Online data communications are quite prevalent and pervasive in modern society, and are becoming more so all the time. Moreover, developments in software, communication protocols, and peripheral devices (e.g., video cameras, three-dimension video cameras, and the like), along with developments in other computing disciplines, have collectively enabled and facilitated the inclusion of multimedia experiences as part of such communications. Indeed, the multimedia nature and aspects of a given communication session are often the focus and even essence of such communications. These multimedia experiences take forms such as audio chats, video chats (that are usually also audio chats), online meetings (e.g., web meetings), and of course many other examples could be listed as well.
Using the context of online meetings as an illustrative example, it is often the case that one of the participants in the video conference call is a designated presenter, and often this user opts to embed a digital representation of themselves (i.e., a persona) as part of the offered presentation. By way of example, the user may choose to have a video feed embedded into a power point presentation. In a simple scenario, the video feed may include a depiction of the user as well as background information. The background information may include a view of the wall behind the user as seen from the point of view of the video camera. If the user is outside, the background information may include buildings and trees. In more advanced versions of this video conferencing paradigm, the persona is isolated from the background information found in video feed. This allows viewers to experience a more natural sensation as the embedded persona they see within the presentation is not cluttered and surrounded by distracting and undesired background information.
Improvements over the above-described developments have recently been realized by technology that, among other capabilities and features, extracts what is known as a “persona” of a user from a video feed from a video camera that is capturing video of the user. The extracted persona, which in some examples appears as a depiction of part of the user (i.e., upper torso, shoulders, arms, hands, neck, and head) and in other examples appears as a depiction of the entire user. This technology is described in the following patent documents, each of which is incorporated in its respective entirety into this disclosure: (i) U.S. patent application Ser. No. 13/083,470, entitled “Systems and Methods for Accurate User Foreground Video Extraction,” filed Apr. 8, 2011 and published Oct. 13, 2011 as U.S. Patent Application Pub. No. US 2011/0249190, (ii) U.S. patent application Ser. No. 13/076,264, entitled “Systems and Methods for Embedding a Foreground Video into a Background Feed based on a Control input,” filed Mar. 30, 2011 and published Oct. 6, 2011 as U.S. Patent Application Pub. No. US2011/0242277, and (iii) unpublished U.S. patent application Ser. No. 14/145,874, entitled “System and Methods for Persona Identification Using Combined Probability Maps,” filed Dec. 31, 2013, since published on Jul. 2, 2015 as U.S. Patent Application Pub. No. US2015/0187076.
Facilitating accurate and precise extraction of the persona, especially the hair of the persona, from a video feed is not a trivial matter. As mentioned, persona extraction is carried out with respect to video data that is received from a camera that is capturing video of a scene in which the user is positioned. The persona-extraction technology substantially continuously (e.g., with respect to each frame) identifies which pixels represent the user and which pixels do not, and accordingly generates “alpha masks” (e.g., generates an alpha mask for each frame), where a given alpha mask may take the form of or at least include an array with a respective stored data element corresponding to each pixel in the corresponding frame, where such stored data elements are individually and respectively set equal to 1 (one) for each user pixel and to 0 (zero) for every other pixel (i.e., for each non-user (a.k.a. background) pixel).
The described alpha masks correspond in name with the definition of the “A” in the “RGBA” pixel-data format known to those of skill in the art, where “R” is a red-color value, “G” is a green-color value, “B” is a blue-color value, and “A” is an alpha value ranging from 0 (complete transparency) to 1 (complete opacity). In a typical implementation, the “0” in the previous sentence may take the form of a hexadecimal number such as 0x00 (equal to a decimal value of 0 (zero)), while the “1” may take the form of a hexadecimal number such as 0xFF (equal to a decimal value of 255); that is, a given alpha value may be expressed as an 8-bit number that can be set equal to any integer that is (i) greater than or equal to zero and (ii) less than or equal to 255. Moreover, a typical RGBA implementation provides for such an 8-bit alpha number for each of what are known as the red channel, the green channel, and the blue channel; as such, each pixel has (i) a red (“R”) color value whose corresponding transparency value can be set to any integer value between 0x00 and 0xFF, (ii) a green (“G”) color value whose corresponding transparency value can be set to any integer value between 0x00 and 0xFF, and (iii) a blue (“B”) color value whose corresponding transparency value can be set to any integer value between 0x00 and 0xFF. And certainly other pixel-data formats could be used, as deemed suitable by those having skill in the relevant art for a given implementation.
When merging an extracted persona with content, the above-referenced persona-based technology creates the above-mentioned merged display in a manner consistent with these conventions; in particular, on a pixel-by-pixel (i.e., pixel-wise) basis, the merging is carried out using pixels from the captured video frame for which the corresponding alpha-mask values equal 1, and otherwise using pixels from the content. Moreover, it is noted that pixel data structures typically also include or are otherwise associated with one or more other values corresponding respectively to one or more other properties of the pixel, where brightness is an example of one such property. In some embodiments, the brightness value is the luma component of the image or video frame. In other embodiments, the brightness value is the pixel values of one of an R, G, or B color channel, or other similar color space (e.g., gamma compressed RGB, or R′G′B′, or YUV, or YCbCr, as examples). In other embodiments, the brightness value may be a weighted average of pixel values from one or more color channels. And other approaches exist as well.
This disclosure describes systems and methods for identifying background in video data using geometric primitives. Such systems and methods are useful for scenarios in which a user's persona is to be extracted from a video feed, for example, in an online “panel discussion” or more generally an online meeting or other online communication session. The present systems and methods facilitate natural interaction by enabling rapid identification of non-persona video data, a particularly troublesome aspect of a comprehensive user extraction process. Obviously, in any persona extraction process, it is just as useful to confirm non-persona (background) pixels as it is to confirm persona (foreground) pixels. The present systems and methods therefore provide an advanced approach for identifying background regions in video data using geometric primitives, which may in turn be used in the context of a comprehensive persona extraction process.
One embodiment of the systems and methods disclosed herein takes the form of a process. The process includes obtaining video data depicting at least a portion of a user. The process also includes detecting at least one geometric primitive within the video data. The at least one detected geometric primitive is a type of geometric primitive included in a set of geometric-primitive models. The process also includes identifying a respective region within the video data associated with each detected geometric primitive. The process also includes classifying each respective region as background of the video data.
In at least one embodiment, obtaining the video data includes obtaining the video data via a video camera. The video data may include one or more frames of color images. In at least one embodiment, obtaining the video data includes obtaining the video data via a depth camera. The video data may include one or more frames of depth images. In at least one embodiment, obtaining the video data includes obtaining the video data via a 3-D camera. The video data may include one or more frames of 3-D images. In at least one embodiment, obtaining the video data includes obtaining the video data from a data store. The video data may be obtained via one or more of the above listed sources.
In at least one embodiment, the set of geometric-primitive models includes a straight line. In one such embodiment, at least one of the detected geometric primitives is a straight line and the respective region within the video data associated with the detected straight line is a rectangle. In some instances, the rectangle has a length that is equal to a length of the detected straight line and is bisected by the detected straight line. In other instances, the rectangle has a length that is less than a length of the detected straight line and is bisected by the detected straight line.
In at least one embodiment, the set of geometric-primitive models includes a straight line longer than a threshold length. In one such embodiment, at least one of the detected geometric primitives is a straight line longer than the threshold length and the respective region within the video data associated with the detected straight line is a rectangle. In some instances, the rectangle has a length that is equal to a length of the detected straight line and is bisected by the detected straight line. In other instances, the rectangle has a length that is less than a length of the detected straight line and is bisected by the detected straight line.
In at least one embodiment, the set of geometric-primitive models includes an angle. In one such embodiment, at least one of the detected geometric primitives is an angle. In some embodiments, the respective region within the video data associated with the detected angle is a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. In some embodiments, the respective region within the video data associated with the detected angle is a sub-region of a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. The sub-region may be a quadrilateral sub-region formed by two triangular sub-regions sharing one common side.
In at least one embodiment, the set of geometric-primitive models include an angle within a threshold tolerance of being a right angle. In one such embodiment, at least one of the detected geometric primitives is an angle within the threshold tolerance. In some embodiments, the respective region within the video data associated with the detected angle is a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. In some embodiments, the respective region within the video data associated with the detected angle is a sub-region of a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. The sub-region may be a quadrilateral sub-region formed by two triangular sub-regions sharing one common side.
In at least one embodiment, the set of geometric-primitive models includes an angle made up of two line segments, wherein each of the two line segments is longer than a threshold length. In one such embodiment, at least one of the detected geometric primitives is an angle made up of two line segments, wherein each of the two line segments is longer than the threshold length. In some embodiments, the respective region within the video data associated with the detected angle is a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. In some embodiments, the respective region within the video data associated with the detected angle is a sub-region of a triangle that is made up of two line segments that form the detected angle and one line segment that connects the two line segments that form the detected angle. The sub-region may be a quadrilateral sub-region formed by two triangular sub-regions sharing one common side.
In at least one embodiment, identifying a respective region within the video data associated with each detected geometric primitive includes obtaining an indication of a foreground region of the video data and selecting a respective region within the video data for each detected geometric primitive that does not include any portion of the indicated foreground region.
In at least one embodiment, classifying each respective region as background of the video data includes employing an alpha mask to classify each respective region as background of the video data. In at least one such embodiment, the alpha mask is made up of Boolean indicators (e.g., binary values). In at least one other embodiment, the alpha mask is made up of background-likelihood indicators (e.g., log-likelihood values).
In at least one embodiment, the method further includes generating a background-color model at least in part by using at least one identified respective region. In at least one such embodiment, generating the background-color model includes (i) identifying a respective color of at least one pixel included in the at least one identified respective region and (ii) adding to the background-color model the identified respective color of the at least one pixel. In some embodiments, generating the background-color model includes (i) identifying a respective color of each pixel included in the at least one identified respective regions and (ii) adding to the background-color model the identified respective colors of the pixels.
In at least one embodiment, the method further includes updating a background-color model at least in part by using at least one identified respective region. In at least one such embodiment, updating the background-color model includes (i) identifying a respective color of at least one pixel included in the at least one identified respective region and (ii) adding to the background-color model the identified respective color of the at least one pixel. In some embodiments, updating the background-color model includes (i) identifying a respective color of each pixel included in the at least one identified respective regions and (ii) adding to the background-color model the identified respective colors of the pixels.
At a high level, the systems and processes described herein use geometric primitive detection and novel processing techniques to identify one or more background regions of video data. The video data depicts at least a portion of a user. The video data may depict a portion of the user (e.g., from the shoulders up) or the entire user. The user may or may not have hair on the top of their head. Identification of background regions using geometric primitive detection helps determine which pixels (i.e., pixels of the video data) are not part of the user. In some cases the classification of each respective region takes the form of an alpha mask as described previously. In other examples, the classification of each respective region associates pixels of the video data with a respective background-likelihood value (e.g., a respective log-likelihood ratio). An indication of the background-classified regions may in turn be used as part of a comprehensive user extraction process.
At least part of the motivation behind the systems and processes described herein is the realization that a depicted persona will not comprise certain geometric primitives. Clearly defined angles and straight lines are not typically found in the human form, and therefore can be assumed to not be part of a to-be-extracted persona if detected within the video data.
The set of geometric-primitive models is a construct which contains one or more archetypal geometric primitives that the systems and methods described herein will look to detect within the video data. The set of geometric-primitive models may include any geometric primitive model. Some useful examples may include right angles, long straight lines, squares, triangles, circles, contours, and the like. The set of geometric primitive models should define shapes or geometric structures that are unlikely to be part of a depicted persona.
A user-hair-color model and a background-color model may each take on a plurality of forms. In general each model is used to indicate which colors are representative of a user-hair color and background of the video data respectively. The models may take on the form of a histogram, a Gaussian mixture, an array of color values and respective color counts, and the like.
In general, any indication, classification, assignment, and the like of pixels, regions, portions, and the like of the video data is relevant within the scope of the systems and processes described herein. As this disclosure describes systems and processes that may be used as part of a comprehensive user-extraction process, it is explicitly noted that it is not required that any classification of pixels as foreground or background be definitive with respect to the entire user-extraction process.
The accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed invention, and explain various principles and advantages of those embodiments.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.
The apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein.
Before proceeding with this detailed description, it is noted that the entities, connections, arrangements, and the like that are depicted in—and described in connection with—the various figures are presented by way of example and not by way of limitation. As such, any and all statements or other indications as to what a particular figure “depicts,” what a particular element or entity in a particular figure “is” or “has,” and any and all similar statements—that may in isolation and out of context be read as absolute and therefore limiting—can only properly be read as being constructively preceded by a clause such as “In at least one embodiment, . . . ” And it is for reasons akin to brevity and clarity of presentation that this implied leading clause is not repeated ad nauseum in this detailed description.
Computing device 1604 may take the form of a desktop computer, laptop computer, tablet computer, smart phone, or the like. In the embodiment shown in
The preceding paragraph is an example of the fact that, in the present disclosure, various elements of one or more of the described embodiments are referred to as modules that carry out (i.e., perform, execute, and the like) various functions described herein. As the term “module” is used herein, each described module includes hardware (e.g., one or more processors, microprocessors, microcontrollers, microchips, application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), memory devices, and/or one or more of any other type or types of devices and/or components deemed suitable by those of skill in the relevant art in a given context and/or for a given implementation. Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the particular module, where those instructions could take the form of or at least include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, stored in any non-transitory computer-readable medium deemed suitable by those of skill in the relevant art.
The persona ID modules 1624 operate on the depth data as shown by arrow 1616, nm the foreground-background map as shown by connection 1618, or on the image pixel data shown by connection 1620, or on both the foreground-background map and the image pixel data. Each of the persona ID modules 1624 generates a probability map indicating a likelihood that the respective pixels are part of a foreground image as compared to a background image. The persona ID modules, as described more fully below, are configured to operate on certain characteristics of the image and/or depth data to identify characteristics of the data indicative of a person's presence in the scene 1602. The respective probability maps are then combined by combiner module 1622 to provide an aggregate probability map. In some embodiments, the individual probability maps are in the form of a log-likelihood ratio:
which represents the logarithm of the ratio of the probability that the pixel “x” is a foreground (“f”) pixel versus a background (“b”) pixel. Thus, a value of 1 represents a likelihood that the pixel being in the foreground is ten times more likely than being in the background, a value of −1 represents a likelihood that the pixel being in the background is ten times that of being in the foreground, while a value of 0 represents and equal likelihood of a pixel being in the foreground or background (that is, a likelihood ratio of 1 has a loglikelihood of 0). In such an embodiment, the combiner module 1622 may combine the probability maps by forming a weighted sum of the plurality of maps on a pixel-by-pixel basis. Note that the probability maps need not be rigorously derived from probability theory, but may also be based on heuristic algorithms that provide approximations of relative likelihoods of a pixel being either a foreground or background pixel.
In some embodiments, the present systems and methods relate to a persona ID module 1624 which may operate, as previously discussed, by seeking to identify background regions or pixels in a frame of pixel data from video data using geometric primitives, which may in turn be used in the context of a comprehensive persona extraction process.
The persona extraction module 1626 of computing device 1604 may operate on the aggregate persona probability map as indicated from line 1628 from combiner module 1622. In one embodiment, a graph cut utility (such as what is available from within the OpenCV library) may be utilized. In such an embodiment, the segmentation of the persona extraction may be formulated as a mincut/maxflow problem. In this case, the image is mapped into a graph, and each pixel is mapped to a node. In addition, there are two additional special nodes called the source and the sink. The node for each image pixel is connected to both the source and the sink. If the aggregate persona probability map indicates that that pixel is likely to be foreground, a weight is applied to the edge linking the pixel to the source. If the aggregate persona probability map indicates that that pixel is likely to be background, a weight is applied to the edge linking the pixel to the sink. The magnitude of the weight increases as the probability becomes more certain. In addition, edges are included that link the nodes for a pixel to the nodes of a neighboring pixel. The weights of these nodes are inversely proportional to the likelihood of a boundary appearing there. One possible technique is to set these weights to be large if the two pixels are similar in color and set them to be small if the two pixels are not. Thus, transitioning from foreground to background is favored in areas where the color is also changing. The mincut problem is then solved by configuring the algorithm to remove edges from the graph until the source is no longer connected to the sink. (The algorithm will minimize the total weight of the edges it removes.) Since the node for each pixel is connected to both the source and the sink, one of those edges must be removed by the cut. If the node remains connected to the source (the edge to the sink was removed), that pixel is marked as foreground. Otherwise, the node is connected to the sink (the edge to the source was removed), and that pixel is marked as background. The formulation described may be solved efficiently through a variety of techniques.
One embodiment takes the form of the process 100. The process 100 includes obtaining video data depicting at least a portion of a user. The process 100 also includes detecting at least one geometric primitive within the video data. The at least one detected geometric primitive is a type of geometric primitive included in a set of geometric-primitive models. The process 100 also includes identifying a respective region within the video data associated with each detected geometric primitive. The process 100 also includes classifying each respective region as background of the video data.
At element 102 the process 100 includes obtaining video data depicting at least a portion of a user. In at least one embodiment, obtaining the video data includes obtaining the video data via a video camera. The video data may include one or more frames of color images. In at least one embodiment, obtaining the video data includes obtaining the video data via a depth camera. The video data may include one or more frames of depth images. In at least one embodiment, obtaining the video data includes obtaining the video data via a 3-D camera. The video data may include one or more frames of 3-D images. In at least one embodiment, obtaining the video data includes obtaining the video data from a data store. The video data may be obtained via one or more of the above listed sources.
At element 104 the process 100 includes detecting at least one geometric primitive within the video data. The at least one detected geometric primitive is a type of geometric primitive included in a set of geometric-primitive models.
At element 106 the process 100 includes identifying a respective region within the video data associated with each detected geometric primitive.
At element 108 the process 100 includes classifying each respective region as background of the video data. In at least one embodiment, classifying each respective region as background of the video data includes employing an alpha mask to classify each respective region as background of the video data. In at least one such embodiment, the alpha mask is made up of Boolean indicators. In at least one other embodiment, the alpha mask is made up of background-likelihood indicators.
In at least one embodiment, obtaining the video data 202 includes obtaining the video data 202 via a video camera. In at least one embodiment, obtaining the video data 202 includes obtaining the video data 202 via a depth camera. In at least one embodiment, obtaining the video data 202 includes obtaining the video data 202 via a 3-D camera. In at least one embodiment, obtaining the video data 202 includes obtaining the video data 202 from a data store.
In at least one embodiment, the set of geometric-primitive models includes a straight line. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more straight lines (such as the detected straight line 302). In some cases every straight line within the video data is detected. This case is not depicted in any of the FIGS. due to the visual complexity, however it would be understood by those with skill in the art how to do so. In other embodiments not every straight line within the video data is detected, of which
In at least one embodiment, the set of geometric-primitive models includes a straight line longer than a threshold length. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more straight lines that are each longer than the threshold length. In some cases every straight line that is longer than the threshold length within the video data 202 is detected. In other embodiments not every straight line within the video data 202 is detected. In some formulation of this embodiment, every straight line depicted in the video data 202 is detected, and those detected straight lines that are not longer than the threshold are thereafter disregarded. Stated formally, in some embodiments, detecting, if present within the video data 202, one or more straight lines that are each longer than the threshold length includes (i) detecting, if present within the video data 202, one or more straight lines and (ii) disregarding those detected straight lines that are not longer than the threshold length.
In at least one embodiment, as depicted in
The width of the region 402 is such that (i) the entirety of the detected straight line 302 as well as (ii) parts of the video data 202 that border the sides of the depicted straight line 302 are included within the region 402.
In at least one embodiment, as depicted in
The width of the region 502 is such that the entirety of the detected straight line 302 is included within the region 502 however, no other portion of the video data 202 is included within the region 502.
In at least one embodiment, the set of geometric-primitive models includes a straight line. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more straight lines (such as the detected straight line 602). In some cases every straight line within the video data is detected. This case is not depicted in any of the FIGS. due to the visual complexity, however it would be understood by those with skill in the art how to do so. In other embodiments not every straight line within the video data is detected, of which
The example described with respect to
In at least one embodiment, as depicted in
The width of the region 702 is such that (i) the entirety of the detected straight line 602 as well as (ii) parts of the video data 202 that border the sides of the depicted straight line 602 are included within the region 702.
As stated previously, in one scenario, the arm of the user 204 is indicated as, or is indicated as part of the foreground region of the video data 202. One example motivation behind identifying the rectangle (i.e., the region 702) to have a length that is less than the length of the detected straight line 602 is to prevent the indicated foreground region of the video data 202 from being included within the identified region 702.
In at least one embodiment, as depicted in
As stated previously, in one scenario, the arm of the user 204 is indicated as, or is indicated as part of the foreground region of the video data 202. One example motivation behind identifying the region 802 to be not a rectangle is to prevent the indicated foreground region of the video data 202 from being included within the identified region 802.
The shape of the region 802 may have been determined through use of a multi-step approach. In a first step, a rectangle with a length equal to that of the detected straight line 602 is selected. In a second step, any portion of the selected rectangle that overlaps with any indicated foreground (e.g., the arm of the user 204) is deselected. The remaining selected region is identified as the region 802 (i.e., the remaining selected region is the identified region associated with the detected straight line 602).
As stated previously, in one scenario, the arm of the user 204 is indicated as, or is indicated as part of the foreground region of the video data 202. One example motivation behind identifying the respective region associated with the detected interpolated straight line 902 (made up of sub-regions 904a-c) to be a noncontiguous region is to prevent the indicated foreground region of the video data 202 from being included within the identified disjointed region.
The shape of the region associated with the detected interpolated straight line 902 may have been determined through use of a multi-step approach. In a first step, a rectangle with a length equal to that of the detected interpolated straight line 902 is selected (e.g., a length from the top of the bookcase 206 to the bottom of the bookcase 206). In a second step, any portion of the selected rectangle that overlaps with any indicated foreground (e.g., the arm of the user 204) is deselected. The remaining selected region is identified as the region made up of the noncontiguous sub-regions 904a-c (the remaining selected sub-regions are the identified region associated with the detected interpolated straight line 902).
In another embodiment, the sub-regions 904a-c are generated using the technique discussed in connection with
In at least one embodiment, the set of geometric-primitive models includes an angle. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more angles (e.g., the angle 1002). In some cases every angle within the video data 202 is detected. This case is not depicted in any of the FIGS. due to the visual complexity, however it would be understood by those with skill in the art how to do so. In other embodiments not every angle within the video data 202 is detected, of which
In at least one embodiment, the set of geometric-primitive models includes an angle within a threshold tolerance of being a right angle. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more angles within a threshold tolerance of being a right angle (e.g., the angle 1002). In some cases every angle within a threshold tolerance of being a right angle within the video data 202 is detected. In other embodiments not every angle within a threshold tolerance of being a right angle within the video data 202 is detected. In some formulation of this embodiment, every angle depicted in the video data 202 is detected, and those detected angles that are not within a threshold tolerance of being a right angle are thereafter disregarded. Stated formally, in some embodiments, detecting, if present within the video data 202, one or more angles within a threshold tolerance of being a right angle includes (i) detecting, if present within the video data 202, one or more angles and (ii) disregarding those detected angles that are not longer than the threshold length.
In at least one embodiment, the set of geometric-primitive models includes an angle made up of two line segments, wherein each of the two line segments is longer than a threshold length. In such an embodiment, the systems and methods described herein include detecting, if present within the video data 202, one or more angles made up of two line segments (e.g., the angle 1002 made up of the line segments 1004 and 1006), wherein each of the two line segments is longer than the threshold length. In some cases every angle made up of two line segments, wherein each of the two line segments is longer than the threshold length within the video data 202 is detected. In other embodiments not every angle made up of two line segments, wherein each of the two line segments is longer than the threshold length within the video data 202 is detected. In some formulation of this embodiment, every angle depicted in the video data 202 is detected, and those detected angles that are not made up of two line segments that are each longer than the threshold length are thereafter disregarded. Stated formally, in some embodiments, detecting, if present within the video data 202, one or more angle made up of two line segments, wherein each of the two line segments is longer than the threshold length includes (i) detecting, if present within the video data 202, one or more angles and (ii) disregarding those detected angles that are not made up of two line segments that are each longer than the threshold length.
In at least one embodiment, as depicted in
In at least one embodiment, as depicted in
The shape of the region 1202 may be determined through use of a multi-step approach. In a first step, a triangle that is made up of the two line segments 1004 and 1006 that form the detected angle 1002 and the one line segment 1104 that connects the two line segments 1004 and 1006 (i.e., the region 1102 of
In at least one embodiment, as depicted in
In at least one scenario, the user 204 is indicated as, or is indicated as part of the foreground region of the video data 1302. One example motivation behind identifying the region 1402 as depicted in
The shape of the region 1402 may be determined through use of a multi-step approach. In a first step, a triangle that is made up of the two line segments 1004 and 1006 that form the detected angle 1002 and the one line segment 1104 that connects the two line segments 1004 and 1006 (i.e., the region 1102 of
The communication interface 1502 may include one or more wireless-communication interfaces (for communicating according to, e.g., APCO P25, TETRA, DMR, LTE, Wi-Fi, NFC, Bluetooth, and/or one or more other wireless-communication protocols) and/or one or more wired-communication interfaces (for communicating according to, e.g., Ethernet, USB, eSATA, IEEE 1394, and/or one or more other wired-communication protocols). As such, the communication interface 1502 may include any necessary hardware (e.g., chipsets, antennas, Ethernet cards, etc.), any necessary firmware, and any necessary software for conducting one or more forms of communication with one or more other entities as described herein. The processor 1504 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated digital signal processor (DSP).
The data storage 1506 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non-transitory data-storage technology deemed suitable by those of skill in the relevant art could be used. As depicted in
If present, the user interface 1512 may include one or more input devices (a.k.a. components and the like) and/or one or more output devices (a.k.a. components and the like). With respect to input devices, the user interface 1512 may include one or more touchscreens, buttons, switches, microphones, and the like. With respect to output devices, the user interface 1512 may include one or more displays, speakers, light emitting diodes (LEDs), and the like. Moreover, one or more components (e.g., an interactive touchscreen-and-display component) of the user interface 1512 could provide both user-input and user-output functionality. And certainly other user-interface components could be used in a given context, as known to those of skill in the art. Furthermore, the CCD 1500 may include one or more video cameras, depth cameras, 3-D cameras, infrared-visible cameras, light-field cameras or a combination thereof.
In the foregoing specification, specific embodiments have been described. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present teachings.
The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims. The invention is defined solely by the appended claims including any amendments made during the pendency of this application and all equivalents of those claims as issued.
Moreover in this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has,” “having,” “includes,” “including,” “contains,” “containing,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “comprises . . . a,” “has . . . a,” “includes . . . a,” “contains . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises, has, includes, contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially,” “essentially,” “approximately,” “about,” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art, and in one non-limiting embodiment the term is defined to be within 1%, in another embodiment within 5%, in another embodiment within 1% and in another embodiment within 0.5%. The term “coupled” as used herein is defined as connected, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
It will be appreciated that some embodiments may be comprised of one or more generic or specialized processors (or “processing devices”) such as microprocessors, digital signal processors, customized processors and field programmable gate arrays (FPGAs) and unique stored program instructions (including both software and firmware) that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the method and/or apparatus described herein. Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more application specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used.
Moreover, an embodiment can be implemented as a computer-readable storage medium having computer readable code stored thereon for programming a computer (e.g., comprising a processor) to perform a method as described and claimed herein. Examples of such computer-readable storage mediums include, but are not limited to, a hard disk, a CD-ROM, an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions and programs and ICs with minimal experimentation.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
Number | Name | Date | Kind |
---|---|---|---|
5001558 | Burley et al. | Mar 1991 | A |
5022085 | Cok | Jun 1991 | A |
5117283 | Kroos et al. | May 1992 | A |
5227985 | DeMenthon | Jul 1993 | A |
5343311 | Morag et al. | Aug 1994 | A |
5506946 | Bar et al. | Apr 1996 | A |
5517334 | Morag et al. | May 1996 | A |
5534917 | MacDougall | Jul 1996 | A |
5581276 | Cipolla et al. | Dec 1996 | A |
5631697 | Nishimura et al. | May 1997 | A |
5687306 | Blank | Nov 1997 | A |
5875040 | Matraszek | Feb 1999 | A |
6119147 | Toomey | Sep 2000 | A |
6125194 | Yeh | Sep 2000 | A |
6150930 | Cooper | Nov 2000 | A |
6288703 | Berman | Sep 2001 | B1 |
6411744 | Edwards | Jun 2002 | B1 |
6618444 | Haskell | Sep 2003 | B1 |
6661918 | Gordon et al. | Dec 2003 | B1 |
6664973 | Iwamoto et al. | Dec 2003 | B1 |
6760749 | Dunlap | Jul 2004 | B1 |
6798407 | Benman | Sep 2004 | B1 |
6937744 | Toyama | Aug 2005 | B1 |
6973201 | Colmenarez | Dec 2005 | B1 |
7050070 | Ida | May 2006 | B2 |
7124164 | Chemtob | Oct 2006 | B1 |
7317830 | Gordon et al. | Jan 2008 | B1 |
7386799 | Clanton | Jun 2008 | B1 |
7420590 | Matusik | Sep 2008 | B2 |
7463296 | Sun | Dec 2008 | B2 |
7518051 | Redmann | Apr 2009 | B2 |
7574043 | Porikli | Aug 2009 | B2 |
7599555 | McGuire | Oct 2009 | B2 |
7602990 | Matusik | Oct 2009 | B2 |
7631151 | Prahlad | Dec 2009 | B2 |
7633511 | Shum | Dec 2009 | B2 |
7634533 | Rudolph | Dec 2009 | B2 |
7668371 | Dorai | Feb 2010 | B2 |
7676081 | Blake | Mar 2010 | B2 |
7692664 | Weiss | Apr 2010 | B2 |
7747044 | Baker | Jun 2010 | B2 |
7755016 | Toda et al. | Jul 2010 | B2 |
7773136 | Ohyama et al. | Aug 2010 | B2 |
7821552 | Suzuki et al. | Oct 2010 | B2 |
7831087 | Harville | Nov 2010 | B2 |
7912246 | Moon | Mar 2011 | B1 |
8094928 | Graepel et al. | Jan 2012 | B2 |
8131011 | Nevatia | Mar 2012 | B2 |
8146005 | Jones | Mar 2012 | B2 |
8175384 | Wang | May 2012 | B1 |
8204316 | Panahpour Tehrani et al. | Jun 2012 | B2 |
8225208 | Sprang | Jul 2012 | B2 |
8264544 | Chang | Sep 2012 | B1 |
8300890 | Gaikwad et al. | Oct 2012 | B1 |
8320666 | Gong | Nov 2012 | B2 |
8335379 | Malik | Dec 2012 | B2 |
8345082 | Tysso | Jan 2013 | B2 |
8363908 | Steinberg | Jan 2013 | B2 |
8379101 | Mathe | Feb 2013 | B2 |
8396328 | Sandrew et al. | Mar 2013 | B2 |
8406494 | Zhan | Mar 2013 | B2 |
8411149 | Maison | Apr 2013 | B2 |
8411948 | Rother | Apr 2013 | B2 |
8422769 | Rother | Apr 2013 | B2 |
8437570 | Criminisi | May 2013 | B2 |
8446459 | Fang | May 2013 | B2 |
8446488 | Yim | May 2013 | B2 |
8477149 | Beato | Jul 2013 | B2 |
8503720 | Shotton | Aug 2013 | B2 |
8533593 | Grossman | Sep 2013 | B2 |
8533594 | Grossman | Sep 2013 | B2 |
8533595 | Grossman | Sep 2013 | B2 |
8565485 | Craig et al. | Oct 2013 | B2 |
8588515 | Bang et al. | Nov 2013 | B2 |
8625897 | Criminisi | Jan 2014 | B2 |
8643701 | Nguyen | Feb 2014 | B2 |
8649592 | Nguyen | Feb 2014 | B2 |
8649932 | Mian et al. | Feb 2014 | B2 |
8655069 | Rother | Feb 2014 | B2 |
8659658 | Vassigh | Feb 2014 | B2 |
8666153 | Hung et al. | Mar 2014 | B2 |
8682072 | Sengamedu | Mar 2014 | B2 |
8701002 | Grossman | Apr 2014 | B2 |
8723914 | Mackie | May 2014 | B2 |
8818028 | Nguyen et al. | Aug 2014 | B2 |
8854412 | Tian | Oct 2014 | B2 |
8874525 | Grossman | Oct 2014 | B2 |
8890923 | Tian | Nov 2014 | B2 |
8890929 | Paithankar | Nov 2014 | B2 |
8913847 | Tang et al. | Dec 2014 | B2 |
8994778 | Weiser | Mar 2015 | B2 |
9008457 | Dikmen | Apr 2015 | B2 |
9053573 | Lin | Jun 2015 | B2 |
9065973 | Graham | Jun 2015 | B2 |
9087229 | Nguyen | Jul 2015 | B2 |
9088692 | Carter | Jul 2015 | B2 |
9285951 | Makofsky | Mar 2016 | B2 |
9542626 | Martinson | Jan 2017 | B2 |
9659658 | Kim | May 2017 | B2 |
20020012072 | Toyama | Jan 2002 | A1 |
20020025066 | Pettigrew | Feb 2002 | A1 |
20020051491 | Challapali | May 2002 | A1 |
20020158873 | Williamson | Oct 2002 | A1 |
20040153671 | Schuyler et al. | Aug 2004 | A1 |
20040175021 | Porter | Sep 2004 | A1 |
20050063565 | Nagaoka | Mar 2005 | A1 |
20060072022 | Iwai | Apr 2006 | A1 |
20060193509 | Criminisi | Aug 2006 | A1 |
20060259552 | Mock | Nov 2006 | A1 |
20060291697 | Luo | Dec 2006 | A1 |
20070036432 | Xu | Feb 2007 | A1 |
20070110298 | Graepel | May 2007 | A1 |
20070133880 | Sun | Jun 2007 | A1 |
20080181507 | Gope et al. | Jul 2008 | A1 |
20080266380 | Gorzynski | Oct 2008 | A1 |
20080273751 | Yuan | Nov 2008 | A1 |
20090003687 | Agarwal | Jan 2009 | A1 |
20090199111 | Emori | Aug 2009 | A1 |
20090245571 | Chien | Oct 2009 | A1 |
20090249863 | Kim | Oct 2009 | A1 |
20090284627 | Bando et al. | Nov 2009 | A1 |
20090300553 | Pettigrew | Dec 2009 | A1 |
20100034457 | Berliner | Feb 2010 | A1 |
20100046830 | Wang | Feb 2010 | A1 |
20100053212 | Kang | Mar 2010 | A1 |
20100128927 | Ikenoue | May 2010 | A1 |
20100171807 | Tysso | Jul 2010 | A1 |
20100302376 | Boulanger | Dec 2010 | A1 |
20100329544 | Sabe | Dec 2010 | A1 |
20110242277 | Do et al. | Oct 2011 | A1 |
20110249190 | Nguyen | Oct 2011 | A1 |
20110249863 | Ohashi | Oct 2011 | A1 |
20110249883 | Can | Oct 2011 | A1 |
20110267348 | Lin et al. | Nov 2011 | A1 |
20120051631 | Nguyen | Mar 2012 | A1 |
20120314077 | Clavenna, II | Dec 2012 | A1 |
20130016097 | Coene | Jan 2013 | A1 |
20130110565 | Means | May 2013 | A1 |
20130129205 | Wang | May 2013 | A1 |
20130142452 | Shionozaki | Jun 2013 | A1 |
20130243313 | Civit | Sep 2013 | A1 |
20130335506 | Carter | Dec 2013 | A1 |
20140003719 | Bai | Jan 2014 | A1 |
20140029788 | Kang | Jan 2014 | A1 |
20140063177 | Tian | Mar 2014 | A1 |
20140112547 | Peeper | Apr 2014 | A1 |
20140119642 | Lee | May 2014 | A1 |
20140153784 | Gandolph | Jun 2014 | A1 |
20140300630 | Flider | Oct 2014 | A1 |
20140307056 | Romea | Oct 2014 | A1 |
20170208243 | Masad | Jul 2017 | A1 |
Number | Date | Country |
---|---|---|
2013019259 | Feb 2013 | WO |
Entry |
---|
Lee, D.S., “Effective Gaussian Mixture Learning for Video Background Substraction”, IEEE, May 2005. |
Benezeth et al., “Review and Evaluation of Commonly-Implemented Background Substraction Algorithm”, 2008. |
Piccardi, M., “Background Substraction Techniques: A Review”, IEEE, 2004. |
Cheung et al., “Robust Techniques for Background Substraction in Urban Traffic Video”, 2004. |
Kolmogorov et al., “Bi-Layer Segmentation of Binocular Stereo Vision”, IEEE, 2005. |
Gvili et al., “Depth Keying”, 2003. |
Crabb et al., “Real-Time Foreground Segmentation via Range and Color Imaging”, 2008. |
Wang, L., et al., “Tofcut: Towards robust real-time foreground extraction using a time-off camera.”, Proc. of 3DPVT, 2010. |
Xu, F., et al., “Human detection using depth and gray images”, Advanced Video and Signal Based Surveillance, 2003., Proceedings, IEEE Conference on IEEE, 2003. |
Zhang, Q., et al., “Segmentation and tracking multiple objects under occlusion from multiview video.”, Image Processing, IEEE Transactions on 20.11 (2011), pp. 3308-3313. |
Carsten, R., et al., “Grabcut: Interactive foreground extraction using iterated graph cuts”, ACM Transactions on Graphics (TOG) 23.3 (2004), pp. 309-314. |
Arbelaez, P., et ,al., “Contour detection and hierarchical image segmentation”, Pattern Analysis and Machine Intelligence, IEEE Transactions on 33.4 (2011): 898-916. |
Izquierdo′ M. Ebroul. “Disparity/segmentation analysis: matching with an adaptive window and depth-driven segmentation.” Circuits and Systems for Video Technology, IEEE Transactions on 9.4 (1999): 589-607. |
Working screenshot of Snagit manufactured by Techsmith, released Apr. 18, 2014. |
Yacoob, Y., et al., “Detection, analysis and matching of hair,” in Computer Vision, 2005, ICCV 2005. Tenth IEEE International Conference, vol. 1., No., pp. 741-748, vol. 1, Oct. 17-21, 2005. |
Talukder, A., et al., “Real-time detection of moving objects in a dynamic scene from moving robotic vehicles,” in Intelligent Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ international Conference on, vol. 2, pp. 1308-1313, vol. 2, Oct. 27-31, 2003. |
Sheasby, G., et al., “A robust stereo prior for human segmentation”, In ACCV, 2012. |
Hradis, M., et al., “Real-time Tracking of Participants in Meeting Video”, Proceedings of CESCG, Wien, 2006. |
Number | Date | Country | |
---|---|---|---|
20160343148 A1 | Nov 2016 | US |