Whiteboards, also known as dry-erase boards, are different from blackboards in that whiteboards include a smoother writing surface that allows rapid marking and erasing of markings. Specifically, whiteboards usually include a glossy white surface for making nonpermanent markings, and are used in many offices, meeting rooms, school classrooms, and other work environments. Whiteboards may also be used to facilitate collaboration among multiple remote participants (referred to as collaborating users) that are sharing information. In such collaborations, one or more cameras are pointed at the whiteboard to share a user's written or drawn content with other participants.
In general, in one aspect, the invention relates to a method to extract content written on a marker board. The method includes generating, by a computer processor, a sequence of samples from a video stream, wherein the video stream comprises a series of images of the marker board, generating, by the computer processor, a center of mass (COM) of foreground content of each sample in the sequence of samples, detecting, by the computer processor and based on a predetermined criterion, a stabilized change of the COM in the sequence of samples, and extracting, based at least on detecting the stabilized change, a portion of static written content from the video stream.
In general, in one aspect, the invention relates to a system for extracting content written on a marker board. The system includes a memory, and a computer processor connected to the memory that generates a sequence of samples from a video stream, wherein the video stream comprises a series of images of the marker board, generates a center of mass (COM) of foreground content of each sample in the sequence of samples, detects, based on a predetermined criterion, a stabilized change of the COM in the sequence of samples, and extracts, based at least on detecting the stabilized change, a portion of static written content from the video stream.
In general, in one aspect, the invention relates to a non-transitory computer readable medium (CRM) storing instructions for extracting content written on a marker board. The computer readable program code, when executed by a computer, includes functionality for generating a sequence of samples from a video stream, wherein the video stream comprises a series of images of the marker board, generating a center of mass (COM) of foreground content of each sample in the sequence of samples, detecting, based on a predetermined criterion, a stabilized change of the COM in the sequence of samples, and extracting, based at least on detecting the stabilized change, a portion of static written content from the video stream.
Other aspects of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
In general, embodiments of the invention provide a method, non-transitory computer readable medium, and system for extracting written content from a marker board using a live video stream or pre-recorded video where one or more users are interacting with the marker board. In a collaboration session between collaborating users, the extracted content is sent to the collaborating user in real time while one or more users are writing/drawing on the marker board. One or more embodiments of the invention minimize the amount of extraction updates sent to collaborating users by limiting the extraction updates to occur only when content changes in a specific region of the marker board.
In one or more embodiments of the invention, the buffer (101) is configured to store a marker board image (102). The marker board image (102) is an image of a writing surface of a marker board captured using one or more camera devices (e.g., a video camera, a webcam, etc.). In particular, the marker board image (102) may be one image in a series of images in a video stream (102a) of the captured marker board, and may be of any size and in any image format (e.g., BMP, JPEG, TIFF, PNG, etc.).
The marker board is a whiteboard, blackboard, or other similar type of writing material. The writing surface is the surface of the marker board where a user writes, draws, or otherwise adds marks and/or notations. Throughout this disclosure, the terms “marker board” and “the writing surface of the marker board” may be used interchangeably depending on context.
The marker board image (102) may include content that is written and/or drawn on the writing surface by one or more users. Once written and/or drawn on the writing surface, the content stays unchanged until the content is removed (e.g., the content is erased by a user). In one or more embodiments, the written and/or drawn content is referred to as static written content (108). Additionally, the marker board image (102) may include content resulting from a user's motion in front of the marker board or sensor noise generated by the camera device. The static written content (108) and content due to the user's motion or sensor noise collectively form a foreground content of the marker board image (102).
In one or more embodiments, the buffer (101) is further configured to store the intermediate and final results of the system (100) that are directly or indirectly derived from the marker board image (102) and the video stream (102a). The intermediate and final results include at least an averaged sample (103), a binarized sample (104), a center of mass (COM) (105), a stable sample count (106), a changing status (107), and the static written content (108). Each of these intermediate and final results is described below in detail.
In one or more embodiments, the averaged sample (103) is an average of a contiguous portion of the video stream (102a), where the contiguous portion corresponds to a short time period (e.g., 0.25 seconds) during the collaboration session. Each pixel of the averaged sample (103) is assigned an averaged pixel value of corresponding pixels in all images within the contiguous portion of the video stream (102a). For example, the marker board image (102) may be one of the images in the contiguous portion of the video stream (102a).
Furthermore, the averaged sample (103) represents one of multiple divided regions of the marker board. Each region is referred to as a tile and may be represented as a rectangle, square, or any other planar shape. In other words, the averaged sample (103) is the average of corresponding tiles in all images within the contiguous portion of the video stream (102a). Each tile in the images is the image of a corresponding tile of the marker board. In this disclosure, the term “tile” is also used to refer to an image of the tile.
As the marker board is divided into multiple tiles, the averaged sample (103) becomes part of a collection of averaged samples that represent, in combination, the entire writing surface of the marker board. Furthermore, as the video stream (102a) is divided into a sequence of contiguous portions, the averaged sample (103) is one averaged sample within a sequence of averaged samples.
In one or more embodiments, the binarized sample (104) is a binary mask generated using the averaged sample (103). Each pixel of the binarized sample (104) is assigned a binary value that represents the foreground pixels and the background pixels of the averaged sample (103). In this disclosure, the averaged sample (103) and the binarized sample (104) are both referred to as a sample of the video stream (102a). Furthermore, the sequence of averaged samples and the corresponding sequence of binarized samples are both referred to as the sequence of samples.
In one or more embodiments, the COM (105) is a pixel location in a tile and corresponding sample (e.g., averaged sample (103), binarized sample (104)) where the coordinates are averaged from all foreground pixels. As the user writes or draws into a particular tile, the COM (105) of the respective sample changes due to the user's hand motion and/or the added static written content (108).
In one or more embodiments, the stable sample count (106) is a count of the consecutive samples in the sequence of samples where the COM has not changed more than a predetermined threshold (e.g., 10 pixels). Generally, the stable sample count (106) is zero for a tile where the user is actively writing or drawing.
In one or more embodiments, the changing status (107) is a status of a sample indicating whether a significant change in the COM (105) has stabilized over a predetermined number (e.g., 10) of subsequent samples. A significant change in the COM (105) that has stabilized over the predetermined number (e.g., 10) of subsequent samples is referred to as a stabilized change.
In one or more embodiments of the invention, the analysis engine (109) is configured to generate a sequence of averaged samples (including the averaged sample (103)) and corresponding binarized samples (including the binarized sample (104)) from the video stream (102a). The analysis engine (109) is further configured to generate the COM (105) for each of the samples.
In one or more embodiments of the invention, the extraction engine (110) is configured to detect a stabilized change of the COM (105) in the sequence of samples, and to extract the static written content (108) in a corresponding tile of the video stream (102a) where the stabilized change is detected. As the user writes or draws across the entire writing surface of the marker board, the static written content (108) in the extracted tile of the video stream (102a) represents only a portion of the entire static written content (108) across the marker board.
In one or more embodiments of the invention, the collaboration engine (111) is configured to generate the static written content (108) by aggregating all portions of the static written content (108) in all of the tiles of the video stream. The collaboration engine (111) is further configured to send the entire or a portion of the static written content (108) to one or more collaborating users. The act of sending whether just a portion or the entirety of the static written content (108) to collaborating user(s) is referred to as an extraction update of the collaboration session.
In one or more embodiments, the analysis engine (109), the extraction engine (110), and the collaboration engine (111) perform the functions described above using the method described in reference to
Although the system (100) is shown as having four components (101, 109, 110, 111), in one or more embodiments of the invention, the system (100) may have more or fewer components. Furthermore, the functions of each component described above may be split across components. Further still, each component (101, 109, 110, 111) may be utilized multiple times to carry out an iterative operation.
Referring to
In one or more embodiments, each image in the video stream is divided into a number of tiles. In one example, each image may be divided equally into rectangular shaped (or other planar shaped) tiles. Each tile in the image corresponds to a rectangular section of the marker board, and each rectangular section of the marker board is referred to as a tile of the marker board. In another example, the tiles may have different form factors within the image and across the marker board such as a dimension of a tile should be at least twice the width of writing/drawing strokes in the image.
In one or more embodiments, each sample in the sequence of samples corresponds to a tile of the marker board. In other words, the pixels of each sample correspond to locations across the corresponding tile of the marker board. In one or more embodiments, the series of images is divided into consecutive portions where each portion is contiguous and includes consecutive images in the video stream. In one example, the consecutive portions may all have the same number of consecutive images. In another example, the number of consecutive images may vary from one portion to another. Regardless of whether the number of consecutive images is constant or variable, the consecutive images in each portion are averaged and divided into the tiles for generating a corresponding sample. Each averaged sample is converted into a binarized sample where the two binary pixel values are used to identify the foreground pixels and the background pixels of the sample.
In Step 201, as discussed above in reference to
In Step 202, as discussed above in reference to
In Step 203, as discussed above in reference to
In Step 204, as discussed above in reference to
In Step 205, as discussed above in reference to
An example main algorithm for performing Steps 200 to 205 above is listed in TABLE 1 below.
The function process_sample(sample_num, avg_color, avg_thresh) is listed in TABLE 2 below.
The function is_stable is listed in TABLE 3 below.
The function sample_differs(sample_num, avg_color, avg_thresh) is listed in TABLE 4 below.
The function significant_change(center_of_mass1, center_of_mass2) is listed in TABLE 5 below.
In the example of
The example method operates on a series of images from a video stream. The video stream may be a pre-recorded collaboration session or a live stream of a current collaboration session. The process described below is repeated for each image of the video stream.
Each image in the video stream is divided into tiles and analyzed for new static written content. Once new static written content has been identified, an update of the tile's static written content is sent to remote participants in the collaboration session.
Automatically transmitting new and stable content when detected advantageously eliminates user's manual initiation of the capture and sending of the content to remote participants in the collaboration session. Such transmission of a tile's static written content based on determining when new content is available and stable also advantageously minimizes (i.e., reduces) the number of necessary content data transmissions to remote participants in the collaboration session. Furthermore, during the content data transmission, the tiles without new content are excluded from the transmission. Excluding the tiles with no new content advantageously minimizes (i.e., reduces) the amount of content data in each of the content data transmission to remote participants in the collaboration session. Furthermore, automatically transmitting new and stable content will also allow content to be seen by remote participants sooner than had the user manually initiated the capture.
For each tile, each set of consecutive n (n=10 in an example implementation) images in the video stream are averaged to generate an averaged sample that minimizes the effects of motion and maximizes the effects of static written content. Any physical marks on the whiteboard identified as pixels in each image of the set show up strongly (i.e., exhibit higher numerical values) in the averaged sample. In contrast, any user motion is likely identified as disparate pixels in different images and consequently does not show up strongly (i.e., exhibit lower numerical values) in the averaged sample. For example, consider the two averaged samples (301a) and (301b) shown in
Furthermore, averaging tiles in images into averaged samples smooths out imaging sensor noise that occurs from image to image. Image processing of the particular tile, as described below, occurs on an averaged-sample-by-averaged-sample basis. The averaged-sample-by-averaged-sample processing corresponds to the sub-steps of step 4 in the main algorithm listed in TABLE 1 above.
Consistent with the main algorithm listed in TABLE 1, the variable “changing” is initially set to false. This variable “changing” tracks whether or not motion is currently being detected in the averaged sample. User motion in the tile is determined by tracking a COM of all foreground content as described below. In particular, the foreground content includes both static written content and artifacts due to the user motion and/or sensor noise.
The next step of the averaged-sample-by-averaged-sample processing, as detailed by items 4.4 through 4.6 in the main algorithm listed in TABLE 1 above, is to identify foreground content in each averaged sample. This is mainly done by running an adaptive thresholding function on each color channel of the averaged sample and using bitwise OR to combine all color channels together into a single binary image, such as the binarized sample (302a) or the binarized sample (302b). Furthermore, some post processing steps are executed to generate the binarized samples (302a) and (302b) for improving the quality of foreground identification in the binary image, such as healing holes (due to faint portions of the pen stroke) and slightly expanding the foreground area (to compensate in imaging artifacts from one averaged sample to another). Identifying foreground content in the two averaged samples (301a) and (301b) results in the corresponding binarized samples (302a) and (302b). In the example of one or more embodiments shown in
Continuing with the averaged-sample-by-averaged-sample processing, the next step is to identify the COM in each binarized sample. The COM is computed as the average location of all foreground (white) pixels in the binarized sample. This is used for motion tracking and stability identification. For the two binarized samples (302a) and (302b), the COM is identified by the icon “x” to generate the marked samples (303a) and (303b). The averaged sample (301a) and the binarized sample (302a) correspond to the marked sample (303a) with the COM (313a), the averaged sample (301b) and the binarized sample (302b) correspond to the marked sample (303b) with the COM (313b). A slight shift exists from the COM (313a) to the COM (313b) due to a noise pattern (312b) being identified as additional foreground from the binarized sample (302a) to the binarized sample (302b).
Once the COM has been identified for each averaged sample, one of two image processing branches (i.e., branch A and branch B) is followed dependent upon whether or not the current state of the tile is considered to be changing or not. If the tile is currently considered to not be changing, then the averaged sample is processed to determine whether it is now changing, as described here (branch B). If the averaged sample being processed is the first averaged sample in the collaboration session (e.g., sample 0), then the variable “changing” is initialized to be false. A vector, center_of_mass_history, is cleared and initialized with the first computed COM. The center_of_mass_history vector is limited to a small size of s (e.g., s=5 in an example implementation) elements and records the most recent s centers of mass. The center_of_mass_history vector is used to smooth out small changes in the COM (e.g., as seen in the marked samples (303a) and (303b) above) by allowing the most recent COM to be averaged together among the s elements.
Otherwise, if the averaged sample being processed is not the first averaged sample, the last_stable_center_of_mass is first computed as the average of all centers of mass in center_of_mass_history. It is then determine whether or not there is a significant change from the last_stable_center_of_mass to the currently identified COM in the averaged sample being processed. A significant change occurs when the Euclidean distance between the two points has changed more than a predetermined threshold t (e.g., t=3 in an example implementation). If a significant change has been identified, then the center_of_mass_history vector is cleared, the variable “changing” is set to true, and a count of stable averaged samples, n_stable_averaged_samples, is initialized to 0. Otherwise, if no significant change has been identified, the variable “changing” is set to false.
If a significant change has been identified (changing is true), the image processing follows the aforementioned branch A and an average change image is created to keep track of the average change across the next several averaged samples where the algorithm looks for stability. If a count of the average sample being stable exceeds a stability window after the significant change, the significant change is a stabilized change. In other words, the stabilized change includes both a significant change and the average sample being stable longer than the stability window after the significant change. The average change image is the average of the subsequent binarized samples (i.e., foreground content) over the stability window and is used to identify which corresponding color pixels to include in a potential update of the tile's static written content. In other words, only those color pixels corresponding to the majority of the foreground pixels across the stability window will be sent in the update. This is done to remove potential flickering at the time of the update.
As further shown in
In branch A of image processing, starting with sample 13, it is determined whether the tile has stabilized. In other words, it is determined whether the COM has moved significantly again from the last averaged position. For sample 13, the COM has moved significantly, and the tile is considered to be still changing. In sample 13, the user's hand does not register a hard edge and so no foreground content is identified. Consequently, the COM becomes undefined and the tile is not considered stabilized. In such a case for sample 13, the center_of_mass_history is cleared, n_stable_averaged_samples is set to 0, and the average change image is cleared and initialized with the current binarized sample.
In one or more embodiments, regardless of whether branch A or branch B of the image processing has been taken, the current COM is pushed onto the back of center_of_mass_history, clearing the oldest entry if it exceeds the size of s.
Embodiments of the invention may be implemented on virtually any type of computing system, regardless of the platform being used. For example, the computing system may be one or more mobile devices (e.g., laptop computer, smart phone, personal digital assistant, tablet computer, or other mobile device), desktop computers, servers, blades in a server chassis, or any other type of computing device or devices that includes at least the minimum processing power, memory, and input and output device(s) to perform one or more embodiments of the invention. For example, as shown in
Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that when executed by a processor(s), is configured to perform embodiments of the invention.
Further, one or more elements of the aforementioned computing system (400) may be located at a remote location and be connected to the other elements over a network (412). Further, one or more embodiments of the invention may be implemented on a distributed system having a plurality of nodes, where each portion of the invention may be located on a different node within the distributed system. In one or more embodiments, the node corresponds to a distinct computing device. Alternatively, the node may correspond to a computer processor with associated physical memory. The node may alternatively correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.
One or more embodiments of the present invention provide the following improvements in electronic collaboration technologies: automatically sharing user content on a marker board with remote collaborating users without the user having to manually send the content data; limiting the amount of content data transmission to the portion of the marker board with new content; and minimizing the number of content data transmissions by automatically determining when the new content is stable prior to transmission.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.