1. Technical Field
The embodiments herein generally relate to video production systems, and, more particularly, to automatically generating notes and annotating multimedia content specific to a video production.
2. Description of the Related Art
Multi-camera shoots are hard to manage, and with more sophisticated technology can lead to a decrease in productivity in a field. Camera operators need to change batteries and/or change flash memory cards more often and at unpredictable times, thereby interrupting shoots and missing important moments and/or schedules. The problem is amplified with more camera operators on the shoot. As video images get larger with higher resolution with HD, 2K and 4K cameras, data storage and power requirements increase, as do the times required to transfer and process the data. This further requires assistants on a production team to process data from flash cards by dumping the data to field hard drives, synchronizing video and sound files, and processing the data into an appropriate format for an editor and/or a director. Eventually these data assets are copied again to larger storage devices (such as hard drives). Much information is lost in the process with hand written notes on papers, notebooks and on flash cards and drives. Accordingly, there remains a need to effectively transfer the data, and store the data for further processing and annotation.
In view of the foregoing, an embodiment herein provides a method for automatically annotating a multimedia content at a base station is provided. The method includes identifying an optimal pairing between a video capturing device and the base station. A video sensor data is received by a processor from a video sensor embedded in the video capturing device that captures a video associated with a user based on the optimal pairing. The video sensor data includes a time series of location data, direction data, orientation data, and a position of the user. A set of information associated with the video capturing device is received. The set of information includes at least one of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and or identifier. The video and the video sensor data is synchronized to obtain a synchronized video content based on (i) a transmitted signal power from the video capturing device, and (ii) a received signal power at the base station. The synchronized video content is annotated with the set of information to obtain an annotated video content.
A radiation pattern associated with the video capturing device and a sensitivity pattern associated with the base station may be recorded. The radiation pattern and the sensitivity pattern may be beam-like, lobe-like or spherical. A comparison between the annotated video content and production data obtained from a production data server may be performed, and automatically generating recommended digital notes based on the comparison. At least one user suggested digital note from a user may be received. The annotated video content may be associated with the at least one user suggested digital notes. The production data may be selected from the group including of a script, scenes, characters, camera operators, a shoot schedule, call times, digital notes, background information and research notes.
An acknowledgement may be transmitted from the base station to the video capturing device when the video, the video sensor data, and the set of information are received at the base station. At least one of the video, the video sensor data, or the set of information may be erased from a memory of the video capturing device based on the acknowledgement. The method may further comprise recording a radiation pattern associated with the base station and a sensitivity pattern associated with the video capturing device, wherein the radiation pattern and the sensitivity pattern are beam-like, lobe-like, or spherical. The method may further comprise localizing the video capturing device in an environment or with respect to the base station based on the radiation pattern of the base station and the sensitivity pattern of the video capturing device. The method may further comprise identifying the radiation pattern of the base station based on a signal receiving power, the location data, and the orientation data obtained from the video capturing device from at least one location.
In another embodiment, a method for automatically annotating multimedia content obtained from at least one video capturing device including a first video capturing device and a second capturing device at a base station is provided. A first optimal pairing between the base station and the first video capturing device is selected. A first set of information associated with the first video capturing device is received based on the first optimal pairing. The first set of information includes at least one of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier. A second optimal pairing between the base station and the second video capturing device is selected. A second set of information associated with the second video capturing device is received based on the second optimal pairing. The second set of information includes at least one of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier. The first video capturing device or the second video capturing device is selected based on the first set of information and the second set of information to obtain a selected video capturing device and a selected set of information;
A video sensor data is obtained from a video sensor embedded in the selected video capturing device that captures a video associated with a user. The video sensor data including a time series of location data, direction data, orientation data, and a position of the user. The video and the video sensor data is synchronized to obtain a synchronized video content using (i) the first optimal pairing when the selected video capturing device is the first video capturing device, or (ii) the second optimal pairing when the selected video capturing device is the second video capturing device. The synchronized video content is annotated with the selected set of information to obtain an annotated video content. The first set of information may comprise at least one of the radiation pattern associated with the first video capturing device, and the sensitivity pattern associated with the base station, and wherein the radiation pattern and the sensitivity pattern are beam-like, lobe-like, or spherical. The second set of information may comprise at least one of the radiation pattern associated with the second video capturing device, and the sensitivity pattern associated with the base station, and wherein the radiation pattern and the sensitivity pattern are beam-like, lobe-like, or spherical.
A comparison between the annotated video content and production data obtained from a production data server may be performed. Recommended digital notes may be generated automatically based on the comparison. An acknowledgement may be transmitted from the base station to the selected video capturing device when the video, the video sensor data, and the selected set of information are received at the base station. At least one of the video, the video sensor data, and the selected set of information may be automatically erased from a memory of the selected video capturing device based on the acknowledgement.
In yet another embodiment a base station for automatically annotating multimedia content obtained from an at least one video capturing device including a first video capturing device and a second capturing device is provided. The base station including a memory that stores instructions, a database, and a processor executed by the instructions. An optimal pair selection module which when executed by the processor selects (i) a first optimal pairing between the base station and the first video capturing device, and (ii) a second optimal pairing between the base station and the second video capturing device. A video capturing device information receiving module when executed by the processor, receives a first set of information from the first video capturing device based on the first optimal pairing and a second set of information from the second video capturing device based on the second optimal pairing. The first set of information is selected from the group including of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier specific to the first video capturing device. The second set of information is selected from the group including of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier specific to the second video capturing device.
A device selection module when executed by the processor selects the first video capturing device or the second video capturing device based on the first set of information and the second set of information to obtain a selected video capturing device and a selected set of information. A sensor data obtaining module when executed by the processor obtains from a video sensor embedded in the selected video capturing device that captures a video associated with a user, a video sensor data including a time series of location data, direction data, orientation data, and a position of the user. A synchronization module when executed by the processor, synchronizes the video and the video sensor data to obtain a synchronized video content using (i) the first optimal pairing when the selected video capturing device is the first video capturing device, or (ii) the second optimal pairing when the selected video capturing device is the second video capturing device. An annotation module when executed by the processor annotates the synchronized video content with the selected set of information to obtain an annotated video content.
The first set of information may comprise at least one of the radiation pattern associated with the first video capturing device, and the sensitivity pattern associated with the base station, and wherein the radiation pattern and the sensitivity pattern are beam-like, lobe-like, or spherical. The second set of information may comprise at least one of the radiation pattern associated with the second video capturing device, and the sensitivity pattern associated with the base station, and wherein the radiation pattern and the sensitivity pattern are beam-like, lobe-like, or spherical.
A comparison module when executed by the processor may perform a comparison of the annotated video content and production data obtained from a production data server. A digital notes generation module when executed by the processor may automatically generate recommended digital notes based on the comparison. At least one user suggested digital notes may be received from a user, and the annotated video content may be associated with the at least one user suggested digital notes instead of the recommended digital notes. An audio capturing device information obtaining module that obtains an audio from an audio capturing device. The sensor data obtaining module obtains an audio sensor data from an audio sensor coupled to the audio capturing device. The audio may be specific to the user. The synchronization module when executed by the processor may further synchronize the video, the video sensor data, the audio, the audio sensor data, and the production data to obtain the synchronized video content.
A self-learning module when executed by the processor may learn a pattern of annotating video content, and generate recommended digital notes based on at least one of the synchronized video content, the at least one user suggested digital notes, the recommended digital notes, and previously annotated video content. A communication module when executed by the processor may transmit an acknowledgement from the base station to the selected video capturing device when the video, the video sensor data, and the selected set of information are received at the base station. At least one of the video, the video sensor data, and the selected set of information is automatically erased from a memory of the selected video capturing device based on the acknowledgement.
The optimal pairing selection module when executed by the processor (i) focuses a radiation pattern of the first video capturing device, (ii) orients the base station to receive a signal from the first video capturing device, (iii) monitors the signal from the first video capturing device to determine an optimal power of the signal, and (iii) selects the first optimal pair between the base station and the first video capturing device corresponding to the optimal power of the signal. The optimal pairing selection module when executed by the processor (i) focuses a radiation pattern of the second video capturing device, (ii) orients the base station to receive a signal from the second video capturing device, (iii) monitors the signal from the second video capturing device to determine an optimal power of the signal, and (iv) selects the second optimal pair between the base station and the second video capturing device corresponding to the optimal power of the signal.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
As mentioned, there remains a need to effectively transfer and store data captured during video shoots for further processing and annotation. The embodiments herein achieve this by providing a system which identifies a first optimal pairing and a second optimal pairing between a base station and at least one video capturing device for the video capturing device to communicate optimally with the base station. A video sensor is embedded in the video capturing device that captures a video associated with a user. A video sensor data and a set of information are obtained from the video capturing device. The video and the video sensor data are synchronized to obtain a synchronized multimedia content using the first optimal pairing and/or the second optimal pairing. The synchronized multimedia content with the set of information is annotated to obtain an annotated multimedia content. Referring now to the drawings, and more particularly to
The user 102 may either be interacting with an audience (not shown in
In one embodiment the radiation patterns are lobe-like and cover an environment. For example, given a reading at a particular location, a receiver measures a radiation signature at that location. In one embodiment, a set of transmitters time-multiplexes to ensure that no pair of transmitters broadcasts simultaneously. Then, the radiation pattern is unambiguously decomposed into its constituent parts for each transmitter. In another embodiment, since a map of the radiation patterns is known, it is possible for the receiver to know the location from within a bounded set of possibilities. In one embodiment, the map of the radiation patterns is acquired by surveying the environment and training the system.
For example, the surveying the environment and training the system may be performed by walking around a building with a laptop and recording observed signal intensities of the building's unmodified base stations. This data may be used to train the localizer to localize a user to precise, correct location across the entire building. This methodology is described in a paper “Practical robust localization over large-scale 802.11 wireless networks” published in “Proceeding MobiCom '04 Proceedings of the 10th annual international conference on Mobile computing and networking”, the complete disclosure of which, in its entirety, is herein incorporated by reference.
In one embodiment, the radiation pattern and the sensitivity pattern are used to produce one or more orientation estimates. For example, a localization information and an orientation information may then be used to annotate the multimedia content. The first video capturing device 104A transmits a first set of information associated with the first video capturing device 104A through the first optimal pairing. The first set of information may include, but is not limited to, a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier, in one example embodiment. Likewise, the second video capturing device 104B transmits a second set of information associated with the second video capturing device 104B through the second optimal pairing.
Attributes of the radiation patterns and the sensitivity patterns are added to the sensor data from the remote camera to provide a richer data set in order to understand the meaning of the data being captured. In one embodiment, the attributes of the radiation pattern may be added by the remote camera. In another embodiment, the receiver at the base station may add a sensitivity pattern, especially if this pattern is not constant in time due to different modes of operation for the base station.
In one embodiment, the process of forming the radiation pattern and the sensitivity pattern helps to obtain location related information, which becomes even more useful in a noisy wireless environment. In one embodiment, if there are multiple devices, a known beacon, or any other constraints on geometry, then a triangulation is performed to know a precise location, including while indoors, without global positioning system (GPS) information. Based on the attenuation in signal power as a function of distance, a signal data may enable pinpointing a location or relative position. In one embodiment, the video signal is annotated with data that can be translated into more meaningful notes based on one or more geometries of a plurality of devices. In one embodiment, the one or more geometries of the plurality of devices is identified based on directional wireless transmission.
The second set of information may include, but is not limited to, a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier, in one example embodiment. The first optimal pairing and the second optimal pairing may be refined to obtain refined optimal pairings based on learnings from past patterns by the base station 106. The past patterns, the location data and the orientation data may be obtained from the first video capturing device 104A and the second video capturing device 104B. The refined optimal pairings enable faster data transmission between the first video capturing device 104A, the second video capturing device 104B, and the base station 106. In one embodiment, the pattern matching of camera and the sensor data is associated with other information through a learning process in which the user trains the system by reviewing and correcting suggestions made by the system.
Based on the first set of information and the second set of information, the base station 106 prioritizes the first video capturing device 104A or the second video capturing device 104B. For example, when the current memory level of the first video capturing device 104A is lower than the current memory level of the second video capturing device 104B, the base station 106 prioritizes the first video capturing device 104A instead of the second video capturing device 104B. The first video capturing device 104A captures a first video associated with the user 102 and transmits the first video to the base station 106. Similarly, the second video capturing device 104B captures a second video associated with the user 102 and transmits the second video to the base station 106. The second video may be transmitted at a time interval after the first video is transmitted completely. The first video capturing device 104A and the second video capturing device 104B may be configured as any of a video camera, a digital camera, a camcorder, a mobile communication device, in one example embodiment. It is to be understood that the system may be implemented with only one video capturing device. As way of clarity and for better understanding of the embodiments described herein, two video capturing devices are illustrated. The system 100 may further include additional video capturing devices to capture video from multiple angles in other embodiments. The system may further include a boom microphone that includes an audio sensor that records audio data associated with the user 102. In a preferred embodiment, the radiation pattern of a video capturing device and the sensitivity pattern of the base station 106 are identified based on a location data and orientation data obtained from the video capturing device. The system 100 localizes a video capturing device in an environment or with respect to the base station 106 based on the radiation pattern of the base station 106 and the sensitivity pattern of the video capturing device. Further, the system 100 identifies the sensitivity pattern of the base station 106 based on a signal receiving power, a location data, and an orientation data obtained from a video capturing device from at least one location. In one embodiment, a radiation pattern associated with a video capturing device and a sensitivity pattern associated with the base station 106 are being recorded.
The audio sensor 108 that is coupled to the audio capturing device 110 captures a user data that may include a time series of the location data, direction data, and orientation data associated with the user 102. The audio capturing device 110 captures an audio. The audio may be specific to (i) the user 102, (ii) another user, (iii) an audience, or (iv) combinations thereof, in example embodiments. The audio capturing device 110 may be configured as any of a microphone and an audio recorder such as tape recorder, etc., in another example embodiment.
The first video sensor 112A embedded in the first video capturing device 104A captures a first video sensor data that may include a time series of the location data, direction data, orientation data, vibration data, sound data, motion data, camera settings data, lens information and a position of the user 102. Similarly, The second video sensor 112B embedded in the second video capturing device 104B may capture a second video sensor data that includes a time series of the location data, direction data, orientation data, vibration data, sound data, motion data, camera settings data, lens information, and a position of the user 102.
The boom microphone is a multi-channel sound recorder used by one or more sound engineers or one or more camera operators to record an audio (for better clarity) associated with the user 102 using the audio sensor. Each of the sensors (e.g., the audio sensor 108, the first video sensor 112A, and the second video sensor 112B) are assigned a unique identifier to (i) identify data aggregated from the audio sensor 108, the first video sensor 112A, and the second video sensor 112B at the base station 106 for annotating multimedia content, in one example embodiment.
The base station 106 comprises one or more of a personal computer, a laptop, a tablet device, a smart phone, a mobile communication device, a personal digital assistant, or any other such computing device, in one example embodiment. The base station 106 (i) receives the first video and the first set of information from the first video capturing device 104A, and the first video sensor data from the first video sensor 112A, (ii) synchronizes the first video and the first video sensor data to obtain a first synchronized data using the first optimal pairing, and the second optimal pairing (iii) annotates the first synchronized data with the first set of information to obtain a first annotated multimedia content. Likewise, the base station 106 receives the second video and the second set of information from the second video capturing device 104B, and the second video sensor data from the second video sensor 112B, (ii) synchronizes the second video and the second video sensor data to obtain a second synchronized data using the first optimal pairing, and the second optimal pairing, (iii) annotates the second synchronized data with the second set of information to obtain a second annotated multimedia content. It is to be understood that the first annotated multimedia content and the second annotated multimedia content may be further annotated with each other.
The base station 106 performs a comparison of the first annotated multimedia content and production data obtained from the database 116 of the production data server 114, and automatically generates recommended digital notes based on the comparison. The recommended digital notes are annotated with the first multimedia content, in one example embodiment. The recommended digital notes are communicated to the computing device 118 associated with the production team for any corrections (or modifications), suggestions, etc., in another example embodiment. The production team comprises any of a producer, a director, a camera operator, and an editor, etc. The production team may either confirm the recommended digital notes, or provide at least one user suggested digital notes through the computing device 118 (or by directly accessing the base station 106). The user suggested digital notes as received from the production team may be associated to (or annotated with) the first annotated multimedia content. Likewise, the base station 106 performs a comparison for the second multimedia content, and similar recommended digital notes are communicated to the production team for corrections (or modification), and/or confirmation, etc. The recommended digital notes may be templates for filling in the blank or multiple choice questions, saving the production team's time in adding detailed notes based on prompts (by the base station 106) on the computing device 118 of the production team. The computing device 118 comprises at least one of a personal computer, a laptop, a tablet device, a smart phone, a mobile communication device, a personal digital assistant, or any other such computing device, in one example embodiment.
The production data may include, but not be limited to, script information (scenes), characters and subjects, locations, camera operators, schedule, call times and information, in one example embodiment. Other similar data may be annotated with the first synchronized data and the second synchronized data. Other data may include but not be limited to, training data and data from other scenes and shoots, etc., in another example embodiment.
The base station 106 learns the pattern of annotating multimedia content, and generates one or more digital notes for the annotated multimedia content based on one or more inputs provided by the production data server 114 and the production team. The base station 106 learns the pattern of annotating multimedia content based on (i) one or more recommended digital notes, (ii) one or more user suggested digital notes, and/or (iii) previously annotated multimedia content. The one or more inputs may be based on the information obtained from the database 116 and a third party data source. The one or more inputs may include a generation of digital notes with specific data patterns, and suggestions to annotate one or more recommended sections. The database 116 stores the production data, and annotation data and associated production data from past shoots and the patterns of annotation data from the past shoots.
When the one or more recommended digital notes are obtained and displayed to the production team and do not correlate with a user's intent or user context of the production team, the user may suggest his/her own user suggested digital notes that can be associated with the annotated multimedia content. In other words, one or more user suggested digital notes are processed from the user and are associated with the annotated multimedia content over the one or more recommended digital notes (that are recommended by the base station 106). The one or more user suggested digital notes are recommended by the user, when the one or more recommended digital notes do not match or correlate with user context (or user intent).
Similarly, the optimal pairing selection module 203 selects a second optimal pairing between the base station 106 and the second video capturing device 104B. The optimal pairing selection module 203 focuses a radiation pattern of the second video capturing device 104B, and orients the base station 106 to receive a signal from the second video capturing device 104B. The optimal pairing selection module 203 further monitors the signal from the second video capturing device 104B to determine an optimal power of the signal, and selects a second optimal pairing configuration between the base station 106 and the second video capturing device 104B corresponding to the optimal power of the signal. In one embodiment, the first optimal pairing may be selected based on a radiation pattern of a transmitter of the first video capturing device 104A (e.g., a first camera), which can be beam-like, lobe-like, or spherical. In one embodiment, the second optimal pairing may be further selected based on a radiation pattern of a transmitter of the second video capturing device 104B (e.g., a second camera), which can be beam-like, lobe-like, or spherical.
The video capturing device information receiving module 204 receives a first set of information from the first video capturing device 104A based on the first optimal pairing. Similarly, the video capturing device information receiving module 204 obtains a second set of information from the second video capturing device 104B based on the second optimal pairing. In one embodiment, the first set of information is selected from the group including of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier specific to the first video capturing device 104A. In one embodiment, the second set of information is selected from the group including of a current memory level, a current power level, a range data, a location data, an orientation data, lighting, and an identifier specific to the second video capturing device 104B. The device selection module 210 selects either the first video capturing device 104A or the second video capturing device 104B to obtain a selected video capturing device and a selected set of information based on the first set of information and the second set of information. Based on the first set of information and the second set of information, the base station 106 prioritizes the first video capturing device 104A or the second video capturing device 104B. For example, when the current memory level of the first video capturing device 104A is lower than the current memory level of the second video capturing device 104B, the base station 106 prioritizes the first video capturing device 104A instead of the second video capturing device 104B. In other words, the base station 106 requests the first video capturing device 104A to transmit data instead of the second video capturing device.
The video capturing device information receiving module 204 obtains the first video associated with the user 102 from the first video capturing device 104A and the first video sensor data from the first video sensor 112A. The first video sensor data includes a time series of the location data, direction data, orientation data, and a position of the user 102. Similarly, the audio capturing device information obtaining module 206 obtains an audio associated with the user 102 from the audio capturing device 110, and an audio sensor data from the audio sensor 108. The audio sensor data includes a time series of the location data, direction data, and orientation data associated with the user 102. The audio may be specific to (i) the user 102, (ii) the another user, (iii) an audience, or (iv) combinations thereof, in example embodiments. The audio capturing device information obtaining module 206 may further obtain an audio, and/or an audio sensor data from a boom microphone if used.
The synchronization module 212 synchronizes the first video and the first video sensor data to obtain a first synchronized data using the first optimal pairing and the second optimal pairing. The synchronization module 212 may further synchronize the first video, the first video sensor data, the audio data, and the audio sensor data to obtain the first synchronized data. The annotation module 214 annotates the first synchronized data with the first set of information to obtain a first annotated multimedia content. Likewise, the video capturing device information receiving module 204 receives the second video and the second set of information from the second video capturing device 104B, and the second video sensor data from the second video sensor 112B. The synchronization module 212 synchronizes the second video and the second video sensor data to obtain a second synchronized data using the first optimal pairing and the second optimal pairing. The synchronization module 212 may further synchronize the second synchronized data with the first synchronized data.
The annotation module 214 annotates the second synchronized data with the second set of information to obtain a second annotated multimedia content. It is to be understood that the first annotated multimedia content and the second annotated multimedia content may be further annotated with each other. In one embodiment, the sensor data obtaining module 208 obtains from a video sensor embedded in the selected video capturing device (e.g., the first video capturing device 104A or the second video capturing device 104B) that captures a video associated with a user, a video sensor data including a time series of location data, direction data, orientation data, and a position of the user 102 based on an optimal pairing. The synchronization module 212 synchronizes the video and the video sensor data to obtain a synchronized video content using (i) the first optimal pairing when the selected video capturing device is the first video capturing device 104A, or (ii) the second optimal pairing when the selected video capturing device is the second video capturing device 104B. In another embodiment, the synchronization module 212 synchronizes the video and the video sensor data to obtain a synchronized video content using a transmitted signal power and a received signal power.
In one embodiment, a transmitted signal includes any information which is transmitted from one or more video capturing devices, one or more audio capturing devices, one or more video sensors, and/or one or more audio sensors to the base station 106. Examples of the transmitted signal include a video associated with a user, a video sensor data, a first set of information, a second set of information, an audio from one or more audio capturing devices, audio sensor data from one or more audio sensors, and/or a radiation pattern. A received signal includes the transmitted signal that is received at the base station 106. In one embodiment, the synchronization module 212 synchronizes a video and a video sensor data of the transmitted signal at the base station 106 using a transmitted signal power (i.e., power at which optimum level of signal transmission occurs from a video capturing device), and a received signal power (i.e., power or strength at the receiving base station 106 of the signal transmitted by the video capturing device), to obtain a synchronized video. The annotation module 214 annotates the synchronized video content with the selected set of information to obtain an annotated video content.
A first order of annotations may be a series of angles and power levels in combination with other data from the camera, including ID, lens information, settings, and lighting as well as accelerometer and orientation data. In one embodiment, the radiation pattern of the video capturing device 104A-B and the sensitivity pattern of the base station 106 of this data may be matched to higher order information by associating with context-specific information such as scene, location, character, page or line of script, identity of shooter, character, etc.
The comparison module 216 performs a comparison of the first annotated multimedia content and production data obtained from the database 116 of the production data server 114. The digital notes generation module 218 automatically generates a first set of recommended digital notes based on the comparison. The first set of recommended digital notes is annotated with the first multimedia content, in one example embodiment. The first set of recommended digital notes is communicated to the computing device 118 associated with the production team for any corrections (or modifications), suggestions, etc. prior to annotating the first set of recommended digital notes with the first annotated multimedia content, in another example embodiment. Likewise, the comparison module 216 performs a comparison of the second annotated multimedia content and production data obtained from the database 116 of the production data server 114, and the digital notes generation module 218 automatically generates a second set of recommended digital notes based on the comparison. Similarly, the second set of recommended digital notes is annotated with the second multimedia content, in one example embodiment. The second set of recommended digital notes is communicated to the computing device 118 associated with the production team for any corrections (or modifications), suggestions, etc. prior to annotating the second set of recommended digital notes with the second annotated multimedia content, in another example embodiment. The production team comprises any of a producer, a director, a camera operator, and an editor, etc. The production team may either confirm the first set and the second set of recommended digital notes, or provide at least one user suggested digital notes. The user suggested digital notes as received from the production team may be associated (or annotated with) the first annotated multimedia content and/or the second annotated multimedia content.
The production data may include, but not limited to, script information (scenes), characters and subjects, locations, camera operators, schedule, call times and information, in one example embodiment. Other similar data may be annotated with the first synchronized data and the second synchronized data. Other data may include but not limited to, training data and data from other scenes and shoots, etc., in another example embodiment.
The self-learning module 220 learns the pattern of annotating multimedia content, generating one or more recommended digital notes for the annotated multimedia content based on one or more inputs provided by the production data server 114 and the production team. The self-learning module 220 learns the pattern of annotating multimedia content based on (i) one or more recommended digital notes, (ii) one or more user suggested digital notes, (iii) previously annotated multimedia content. The one or more inputs may be based on the information obtained from the database 116 and a third party data source. The one or more inputs include a generation of digital notes with specific data patterns, and suggestions to annotate one or more recommended sections.
When the one or more recommended digital notes are obtained and displayed to the production team and do not correlate with a user's intent or user context of the production team, the user may suggest his/her own user suggested digital notes that can be associated with the annotated multimedia content. In other words, one or more user suggested digital notes are processed from the user and are associated with the annotated multimedia content over the one or more recommended digital notes (that are recommended by the base station 106). The one or more user suggested digital notes are recommended by the user, when the one or more recommended digital notes do not match or correlate with user context (or user intent).
The communication module 222 communicates an acknowledgement to the first video capturing device 104A, the second video capturing device 104B, and the audio capturing device 110. Upon receipt of the acknowledgement, the data (e.g., the first video, the second video, the first video sensor data, the second video sensor data) stored (and/or recorded) in the respective devices (e.g., the first video capturing device 104A, the second video capturing device 104B, and the audio capturing device 110) are automatically erased (or cleared, or deleted).
An update associated with the first set of information, the second set of information, and the third set of information may be obtained from the first video capturing device 104A, the second video capturing device 104B, and the third video capturing device in real time (or near real time), in one example embodiment. An update associated with the first set of information, the second set of information, and the third set of information may be obtained from the first video capturing device 104A, the second video capturing device 104B, and the third video capturing device is offline (e.g., when there is no shoot scheduled), in another example embodiment.
Based on the first set of information, the second set of information, and the third set of information, the device selection module 210 selects the first video capturing device 104A to transmit the data (e.g., the first video, the first video sensor data) instead of the second video capturing device 104B and the third video capturing device, in one example embodiment. Similarly, the device selection module 210 may select the second video capturing device 104B to transmit the data (e.g., the second video, the second video sensor data) instead of the first video capturing device 104A and the third video capturing device, in one example embodiment. In one embodiment, the device selection module 210 may prioritize either the first video capturing device 104A or the second video capturing device 104B instead of the third video capturing device because pairing with the base station 106 is better for the first video capturing device 104A and the second video capturing device 104B, as compared to the pairing associated with the third video capturing device. Thus, the base station 106 performs prioritization of bandwidth based on available memory, power, and director prioritization of remote video capturing device.
Alternatively, the base station 106 may also prompt the production team to select at least one of a video capturing device from the first video capturing device 104A, the second video capturing device 104B, and the third video capturing device for data transmission. The base station 106 may prompt the production team to select through the computing device 118 associated with the production team.
The CMOS sensor 510 (also referred as an image sensor) which includes an integrated circuit containing an array of pixel sensors, each pixel containing a photo detector and an active amplifier to capturing high quality images (or series of frames), and/or videos. The image processor 512 processes the high quality images (or series of frames), and/or videos captured by the image sensor. The motion and absolute orientation fusion unit 514 may include a 3-axis gyroscope, a 3-axis accelerometer, and a 3-axis geomagnetic sensor. Each of these sensors exhibits inherent strengths and weaknesses with respect to motion-tracking and absolute orientation associated with a video recording of the user 102. Each of these sensors are further configured to calculate precise measurement of movement, direction, angular rate, and acceleration in three perpendicular axes.
The central processing unit 516 may be embodied as a micro-controller that is configured to execute instructions stored in a memory (e.g., the read only memory 522, the random access memory 524, and the flash memory 526) including, but not limited to, an operating system, sensor I/O procedures, sensor fusion procedures for combining raw orientation data from multiple degrees of freedom in the orientation sensors to calculate absolute orientation, transceiver procedures for communicating with a receiver unit and determining for communications accuracy, power procedures for going into power saving modes, data aggregation procedures for collecting and transmitting data in batches according to a duty cycle, and other applications. The GPS 518 is configured to establish absolute location of the sensor, and may be made more precise through triangulation of Wi-Fi or beacon signals at known locations.
The clock 520 tracks absolute time so that all data streams (e.g., data feeds that are being recorded such as time series of data) are synchronized and may be reset by a beacon signal or from the GPS 518 or other wireless signal. The user input/output unit 528 enables the production team to provide additional measured features of the user 102 (e.g., heart rate, heart rate variability, blood pressure, respiration, perspiration, etc.) or the environment (e.g., temperature, barometric pressure, moisture or humidity, light, wind, presence of chemicals, etc.). The power source unit 530 may include, but not limited to, a battery, solar cells, and/or an external power supply to power the video capturing device. The transceiver 532 may include an antenna and is configured to transmit collected data and sensor node identification to the base station 106 and may receive a beacon signal to synchronize timing with other sensor nodes, or to indicate standby or active modes of operation. The display unit 534 is configured to display data that includes, but not limited to, information associated with the production team, a video being recorded, and settings data associated with the video capturing device, etc.
The motion and absolute orientation fusion unit 606 may include a 3-axis gyroscope, a 3-axis accelerometer, and a 3-axis geomagnetic sensor. Each of these sensors exhibits inherent strengths and weaknesses with respect to motion-tracking and absolute orientation associated with a video recording of the user 102. Each of these sensors are further configured to calculate precise measurement of movement, direction, angular rate, and acceleration in three perpendicular axes.
The global positioning system (GPS) 608 is configured to establish absolute location of the sensor, and may be made more precise through triangulation of Wi-Fi or beacon signals at known locations. The clock 610 tracks absolute time so that all data streams (e.g., data feeds that are being recorded such as time series of data) are synchronized and may be reset by a beacon signal or from the GPS 608 or other wireless signal. The central processing unit 612 may be embodied as a micro-controller that is configured to execute instructions stored in a memory (e.g., the read only memory 618, the random access memory 620) including, but not limited to, an operating system, sensor I/O procedures, sensor fusion procedures for combining raw orientation data from multiple degrees of freedom in the orientation sensors to calculate absolute orientation, transceiver procedures for communicating with a receiver unit and determining for communications accuracy, power procedures for going into power saving modes, data aggregation procedures for collecting and transmitting data in batches according to a duty cycle, and other applications.
The raid (Redundant Array of Inexpensive Disks) drive 614 is a technology that allows computer users to achieve high levels of storage reliability from low-cost and less reliable PC-class disk-drives. The raid drive 614 combines multiple disk drive components into a logical unit for the purposes of data redundancy and performance improvement. The power supply unit 616 may include, but not limited to, a battery, solar cells, and/or an external power supply to power the base station 106.
The transceiver 622 may include an antenna and is configured to transmit collected data and sensor node identification to one or more devices such as (i) the first video capturing device 104A, (ii) the second video capturing device 104B, (iii) the audio sensor 108, (iv) the audio capturing device 110, (v) the first video sensor 112A, and/or (vi) the second video sensor 112B and may receive a beacon signal from the one or more devices to synchronize timing with other sensor nodes, or to indicate standby or active modes of operation. The display unit 624 is configured to display data that includes, but not limited to, information associated with the production team, a video being recorded, and settings data associated with the video capturing device, etc.
The system further includes a user interface adapter 19 that connects a keyboard 15, mouse 17, speaker 24, microphone 22, and/or other user interface devices such as a touch screen device (not shown) to the bus 12 to gather user input. Additionally, a communication adapter 20 connects the bus 12 to a data processing network 25, and a display adapter 21 connects the bus 12 to a display device 23 which may be embodied as an output device such as a monitor, printer, or transmitter, for example.
The embodiments herein can include hardware and software embodiments. The embodiments that comprise software include but are not limited to, firmware, resident software, microcode, etc.
Furthermore, the embodiments herein can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read/write (CD-R/W) and DVD.
A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output (I/O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Digital content may also be stored in the memory 802 for future processing or consumption. The memory 802 may also store program specific information and/or service information (PSI/SI), including information about digital content (e.g., the detected information bits) available in the future or stored from the past. A user (e.g., the production team) of the computing device 118 may view this stored information on display 806 and select an item of for viewing, listening, or other uses via input, which may take the form of keypad, scroll, or other input device(s) or combinations thereof. When digital content is selected, the processor 810 may pass information. The content and PSI/SI may be passed among functions within the computing device 118 using the bus 804.
In step 1010, the first video capturing device 104A or the second video capturing device 104B is selected based on the first set of information and the second set of information to obtain a selected video capturing device and a selected set of information. In step 1012, a video sensor data is obtained by the processor from a video sensor embedded in the selected video capturing device that captures a video associated with a user. The video sensor data includes a time series of location data, direction data, orientation data, and a position of the user. In step 1014, the video and the video sensor data is synchronized to obtain a synchronized video content using (i) the first optimal pairing when the selected video capturing device is the first video capturing device 104A, or (ii) the second optimal pairing when the selected video capturing device is the second video capturing device 104B. In step 1016, the synchronized video content is annotated with the selected set of information to obtain an annotated video content.
The first video capturing device 104A and the second video capturing device 104B capture raw video data and buffer it in a RAM while transmitting uncompressed video short ranges to the base station 106 using 60 Gigahertz low power wireless transmission, which is very fast and efficient provided there are no obstacles that absorb the signal and the range is short. 60 GHz also bounces off of walls and objects, resulting in multiple potential pattern and echoes in transmission. Power requirements are kept low for transmission with pattern forming techniques known in the art, where multi-antenna transceivers choose the best pattern for the clearest signal to the transmitting device, lowering the transmission power requirements.
Transmission power requirements can be further reduced with location and orientation information from the camera and the base station 106 combined with machine learning approaches to pattern finding. The orientation and location information can be features that the model fits with regression techniques to use the orientation data to predict an optimal pairing.
Data transmission from the video capturing devices includes real-time annotations with video capturing device settings data (e.g., the first set of information, the second set of information and the third set of information), range data, lighting data, location and absolute orientation data. The video capturing devices can also store data locally and then transmit data when sufficient bandwidth is available. Local buffers and memory can be cleared automatically when acknowledgement is received by the video capturing devices from the base station 106 that the data was transmitted without any error. The recorded signal, power and the radiation pattern and the sensitivity pattern are used in combination with sensor data to generate situational (or contextual) information for the annotation. The radiation patterns and sensitivity patterns could be trivial, complicated, sub-optimal or optimal, beam-like or lobe-like.
For multiple video capturing device shoots, the above embodiments enable data to be transmitted to the base station 106 depending in parallel if there is sufficient bandwidth, or taking turns between the video capturing devices. For the taking turn's method, when a channel is in use, a video capturing device saves data to buffer or flash memory. Channel priority is allocated based on which video capturing device has greatest need for available memory and power. Thus, the base station 106 allows the production team to know at all times what the battery and local storage levels are on the field, so that unexpected interruptions are minimized. The above methodology also enables sound recorders or microphones to be included in the transmission from the video capturing devices or from stand-alone microphone units. There are vectors of data streaming in, and there is a need to do pattern matching and classification. This is a system that can be trained as the initial suggestions to the user are trained over time much. For example, the system can be trained in a manner similar to how spam filters are trained on email systems and services. The system also used to map that high level contextual annotation to lower level technologies that can physically add to cameras and broadband wireless networks where cameras can be used. Based on a position associated with a camera enhances the annotation. The annotation system can record signal strength, the antenna settings (e.g., which implies the radiation pattern and the sensitivity patters for both cameras and base stations), and incorporate these into the annotation. Similarly, when the annotation gets processed, an estimate of the camera positions can be computed an added into the notes. Then the base stations can adjust their antenna or recommend other base stations given the requirements of the shot, and send a notification for failure.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.