The field of the present technology relates to computing systems. More particularly, embodiments of the present technology relate to video streams.
Participating in the world of sharing on-line videos can be a rich and rewarding experience. For example, one may easily share on-line videos with friends, family, and even strangers. Generally, the modern day computer allows a user to organize and store a large number of on-line videos. However, in order to store and share hundreds of on-line videos, the user expends much time and effort making hundreds of organizational decisions.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the technology for organizing video data, together with the description, serve to explain principles discussed below:
The drawings referred to in this description should not be understood as being drawn to scale unless specifically noted.
Reference will now be made in detail to embodiments of the present technology, examples of which are illustrated in the accompanying drawings. While the technology will be described in conjunction with various embodiment(s), it will be understood that they are not intended to limit the present technology to these embodiments. On the contrary, the present technology is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the various embodiments as defined by the appended claims.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present technology. However, embodiments of the present technology may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present embodiment.
Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present detailed description, discussions utilizing terms such as “receiving”, “comparing”, “associating”, “identifying”, “removing”, “utilizing”, or the like, refer to the actions and processes of a computer system, or similar electronic computing device. The computer system or similar electronic computing device manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission, or display devices. Embodiments of the present technology are also well suited to the use of other computer systems such as, for example, optical and mechanical computers.
Embodiments in accordance with the present technology pertain to a system for organizing video data and its usage. In one embodiment in accordance with the present technology, the system described herein enables the utilization of a user's deliberately created metadata within a video to organize that video within a database.
More particularly, in one embodiment metadata comprising visual and/or audio cues are included by a user in the video and then utilized to find a corresponding video profile with matching visual and/or audio cues. This video profile may be stored within a database of a plurality of video profiles. Each video profile is a combination of features extracted from the video that are suitable for making subsequent comparisons with the video, as will be described. These features may include the entire video or portions thereof, as well as a point of reference to the original video. The video is then associated with any corresponding video profile that is found. Thus, the video is organized based on metadata that was included in the video by the user.
For example, a user may first cover and uncover a video camera's lens while the video camera is recording to create a “dark time” within video “A”. This “dark time” signifies that important visual and/or audio cues will occur shortly. Then, the user may place a visual cue within video “A” by recording a short video of an object, such as a diamond, as part of video “A”. The user then may place an audio cue within video “A” by recording the spoken words, “research project on diamonds”, within video “A”. The visual cue and the audio cue then may be stored as part of a video profile associated with video “A” in a database coupled with the system described herein.
Then, when the user creates a new video to share, video “B”, the user may make a video recording of the diamond at the beginning of video “B”. Embodiments of the present technology then receive video “B” that includes the recording of the diamond. Video “B” and its visual and audio cues within are then compared to a database of a plurality of video profiles in order to find a video profile with matching visual and audio cues.
Once a video profile “C” that matches the visual and audio cues of video “B” is found, video “B” is associated with the group of one or more other videos also associated with video profile “C”. For example, the appropriate association for video “B” is with the group of one or more videos having the visual and/or audio cues, a diamond and the spoken words, “research project on diamonds”. Additionally, in one embodiment, the recording of the diamond and the spoken words, “research project on diamonds”, may be removed from video “B” before video “B” is shared with others.
Thus, embodiments of the present technology enable the organizing of a video based on the comparison of the metadata within this video with a plurality of stored video profiles. This method of organizing enables the associating of a video with videos containing matching metadata, without manual interaction by a user.
Referring still to
Video data 110 comprises metadata 120 used to organize video data 110. Metadata 120 is included as part of the audio/video stream. Metadata 120 may comprise a visual cue 145 and/or an audio cue 160. Video data 110 may have an intra-video tag of one or more visual cues 145 and/or audio cues 160.
Visual cue 145 refers to anything that may be viewed that triggers action and/or inaction by system 100. Audio cue 160 refers to any sound that triggers action and/or inaction by system 100. An “intra-video tag” refers to the inclusion, via recording, of metadata 120, such as visual cue 145 and audio cue 160, as part of video data 110. In other words, video data 110 comprises a video or portions thereof that includes metadata 120 as part of its audio/video stream. This metadata assists system 100 in organizing video data 110 into related groups.
In one embodiment, visual cue 145 comprises an object 150 and/or a break in video 155. For example, video data 110 may have an intra-video tag of object 150, such as but not limited to, a piece of jewelry, a purple pen, a shoe, headphones, etc.
Break in video 155 refers to a section of video data 110 that is different from its preceding section or its following section. For example, break in video 155 may be a result of a user covering a camera's lens while in the recording process, thus creating a “dark time”. In another example, break in video 155 may also be a period of “lightness” in which video data 110 is all white. In yet another example, break in video 155 may be a particular sound, such as an audible clap or an audible keyword, which is predetermined to represent the beginning or the ending of a section of video data 110.
In one embodiment, audio cue 160 comprises sound 180. Sound 180 for example, may be but is not limited to, a horn honking, a buzzer buzzing, or a piano key sounding.
Coupled with system 100 is plurality of video profiles 130. In one embodiment, plurality of video profiles 130 is coupled with data store 125. Plurality of video profiles 130 comprises one or more video profiles, for example, video profiles 132a, 132b, and 132n . . . .
More generally, in embodiments in accordance with the present technology, system 100 utilizes metadata 120, such as one or more visual cues 145 and/or audio cues 160 to automatically organize video data 110 by associating video data 110 with a corresponding one of a plurality of video profiles 130. Such a method of organizing video data 110 is particularly useful to match video data 110 with similar video data, without a user having to manually organize the video data 110, thus saving time and resources.
For example, video data 110 may have an intra-video tag of metadata 120. For example, in one embodiment, video data 110 may have an intra-video tag of visual cue 145 such as an object 150, a diamond. In another embodiment, video data 110 may have an intra-video tag of audio cue 160, such as a spoken description of a particular author, “Tom Twain”. In another example, video data 110 may have an intra-video tag of more than one object 150, such as a purple pen and a notebook, disposed next to each other.
In one embodiment, a user may cover the lens of a camera and begin video recording, thus generating “dark time” in video data 110, represented by video data “D”. The content of video data “D” resembles a re-enactment of Beethoven's 3rd symphony. This “dark time” is considered to be a break in video “D”. During this “dark time”, the user may include an audio cue 160 within video data “D” by playing sound 180 of a piano note, that of “middle C”. The user may then uncover the lens of the camera while finishing the recording. Metadata 120, including this break in video 155, its associated “dark time”, and the sound of “middle C”, is stored along with plurality of video profiles 130 within data store 125.
Referring still to
Video comparator 135 compares metadata 120 with a plurality of video profiles 130. Plurality of video profiles 130 are stored in data store 125, wherein data store 125 is coupled with system 100, either internally or externally. For example, video comparator 135 compares break in video 155 and its associated “dark time”, and the sound of “middle C”, with plurality of video profiles 130 in order to find a video profile with a matching break in video 155 and its associated “dark time”, and the sound of “middle C”.
Then, video associator 140 associates video data “D” with a corresponding one of the plurality of video profiles 130 based on the comparing. For example, if after comparing, system 100 finds a video profile 132b that matches video data “D”, then video data “D” is associated with video profile 132h. By being associated, video data “D” is placed alongside other videos having similar video profiles. In other words, in one embodiment video data “D” is listed along with a group of one or more other videos that match the video profile of video data “D”.
For example, based on its video profile, video data “D” may be listed with a group of videos, wherein the content of the group of videos includes the following: a child's piano rendition of “Twinkle, Twinkle, Little Star”, a trumpeted salute to a school flag performed by a school band, a German lullaby sung by an aspiring actress, and a lip-synced version of the user's favorite commercial ditty. Of note, each of the group of videos contains the metadata of a break in video and its associated “dark time” and the sound of “middle C”.
In one embodiment, a match is found if the match surpasses a threshold level of similarities and/or differences. A threshold level of similarities and/or differences may be based on any number of variables, such as but not limited to: color, lighting, decibel level, range of tones, movement detection, and association via sound with a particular topic (e.g., colors, numbers, age), For example, even if the spoken words, “purple pen”, are different from the spoken words, “blue pen”, of a video profile, system 100 tray still find “purple pen” to match video profile containing the audio cue of “blue pen”. For instance, a threshold level may be predetermined to be such that any sound matching a description of a color is to be included within a listing of a group of videos associated with the video profile containing the audio cue of “blue pen”.
In another embodiment, system 100 associates video data 110 with the corresponding one of plurality of video profiles 130 that most closely matches metadata 120 within video data 110. For example, metadata 120 within video data 110 (represented by video data “E”) may be that of a parrot as object 150. In this example, there exist three video profiles within plurality of video profiles 130, that of 132a, 132b, and 132c. Video profile 132a of plurality of video profiles 130 includes a frog as object 150. Video profile 132b of plurality of video profiles 130 includes a snake as object 150. Video profile 132c of plurality of video profiles 130 includes a chicken as object 150. System 100 associates video data “E” with video profile 132e since a chicken of video profile 132c is closest to the metadata of video data “E”, a parrot. Both a chicken and a parrot have feathers and a more similar body type than that of a parrot versus a frog or a parrot versus a snake.
As described herein, visual cue 145 may be an object, such as a rhinestone. Furthermore, after the rhinestone is used as visual cue 145 once, new videos may be created using the rhinestone as an intra-video tag. For example, the user may create a new video with a recorded visual image of the rhinestone, which gets organized with other videos containing the same intra-video tag of a rhinestone.
In one embodiment, a group of videos on the same topic, making a cake, are considered to be related and are all have the intra-video tag of an image of a famous chef covered in flour making his favorite buttery concoction. In another example, a user may provide an intra-video tag of the audio cue, “nine years old”, for each of a group of videos that contain the seemingly unrelated topics of Fred Jones playing a soccer game, Susie Smith entering fourth grade, and Jeff Johnson feeding his new puppy.
In one embodiment, a new video being created, video data “F”, has an intra-tag of more than one metadata 120. For example, video data “F” may have the intra-video tag of a skateboard (visual cue 145) and the spoken words, “nine years old” (audio cue 160).
In another embodiment, sound associator 175 associates sound 180 with object 150. In one example, a user records on a first video a purple pen as object 150 as well as the spoken words, “tax preparation”, as sound 180. Sound associator 175 associates sound 180, “tax preparation”, with object 150, the purple pen. In other words, a video profile is created that links the purple pen with the spoken words, “tax preparation”.
Furthermore, each of a group of video conversations related to tax preparation may have an intra-video tag of a “purple pen”. A user wishing to include a new video, video “G” whose content relates to “conversations of 2008 tax preparation”, within the current group of videos having the intra-video tag of a “purple pen” may simply record within video “G” a visual image of a “purple pen”.
In another embodiment, a user creates a new video having the spoken words, “research project on jewelry”, as its audio cue 160. For example, the user may create a “dark time” in the new video and speak the words, “research project on jewelry”. The video profile of this new video then includes the “dark time” and the spoken words, “research project on jewelry”. In one embodiment, more metadata 120 may be added to this video profile. For example, a visual cue of a diamond may be recorded in the video and linked with the audio cue of the spoken words, “research project on jewelry”.
In one embodiment, object identifier 165 identifies a portion of video data 110 that comprises metadata 120, such as visual cue 145 and/or audio cue 160. Object remover 170 then is able to remove this metadata 120 from video data 110. For example, object identifier 165 identifies the portion of video data 110 that comprises the spoken word, “diamond”. Object remover 170 may then remove the spoken word, “diamond” from video data 110. Of note, embodiments of the present technology are well suited to enabling removal of metadata 120 at any time, according to preprogrammed instructions or instructions from a user. For example, metadata 120 may be removed before or after video data 110 is shared with others.
In yet another embodiment, system 100 matches more than one object 150, such as a pencil and a notebook with a video profile containing both of these objects.
Referring to 210 of
Thus, embodiments of the present technology provide a method for organizing video data without any manual interaction by a user. Additionally, embodiments provide a method for automatic organizing of video data based on video and/or audio cues. Furthermore, embodiments of the present technology enable a user to automatically associate video data with videos containing matching video data, thus requiring no manual interactions when the user uploads the video data for sharing. Additionally, portions of the video data enabling this organizing may be identified and removed before the video data is uploaded.
With reference now to
System 300 of
System 300 also includes computer usable non-volatile memory 310 e.g. read only memory (ROM), coupled to bus 304 for storing static information and instructions for processors 306A, 306B, and 306C. Also present in system 300 is a data storage unit 312 (e.g., a magnetic or optical disk and disk drive) coupled to bus 304 for storing information and instructions. System 300 also includes an optional alpha-numeric input device 314 including alphanumeric and function keys coupled to bus 304 for communicating information and command selections to processor 306A or processors 306A, 306B, and 306C. System 300 also includes an optional cursor control device 316 coupled to bus 304 for communicating user input information and command selections to processor 306A or processors 306A, 306B, and 306C. System 300 of embodiments of the present technology also includes an optional display device 318 coupled to bus 304 for displaying information.
Referring still to
System 300 is also well suited to having a cursor directed by other means such as, for example, voice commands. System 300 also includes an I/O device 320 for coupling system 300 with external entities.
Referring still to
Computing system 300 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the present technology. Neither should the computing environment 300 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the example computing system 300.
Embodiments of the present technology may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Embodiments of the present technology may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer-storage media including memory-storage devices.
Referring to 405 of
For example, a user creates two videos. The first video data “H” contains a video of the user's wedding dress. The second video “I” contains a recording of a wedding ring. The user then is able to upload the first video data “H” and the second video data “I” and organize first video data “H” based on second video data “I”'s metadata of a wedding ring.
For example, the first video data “H” is received. A second video data “I” is also received, wherein second video data “I” comprises metadata 120 that provides an intra-video tag described herein of the first video data “H”. In essence, second video data “I” is representing first video data “H”'s metadata for organizational purposes. In one embodiment, the first video data “H” comprises the second video data “I”.
Additionally, in another embodiment the user decides to create a third video, video data “J”, of the flower arrangement for the wedding. According to embodiments of the present technology, user is able to upload the third video data “J” and organize third video data “J” based on second video data “I”'s metadata of a wedding ring.
In another embodiment, visual cue 145 is utilized as metadata 120 to organize first video data “H” in yet another embodiment, audio cue 160 is utilized as metadata 120 to organize first video data “H”.
Thus, embodiments of the present technology enable the organizing of video data without manual interaction. Such a method of organizing is particularly useful for sorting large numbers of videos in a short period of time.
Although the subject matter has been described in a language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US08/82151 | 10/31/2008 | WO | 00 | 4/4/2011 |