A transmission method in a video apparatus connected to a database system is disclosed. The database system comprises a database of visual entities possibly associated with metadata. A receiving method in the database system is also disclosed. The invention further relates to a corresponding video apparatus and a corresponding database system.
When watching a video, it is known to enhance the video content with additional data known as metadata. These metadata can be digital data as well as textual data that are associated with either video segments (i.e. groups of successive frames) or segments within frames (i.e. surfaces made up of contiguous pixels such as pixel blocks). Usually, metadata are associated with specific visual entities in the video content. Such visual entities in videos are usually made up of segments within frames that can have some semantic meaning and that are appearing on at least a couple of successive frames. As an example depicted on
The purpose of the invention is to overcome at least one of the disadvantages of the prior art. A transmission method in a video apparatus connected to a database system of visual entities is disclosed. The method comprises:
The method further comprises receiving metadata associated with one of the first and second visual entities when the other one of the first and second entities is selected.
According to a specific embodiment, the method further comprises after selecting the first visual entity, transmitting a first request to the database system to check for the presence of the first visual entity in the database system.
According to a specific embodiment, the method further comprising after selecting the second visual entity, sending a second request to the database system to check for the presence of the second visual entity in the database system.
Advantageously, selecting the first visual entity comprises pressing down to select the first visual entity.
According to a specific characteristic of the invention, selecting a second visual entity comprises, after pressing down to select the first visual entity, dragging towards the second visual entity, releasing pressure and further pressing down to select the second visual entity.
According to an aspect of the invention, sending a request to the database system to check for the presence of a visual entity in the database system comprises sending at least one graphical feature determined from the visual entity.
According to a specific characteristic, the at least one graphical feature is a set of color histograms determined by dividing the visual entity into small blocks and computing a color histogram for each of the blocks.
A receiving method in a database system of visual entities associated with metadata, the database system being connected to a video apparatus, is also disclosed. The receiving method comprises:
According to an aspect of the invention, linking the first visual entity and the second visual entity comprises associating any metadata of one of the first and second visual entities with the other one of the first and second entities.
Advantageously, the method further comprises receiving from the video apparatus a request to check for the presence of the first visual entity in the database system, checking the presence of the first visual entity in the database system upon reception of the request and adding the first visual entity in the database system when not present.
Advantageously, the method further comprises receiving from the video apparatus a request to check for the presence of the second visual entity in the database system, checking the presence of the second visual entity in the database system upon reception of the request and adding the second visual entity in the database system when not present.
According to an aspect of the invention, receiving a request to check for the presence of a visual entity comprises receiving at least one graphical feature determined from the visual entity.
According to a specific characteristic, the at least one graphical feature is a set of color histograms determined by dividing the visual entity into small blocks and computing a color histogram for each of the blocks.
According to a specific embodiment, checking the presence of a visual entity into the database system comprises comparing the received at least one graphical feature with each graphical feature associated with each visual entity of the database system.
A video apparatus connected to a database system comprising a database of visual entities is disclosed. The video apparatus comprises:
A database system of visual entities associated with metadata, the database system being connected to a video apparatus, is also disclosed. The database system comprises:
A video system comprising a database system of visual entities associated with metadata connected to at least one video apparatus is also disclosed.
Other features and advantages of the invention will appear with the following description of some of its embodiments, this description being made in connection with the drawings in which:
In the figures, the represented boxes are purely functional entities, which do not necessarily correspond to physical separated entities. As will be appreciated by one skilled in the art, aspects of the present principles can be embodied as a system, method or computer readable medium. Accordingly, aspects of the present principles can take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, and so forth), or an embodiment combining software and hardware aspects that can all generally be referred to herein as a “circuit,” “module”, or “system.” Furthermore, aspects of the present principles can take the form of a computer readable storage medium. Any combination of one or more computer readable storage medium(s) may be utilized.
The flowchart and/or block diagrams in the figures illustrate the configuration, operation and functionality of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, or blocks may be executed in an alternative order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of the blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. While not explicitly described, the present embodiments may be employed in any combination or sub-combination.
In reference to
In reference to
In a step 12, a first visual entity VE1 is selected in the video content. Specifically, the video apparatus connected to the database receives a selection of a first visual entity VE1, e.g. made by a user. Indeed, a selection is initiated by a user but received by the video apparatus. The visual entity VE1 is for example selected by a mouse click. More precisely, the user presses down on a mouse button to select the first visual entity VE1. According to a variant, the user directly presses down on a touch screen to select the first visual entity VE1. The first visual entity may also be selected by voice command or gesture command. In a step 14, a second visual entity VE2 is selected in the video content. Specifically, the video apparatus receives a selection of a second visual entity VE2, e.g. made by a user. The second visual entity VE2 is selected in the same way as the first visual entity, i.e. either by a mouse click or by directly tapping on the touch screen, by voice command or gesture command. After clicking on VE1 at step 12, the selection of VE2 may also be made either by dragging a representation of VE1 or the cursor onto VE2 and then releasing pressure or by dragging a representation of VE1 or the cursor onto VE2, then releasing pressure and finally clicking/tapping on VE2 to confirm the selection. In this last case, if the final clicking/tapping to confirm the selection occurs far away, i.e. at a distance above a threshold value, from the point of pressure release then the whole process is cancelled. The second visual entity may also be selected by voice command or gesture command. According to a variant, if the time delay between step 12 and 14 is above a given threshold value, then the whole process is cancelled.
In a step 16, one information item relative to an association of said first visual entity VE1 with said second visual entity VE2 is transmitted to the database system. The information item is for example a simple request to associate in the database both entities.
According to an improved embodiment, the transmission method further comprises at a step 13 after selecting the first visual entity, transmitting/sending a first request to the database system to check for the presence of the first visual entity VE1 in the database system. Indeed, the first visual entity is possibly a new visual entity not yet recorded in the database. If not present, VE1 is added to the database with its graphical feature to be later on recognized. In the same way, the method further comprises at a step 15 after selecting the second visual entity, sending a second request to said database system to check for the presence of the second visual entity in the database system. According to a specific embodiment of the invention, sending a request to the database system to check for the presence of a visual entity, either the first or the second visual entity, in the database system comprises sending at least one graphical feature or more generally descriptive feature (e.g. position within frames) determined from the visual entity. As an example depicted on
If the second visual entity VE2 is not present in the database, it is a new visual entity VE2 and it is inserted into the database with its graphical feature. To be later recognized as a database entry it has to get a sufficiently discriminative yet generic description. Such a description is for example the set of color histograms.
According to a variant, the steps 12 and 14 are operated first. Then steps 13, 15 and 16 are merged into a single step. More precisely, the VE1 is selected first, then VE2 is selected. Finally, a single request is transmitted/sent to the database for checking for the presence in the database of VE1 and VE2 (adding them if necessary with their graphical features) and for linking both entities. According to another variant, only steps 15 and 16 are merged into a single step, i.e. a single request is transmitted to the database to check for the presence in the database of VE2 (adding it if necessary with their graphical features) and to link both entities.
Later on, when a user selects one visual entity, e.g. VE1, in one of the video apparatus 20, 30, 40, etc, connected to the database system, he receives the metadata associated with the selected visual entity and the metadata associated with any one of the visual entities linked with the selected visual entity in the database system.
In the database 110, metadata, graphical features and links can be stored as three simple maps:
In reference to
At a step 22, an information item (e.g. a request to link two entities) relative to an association of a first visual entity with a second visual entity is received by the database system from said video apparatus.
At a step 24, the first visual entity and the second visual entity are linked upon reception of the information in the database. According to a specific embodiment, linking the first visual entity and the second visual entity comprises associating any metadata of one of said first and second visual entities with the other one of said first and second entities. The link is for example created as list of pairs of visual entity identifiers as in table 1. According to a variant, each pair is duplicated with reversed first and second components in order to ease the search in the database. As an example, the pair (ID_1, ID_2) is also stored as (ID_2, ID_1).
According to a variant, the links are created as a map or dictionary connecting visual entity identifiers together. Such a dictionary is for example defined as a hash map, e.g. {ID_1: [ID_2, ID_5], ID_2: [ID_1, ID_5], ID_3: [ID_4], ID_4: [ID_3], ID_5: [ID_1, ID—2]}.
According to yet another variant, single linked chains or double linked chains of visual entities are stored in the database. On
According to an improved embodiment, the method comprises receiving from the video apparatus a first request to check for the presence of the first visual entity in the database system, checking the presence of the first visual entity in the database system upon reception of the request and adding the first visual entity with its graphical feature in the database system when not present. In the same way, the method further comprises receiving from the video apparatus a second request to check for the presence of the second visual entity in the database system, checking the presence of the second visual entity in the database system upon reception of the request and adding the second visual entity with its graphical feature in the database system when not present. According to a variant, a single request is received from the video apparatus by the database system to check for the presence of the second visual entity and to further link the two entities. According to yet another variant a single request is received to check for the presence of both visual entities and to further link the two entities.
As an example, receiving a request to check for the presence of a visual entity comprises receiving at least one graphical feature determined from the visual entity. The graphical feature is the representation of the visual entity and is for example constituted of a set of color histograms determined by dividing the visual entity into image blocks and computing a color histogram for each of the image blocks. Another graphical feature is, for example, a set of color regions obtained by color segmentation. Another descriptive feature is the size and position of the visual entity within the frame.
Checking for the presence of a visual entity into the database system comprises comparing the received graphical feature with each graphical feature associated with each visual entity of the database system. The graphical feature associated with each visual entity of the database system is preferably stored in the database with the visual entity and its metadata. Therefore, for checking whether a visual entity selected by the video apparatus is already stored in the database, the DMBS compares the received graphical feature with the graphical features of all the visual entities of the database. If the DMBS finds a visual entity stored in the database whose graphical feature is close in the sense of a certain distance to the graphical feature received, this means that the visual entity is already stored. Otherwise, a new visual entity is added in the database with the received graphical feature.
The distance between two color histograms is for example determined according to the following equation:
(i,j) are the coordinates of the block whose color histograms are compared. The core of this function is di,j the distance between colors, for which any well-known distance can be used (L1, L2, Euclidian, . . . )
Therefore, each histogram of the visual entity to be checked is compared with a spatially corresponding histogram of a visual entity in the database.
Once a distance between colors histograms is computed for each block, an overall distance is computed between both visual entities as a weighted function of all the blocks distances:
with n and m the counts of blocks (lines and columns) describing the object.
If the overall distance is below a threshold value, then the visual entity is thus recognized to be stored in the database.
Advantageously, the weights Wi,j are defined such as the more external the blocks have lower weight. Table 2 below shows an example of such weights
According to a variant, the representation of the visual entity is constituted of a pyramidal structure of colors histograms as depicted on
Later on, in an optional step 26, the database system possibly transmits both the metadata associated with a selected visual entity and the metadata associated with any one of the visual entities linked with the selected visual entity in the database system. The selection of the visual entity is made in anyone of the video apparatus 20, 30, 40, etc.
At a step 32 in the video receiver, a user selects a first visual entity VE1 for example by holding on pressure (with finger or mouse button) over the first visual entity VE1. Specifically, the video receiver receives a selection of a first visual entity VE1, i.e. the one made by the user.
At a step 33 in the video receiver, VE1 graphical features (e.g. colors, shape, gradients, etc.) are extracted from current frame and a first request is sent to the database system.
At a step 34 in the database system, the presence of the visual entity VE1 in the database is checked by comparing the received graphical feature with the graphical features of the visual entities stored in the database. If the visual entity VE1 is not found to be present in the database, then VE1 is added in the database as a new entry with its graphical feature.
At a step 35 in the video receiver, a user selects a second visual entity VE2 for example by clicking on VE2 or by successively dragging VE1 within the video over onto VE2 and releasing pressure. According to a variant, the selection of the second entity VE2 is made by successively dragging VE1 within the video onto VE2, releasing pressure and then clicking on VE2 to confirm the selection. Specifically, the video receiver receives a selection of a first visual entity VE1, i.e. the one made by the user.
At a step 36 in the video receiver, VE2 graphical features (e.g. colors, shape, gradients, etc.) are extracted from current frame and a second request is sent to the database system.
At a step 37 in the database system, the presence of the visual entity VE2 in the database is checked by comparing the received graphical feature with the graphical features of the visual entities stored in the database. If the visual entity VE2 is not found to be present in the database, then it is added in the database as a new entry with its graphical feature.
At a step 38 in the video receiver, one information item relative to an association of the first visual entity VE1 with the second visual entity VE2 is transmitted to the database system. According to a variant, the step 38 is merged with step 36. In this case, at step 36, VE2 graphical features (e.g. colors, shape, gradients, position, etc.) are extracted from the current frame and a second request is transmitted/sent to the database system to check for the presence of the visual entity VE2 in the database and further to link VE1 and VE2. According to yet another variant, steps 33, 36 and 38 are merged into a single step. In this case, VE1 and VE2 graphical features are extracted from current frame and a request is transmitted/sent to the database system to check for the presence of the visual entities VE1 and VE2 in the database and further to link VE1 and VE2.
At a step 39 in the database system, VE1 and VE2 are linked.
Later on, in any video receiver connected to the database system, when a user selects VE1 or VE2 to get metadata, he receives both the metadata associated with the selected visual entity and all the metadata associated with visual entities different from the selected visual entity, but linked with it in the database. As depicted on
Each of these elements of
The architecture of the video apparatus 20 and 30 is identical to the architecture described above for apparatus 40.
When a first user is associating two visual entities, the database system thus receives information on this association and thus links the two entities, i.e. creates a link between the two entities. As soon as visual entities are linked in the database, any information/metadata associated with one of these entities is immediately associated with the other ones. Consequently, when, later on, a user, i.e. either the first user or another one, selects any one of the two entities, he receives all the metadata associated with both the first visual entity and with the second visual entity. Therefore, the second user when selecting the second entity is not limited to the reception of the metadata associated with this second entity, but also receives the metadata associated with the first visual entity.
A small database of such links is then constructed on the fly and moreover shared between users who act then in a collaborative manner. The database contains visual entities associated with metadata and links between visual entities as created upon user's selection.
This database is available for the user during any interactions with the video document. Furthermore, the database system is advantageously used in a collaborative way, each user being connected to friends or communities, sharing their links and associated information with others. Some policy may be proposed to ensure minimal coherence of the database, e.g. the mostly linked together entities over the community will be shared first.
Finally, this database may be associated to the displayed document and provided with it at any replay time (e.g. VoD, catch-up TV, etc.) One can imagine providing a database version according to the user profile or to some community description. A community is defined based on user profiles and center of interest. As an example, for a football/tennis game, two communities may be defined, e.g. one for each team. The experience for the user is thus enhanced. All the users thus contribute to an overall task and take benefit of others' contributions.
Number | Date | Country | Kind |
---|---|---|---|
13305071.6 | Jan 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/050593 | 1/14/2014 | WO | 00 |