This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2014-054208, filed on Mar. 17, 2014, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a recording medium storing an extraction program, an extraction method, and an extraction device.
Hitherto, services exist that distribute footage captured of a soccer game both as a live distribution, and as a distribution of Video On Demand (VOD) content. In such services, sometimes there are occasions when the extraction of a particular scene from footage is desired.
For example, an exciting scene is recognized as being when a sharp rise in sound intensity is detected in audio data corresponding to footage.
There has also been a proposal to identify highlight segments in video including a frame sequence.
Japanese National Publication of International Patent Application No. 2008-511186
In the extraction of scenes described above, sometimes, for example, the extraction of breaks in play is desired.
According to an aspect of the embodiments, a non-transitory recording medium stores an extraction program that causes a computer to execute a process. The process includes detecting a transition between a close-up captured image and a long distance captured image from captured images obtained by capturing a sports game, and, based on the detected transition, extracting a timing corresponding to a break in play in the sports game, or extracting a captured image corresponding to a break in play in the sports game.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Detailed explanation follows below regarding examples of exemplary embodiments of the technology disclosed herein with reference to the drawings. In each of the exemplary embodiments below, explanation is given regarding an example in which an extraction device of the technology disclosed herein is applied to a footage distribution system that distributes captured footage of a soccer match.
As illustrated in
First, detailed description is given regarding each section of the distribution device 30.
The editing section 32 acquires captured footage of a soccer match (referred to as “captured footage” hereafter). Note that in the present exemplary embodiment, the footage is media including video data and audio data. The video data included in the captured footage is captured at a frame rate such as 30 fps or 60 fps, and includes plural frames. Each of the frames is associated with time data indicating the elapsed time since capture of the captured footage started. The captured footage includes audio data including audio such as cheering in the match stadium, and audio of commentators, live reports, and the like. The audio data is time sequenced data indicating a sound intensity level for respective sampling points, and at each sampling point, the audio data is associated with time data that synchronizes with time data associated with each frame in the video data, such that the audio data and the video data are synchronized with each other.
The editing section 32 applies editing instructions, instructed by an operation by an operator using a display device and an input device, not illustrated in the drawings, to each frame in the video data included in the captured footage using image processing. Editing instructions include, for example, adding an overlay 80 displaying a game state as illustrated in
The distribution section 34 acquires footage to which metadata generated by the extraction device 20 has been appended (referred to as “metadata appended footage” hereafter; detailed description given below). The metadata appended footage is converted to distribution footage, according to specified standards, and distributed to the distribution destination terminal (omitted from illustration in the drawings) by the distribution section 34.
Next, detailed description is given regarding each section of the extraction device 20.
The detection section 22 acquires edited footage transmitted from the distribution device 30. The detection section 22 first detects a cut transition based on the video data included in the edited footage. Cut refers to continuous portions in the video data captured at the same angle. Specifically, the detection section 22 computes differences between each frame and the previous frame. The inter-frame difference is, for example, computed according to the sum of the differences in pixel data between pairs of corresponding pixels in the respective frames. As illustrated in
In the consecutive frames 82, 84, 86, 88 in
Moreover, the detection section 22 may designate the group of frames including from the leading frame until the final frame of the detected cut as a group of frames representing one cut. Then, the detection section 22 determines whether at least one frame included in the group of frames representing one cut is a frame captured at long distance (referred to as a “long distance frame” below), or a frame captured close-up (referred to as a “close-up frame” below). Note that long distance frames and close-up frames are examples of long distance capture video and close-up capture video of the technology disclosed herein. This determination may, for example, be performed by employing an identification model for identifying long distance frames and close-up frames. The identification model may be generated by, for example, training on plural long distance frames like those illustrated in
The extraction section 24 extracts a break in play based on the transition between the long distance frame and the close-up frame detected by the detection section 22. Generally, in footage of a soccer game, play is captured at long distance and transition is made to cut to a captured close-up, such as the demeanor of a player on the pitch or on the bench, at respective breaks between single plays. Accordingly, the location where transition is made between a long distance frame and a close-up frame is the start of, or end of, a single play. Note that a single play refers to continuous flow of the ball, from when a ball that was temporarily stopped due to a foul or the ball going out or the like starts to be in play again due to a free-kick, a throw-in, or the like, up until another stop. Note that the meaning of starting and stopping of the ball does not strictly refer to ball movement itself, but rather refers to the status of the ball at recommencement, or suspension, of play.
Specifically, as illustrated in
The generation section 26 generates metadata indicating a break in play based on the group of frames extracted by the extraction section 24. Specifically, the generation section 26 generates metadata that associates data indicating the start of a single play with time data associated with a frame representing the start of the single play. Moreover, the generation section 26 generates metadata associating data indicating the end of a single play with time data associated with a frame representing the end of the single play.
The generation section 26 generates a metadata file that stores the plural generated metadata in the sequence of the time data included in the metadata. The metadata file may be generated as a file formatted as, for example, a csv (comma-separated values) format file. An example of a metadata file is illustrated in
Note that although explanation is given here regarding a case in which the metadata is generated using time data associated with frames, the metadata may be generated using other data that identifies each frame, such as frame numbers.
The edited footage is appended with the generated metadata file and transmitted to distribution device 30 as metadata appended footage by the generation section 26.
The extraction device 20 may, for example, be implemented by a computer 40 as illustrated in
The storage section 46 may be implemented by a hard disk drive (HDD), flash memory, or the like. An extraction program 50 that causes the computer 40 to function as the extraction device 20 is stored in the storage section 46 that serves as a recording medium. The CPU 42 reads the extraction program 50 from the storage section 46, expands the extraction program 50 into the memory 44, and sequentially executes the processes included in the extraction program 50.
The extraction program 50 includes a detection process 52, an extraction process 54, and a generation process 56. The CPU 42 operates as the detection section 22 illustrated in
The distribution device 30 may be implemented by, for example, a computer 60 illustrated in
The distribution device 30 and the extraction device 20 are connected through the network I/F 68 of the distribution device 30, the network, and the network I/F 48 of the extraction device 20.
The storage section 66 may be implemented by a HDD, flash memory, or the like. A distribution program 70 that causes the computer 60 to function as the distribution device 30 is stored in the storage section 66 that serves as a recording medium. The CPU 62 reads the distribution program 70 from the storage section 66 and expands the distribution program 70 into the memory 64, and sequentially executes the processes included in the distribution program 70.
The distribution program 70 includes an editing process 72 and a distribution process 74. The CPU 62 operates as the editing section 32 illustrated in
The extraction device 20 and the distribution device 30 may each be implemented by, for example, a semiconductor integrated circuit, and more specifically by an application specific integrated circuit (ASIC) or the like.
Explanation next follows regarding operation of the footage distribution system 10 according to the present exemplary embodiment. When the distribution device 30 is input with captured footage, the distribution device 30 executes editing processing illustrated in
First, at step S10 of the editing processing illustrated in
Next, at step S20 of the extraction processing illustrated in
Next, at step S22 the detection section 22 computes differences for each frame compared to the respective previous frame, so as to detect the leading frame and final frame of a cut based on frames for which the difference compared to the previous frame exceeds a specific threshold value.
Next, at step S24 the detection section 22 designates a group of frames, including from the leading frame until the final frame of the detected cut, as a group of frames representing a single cut. Then, the detection section 22 determines whether at least one frame included in the group of frames representing the single cut is a long distance frame, or a close-up frame. The detection section 22 detects a transition between a long distance frame and a close-up frame based on the determination result.
Next, at step S26 the extraction section 24 extracts a long distance frame of a transition from the close-up frame to the long distance frame as a frame representing the start of a single play. Moreover, the extraction section 24 extracts the long distance frame of a transition from a long distance frame to a close-up frame as a frame representing the end of a single play. Namely, the extraction section 24 extracts a group of frames from the frame representing the start of the single play until the frame representing the end of the single play as the group of frames representing the single play.
Next, at step S28, the generation section 26 generates metadata that associates data indicating the start of the single play to time data associated with the frames representing the start of the single play extracted by the extraction section 24. Moreover, the generation section 26 generates metadata that associates data representing the end of the single play to time data associated with the frames representing the end of the single play extracted by the extraction section 24.
Next, at step S30, the generation section 26 generates a metadata file storing the plural metadata generated at step S28 above in the sequence of the time data included in the metadata. The generation section 26 then appends the generated metadata file to the edited footage, and transmits the edited footage to the distribution device 30 as metadata appended footage, and extraction processing ends.
Next, at step S40 of the distribution processing illustrated in
As explained above, according to the extraction device 20 of the present exemplary embodiment, from frames of video data included in captured footage of a soccer game, a transition between a long distance frame and a close-up frame is detected, and the transition is extracted as a break in play. This thereby enables extraction of a break in play from captured footage of a soccer game.
In each of the exemplary embodiments above, easy location of respective scenes of single plays of sports game footage (the captured footage, or the edited footage) based on metadata is enabled when employing metadata appended footage, to which the metadata indicating the start and end of the extracted single play is appended. Moreover during footage distribution, for example, a supplementary service, such as transmission of email to a user, may be performed automatically coordinated with breaks in play based on the metadata.
Although explanation has been given of examples in which a footage distribution system includes a distribution device and an extraction device in each of the exemplary embodiments above, there is no limitation thereto. Each of the functional sections of the distribution device, and each of the functional sections of the extraction device may be implemented by a single computer.
Although explanation has been given of cases in which footage appended with metadata, generated by the extraction device, indicating the starts and ends of a single play is distributed by the distribution device in each of the exemplary embodiments above, there is no limitation thereto. For example, metadata appended footage may be saved as a large volume archive, and the respective single plays extracted and output based on the metadata.
The output from the extraction device may also be employed in applications other than footage distribution systems.
Although explanation has been given above of modes in which the extraction program 50 is pre-stored (installed) on the storage section 46, and the distribution program 70 is pre-stored (installed) on the non-volatile storage section 66, this may be provided in a format recorded on a recording medium such as a CD-ROM or a DVD-ROM.
One aspect exhibits the advantageous effect of enabling extraction of breaks in play from captured footage of a soccer game.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the technology disclosed herein have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2014-054208 | Mar 2014 | JP | national |