This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2014-140481, filed Jul. 8, 2014, the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an image processor, a method and a program.
Checking the contents of the whole of a moving image or a set of still images and efficiently accessing an image of a topic to watch are required. For example, in the educational field, the technology to divide a lecture video made by shooting a lecture by a video camera where the lecturer uses a slide projector into a plurality of topics in accordance with major changes of the contents of slides, to generate topic images that show the content of each topic and to show a list of the topic images is well known. A user can see the topic images and thereby easily find a topic to watch.
Since the conventional technology is based on the premise that slides showing the lesson contents are projected, video can be divided into topics in accordance with major changes of the contents of the slides. However, the conventional technology has a problem that images of, for example, a blackboard, where writing and erasing are repeated and contents are momentarily changed cannot be divided into topics. The problem may arise not only in the educational field, but also in watching video (a moving image or a set of still images) undivided into chapters. The problem may also arise in a moving image made by shooting a blackboard by a video camera that describes a progress situation of public engineering works.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
In general, according to one embodiment, an image processor includes a writing amount detector and an end timing detector. The writing amount detector detects a writing amount in an image. The end timing detector detects an end timing of writing based on the writing amount detected by the writing amount detector.
The embodiments of the image processor can be implemented by various devices such as desktop or laptop general-purpose computers, portable general-purpose computers, other portable information devices, information devices including an imaging device, smart phones and other information processors. In the description below, a laptop general-purpose computer is described as an example. The laptop general-purpose computer (not shown) is constituted by a computer body and a display unit attached to the body by a hinge so as to be openable and closable. The computer body has a thin box-shaped housing. A keyboard, a power button, a touchpad, a speaker, etc., are arranged on the upper surface of the housing. An LCD panel is incorporated into the display unit.
The CPU 12 is a processor that controls operations of various components mounted on the general-purpose computer. The CPU 12 executes various types of software loaded from the nonvolatile storage device 20 to the main memory 16. The software includes an operating system (OS) 16a, an automatic chaptering application program 16b, etc. The automatic chaptering application program 16b analyzes video content, detects topic end timings and divides the video content into a plurality of chapters according to topics.
The CPU 12 also executes a basic input/output system (BIOS) stored in the BIOS-ROM 18. The BIOS is a program for hardware control.
The system controller 14 is a device that connects the CPU 12 with various components. The system controller 14 includes a memory controller that executes access control of the main memory 16. The main memory 16, the BIOS-ROM 18, the storage device 20, the optical disk drive 22, the display controller 26, the sound controller 28, the wireless communication device 30, the embedded controller 34, etc., are connected to the system controller 14.
The display controller 26 controls an LCD 42. The display controller 26 transmits a display signal to the LCD 42 under the control of the CPU 12. The LCD 42 displays a screen image based on the display signal. The sound controller 28 is a controller that processes an audio signal. The sound controller 28 controls audio output of the speaker 44. The wireless communication device 30 is a device configured to execute wireless communication such as wireless LAN conforming to the IEEE 802.11g standard or 3G mobile communication or short-range wireless communication such as near-field communication (NFC) to be connected to a network. The LAN interface 32 is configured to execute, for example, wire communication conforming to the IEEE 802.3 standard to be connected to the network. The embedded controller 34 is a one-chip microcomputer including a controller for power management. The embedded controller 34 has a function of powering on and off the general-purpose computer in accordance with power button operation by a user (not shown). A keyboard/mouse 46 is connected to the embedded controller 34.
Next, the automatic chaptering application program 16b is schematically described. The automatic chaptering application program 16b is often used together with a video viewing application to access desired information from video of, for example, a lecture where the lecturer projects slides of presentation or a lesson where the teacher writes on a blackboard or a whiteboard. The blackboard and the whiteboard are hereinafter collectively called blackboards. The video to be processed is not limited to a moving image, but may be a set of still images. In addition, the video is not limited to educational video, but may be video of a meeting or arrangement using a blackboard. When video of a lecture or lesson undivided into chapters is viewed, the automatic chaptering application program 16b can compute an end timing of each topic, divide the video into a plurality of topics, i.e., chapters to find the beginning of each chapter and display a snapshot near the topic end timing as a thumbnail representative of the topic. Therefore, the contents of the entire video can be efficiently checked.
Conventionally, video of a scene where a slide that displays the same contents for a certain time is projected can be divided into chapters in accordance with changes of the slide contents. However, video of, for example, a blackboard on which writing and erasing are repeated and contents are momentarily changed cannot be divided into chapters since topic end timings cannot be detected. In contrast, the automatic chaptering application program 16b extracts writing blocks from video, computes a writing amount of the blocks and computes, based on the writing amount, an end timing (i.e., a start/end point of a topic) showing that writing of a topic is temporarily stopped.
The time-series image acquisition module 52 acquires time-series images to be subjected to automatic chaptering processing from the input signal. The time-series images to be processed are time-series images obtained by capturing a scene where a lecturer delivering a lecture or a chair presiding at a meeting writes characters on a blackboard or whiteboard. If the input signal is encoded in the MPEG format, the signal is decoded in the time-series image acquisition module 52 and the original time-series images are thereby extracted. Each frame image or each field image of the time-series images is accompanied by time data. The time data is used in a background/writing block extraction module 54, a structuring processor 58 and a chapter image generation module 60. In the background/writing block extraction module 54 and the structuring processor 58, writing blocks and writing areas to be described later are computed based on the time data. In the chapter image generation module 60, an image having time data of the end timing may be determined as a chapter image.
The time-series images to be processed are input to the background/writing block extraction module 54. The extraction module 54 analyzes the time-series images, extracts a background in each frame and extracts writing blocks from the background. The background is the largest area (in particular, a blackboard) on which the lecturer can write characters, and is extracted by finding the largest area having pixels of a color unchanged for a long time. In the time-series images, a frame is not necessarily filled with the blackboard, but objects other than the blackboard (for example, the wall of a room) may also be seen.
A writing block is constituted by positional data and time data indicative of a position and a time, respectively, at which an area different from the background is expressed by a writing action. In other words, a start time and an end time of a period when a pixel value is different from the background are described per area. The positional data (area) may be expressed in a unit of pixel or by an area of a certain size including pixels different from the background.
Considering a processing load of the blocks in the subsequent stage, the positional data may be expressed in a unit of character, word or line, not pixel. For example, the writing blocks are expressed as follows.
(s1,xb1,yb1)−(e1,xb1,yb1)
(s2,xb2,yb1)−(e2,xb2,yb1)
. . .
where s is a start time, e is an end time and xb and yb are a set of coordinates in the area. The above example shows that the area (xb1, yb1) has a pixel value different from the background from time s1 to time e1 and the area (xb2, yb1) has a pixel value different from the background from time s2 to time e2.
If writing blocks are detected, a time-series locus of a writing action and a writing image series at a certain time can be extracted.
If a teacher is seen in the video, writing blocks must be distinguished from the teacher since the teacher also includes pixels different from the background. Once writing blocks are detected, the positions of the blocks do not change until erased. In contrast, the teacher is moving and thus the position of the teacher changes over time. Based on the difference, the extraction module 54 distinguishes between the writing blocks and the teacher.
The background and the writing blocks extracted in each frame are input to the end computation module 56. The writing blocks are also input to the structuring processor 58. The structuring processor 58 integrates the writing blocks input from the extraction module 54 into writing areas based on the unity of time and space, and outputs the writing areas to the end computation module 56. The unity of time indicates a set of temporally-consecutive writing blocks, and can be expressed as a significant unit. The unity of space indicates a set of writing blocks whose writing pixels are adjacent to each other, and can also be expressed as a significant unit similarly to the unity of time. For example, the structuring processor 58 may integrate a plurality of writing blocks into writing areas based on writing directions. The structuring processing is executed because the principle of end computation is different for the case of writing about a topic using the entire blackboard and the case of dividing the blackboard into several areas and writing about a plurality of topics in these areas, respectively.
The end computation module 56 computes a writing amount in the time-series images based on the background and the writing blocks input from the background/writing block extraction module 54 and/or the writing areas input from the structuring processor 58, and computes, based on the writing amount, an end timing showing that writing of a topic is temporarily stopped. Whether the end computation module 56 uses the background and the writing blocks or the writing areas should preferably be determined depending on the type of time-series images to be processed. The type of images relates to whether the blackboard is used in whole or per area as described above. If the type is preliminarily known, the switching may be performed by the user or automatically performed by including type information as attribute information of the contents. If the type is unknown, the writing areas are considered to be used. However, not only either the background and the writing blocks or the writing areas, but also both of them may be used.
The writing amount can be computed as a ratio of writing blocks to a background and/or a ratio of writing blocks to a writing area obtained by integrating the writing blocks. General methods of writing on the blackboard include a method using the entire blackboard and a method dividing the blackboard in half. In the former method, when contents are written to fill the blackboard, all the written contents are erased and then new contents are written. In the latter method, the following process is repeated. When contents are written to fill the blackboard divided in half, the contents in the left half is first erased and new contents are written to fill the left half, and then the contents in the right half is erased and new contents are written to fill the right half. The writing amount is often computed correctly by the ratio of writing blocks to the background in the former method, and by the ratio of writing blocks to a writing area in the latter method.
The writing amount is increased as the teacher writes on the blackboard. When the blackboard has little or no space to write, the writing space is often newly secured by erasing all or a part of the written contents. Therefore, the writing amount is increased over time, but if there is little or no writing space and the existing writing blocks are erased, the writing amount is temporarily reduced. The increasing rate of the writing amount decreases as the writing space becomes smaller. If the writing space runs out, the writing amount does not increase and is saturated. After that, if the existing writing blocks are erased, the writing amount decreases. Therefore, the end computation module 56 computes at least one of a timing when the writing amount is maximum, a timing when the writing amount reaches a predetermined value (for example, 80%), and a timing when the writing amount is substantially saturated (i.e., when the change rate becomes lower than a threshold value) as an end timing. The basis for computation should preferably be determined depending on the type of time-series images to be processed. The type of images is determined based on whether the contents are frequently and partly erased and written or written by using the entire blackboard and infrequently erased, etc. If the type is preliminary known, the switching may be performed by the user or automatically performed by including type information as attribute information of the contents.
The output of the end computation module 56 and the time-series images acquired in the time-series image acquisition module 52 are supplied to the chapter image generation module 60. The chapter image generation module 60 divides the time-series images into a plurality of chapters based on the end timings. The chapter image generation module 60 generates a chapter image of each chapter and displays the chapter images on the LCD 42 for selection of the beginning of the time-series images. The chapter image is a representative image that expresses the contents of the chapter. For example, an image including writing blocks and writing areas used for computing the end timing may be determined as a chapter image since this image has the largest amount of information. Instead, an image including a first set of writing blocks in which information such as a title or theme is expressed without interruption after the previous end timing may be determined as a chapter image.
The LCD 42 can display a plurality of chapter images. When any one of the chapter images is selected by the keyboard/mouse 46, the time-series data is reproduced from a position corresponding to the selected chapter image. To implement such reproduction, the time-series images are supplied to a time-series image reproduction module 62 and chapter designation data indicative of the selected chapter is input from the keyboard/mouse 46 to the time-series image reproduction module 62. Since the end timing is a timing of the end of a lecture on a topic, if the reproduction is started from the end timing, the lecture is immediately shifted to the next topic. Therefore, the reproduction may be started from an end timing immediately preceding the selected end timing.
As described above, start/end points of topics can be computed in images of, for example, a blackboard where the contents are momentarily changed by computing end timings that are start/end points of the topics based on writing blocks extracted from images of the blackboard. Therefore, the time-series images can be divided into chapters at the end timings, the whole of the time-series images can be understood in a short time by viewing representative images of the chapters, and images of a desired topic can be immediately reproduced.
The above is a basic configuration of the present embodiment, which will be hereinafter described in detail with examples.
In Example 1, writing blocks are extracted from a moving image of a lecture using a blackboard, an end timing showing that writing of a topic is temporarily stopped is computed based on the writing blocks, and the moving image is divided into chapters according to the end timings.
The background/writing block extraction module 54 analyzes images input from the time-series image acquisition module 52 and extracts a background and writing blocks.
The background includes not only the writing blocks expressed by writing actions, but also an occlusion block where the writing blocks and the background are hidden behind the writer. A spatiotemporal analysis is one of the ways to distinguish between the writing blocks and the occlusion block. On the assumption that a field of view of an imaging camera is fixed, the position of the writer causing the occlusion moves over time, but the writing blocks expressed by the writing actions do not move until erased. Focusing on this point, images of the background are spatiotemporally analyzed over a certain time as shown in
The structuring processor 58 integrates a plurality of writing blocks input from the background/writing block extraction module 54 into a writing area in consideration of the unity of time and space, and outputs the writing area to the end computation module 56.
The end computation module 56 computes a writing amount by using the background and the writing blocks input from the background/writing block extraction module 54 and/or the writing areas input from the structuring processor 58, and computes a timing when the writing amount reaches a maximum value or a predetermined value or a timing when the writing amount is substantially saturated (i.e., when the change rate of the writing amount becomes lower than a threshold value), as an end timing showing that writing of a topic is temporarily stopped.
The writing of the theme area W1 is completed before the writing of the affirmative area W2 and the negative area W3. Upon the finding of new arguments, writing blocks are added to the affirmative area W2 and the negative area W3. In this example, the contents once written in the three writing areas W1, W2 and W3 are not erased except for an error. Therefore, the writing amount being almost unchanged can be a predetermined condition to compute an end timing. Thus, as shown in
As described above, on the basis of a writing amount computed from a ratio between a background and writing blocks extracted from a moving image and/or a ratio between writing blocks extracted from the moving image and a writing area that is a set of these writing blocks, the temporary stop of writing in the writing blocks and/or the writing areas can be computed. Therefore, start/end points of topics can be detected in images showing a writing process in which contents are momentarily changed, and the images can be divided into a plurality of chapters according to the topics. By sequentially reproducing the start points of the chapters, the end point of a topic alone can be efficiently viewed, the whole of the time-series images can be understood for a short time, and images of a desired topic can be immediately found.
In addition, when extracting the writing blocks, the writing blocks can be distinguished from the occlusion block depending on whether the positions of areas having pixel values different from that of the background temporally change or not. Therefore, the correct writing amount can be computed.
Example 2 aims to facilitate selection of a chapter to be reproduced in a moving image of a lecture using a blackboard by extracting writing blocks and writing areas, computing, based on the extracted writing blocks and writing areas, end timings showing that writing on a topic is temporarily stopped, dividing the moving image into chapters according to the end timings, and displaying chapter images representative of the respective chapters. Example 2 is the same as Example 1 except that the chapter images are generated in the chapter image generation module 60, a screen to select a chapter is displayed on the LCD 42, and the reproduction is started from a timing corresponding to the selected chapter image. Therefore, the differences from Example 1 are described in detail.
In the list display shown in
As described above, a chapter that the user is interested in and desires to view can be immediately reproduced by computing sets of writing blocks extracted from a moving image (i.e., writing areas), computing a timing when writing is temporarily stopped as an end timing, showing the user a list of chapter images representative of chapters obtained by dividing the moving image according to the end timings, and allowing the user to select a chapter image. Therefore, the desired chapter alone can be efficiently viewed without reproducing all the time-series images. Since the end timing is computed by using a writing amount computed from a ratio of writing blocks to a writing area, the images include areas unrelated to the computation of the end timings. In this example, the areas unrelated to the computation of the end timings are excluded when a combined chapter image is generated by combining images related to the end timings. Therefore, when a list of chapter images is displayed for selection of the beginning of a chapter, only the areas that were actually used for the end computation are combined and displayed. The list of chapter images can be thereby efficiently displayed even on a device having a small screen.
In Example 3, only a writing area corresponding to an end timing is highlighted in blackboard contents of a lecture. Example 3 is different from Example 2 in that the writing area related to the computation of the end timing is highlighted when a chapter image corresponding to the end timing is generated by the chapter image generation module 60. Therefore, the chapter image generation module 60 alone is hereinafter described in detail.
In the above embodiment, the general-purpose computer executes all the processing. However, as shown in
The user device 82 requests a list of images of an educational material from the server 86 via the network 84. The server 86 requests images of the educational material from the database 88 and receives the images from the database 88. The server 86 executes the automatic chaptering application program and executes the processing shown in
The same effect and advantage as the embodiment can also be achieved by such a structure.
In the example of
The procedures described in the above embodiment can be executed by a program that is software. If a general-purpose computer system storing the program reads the program, the same advantage as the image processor of the above embodiment can be achieved. The procedures described in the above embodiment are stored as a program that can be executed by a computer in a storage medium such as a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD±R, DVD±RW, etc.), a semiconductor memory or the like. A format of the data storage may be any format if the storage medium is readable by a computer or an embedded system. If the computer reads a program from the storage medium and causes the CPU to carry out instructions described in the program based on the program, the same operation as the image processor of the embodiment can be implemented. Of course, the computer may acquire or read the program via the network.
In addition, each procedure to implement the present embodiment may be partly executed by the operating system (OS), middleware such as database management software and a network, etc., that operate based on instructions of the program installed from the storage medium to the computer or the embedded system.
Furthermore, the storage medium of the present embodiment is not necessarily independent of the computer or the embedded system. The storage medium also includes a storage medium that downloads and stores or temporarily stores a program transmitted via the LAN, the Internet, etc.
Moreover, the processing of the present embodiment is not necessarily executed by means of a single storage medium, but may be executed by means of a plurality of storage media.
The computer or the embedded system of the present embodiment executes each procedure in the present embodiment based on the program stored in the storage medium, and may be a device such as a personal computer of a microcomputer constituted by one device, a system constituted by a plurality of network-connected devices or the like.
The computer of the present embodiment is not limited to a personal computer, but includes an arithmetic processing unit, a microcomputer, etc., included in an information processing device, and is a generic name for a device that can implement the functions of the present embodiment by a program.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2014-140481 | Jul 2014 | JP | national |