The present invention relates to an information processing device, a generation method, and a storage medium.
At worksites such as factories, with the aim of improving operations, problems are extracted by measuring and visualizing working hours to evaluate variations in the working hours and comparing work performed by different persons.
Furthermore, as one of work detection methods, there has been a method of generating a region of interest (ROI) according to work for a moving image of a worksite captured by an imaging device, such as a camera, and detecting work of a person based on entrance of the person into the region of interest.
Furthermore, there has been known a technique of performing clustering related to movements of persons (e.g., Patent Documents 1 and 2).
Patent Document 1: Japanese Laid-open Patent Publication No. 2017-090965, Patent Document 2: International Publication Pamphlet No. WO 2011/013299.
According to an aspect of the embodiments, an information processing device includes one or more memories; and one or more processors coupled to the one or more memories and the one or more processors configured to: specify, from a moving image obtained by imaging work of a person, a first plurality of stationary positions at which the person is stationary and a movement order in which the person moves through the first plurality of stationary positions, divide the first plurality of stationary positions into a first plurality of clusters by clustering the first plurality of stationary positions, when a cluster included in the first plurality of clusters includes a pair of stationary positions with a relationship of a movement source and a movement destination in the movement order, divide a second plurality of stationary positions included in the cluster into a second plurality of clusters by clustering the second plurality of stationary positions, and generate a region of interest in the moving image based on the second plurality of clusters.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
A region of interest may be manually generated while viewing a moving image obtained by imaging work of a person, for example. However, since it takes time and effort to manually generate a region of interest, there is a demand for a technique capable of automatically generating a region of interest highly accurately.
In one aspect, an object of the present invention is to provide a technique capable of automatically generating a region of interest highly accurately.
It becomes possible to automatically generate a region of interest highly accurately.
Hereinafter, several embodiments of the present invention will be described in detail with reference to the drawings. Note that corresponding elements in a plurality of drawings are denoted by the same reference sign.
As described above, as one of work detection methods, there has been a method of generating a region of interest for each work area in a moving image of a worksite captured by an imaging device and detecting work of a person based on entrance of the person into the region of interest. Such a detection method is suitable for, for example, use at worksites where a person produces a product while moving through a plurality of work locations, such as cellular manufacturing and a job shop system.
Furthermore, the region of interest may be manually generated while viewing the moving image obtained by imaging the work of the person, for example. However, since it takes time and effort to manually generate a region of interest, there is a demand for a technique capable of automatically generating a region of interest highly accurately.
Here, for example, it is conceivable to generate, assuming that the person is stationary during the work, the region of interest based on a stationary position by detecting the stationary position at which the person is stationary from the moving image obtained by imaging the work.
In one example, a plurality of the stationary positions of the person is detected from moving images obtained by imaging a plurality of types of work. Note that, in the moving images to be used to detect the stationary positions, a plurality of work operations performed by the same person may be captured, or work performed by a plurality of persons may be captured.
In the example of
However, when the stationary positions are clustered, the obtained cluster may not correspond to the work area of the actual work.
For example, in the shooting range of the moving image captured by an imaging device, stationary positions associated with different types of work may be clustered into one cluster when a distance between work areas associated with those different types of work is close.
Furthermore, for example, in the moving image, a length of an object varies depending on a distance from the imaging device used for imaging.
Therefore, for example, at positions distant from the imaging device when a plurality of stationary positions of persons detected from the moving image is clustered, a difference in distance between the stationary positions detected from the persons performing different types of work is too small so that the positions may be detected as one cluster. Alternatively, for example, a plurality of stationary positions detected from a person working in a work area close to the imaging device largely varies so that the positions may be detected as a plurality of clusters.
In
As described above, variations of the stationary positions for one work may differ depending on the position in the image of the work location, work content, and the like. As a result, stationary positions for one work may be divided into a plurality of clusters, and stationary positions for a plurality of types of work may be grouped into one cluster. As a result, at the time of clustering the stationary positions, the obtained cluster may not correspond to the work area of actual work, which may make it difficult to automatically generate a region of interest corresponding to the work based on the cluster of the stationary positions.
In the embodiment to be described below, clustering is performed in such a manner that a cluster obtained by clustering a plurality of stationary positions of a person detected from a moving image does not include a pair of stationary positions having a relationship of a movement source and a movement destination in the movement order in which the person moves through the plurality of stationary positions. Accordingly, it becomes possible to divide the stationary positions into clusters that may be associated with the work area highly accurately. Then, it becomes possible to automatically generate a region of interest associated with the work area highly accurately based on the obtained clusters. Hereinafter, the embodiment will be described in more detail.
The information processing device 501 generates a region of interest based on moving images captured by the imaging device 502, for example. In one example, the information processing device 501 may be coupled to the imaging device 502, and may obtain moving images from the imaging device 502. In another example, the information processing device 501 may obtain moving images captured by the imaging device 502 via another device.
As described above, for example, it is conceivable to generate, assuming that a position of a person is stationary during work, a region of interest based on a stationary position by detecting the stationary position at which the person is stationary from a moving image obtained by imaging the work. Note that it is conceivable that the stationary position may slightly differ depending on the person, and the stationary position of the same person may slightly differ for each work operation. Therefore, for example, a plurality of work operations performed by the same person may be captured and work performed by a plurality of persons may be captured in the moving images.
Then, the control unit 601 detects a person from the captured moving image. The person may be detected using, for example, a known human detection technique. In one example, human detection may be carried out using a technique using local feature values such as Histogram of Oriented Gradients (HOG), OpenPose, or the like. Subsequently, the control unit 601 detects, for example, a position at which the detected person remains stationary while satisfying a predetermined condition as a stationary position. Examples of further details of the stationary position detection will be described later.
Then, for example, it is conceivable to cluster a plurality of stationary positions detected from the moving images, and to use, as a work area, each of obtained clusters of the stationary positions for region of interest generation.
Then, in the embodiment, the control unit 601 also obtains information regarding the movement order in which a person moves between stationary positions at the time of detecting the stationary positions from moving images. For example, in
In this case, the control unit 601 may cluster the plurality of detected stationary positions by dividing them into two, which is the minimum number of divisions.
Here, it is conceivable that, when a person completes a certain work operation and shifts to another work operation, the person moves from the stationary position of the certain work operation to the stationary position of the another work operation, for example. Accordingly, the control unit 601 determines whether or not a pair of stationary positions having a relationship of a movement source and a movement destination is included in the cluster. Then, when a pair of stationary positions having a relationship of a movement source and a movement destination is included in the same cluster, it may be considered that stationary positions associated with a plurality of types of work are mixedly present in the cluster. Accordingly, when a pair of stationary positions having a relationship of a movement source and a movement destination is included in the cluster, the control unit 601 further performs clustering by dividing the cluster into two. On the other hand, when a pair of stationary positions having a relationship of a movement source and a movement destination is not included in the cluster, the control unit 601 may end the clustering of the cluster.
For example, in
On the other hand, in
As described above, by performing the clustering using the information regarding the movement order, it becomes possible to divide the stationary positions into clusters well associated with work areas. Then, it becomes possible to generate a work area and a corresponding region of interest highly accurately based on the clusters of the stationary positions obtained by the division.
For example, when a pair of stationary positions having a relationship of a movement source and a movement destination is included in the same cluster, it is considered that stationary positions associated with a plurality of types of work are mixedly present in the cluster, and the cluster is further divided. Accordingly, it becomes possible to highly accurately separate a stationary position of certain work from a stationary position of another work at a close distance. For example, at a worksite where a person produces a product while moving through a plurality of work locations, such as the cellular manufacturing and the job shop system, a flow line may be designed to minimize a movement distance of the person during execution of a series of work operations to enhance work efficiency. Even in such a case, according to the clustering according to the embodiment, it becomes possible to divide stationary positions associated with two adjacently arranged work areas highly accurately, and to automatically generate a region of interest based on the clusters of divided stationary positions highly accurately.
Moreover, in the example of
Note that, while the labels N of the stationary positions are present in association with work with the suffixes a, b, and c in the example of
For example, as described above, it becomes possible to suppress excessive division and insufficient division for differences in clustering and to perform clustering well associated with work by performing the clustering step by step using the information indicating the movement order of the stationary positions.
Hereinafter, a region of interest generation process according to the embodiment will be described.
The stationary time period is information indicating, for example, a time period during which a person detected from the moving image remains stationary while satisfying a predetermined condition. In the stationary position information 900 in
The person ID is an identifier assigned to identify a person detected from the moving image. In the person ID of the stationary position information 900, for example, a person ID for identifying a person detected to be in the stationary state in the stationary time period of the record is registered.
The stationary label is a label assigned to, for example, a stationary position detected in the stationary time period of the record. In one example, when a plurality of stationary time periods is detected from the moving image for a specific person identified by the person ID, a series of labels may be assigned as the stationary label according to the order in which the stationary time periods have been detected for the person in the moving image. For example, in the stationary position information 900 in
As the stationary position of the stationary position information 900, for example, information indicating a position of a person detected in the stationary time period of the record may be registered. The stationary position may be expressed by, for example, coordinates indicating a position in a frame image of the moving image. In one example, the coordinates may be represented by the number of pixels in the longitudinal and lateral directions from a predetermined pixel to the stationary position in the frame image.
As the previous label of the stationary position information 900, for the person identified by the person ID of the record, a stationary label of a stationary position detected immediately before the stationary position of the record may be registered. Furthermore, as the subsequent label of the stationary position information 900, for the person identified by the person ID of the record, a stationary label of a stationary position detected immediately after the stationary position of the record may be registered. Note that, in the stationary position information 900 in
Next, an operation flow of the region of interest generation process according to the embodiment will be described.
In step 1001 (hereinafter, step will be written as āSā and for example, written as S1001), the control unit 601 carries out human detection from the moving image. For example, the control unit 601 cuts out a frame image of each frame from the moving image. Then, the control unit 601 performs the human detection on the cut out frame image, and extracts information regarding a person and a skeleton of the person, such as joint positions. The human detection and the skeleton extraction may be carried out using, for example, a technique using local feature values such as HOG, or a known technique such as OpenPose. Then, the control unit 601 assigns a person ID to the person detected from the moving image.
In S1002, the control unit 601 detects, for each detected person, a stationary time period during which the person remains stationary while satisfying a predetermined condition in the moving image. For example, for each detected person, the control unit 601 may trace movement of the person in the moving image to determine whether or not the person is moving or stationary. Note that the determination on whether the person is stationary may be made using various known techniques.
For example, the control unit 601 may specify a stationary time period in the moving image as a section in which a predetermined part of the person based on the skeleton information of the detected person does not move while satisfying a predetermined condition. In one example, the control unit 601 may determine the stationary state in the section between the current frame and the previous frame when a distance between ankle coordinates of the person present in the current frame image and ankle coordinates of the person present in the previous frame image is equal to or less than a predetermined threshold. Then, the control unit 601 may extract, as a stationary time period, a time period during which the stationary state continues for equal to or more than a predetermined number of frames, and may assign a stationary label to the extracted stationary time period. For example, the control unit 601 may assign Nx to the stationary time period as the stationary label. N of Nx may be a person ID, x may be a value representing the order of detection of the stationary time periods, and a label may be assigned in alphabetical order starting with a. Then, the control unit 601 registers, in the stationary position information 900, a record in which the stationary time period detected for the person is associated with the person ID and the stationary label.
In S1003, the control unit 601 specifies a stationary position of the person for each stationary time period detected from the moving image. In one example, the control unit 601 may specify, as the stationary position, a representative position representing a position of a predetermined part of the person in each frame in the stationary time period. For example, the control unit 601 may average the ankle positions in the individual frames in the stationary time period to use it as a stationary position. Note that the representative position according to the embodiment is not limited to this, and another statistical value, such as a median value, may be used as the representative position instead of the average. Then, the control unit 601 may register the coordinates of the specified stationary position in association with the stationary time period in the record of the stationary position information 900 registered in S1002.
In S1004, the control unit 601 specifies movement information. For example, the control unit 601 may specify, as the movement information, pieces of information indicating the stationary labels of the stationary time periods immediately before and immediately after the certain stationary time period specified for the person, and registers them as the previous label and the subsequent label of the stationary position information 900, respectively.
In S1005, the control unit 601 clusters the stationary positions by dividing them into two. For example, the control unit 601 may cluster the stationary positions registered in the stationary position information 900 into two clusters using an existing clustering technique such as the K-means method.
In S1006, the control unit 601 determines whether or not there is a pair of stationary positions having a relationship of a movement source and a movement destination in the movement order in which the person detected from the moving image moves through the stationary positions in the cluster obtained by the clustering. Then, if there is a pair of stationary positions having a relationship of a movement source and a movement destination (YES in S1006), the flow proceeds to S1007. For example, the control unit 601 may refer to the record of the stationary position information 900 for the stationary position included in the cluster, and may determine as YES in S1006 if the stationary position registered as the previous label or the subsequent label of the record is included in the same cluster.
In S1007, the control unit 601 further performs clustering on the cluster including the pair of stationary positions having a relationship of a movement source and a movement destination by dividing it into two, and the flow returns to S1006 to repeat the process.
On the other hand, if there is no cluster including a pair of stationary positions having a relationship of a movement source and a movement destination in the clusters obtained by the clustering (NO in S1006), the flow proceeds to S1008. For example, the control unit 601 may refer to the record of the stationary position information 900 for the stationary position included in the cluster, and may determine as NO in S1006 if the stationary position registered as the previous label or the subsequent label of the record is not included in the same cluster.
In S1008, the control unit 601 generates a region of interest based on the cluster obtained by the clustering, and the flow is terminated. For example, the control unit 601 may generate a region of interest to include at least a part of the stationary positions included in the cluster.
For example, the control unit 601 may generate, as the region of interest, a rectangular region including the maximum value and the minimum value in each axis direction of the coordinates of the stationary position included in the cluster. Alternatively, for example, the control unit 601 may generate an interior region by connecting the stationary positions included in the cluster and arranged on the outermost sides, and use it as the region of interest. Note that the generation of the region of interest based on the cluster of the stationary positions is not limited to those, and may be generated using other techniques.
As described above, according to the embodiment, it becomes possible to highly accurately generate the region of interest in the moving image based on the stationary positions to be detected.
Subsequently, the control unit 601 of the information processing device 501 generates and outputs a region of interest based on the obtained cluster of the stationary positions (1104 in
Meanwhile,
Although the embodiment has been exemplified above, the embodiment is not limited to this. For example, the operation flow described above is exemplary, and the embodiment is not limited to this. If possible, the operation flow may be performed by changing the processing order, may further include additional processing, may omit a part of the process, or may replace a part of the process.
For example, in the processing of S1005 and S1007 in
Furthermore, although the ankle has been exemplified as a predetermined part of the person used to determine the stationary state in the embodiment described above, the embodiment is not limited to this, and another part may be used. As another example, the stationary state may be determined using coordinates of the heels of the person or coordinates of the center of gravity of the body such as the back.
Furthermore, although the example of determining the stationary state when the movement of the predetermined part in successive frames is equal to or less than the threshold has been described in the embodiment above, the embodiment is not limited to this. In another embodiment, a threshold to be used to determine whether or not a person is stationary may be appropriately set according to the size of the person present in a frame image, such as by multiplying the distance between the knee joint and the ankle of the person present in the frame image by a predetermined coefficient. As described above, since the size of the person present in the frame image varies depending on the distance from the imaging device, it becomes possible to improve the accuracy in determining the stationary state by relatively setting a threshold based on a distance between joints detected from the person or the like.
Moreover, the algorithm for determining the stationary state according to the embodiment is not limited to the example described above. In another example, the control unit 601 may determine a time period during which a person continuously moves for equal to or more than a predetermined number of frames, such as the person keeps moving for 10 consecutive frames, as movement, and may specify a time period with no movement as a stationary time period. Alternatively, the control unit 601 may detect a stationary time period when movement of the person is not detected at equal to or more than a predetermined rate in a predetermined period of time. Moreover, for the stationary state detection, another algorithm may be used as long as one stationary position may be detected for a person detected from a moving image during a period of time from the start to the end of one work, for example.
Then, for example, a stationary position and work are well associated when the algorithm for the stationary state detection is adjusted such that one stationary position may be detected from the person performing one work, whereby it becomes possible to improve the accuracy in generating a region of interest for the work based on the clustering described above.
Furthermore, although the example of applying the embodiment to the setting of the region of interest used to detect the work of the person has been described above, the embodiment is not limited to this. For example, an object for which a region of interest is generated may be an object other than a person, such as an animal or a machine. For example, the embodiment may be applied to set a region of interest in a region where another part and another object, which repeatedly move and stop, enter a stationary state in a moving image.
In the embodiment described above, the control unit 601 operates, for example, as the specifying unit 611 in the processing of S1003 and S1004. Furthermore, the control unit 601 operates, for example, as the division unit 612 in the processing of S1005 and S1007. The control unit 601 operates, for example, as the generation unit 613 in the processing of S1008.
The processor 1301 may be, for example, a single processor, a multiprocessor, or a multicore processor. The processor 1301 executes a program describing procedures of the operation flow described above, for example, using the memory 1302, thereby providing a part or all of the functions of the control unit 601 described above. For example, the processor 1301 of the information processing device 501 reads and executes the program stored in the storage device 1303, thereby operating as the specifying unit 611, the division unit 612, and the generation unit 613.
The memory 1302 is, for example, a semiconductor memory, and may include a RAM region and a ROM region. The storage device 1303 is, for example, a semiconductor memory such as a hard disk or a flash memory, or an external storage device. Note that the RAM is an abbreviation for random access memory. In addition, the ROM is an abbreviation for read only memory.
The reading device 1304 accesses a removable storage medium 1305 in accordance with an instruction from the processor 1301. The removable storage medium 1305 is implemented by, for example, a semiconductor device, a medium to and from which information is input and output by magnetic action, a medium to and from which information is input and output by optical action, or the like. Note that the semiconductor device is, for example, a universal serial bus (USB) memory. Furthermore, the medium to and from which information is input and output by magnetic action is, for example, a magnetic disk. The medium to and from which information is input and output by optical action is, for example, a CD-ROM, a DVD, or a Blu-ray disc (Blu-ray is a registered trademark), or the like. The CD is an abbreviation for compact disc. The DVD is an abbreviation for digital versatile disk.
The storage unit 602 includes, for example, the memory 1302, the storage device 1303, and the removable storage medium 1305. For example, the storage device 1303 of the information processing device 501 stores moving images obtained by capturing work, and the stationary position information 900.
The communication interface 1306 communicates with another device in accordance with an instruction from the processor 1301. In one example, the communication interface 1306 may exchange data with another device, such as the imaging device 502, via wired or wireless communication. The communication interface 1306 is an example of the communication unit 603 described above.
The input/output interface 1307 may be, for example, an interface between an input device and an output device. The input device is, for example, a device that receives an instruction from a user, such as a keyboard, a mouse, or a touch panel. The output device is, for example, a display device such as a display, or an audio device such as a speaker.
Each program according to the embodiment is provided to the information processing device 501 in the following forms, for example.
(1) Installed on the storage device 1303 in advance.
(2) Provided by the removable storage medium 1305.
(3) Provided from a server such as a program server.
Note that the hardware configuration of the computer 1300 for implementing the information processing device 501 described with reference to
Several embodiments have been described above. However, the embodiments are not limited to the embodiments described above, and it should be understood that the embodiments include various modifications and alternatives of the embodiments described above. For example, it would be understood that various embodiments may be embodied by modifying components without departing from the spirit and scope of the embodiments. Furthermore, it would be understood that various embodiments may be implemented by appropriately combining a plurality of components disclosed in the embodiments described above. Moreover, a person skilled in the art would understand that various embodiments may be implemented by removing some components from all the components indicated in the embodiments or by adding some components to the components indicated in the embodiments.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a continuation application of International Application PCT/JP2020/042193 filed on Nov. 12, 2020 and designated the U.S., the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2020/042193 | Nov 2020 | WO |
Child | 18191034 | US |