The present invention relates to a music playing method, a music playing device, and a program.
Playing musical instruments such as a guitar, a violin, and a trumpet is not easy, and it often takes time to practice. Moreover, for children, elderly persons, and disabled persons, there is a case where it is difficult to play musical instruments because of physical reasons. However, a person who is difficult to play musical instruments also has a desire to play musical instruments for interest and entertainment.
Patent Literature 1 discloses art that enables outputting of a musical instrument sound without actually playing a musical instrument. Specifically, the art of Patent Literature 1 extracts a skin color portion such as a hand or an arm of a user from a captured image of the user, analyzes the optical flow of such a skin color portion, and outputs a sound of a musical instrument according to the orientation and the magnitude of the optical flow.
However, in the art of Patent Literature 1, since a sound of a musical instrument is output based on the optical flow of a body part such as a hand or an arm of a user, some users may be difficult to perform operation because of a physical reason. As a result, there is a problem that users who can use it may be limited.
In view of the above, an object of the present invention is to provide a music playing method, a music playing device, and a program capable of solving the problem described above, that is, a problem that in a system that can output a sound without actually playing a musical instrument, users who can use it may be limited.
A music playing method according to one aspect of the present invention is configured to include
Further, a music playing device according to one aspect of the present invention is configured to include
Further, a program according to one aspect of the present invention is configured to cause an information processing device to realize
With the configurations described above, the present invention can suppress limitation of users who can use a system that can output a sound without actually playing a musical instrument.
A first exemplary embodiment of the present invention will be described with reference to
[Configuration]
A music playing system of the present embodiment is a system for enabling a sound to be output without actually playing a musical instrument by a user U. A music playing system is installed in, for example, an event site, and when the user U attending the event performs a predetermined operation, the system operates to output a sound without playing a musical instrument. However, the music playing system may be installed in any place, and the user U may be any person. For example, the target user U may be a child. an elderly person, or a disabled person, and the music playing system may be installed in any facility such as a dancing school, a gymnastic school, or a rehabilitation facility. That is, the music playing system may be used not only for the purpose of simply playing a musical instrument by the user U but for the purpose of moving a body as described below.
As illustrated in
The music playing device 10 is configured of one or a plurality of information processing devices each having an arithmetic device and a storage device. As illustrated in
The acquisition unit 11 (acquisition means) acquires a captured image captured by the camera 1. Then, the acquisition unit 11 detects the user U shown in the captured image, and acquires a plurality of feature points of the user U having been set. Specifically, the acquisition unit 11 uses a posture estimation technique as described in Non-Patent Literature 1 to extract joint positions and body part positions of the user U as feature points, and acquires position information of a specific feature point among them. For example, as illustrated in
However, the acquisition unit 11 may extract any feature points of the user U and may use a combination of any feature points as a feature point set. For example, the acquisition unit 11 may use a combination of a plurality of feature points of body parts such as an ankle, a knee, and a hip of the lower half of a body as a feature point set. Further, the acquisition unit 11 may use a combination of a joint position and other body part positions such as an eye and a nose as a feature point set, without limiting to using a combination of feature points of a plurality of joint positions as a feature point set. Further, the acquisition unit 11 may extract feature points by using any technique, not necessarily limiting to extracting feature points of the user U by using the posture estimation technique described above. Note that the number of feature points constituting a feature point set is not limited to three, and may be two or four or more.
The detection unit 12 (detection means) detects a position relationship among the feature points constituting a feature point set acquired from the user U. In the present embodiment, as illustrated in
Examples of previously set position relationships are illustrated in
Further, the position relationships that are previously set for the second feature point set “b” of a left arm are associated with commands to operate an octave, respectively. For example, the position relationship illustrated in the center drawing of
Here, when the detection unit 12 detects position relationships of the respective feature point sets “a” and “b”, the detection unit 12 determines a previously set position relationship to which the position relationship actually detected from the user U corresponds. At that time, the detection unit 12 selects a previously set position relationship that is determined to be the same as the position relationship detected from the user U, according to a predetermined reference. That is, the detection unit 12 sets a similarity range for each of the previously set position relationships, and when the position relationship detected from the user U is included in such a similarity range, the detection unit 12 determines that they are the same according to the predetermined reference. Therefore, the detection unit 12 does not require the position relationships of the feature point sets “a” and “b” to be completely the same as any of the preset position relationships. The detection unit 12 determines that those having an angle defined by a first line segment (a1 or b1) and a second line segment (a2 or b2) or the orientations of the first line segment (a1 or b1) and the second line segment (a2 or b2) are determined to be the same within the predetermined range, have the same position relationship.
Note that the detection unit 12 may detect the position relationships of the respective feature point sets “a” and “b” by any method, without being limited to the detection based on the shape linking the feature points as described above. For example, the detection unit 12 may specify the position relationships of the respective feature point sets “a” and “b” from the position relationships of coordinates of the respective feature points. Further, while the detection unit 12 detects the respective position relationships of the two feature point sets “a” and “b”, the detection unit 12 may detect position relationships of one or three or more feature point sets.
Further, in the above description, the detection unit 12 specifies a note in the musical scale by the position relationship of the first feature point set “a”, specifies an octave by the position relationship of the second feature point set “b”, and consequently specifies one note from the combination of the position relationships of the feature point sets “a” and “b”. However, another note may be specified by the position relationship of the first feature point set “a” or the position relationship of the second feature point set “b”. For example, the position relationship of the second feature point set “b” may be associated with the type of a musical instrument. The position relationship of the second feature point set “b” may specify a musical instrument, and the position relationship of the first feature point set “a” may specify the scale of the musical instrument. Further, the detection unit 12 is not limited to specify one note from a combination of the position relationships of the plurality of feature point sets “a” and “b”, and may specify a note for each position relationship of each feature point set. Further, the detection unit 12 is not necessarily limited to specify a single note by the position relationship of a feature point set, but may specify any sound such as a chord in which a plurality of notes sound simultaneously.
The sound output unit 13 (output means) outputs the sound specified based on the position relationships of the feature point sets “a” and “b” as described above, from the loudspeaker 3. At that time, the sound output unit 13 acquires sound source data corresponding to the sound specified based on the position relationships of the feature point sets “a” and “b” from the sound source data stored in the sound information storage unit 15, and plays such sound data to thereby output the sound from the loudspeaker 3. Note that when one sound is specified from a combination of the position relationship of the feature point sets “a” and “b”, the sound output means outputs the sound thereof, while when different sounds are specified from the position relationships of the feature point sets “a” and “b” respectively, the sound output means outputs the sounds simultaneously, or sequentially outputs the sounds at time intervals.
The video image output unit 14 (output means) outputs the captured image captured by the camera 1 so as to directly display it on the display 2. At that time, the video image output unit 14 may display the captured image, and also display information representing the position relationships of the feature point sets “a” and “b” of the user U detected by the detection unit 12 by adding it on the captured image. For example, as illustrated in
Further, the video image output unit 14 may output sample information configured of images showing a sample of a position relationship of a feature point set, stored in the sample information storage unit 16, to display it on the display 2. At that time, the video image output unit 14 may output a plurality of pieces of sample information on the display 2 or output sample information of a position relationship of a feature point set corresponding to the sound that is requested to be output currently.
Note that the sound output unit 13 may first output a sound of the sound source data from the loudspeaker 3 and request the user U to take a posture of the position relationship of the feature point set corresponding to such sound. In association with it, the video image output unit 14 may output, on the display 2, sample information of the position relationship of the feature points corresponding to the sound output by the sound output unit 13. Further, when the sound output unit 13 first plays the sound source data and outputs a sound or the video image output unit 14 first displays sample information, it is possible to determine whether a position relationship that is the same as the position relationship of the feature point set corresponding to such sound source data or such sample information is detected from the user U by the detection unit 12, and display the determination result on the display 2.
[Operation]
Next, operation of the music playing device 10 described above will be described with mainly reference to the flowchart of
Then, the music playing device 10 detects the user U shown in the captured image, and acquires a plurality of predetermined feature points of the user U (step S2). For example, as illustrated in
Then, the music playing device 10 detects a position relationship of the acquired feature point set configured of a plurality of feature points (step S3). For example, as illustrated in
Then, the music playing device 10 specifies a sound from the position relationship of each of the detected feature point sets, and outputs the sound from the loudspeaker 3 (step S4). Specifically, the music playing device 10 specifies an octave from the position relationship of the second feature point set “b” of the left arm, and specifies a note in the musical scale from the position relationship of the first feature point set “a” of the right arm, and outputs the sound of the finally specified note from the loudspeaker 3. At that time, the music playing device 10 specifies and outputs one note from a combination of position relationships of a plurality of the feature point sets. However, the music playing device 10 may specify a note from each position relationship of a plurality of feature point sets, and outputs a plurality of specified notes. The music playing device 10 may output the position relationship of a feature point set detected from the user U as described above to display it on the display 2.
<Modifications>
Next, modifications of the music playing device 10 of the present embodiment will be described. In the above description, the music playing device 10 specifies a sound to be output based on the position relationship of a feature point set of the user U. However, the music playing device 10 may specify a sound to be output based on the information of the user U. Therefore, the music playing device 10 may have a configuration as described below.
The acquisition unit 11 extracts the user U shown in the captured image captured by the camera 1, and detects information of the user U. For example, from the appearance of the user U, the acquisition unit 11 detects information about attributes such as gender, age group, height (high or low), and the type of clothes (comfortable outfit, uncomfortable outfit), and the standing position of the user U in the captured image. That is, the acquisition unit 11 also has a function as a user information detection unit that detects information such as attributes and characteristics of the user. As an example, the acquisition unit 11 specifies a part of the face from a face image of the user U and detects the gender and age group from the characteristics of the face part, or specifies the size in the height direction and the position of the user with respect to the captured image and detects information such as the height and the standing position from such information. However, the information of the user U may be any information, and detection of such information may be performed by any method.
When there are a plurality of users U in the captured image, the acquisition unit 11 extracts each of the users U, and extracts information such as attributes for each user U. For example, as illustrated in
Then, as similar to the above description, the acquisition unit 11 acquires a feature point set configured of a plurality of feature points of the user U. At that time, when the acquisition unit 11 extracts a plurality of users U1 and U2, the acquisition unit 11 acquires a feature point set for each of the users U1 and U2. Note that while the examples of
The detection unit 12 detects, for each of the users U1 and U2 detected as described above, the position relationship of each feature point set of each of the users U1 and U2. Then, for each of the users U1 and U2, the detection unit 12 specifies a sound to be output by using the information of the users U1 and U2 detected as described above and the position relationship of the feature point set of each of the users U1 and U2. For example, the detection unit 12 specifies a musical instrument by the gender of each of the users U1 and U2 and specifies the sound of the musical instrument by the position relationship of the feature point set of the user U. For example, in the example of
While the case of changing the musical instrument according to the information of the users U1 and U2 has been described above, the detection unit 12 may change another element of specifying the sound to be output. For example, the octave may be changed according to the information of the users U1 and U2. Further, the detection unit 12 may change the degree of difficulty of outputting the sound according to the information of the users U1 and U2. For example, while the detection unit 12 specifies the sound of each musical instrument on the basis of a position relationship of a feature point set detected from each of the users U1 and U2, the detection unit 12 may change the reference for determining that the position relationship detected from the user U and the previously set position relationship are the same, according to the information from the users U1 and U2. As an example, when detecting that a user is a child, the detection unit 12 may set the previously set similarity range of the position relationship to be wider compared with the case of detecting that a user is an adult so as to allow the position relationship detected from the user U to be determined to be the same as any of the previously set position relationships, to thereby lower the degree of difficulty of outputting the sound.
The sound output unit 13 outputs a specified sound from the loudspeaker 3 as described above. For example, when it is specified that the musical instrument is a trumpet, the sound output unit 13 uses the sound source data of a trumpet and outputs the sound specified from the position relationship of the feature point set. At that time, the user U may be allowed to move freely, without displaying sample information to the user U as described above. Then, the sound output unit 13 may output a sound specified from the information of the user U who is moving freely and the position relationship of the feature point set by the acquisition unit 11 and the detection unit 12. Further, the sound output unit 13 may generate a musical score of the output sound or record the sound.
As described above, according to the music playing device and the music playing method of the present embodiment, it is possible to detect position relationships of a plurality of feature point sets acquired from a captured image of a person, and output a sound specified based on such position relationships. Therefore, even in the case where an operation similar to the operation of playing a musical instrument is difficult because of a physical reason for children, elderly persons, and disabled persons, it is possible to output a sound by an easy operation. Further, even in the situation where an optical flow of a body part such as a hand or an arm of a user cannot be detected due to the clothes of the user or imaging conditions (imaging environment, frame rate, or the like), it is possible to output a sound easily. As a result, in a system that allows a sound to be output without actually playing a musical instrument, the system can be used by any user, and it is possible to improve entertainment and to use it for various purposes.
Next, a second exemplary embodiment of the present invention will be described with reference to
First, a hardware configuration of a music playing device 100 in the present embodiment will be described with reference to
The music playing device 100 can construct, and can be equipped with, an acquisition means 121, a detection means 122, and an output means 123 illustrated in
Note that
The music playing device 100 executes the music playing method illustrated in the flowchart of
As illustrated in
With the configuration described above, the present invention can detect position relationships of a plurality of feature point sets acquired from a captured image of a person, and output a sound specified based on such position relationships. Therefore, even in the case where an operation similar to the operation of playing a musical instrument is difficult because of a physical reason for children, elderly persons, and disabled persons, it is possible to output a sound by an easy operation. As a result, in the system that allows a sound to be output without actually playing a musical instrument, the system can be used by any user, and it is possible to improve the entertainment and to use it for various purposes.
Note that the program described above can be supplied to a computer by being stored in a non-transitory computer-readable medium of any type. Non-transitory computer-readable media include tangible storage media of various types. Examples of non-transitory computer-readable media include magnetic storage media (for example, a flexible disk, a magnetic tape, and a hard disk drive), magneto-optical storage media (for example, a magneto-optical disk), a CD-ROM (Read Only Memory), a CD-R, a CD-R/W, and semiconductor memories (for example, a mask ROM, a PROM (Programmable ROM), an EPROM (Erasable PROM), a flash ROM, and a RAM (Random Access Memory)). Note that the program may be supplied to a computer by being stored in a transitory computer-readable medium of any type. Examples of transitory computer-readable media include electric signals, optical signals, and electromagnetic waves. A transitory computer-readable medium can be supplied to a computer via a wired communication channel such as a wire and an optical fiber, or a wireless communication channel.
While the present invention has been described with reference to the exemplary embodiments described above, the present invention is not limited to the above-described embodiments. The form and details of the present invention can be changed within the scope of the present invention in various manners that can be understood by those skilled in the art. Further, at least one of the functions of the acquisition means, the detection means, and the output means described above may be carried out by an information processing device provided and connected to any location on the network, that is, may be carried out by so-called cloud computing.
<Supplementary Notes>
The whole or part of the exemplary embodiments disclosed above can be described as the following supplementary notes. Hereinafter, outlines of the configurations of a music playing method, a music playing device, and a program, according to the present invention, will be described. However, the present invention is not limited to the configurations described below.
A music playing method comprising:
The music playing method according to supplementary note 1, further comprising
The music playing method according to supplementary note 1 or 2, further comprising
The music playing method according to any of supplementary notes 1 to 3, further comprising:
The music playing method according to any of supplementary notes 1 to 4, further comprising:
The music playing method according to supplementary note 5, further comprising
The music playing method according to any of supplementary notes 1 to 6, further comprising:
The music playing method according to supplementary note 7, further comprising
The music playing method according to supplementary note 7 or 8, further comprising
The music playing method according to supplementary note 9, further comprising outputting a sound of a different musical instrument for each person.
The music playing method according to any of supplementary notes 1 to 10, further comprising
The music playing method according to any of supplementary notes 1 to 11, further comprising
A music playing device comprising:
The music playing device according to supplementary note 13, wherein
The music playing device according to supplementary note 13 or 14, wherein
The music playing device according to any of supplementary notes 13 to 15, wherein
The music playing device according to any of supplementary notes 13 to 16, wherein
The music playing device according to supplementary note 17, wherein
The music playing device according to any of supplementary notes 13 to 18, wherein
The music playing device according to supplementary note 19, wherein
The music playing device according to supplementary note 19 or 20, wherein
The music playing device according to supplementary note 21, wherein
The music playing device according to any of supplementary notes 13 to 22, wherein
The music playing method according to any of supplementary notes 13 to 23, wherein
A computer-readable storage medium storing thereon a program for causing an information processing device to realize:
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/011762 | 3/17/2020 | WO |