The present disclosure relates to a text providing method and a text providing device.
A plurality of chords constituting a piece of music changes an impression given to a listener by a combination thereof (for example, chord progression arranged in chronological order). General listeners receive an impression of the music intuitively. The listener can confirm the impression of the music by analyzing the music based on music theory such as chord progression. A technique of detecting cadence in a music score indicating chord progression of a piece of music, displaying an arrow symbol in a cadence part, and changing a color in a display according to a type of cadence is disclosed in, for example, Japanese Laid-Open Patent Publication No. 2020-56938. A user can recognize a part of the chord included in the music corresponding to the cadence and the type of the cadence by the arrow symbol and the color.
According to an embodiment of the present disclosure, there is provided a text providing method including obtaining text corresponding to chord input data in which chords are aligned in chronological order, based on a relationship between chord sequence data, in which chords are aligned in chronological order, and explanatory text related to the chords included in the chord sequence data.
According to an embodiment of the present disclosure, there is provided a text providing device including a control unit including a processor and a memory. The control unit is configured to obtain a text corresponding to chord input data in which chords are aligned in chronological order, based on a relationship between chord sequence data, in which chords are aligned in chronological order, and explanatory text related to the chords included in the chord sequence data.
There are various types of chord progression included in music. Understanding the type of chord progression is an important factor that supports the impression of a song. According to the technique described in Japanese Laid-Open Patent Publication No. 2020-56938, a user can recognize a part and a type of cadence included in a musical score of a piece of music, based on image information such as an arrow symbol and a color. However, if the user does not have a certain degree of knowledge of music theory, the user cannot understand the meaning based on the image information, and cannot utilize the obtained information.
According to the present disclosure, it is possible to provide an explanatory text related to a chord, based on a plurality of chords aligned in chronological order.
Hereinafter, an embodiment of the present disclosure will be described in detail with reference to the drawings. The following embodiments are examples, and the present disclosure should not be construed as being limited to these embodiments. In the drawings referred to in the present embodiment, the same or similar parts are denoted by the same reference signs or similar reference signs (only denoted by A, B, or the like after a numeral), and repetitive description thereof may be omitted.
The text providing server 1 receives data related to music from the communication terminal 9 via the network NW, and transmits explanatory text corresponding to a chord progression included in the music to the communication terminal 9. In the communication terminal 9, the explanatory text can be displayed on a display. The text providing server 1 generates explanatory text using a trained model obtained by a machine learning. A trained model 155 receives chord input data in which chords constituting a piece of music are aligned in chronological order, and outputs explanatory text related to the chord progression by an arithmetic process using a neural network. The model generating server 3 executes the machine learning process using a teacher data set to generate the trained model to be used in the text providing server 1. Hereinafter, the text providing server 1 and the model generating server 3 will be described.
The text providing server 1 includes a control unit 11, a communication unit 13, and a memory unit 15. The control unit 11 includes a CPU (processor), a RAM, and a ROM. The control unit 11 executes a program stored in the memory unit by the CPU to perform a process according to an instruction described in the program. The program includes a program 151 for performing a text providing process to be described later.
The communication unit 13 includes a communication module, and is connected to the networked NW to transmit and receive various types of data to and from other devices.
The memory unit 15 includes a memory device such as a nonvolatile memory, and stores the program 151 and the trained model 155. In addition, various data used in the text providing server 1 is stored. The memory unit 15 may store a music database 159. The music database 159 is described in another embodiment. The program 151 should be executable by a computer, and may be provided to the text providing server 1 in a state of being stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the text providing server 1 should include a device for reading the recording medium. The program 151 may be provided by downloading via the communication unit 13.
The trained model 155 is generated by the machine learning in the model generating server 3 and provided to the text providing server 1. When the chord input data is input, the trained model 155 outputs the explanatory text related to the chord by the arithmetic process using the neural network. In this embodiment, the trained model 155 is a model using RNN (Recurrent Neural Network). The trained model 155 utilizes Seq2Seq (Sequence To Sequence), that is, an encoder and a decoder are included as described below. The chord input data and the explanatory text are examples of data described in chronological order, and will be described in detail later. Therefore, it is preferable that the trained model 155 adopts a model that is advantageous for handling time-series data.
The trained model 155 may be a model using LSTM (Long Short Term Memory) or GRU (Gated Recurrent Unit). The trained model 155 may be a model using CNN (Convolutional Neural Network), Attention (Self-Attention, Source Target Attention), and the like. The trained model 155 may be a model in which a plurality of models is combined. The trained model 155 may be stored in another device connected via the network NW. In this case, the text providing server 1 may be connected to the trained model 155 via the network NW.
The model generating server 3 includes a control unit 31, a communication unit 33, and a memory unit 35. The control unit 31 includes a CPU (processor), a RAM, and a ROM. The control unit 31 executes a program stored in the memory unit 35 by the CPU to perform a process according to an instruction described in the program. The program includes a program 351 for performing a model generating process to be described later. The model generating process is a process for generating the trained model 155 using a teacher data set 355.
The communication unit 33 includes a communication module, and is connected to the network NW to transmit and receive various types of data to and from other devices.
The memory unit 35 includes a memory device such as a non-volatile memory, and stores the program 351 and the teacher data set 355. In addition, various data used in the model generating server 3 is stored. The program 351 may be executable by a computer, and may be provided to the model generating server 3 in a state of being stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the text providing server 1 should include a device for reading the recording medium. The program 351 may be provided by downloading via the communication unit 33.
A plurality of teacher data sets 355 may be stored in the memory unit 35. The teacher data set 355 is data in which chord sequence data 357 and explanatory text data 359 are associated with each other, and is used for generating the trained model 155. Details of the teacher data set 355 will be described later.
Next, a text providing process (text providing method) executed by the control unit 11 of the text providing server 1 will be described. The text providing process is started, for example, in response to a request from the communication terminal 9.
When the user operates the communication terminal 9 to instruct transmission of the music chord data, the communication terminal 9 transmits the music chord data to the text providing server 1. When the text providing server 1 receives the music chord data, the control unit 11 generates chord input data from the music chord data (step S103). The chord input data is described by converting each chord included in the music chord data into a predetermined format. Specifically, the chord input data is data in which each chord is described by a chroma vector.
The chord input data is data obtained by aligning the converted data in chronological order. As described above, when the music chord data is “CM7-Dm7- . . . ”, the chord input data is described as data aligned in the order of the converted data corresponding to “CM7”, the converted data corresponding to “Dm7”, . . .
Returning to
The chord input data shown in
In the explanatory text shown in
The second character group (describing the function of the chord) in the explanatory text shown in
Among the explanatory text shown in
The text output data thus obtained is transmitted to the communication terminal 9 that has transmitted the music chord data. Thus, the user of the communication terminal 9 is provided with explanatory text corresponding to the music chord data. The above is the description of the text providing process.
Next, a model generating process (model generating method) executed by the control unit 31 of the model generating server 3 will be described. The model generating process is started in response to a request from a terminal or the like used by an administrator of the model generating server 3. The model generating process may be started in response to a request from the user, that is, the request from the communication terminal 9.
The explanatory text data 359 is data including explanatory text as shown in
The chord sequence data 357 included in the teacher data set 355 includes a sequence of chords corresponding to a piece of music in this example, and has at least one termination marker EOS. The teacher data set 355 may take various forms. A plurality of examples of the form that the teacher data set 355 may take will be described with reference to
The chord sequence data 357 shown in
The chord sequence data 357 in
The chord sequence data 357 in
The explanatory text data 359 includes explanatory texts ED(A) to ED(E) corresponding to the music sections CL(A) to CL(E), respectively. For example, the explanatory text ED(A) includes a character group describing the chord corresponding to the music section CL(A). The explanatory text data 359 shown in
Returning to
The control unit 31 executes the machine learning by error backpropagation using the values output from the training model and the explanatory text data 359 in response to inputting the chord sequence data (step S305). Specifically, a weight coefficient in the neural network of the training model is updated by the machine learning. If there are other teacher data sets 355 to be learned (step S307; Yes), the machine learning is performed using the rest of teacher data sets 355 (step S301, S303, and S305). In the case where there is no other teacher data set 355 to be learned (S307 step; No), the control unit 31 ends the machine learning.
The control unit 31 generates a training model that has already been machine learned as a trained model (step S309), and ends the model generating process. The generated trained model is provided to the text providing server 1 and used as the trained model 155. As described above, the trained model 155 is a model in which a correlation between the chord defined in the chord sequence data 357 and the explanatory text related to the chord is learned.
In the case where the chord sequence data 357 input in the machine learning includes the end marker EOS in the middle of the data as shown in
In an example shown in
The three shown teacher data sets are summarized as follows. The first example is a teacher data set in which a divided region is not set as shown in
In particular, as in the first example and the third example, by increasing the number of chords treated as time-series data, it is possible to realize highly accurate machine learning including a before and after relationship of an order of chords is widely taken into consideration. As in the third example, by narrowing down a range of the before and after relationship, it is possible to exclude parts that are too far apart from each other and have a weak relationship from the object of machine learning, and to realize machine learning with higher accuracy. In the machine learning, only one of these examples may be used, or a plurality of examples may be used in combination.
Next, a correlation between a chord interpretation and the explanatory text will be described in more detail. Here, an example in which II-V-I is detected as an example of a typical chord progression will be described.
The trained model 155 generated by the machine learning described above can output as explanatory text that II-V-I exists even in the case of a chord progression expressed in a form other than the basic form. In some cases, an order of chords constituting a piece of music accidentally includes a chord progression corresponding to II-V-I without intending to be II-V-I. Even in such a case, the trained model 155 generated by the machine learning considering a before and after relationship of an order of chords can output explanatory text considering whether or not the chord progress corresponds to II-V-I.
The trained model 155 estimates that an element “Em7-A7-Ab7” in the chord input data shown in
“In a diatonic chord of the Cmaj scale, Em7-A7-Ab7 is a derivative form of II-V-I (Em7-A7-Dm7) in the Dmaj scale, and changes Dm7 to a tritone substitution. GbM7 is II-V for Ab7 in Dbmaj and is inserted to temporarily delay resolution (cadence) to Ab7.”
On the other hand, the trained model 155 estimates that an element of “GbM7-Ab7-DbM7” in the chord input data shown in
As described above, by performing the machine learning using the many teacher data sets 355, even if a sequence of chords included in the chord input data is similar, the trained model 155 can output text output data including an appropriate explanatory text in consideration of a before and after relationship of the similar part.
In the example shown in
In the embodiment described above, the chord input data may specify a sequence of all the chords included in the music chord data, or may specify a sequence of some of the chords extracted from the sequence of all the chords. In the following description, a section of a piece of music corresponding to a chord included in chord input data is referred to as a specific section. The specific section may be set by the user or may be set by a predetermined method exemplified below.
An example of a predetermined method will be described. The chord input data provided to the trained model 155 may not be all of the music chord data, and if a characteristic part of a piece of music can be used, a characteristic explanatory text corresponding to the piece of music can be obtained. Therefore, it is preferable to set such a characteristic part of the music as a specific section. The characteristic part of the music can be set in various ways, and an example thereof will be described.
In the example described here, the control unit 11 divides a piece of music into a plurality of predetermined determination sections (for example, the music section described above), and sets a determination section satisfying a predetermined condition as a specific section. In this example, a determination section having a chord progression importance level exceeding a predetermined threshold value is set as the specific section by calculating the chord progression importance level in each determination section.
The chord progression importance level is calculated based on various data registered in the music database 159 and the chord progression in the determination section. An example of this calculation method will be described.
The genre information is, for example, information indicating a genre of music such as “rock”, “pop”, and “jazz”. The scale information is information indicating scales called “C major scale”, “C minor scale”, “C #major scale”, (including keys in this embodiment). In each scale, a sound constituting the scale (hereinafter, referred to as a scale constituent note) is set.
The chord appearance rate data indicates a ratio of each type of chord to the total number of chords of all the music registered in the music database 159. For example, if the total number of chords is “10000” and the number of chords “Cm” is “100”, the appearance rate of the chords is “0.01”.
When calculating the appearance rate of the chord, any of the following determination criteria may be used for an identity of chords similar to each other. Chords having different names may be treated as different chords (“CM7” and “C/B” are different). If the chord tones are the same, they may be treated as the same chords (“CM7” and “C/B” are same). If the chord tones and the bass note are the same as each other, they may be treated as the same chords (“CM7” and “G/C” are the same). Even if the chord tones are different from each other, the same chords may be used as long as they are the same except for the tension notes (“CM7” and “C” are the same).
The chord progression appearance rate data indicates a ratio of a chord progression of each type to a total number of chord progressions of all the music registered in the music database 159. Here, the chord progression is set in advance by the user or the like. For example, if the total number of chord progressions is “20000” and a number of chord progressions “Dm-G7-CM7” is “400”, the appearance rate of chord progressions is “0.02”.
The determination criteria of the identity of the chord may be the same as the method for determining the chord appearance rate described above. As the determination criteria of the identity of the chord progression, any of the following criterions may be used. Chord progressions that are similar to each other may be treated as the same chord progression. For example, a derived form for the base form and the form using the tritone substitution shown in
Chord progressions, in which at least two of the chord progressions match, may be treated as the same chord progression. For example, if the chord progression is “Dm-G7-CM7”, “*-G7-CM7”, “Dm-*-CM7”, and “Dm-G7-*” may be treated as the same chord progression. Here, “*” indicates an unspecified chord (any of all chords).
The chord appearance rate data and the chord progression appearance rate data include data for all music. In this example, the chord appearance rate data and the chord progression appearance rate data further include data determined corresponding to each genre defined in the genre information. For example, the chord appearance rate data and the chord progression appearance rate data corresponding to the genre “rock” may include the appearance rate of the chord and the appearance rate of the chord progression obtained only from the music corresponding to the genre “rock”. A parameter of the appearance rate (the total number of chords and the total number of chord progressions) may be data for all of the music may be used.
Regarding chord and chord progression, an appearance rate in the genre “rock” is different from an appearance rate in the genre “jazz”. Therefore, since the appearance rate of the chord and the appearance rate of the chord progression present for each genre, it is possible to more accurately determine the characteristic part of the music. The genre information may not necessarily be used, and in this case, the chord appearance rate data and the chord progression appearance rate data for each genre may not be present.
In this case, for the music, the key is C, the scale is a major scale, and the genre is pops. These pieces of information may be set in advance by the user or may be set by analyzing the music chord data. In the case where the music chord data is analyzed, for example, the pieces of information may be set in relation to a similar chord by comparison with a piece of music registered in the music database 159, or may be estimated from a sequence of chords using a trained model obtained by machine learning or the like.
The scale element (S) is set to “0” in the case where all of the chord tones are included in the scale constituent notes, and is set to “1” in the case where any of the chord tones are not included in the scale constituent notes. This is because a chord including a sound not included in the scale constituent note can be said to be a characteristic part of a piece of music.
The chord rareness (C) is obtained by a predetermined calculation formula. The calculation formula is determined so that the chord rareness (C) decreases as the chord appearance rate increases. In the case of a C major scale, C and CM7 have a relatively high chord appearance rate, so that the chord rareness (C) is set to a relatively low value.
The chord progression rareness (CP) is obtained by a predetermined calculation formula. The calculation formula is determined so that the chord progression rareness (CP) decreases as the chord progression appearance rate increases. In this case, since the appearance rate of the chord progression “C-Cm-CM7-Cm7” is extremely low, the chord progression rareness (CP) is set to “1” which is a large value.
The chord importance level (CS) is calculated using the scale element (S), the chord rareness (C) and the chord progression rareness (CP). In this case, the calculation formula is CS=a×S+b×C+c×CP, a=¼, b=¼, c=½. The chord progression importance level (CPS) is the mean of the chord importance level (CS).
The chord progression importance level (CPS) obtained in this way indicates that the larger the numerical value (closer to “1”), the rarer the chord progression compared to other music. That is, a determination section having a large chord progression importance level (CPS) is a characteristic part of music.
The method of calculating the index value and the importance level described above is an example, and various calculation methods can be used as long as the importance level of the chord progression (that is, a characteristic part of the music) is obtained. Next, a method of generating chord input data using a specific section will be described. For example, the process of the step S103 shown in
The control unit 11 sets at least one determination section as a specific section on the basis of the chord progression importance level (CPS) calculated for the respective determination sections (step S1037). In this embodiment, a determination section in which the chord progression importance level (CPS) is larger than a predetermined threshold is set as a specific section. A predetermined number of determination sections in order from the determination section having the largest chord progression importance level (CPS) may be set as the specific sections.
The control unit 11 generates chord input data corresponding to the specific section (step S1039). In the chord input data, one specific section may be arranged in one divided region by providing the end marker EOS for each specific section, or in the case where a plurality of consecutive determination sections is set as a plurality of specific sections, the plurality of specific sections may be arranged so as to be included in one divided region.
In this way, by providing the generated chord input data to the trained model 155, the trained model 155 can generate explanatory text for the chord progression representing the characteristic part of the music, and output the text output data.
The present disclosure is not limited to the embodiment described above, and includes various other modifications. For example, the embodiments described above have been described in detail to show the present disclosure in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. In the embodiment, other configurations may be added, one or more configurations are deleted, or substituted for some of the configurations. Some modification examples will be described below.
According to the rule-based model, it is necessary to set a rule for generating the explanatory text from the chord input data, that is, a correspondence relationship between information corresponding to the chord sequence data 357 and information corresponding to the explanatory text data 359. This rule requires a large amount of information. For example, as described above, various types of a sequence of chords determined as a II-V-I chord progression are assumed. Therefore, in order to increase the accuracy of the explanatory text, it is necessary to set respective explanatory texts corresponding to many types of the sequence of chords that can be assumed. In order to reduce the amount of information, it may be necessary to simplify the explanatory text as compared with the case where the trained model 155 is used. Although it is assumed that an efficiency is lower than in the case of using the trained model 155, it is feasible to generate the explanatory text from chord input data by the rule-based model.
That is, the chord appearance rate data and the chord progression appearance rate data may each be defined by a chord of a relative expression for a key of a piece of music. The relative expression may be converted to the chord, for example, when the key is “C”. It may be converted into a description such as “I” or “II”. For example, the chord “Em7” when the key is “C” is expressed as “IIIm7”.
In this case, the control unit 11 converts the chord appearance rate data and the chord progression appearance rate data defined by the chord of the relative expression into a chord of an absolute expression based on the key of the set music. The control unit 11 calculates the chord importance level (CS) and the chord progression importance level (CPS) based on the appearance rate of the converted chord.
In the plurality of trained models 155, at least a part of the teacher data sets 355 used in machine learning are different from each other. For example, when machine learning is performed using the plurality of teacher data sets 355 classified by genres (jazz, classic, or the like), a plurality of trained models 155 corresponding to the plurality of genres are generated. The teacher data sets 355 may be classified according to a genre type, or may be classified according to a musical instrument type. According to this classification, the chord sequence data and the explanatory text data are specialized for the classification. The teacher data sets 355 may be classified by a creator of the explanatory text included in the explanatory text data 359.
For example, by providing the trained model 155 corresponding to jazz with chord input data corresponding to a piece of music classified into jazz, it is possible to obtain explanatory text with high accuracy. A target for classifying a piece of music corresponding to the chord input data may be set by the user or may be set by analyzing the music.
A plurality of types of explanatory texts may be obtained by providing one chord input data to the plurality of trained models 155. For example, if a plurality of trained models 155 corresponding to a plurality of creators are used, a plurality of types of explanatory texts are obtained and can be compared to select one suitable for the user. Among the explanatory texts obtained from the plurality of trained models 155, a new explanatory text may be generated based on a common point.
The above is the description of the modifications.
As described above, according to an embodiment of the present disclosure, there is provided a text providing method including obtaining text corresponding to chord input data, in which chords are aligned in chronological order, based on a relationship between chord sequence data, in which chords are aligned in chronological order, and explanatory text related to chords included in the chord sequence data.
Obtaining the text may include obtaining the text from the trained model by providing the chord data to a trained model that has learned the relationship.
The chord input data may include at least a chord tone and a bass note of the chord corresponding to the chord input data.
The chord input data may include at least a chord tone and a tension note of the chord corresponding to the chord input data.
The chord input data may include a vector data.
The chord input data may include a first chroma vector corresponding to a chord tone of a chord corresponding to the chord input data.
The chord input data may include a second chroma vector corresponding to a bass note of a chord corresponding to the chord input data.
The chord input data may include a third chroma vector corresponding to a tension note of a chord corresponding to the chord input data.
The text, corresponding to the chord input data, obtained from the trained model may include explanatory text including a first character group describing the chord progression of chords corresponding to the chord input data.
The text, corresponding to the chord input data, obtained from the trained model may include explanatory text including a second character group describing respective functions of the chords corresponding to the chord input data.
The text, corresponding to the chord input data, obtained from the trained model may include explanatory text including a third character group describing the concatenation technique between chords corresponding to the chord input data.
The method may include obtaining music chord data, in which the chords of the piece of music are aligned in chronological order, and extracting a sequence of the chords, in a specific section of the piece of the music satisfying a predetermined condition, from the music chord data as the chord input data.
The predetermined condition may include a condition using the chord included in the music chord data and an importance level related to a chord determined according to a key of the music.
The predetermined condition may include a condition using the chord included in the music chord data and an importance level related to a chord determined according to a genre of the music.
A program for causing a computer to execute the text providing method may be provided. A text providing device including a memory unit storing instructions of the program and a processor executing the instructions may be provided.
Number | Date | Country | Kind |
---|---|---|---|
2021-049200 | Mar 2021 | JP | national |
This application is a Continuation of International Patent Application No. PCT/JP2022/010084, filed on Mar. 8, 2022, which claims the benefit of priority to Japanese Patent Application No. 2021-049200, filed on Mar. 23, 2021, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2022/010084 | Mar 2022 | US |
Child | 18471376 | US |