The present invention relates generally to a system, apparatus, and method which generates personalized information, and more particularly to a system, apparatus, and method which generates a music composition based upon information such as, for example, images and/or sound files.
With the advent of the Internet, user socialization websites have become common. In these websites, individuals may post personal information such as education, accomplishments, employment status, ideals, and favorite songs, places, friends, etc. Viewers of these websites may then learn more about a selected individual or entity by accessing, for example, a page including the user's information. For example, viewers may select items on the person's web page (e.g., links, etc.) to access other information about the person. For example, a viewer of “Beth's” web page may view information that is unique to Beth such as Beth's image, her favorite songs, etc. However, although this information, such as, for example, Beth's image, may be unique to Beth, it may be desirable to associate other unique information with her to further personalize her webpage. Accordingly, user personalization may be achieved by including information which is composed using a feature unique to Beth such as, for example, Beth's image.
Accordingly, there is a need for a system, apparatus, and method for determining, forming, and providing information unique to a user. Further, there is a need for a social networking system, apparatus, and method which can form and provide information (e.g., musical tunes, etc.) unique to a user via a network.
Therefore, it is an object of the present invention to solve the above-noted and other problems of conventional social networking methods and to provide a system, apparatus, and method which can generate and provide individualized (or unique) information corresponding to a user's input. The system can further output this information directly (e.g., via one or more audio outputs such as, for example, speakers, etc., and/or one or more displays—which are not shown).
Thus, according to an aspect of the present invention, there is provided a system, apparatus, and method which can compose unique pieces of music when provided with, for example, a set of images, sound files (e.g., a user's voice or other sound), user selections (e.g., a rhythm, etc.), etc. The system may include, for example, a user interface such as, for example, one or more displays (either directly or remotely mounted for example, via a network such as a LAN, a WAN, the Internet, etc.), a telephonic interface, or other suitable interface, as desired. Further, the method of the present invention can run on one or more of a server, a workstation (e.g., a personal computer (PC)), a personal digital assistant (PDA), a mobile station (MS) such as a cellular phone, and/or other suitable computing devices, as desired. These devices may operate independently of each other or may communicate to one or more other devices via, for example, a wired and/or wireless network such as, for example, a LAN, WAN, the Internet, a cellular (telephone) network, etc.
It is also an aspect of the present invention to operate perform the method of the present invention on one or more computers which can, for example, operate via a network (e.g., wired or wireless) such as, for example, a LAN, a WAN, the Internet, a cellular communication network, and/or combinations thereof.
Although the musical ability of users may vary, outputs of the present invention are substantially independent of the musical ability of a user. Accordingly, the system, apparatus and method of the present invention forms and outputs data which is independent of a person's musical ability.
It is a further aspect of the present invention to provide a core music composition engine which processes an input stream of floating point numbers and generates a pattern representing a musical composition. The method can include the steps of collecting user input information such as, for example, sound and/or image data. This user input information may include one or more images, an audio sample, such as, for example, a person's voice, a sample of any sound, a rhythm, etc. The user input information can include files which may be provided by the user (e.g., formed and/or uploaded by the user), files selected from a predetermined list (e.g., provided by the system), etc. The user can also record audio (e.g., the user's voice, a song, rhythm, etc.) and/or graphic files (e.g., an image such as the user's face, etc.). Accordingly, the system, apparatus, and/or method can provide the user with an interface (e.g., a graphic and/or audio) to select desired information to be input and/or to record information, if desired.
It is a further aspect of the present invention to provide a system which can convert user input information to one or more information streams each of which can include, for example, floating point numbers or some other suitable numbering scheme (e.g., integers). For example, the floating point numbers can have a range which is between 0.0 and 1.0. However, other ranges can also be used, if desired. The system processes the one or more information streams, using, for example, an inference engine, and creates a pattern e.g., in a format such as, for example, XML, which represents a musical composition. The system processes the pattern to create musical notes (e.g., in an encoded in MIDI format) and optionally converts the musical notes to a suitable format such as an MP3 (MPEG-1 Audio Layer 3) encoded audio file and effects processing (e.g., audio compression). The information produced (e.g., the MIDI and/or MP3 format information) can be optionally directly output (e.g., via the speaker and/or display) or can be transmitted via, for example, a network such as a LAN, WAN, the Internet, a mobile communication network, a cellular (e.g., telephone) network, etc. to one or more users. The system according to the present invention may use one or more processors and may be located in one or more locations. For example, a data base containing information such as, for example, user input information, produced data, musical notes, etc, may be located at a first location and a processor may be located at another location and communicate with the other devices such as, for example, the data base using a suitable means via the network. Further, a user may communicate with the system, apparatus, and/or method via wired and/or wireless communication means (e.g., a PC, a PALM, a cellular telephone, etc.).
Accordingly, it is an aspect of the present invention to provide a system, apparatus, and method for generating audio information based upon information corresponding to a user. The system can include one or more controllers which input user information and form one or more streams of information based upon the user information, create a pattern in accordance with the user information, and generate audio information based upon the pattern. Further, the one or more controllers may communicate with each other using wired and/or wireless (e.g., a cellular) networking systems.
According to the present invention, disclosed is a system and apparatus for generating audio information, including one or more controllers which input user information, form one or more streams of information based upon the user information, create a pattern in accordance with the user information, and generate audio information based upon the pattern. The user information can include at least one of audio and visual data and the audio data can include at least one of an image, a voice, and a rhythm. According to the system, the one or more streams can include floating point numbers. Further, the one or more streams can range from 0 to 1 (or other suitable numbers which can be normalized if desired). Further, the system can include an interference engine which processes the one or more streams of information. The pattern can be based upon a musical composition corresponding to a music template. Further, the controller can operate so as to convert the generated audio information into audio information having a desired file format which can include a MIDI file or a text file corresponding to a musical score.
It is a further aspect of the present invention to provide a method for generating audio information using at least one controller, the method including the steps of: inputting, using the at least one controller, user information; forming, using the at least one controller, one or more streams of information based upon the user information; creating, using the at least one controller, a pattern in accordance with the user information; and generating, using the at least one controller, the audio information based upon the pattern. According to the method, the user information can include at least one of audio and visual data. Further, the audio data can include at least one of an image, a voice, and a rhythm. Moreover, the one or more streams include floating point numbers which can, for example, have a range of between 0 and 1. The method may also include processing, using an interference engine, the one or more streams of information and the pattern can be based upon a musical composition corresponding to a music template. It is a further aspect of the method to convert, using the at least one controller, the generated audio information into audio information having a desired file format such as, for example, a MIDI file or a text file corresponding to a musical score.
It is a further aspect of the present invention to provide a method performed by a system including at least one controller, the method including receiving, by the at least one controller, voice information, inputting, by the at least one controller, image information, receiving, by the at least one controller, at least one of sound information and rhythm information, processing the received voice information, image information, and the at least one of sound information and rhythm information, and forming a musical composition based upon the one or more of the received voice information, image information, sound information and rhythm information. The method can also include forming a string of floating point numbers based upon at least one of the voice, image, sound and rhythm information.
Additional advantages of the present invention include the incorporation of features that reduce the complexity and cost of manufacturing.
The invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:
Preferred embodiments of the present invention will now be described in detail with reference to the drawings. For the sake of clarity, certain features of the invention will not be discussed when they would be apparent to those with skill in the art. If desired, one or more steps and/or features of the present invention may be deleted and/or incorporated into other steps and/or features. Further, the method may be performed by one or more controllers operating at one or more locations and/or communicating with each other via wired and/or wireless connections.
When referring to musical instruments of a certain type (e.g., a guitar, a piano, drum, clarinet, etc.), it is assumed the instruments can be synthesized and/or actual sound clips may be used.
A flow chart illustrating a method according to the present invention is shown in
In step 102, an input processor (not shown) performs input processing on the received user information. Music produced by the method of the present invention can vary according to various input information that is input into the system (e.g., into the input processor, etc.). Depending upon processing methods, similar (but not the same) inputs should yield similar outputs (i.e., results). However, similar (but not identical) information input into the system may include files which have different values for a particular sample (e.g., a sound sample, a pixel, etc.). Thus, when processing these similar (but not identical) samples, one or more statistical processes are used to produce a representation that contains sufficient information to drive a subsequent composition process and generate similar output results for similar (if not the same) inputs.
For example, when processing images, a colorfulness measure may be determined by sampling the image a number of times (e.g., 2000, etc.) at, for example, random locations, to determine how many colors are present. A colorfulness measure of 1.0 can be used to indicate that all samples returned a different colorfulness measure, while a colorfulness measure of 0.0 can indicate that all samples returned the same color. Thus, the colorfulness measure can include a single digit as opposed to a stream of digits as used in other values according to the present invention. Further, image luminance (e.g., the average of red, green and blue components of pixels) can also be determined by, for example, sampling in a pattern such as, for example, a spiral pattern working from the center of the image to the outside of the image. The results can then be normalized to fall within the range of, for example, 0.0-1.0 with 0.0 indicating minimum luminance (i.e., black) and 1.0 indicating maximum luminance (i.e., white).
When processing audio information (e.g., sound information), inaudible areas (e.g., silent areas at the beginning and end of a recording) can be recognized and skipped. The audio information can be divided into overlapping segments of a given length (e.g., 1/10 of a second). A Fourier analysis can then be performed on each of the segments to produce an output in bands corresponding to a Bark scale and the results are output as floating point numbers which correspond to each of the segments of the input audio information. As used in the present invention, the Bark scale typically specifies 24 frequency bands. The system determines a Fourier transform for a given segment of audio information and energy is determined for each of the 24 frequency bands corresponding to the Bark scale. For each frequency band in the Bark scale, the system: determines a range of FFT (fast Fourier transform) results that fit in the frequency bands; sums the squares of a real portion (as opposed to an imaginary portion of complex numbers) of the FFT results in the frequency band; and divides the summed squares by the number of FFT samples within the frequency band. Once the system has computed the values for the entire audio file, the system can normalize the results to ensure that all values are within a specific range such as, for example, 0.0-1.0.
When processing rhythms, a power variation of the input signal is analyzed so as to identify pulsed of more than an average strength and are set as “beats.” Then, a variation in time between each of the beats is determined and the results normalized so that they fall into the range of 0.0-1.0 (where, for example, 0.0 represents the shortest delay and 1.0 represents the longest delay—of the input rhythms during a certain time frame).
According to the present invention, the input processor can process various types (e.g., audio-, image-, video-, and/or motion-types) of information input thereto. For example, the input information may include audio, image, video, graphic, motion, text, motion/position, etc. information and/or combinations thereof. This information may be input in real time or may include saved (e.g., an image file, etc.) information. The input (e.g., a real-time voice input or a saved file, etc.) can be input or selected for input by the system and/or the user, as desired. Accordingly, the input processor can include one or more corresponding input processors which are optionally provided for each type of input information. Thus, for example, textual information may be processed by a text input processor while a motion tracker input (e.g., generated by a game system, such as a Nintendo™ Wii™ remote control) may be processed by a motion-tracker input processor (not shown). Accordingly, the system may include means for determining the type of input information and for determining which of the corresponding input processors to use. It is also envisioned that one or more of the input processors may be formed integrally with and/or incorporated into another input processor.
Referring back to steps 101A-D, processing performed by the input processor on each type of information will now be described in detail.
With reference to image type information (e.g., see, step 101A), when processing image information such as, for example, an image file, the input processor would use an image processor (e.g., in step 102) which would determine how colorful the image is and/or how the luminance of the image changes over the entirety of the image. For example, the colorfulness of an image can be determined by taking a number of samples of the image at various locations and determining how many different colors are present. These various locations can be determined randomly (e.g., using a random number generator), can be determined based upon the size and/or shape of the image, and/or can be predetermined (e.g., at x-, y-, and/or z-axis locations). Further, luminance changes over an image can optionally be determined by sampling in, for example, a spiral pattern from the center of the image outwards. At each sample position, an average luminance value over a square patch (e.g., a few pixels wide) can be determined. The spiral can be scaled such that an equal number of samples are taken for each image independent of size. However, it is also envisioned that the location and/or number of samples can be randomly determined or determined based upon other considerations (e.g., size, color, luminance, etc.), as desired. In yet other embodiments, it is envisioned that digital image processing (DSP) may be performed on images to determine various features of these images. For example, a facial recognition step may be performed to determine whether different images of a person are of the same person. If it is determined in the facial recognition step that the person is the same person, similar outputs may be output by the system regardless of other inputs. Similarly, the system according to the present invention can optionally determine an image's background and output information accordingly. Thus, for example, if it is determined that the same person is in two different input images, background information such as, for example, snow (indicative of winter), flowers (indicative of spring), green leaves (indicative of summer), and/or brown leaves (indicative of autumn), can be optionally used to determine an appropriate output.
With reference to sound information (e.g., see, step 101B), when processing sound information such as, for example, a sound file, the input processor (e.g., in step 102) can merge optional left and right stereo signals into a mono stream, if desired. Additionally, any sound information which is determined to be below a certain threshold (e.g., a silent area at the beginning of a sound file), can be optionally skipped to avoid non-relevant data input and processing, as desired. The sound information can then be split into a number of overlapping segments and series of filters can be applied to each segment (in series or optionally in parallel) to determine how strongly the sound was represented in a number of different frequency bands. The resultant data is a stream of information that describes how active the sound is in each frequency band over time. As discussed above, a suitable method to determine frequencies contained in the input sound information can optionally include performing anFFT on the input sound information to determine frequencies contained within the input sound information. The results of the Fourier analysis are then processed so that they correspond to scale such as, for example, the Bark scale.
With reference to rhythm information (e.g., see, step 101D), although the rhythm information can be encoded as a sound file, a type of information that is of interest is the pulsations of the rhythm (as opposed to the frequency of the sound waves of the rhythm itself). Thus, when it is determined that a rhythm is being input, the input processor (e.g., in step 102) uses a beat detection algorithm to determine the start and end of each beat and to produce a stream of floating point numbers which indicates the variation of the corresponding time between the beats. However, it is also envisioned that the input processor can determine the frequencies contained in the sound file as well, if desired.
The creation of music “structure” or “pattern” will now be explained with reference to step 104. In this step, a composition process occurs in two stages (although a single or other number of stages is also envisioned). The first stage establishes a basic structure of the music in terms of basic operations and then the basic structure of the music is converted (e.g., using custom software, etc.) into notes which are used in a final composition. Step 104 outputs data such as, for example, an XML file that describes a final piece of music (e.g., in terms of musical processes rather than, for example, musical notes).
The creation of music structure is performed using a conventional interference engine (i.e., a composition “engine,” not shown) such as, for example, a CLIPS (C Language Integrated Production System)-type interference engine which processes the streams of input information (e.g., floating point numbers in the range of, for example, 0.0 to 1.0 received from the step 102) and generates a corresponding musical structure. Each time a decision is required, a value is taken from an input stream and used to select among the available possibilities. If, for example, the input stream is exhausted before the composition process is finished, then the software can cycle around to the beginning of the input stream and/or re-use a previous value until the composition process is complete. However, rather than reusing previous values, other values can also be used, as desired. Tables relating to facts will be described below with reference to Tables 12-15.
Referring to
With reference to the composition process of step 104, there are several optional operations (e.g., I-VIII) which can be performed, as desired, during this composition process. The first operation (i.e., step I) and last three (i.e., steps VI-VIII) are global and preferably operate on all tracks, and the others (i.e., steps II-V) preferably operate on a per-track basis, as desired. However, one or more of these operations or variations thereof can be performed on any selected track, if desired. These operations are better illustrated with reference to Table 1 below.
With reference to step VI above, part switching will now be explained in further detail. According to the part switching method of the present invention, the music may be broken up into a number of “zones” (each having an index z) and transitions such as, for example, changing a set of playing tracks, is performed at a start of a new zone. In the present example, the zones will be given corresponding indexes z, such as, for example, 0, 1, 2, . . . Z, where Z=10. However, other numbers are possible. Each of the zones represents a “slice” of the music taken along the time access (i.e., in the time domain). For example, zone 1 is the first 30 seconds, zone 2 the second 30 seconds, etc. Each instrument is assigned a weight range (e.g., from 0.4 to 0.9 from, for example, a weight range which is between 0.0 and 1.0) when, for example, an instrument is selected for a track. Then the instrument's weight range is assigned to the corresponding track. The correspondingly assigned weights are used to determine how often the track may play. Thus, for example, a track with a weight of 0.0 would never play while a track with a weight of 1.0 would play all the time. However, other ranges and settings are also envisioned.
With reference to step I above, the “zone profile” selected in the first phase (i.e., step I) of a tune controls how many tracks can play in each zone. For example, in each zone, the zone profile includes a number in the range 0.0 to 1.0 (although other ranges are also envisioned, as desired). The configuration for a particular genre (e.g., see, “beat” method below) specifies a minimum number of tracks that can play at any one time. According to the present example, a zero in the zone profile controls such that a minimum number of tracks play, whereas a one controls such that all available tracks can play. The actual effect of the zone profile can be optionally modulated by values included within and taken (e.g., by the system) from the input stream.
In order to allow for more variety, optional configuration options (e.g., values) can be used to control the effect of the zone profile described above. For example, a zoneValueWeight value can be optionally assigned to the zone profile to control how much influence the zone profile exerts over the final result. Further, a zoneInputWeight value can be optionally assigned to a value from the input stream for a given zone. The zone input weight and the zone value weight can be used to determine which has more influence on determining whether a track plays in a given zone (i.e., time segment) thereby providing for more variation. Moreover, combinations of these weights can be optionally used to decide whether the number of playing tracks should be entirely defined by the zone profile, entirely defined by the input stream, or a combination thereof. Therefore, a number of “shapes” for the tune can be defined (e.g., by gradually increasing the number of tracks until most tracks are playing and then decreasing the number of tracks at the end of the tune) and variation between tunes using the same zone profile can also be provided.
According to the present invention, for each zone a number of tracks to play can be determined according to Equations (1) and (2) below.
After determining the value of num-tracks-for-zonez, this value can be optionally clipped to ensure that it lies in the range of min-playing-tracks and num-tracks. In Equations (1) and (2) above, the zoneValue is the value of the zone profile for that zone; the inputValue is the value selected from the input stream for that zone; the num-tracks is the total number of tracks defined for the corresponding tune; and the min-playing-tracks is the minimum number of tracks that is to be played at any one time. As defined below, for each of 0-T tracks, an index t can be optionally assigned.
Using Equations (1) and (2), the method and system of the present invention computes the actual tracks to play over the length of the tune according to the algorithm illustrated in Table 2 below.
Referring back to
The process can optionally map instrument names to MIDI bank and patch numbers, and can optionally set volume and pan of MIDI tracks according to the tracks defined in the tune structure. Accordingly, the selection of the overall track (as opposed to note) volume and pan (e.g., position in stereo space) is simplified and the system can map an instrument name to MIDI instrument bank and patch numbers.
However, other parts of the process may be more complex and optionally require, for example, the generation of streams of MIDI note messages (one stream per track—where a MIDI note message is a note) from a harmonic maths process (as will be explained below) defined in the XML file received from the preceding stage (e.g., see, step 104,
After completing the MIDI file in step 106, the process continues to step 108.
In step 108, an audio file is generated. In this step, the MIDI file generated in step 106 is transmitted to a software synthesizer configured with sets of instruments for producing output information according to the input received. This output information can include a software sequence such as, for example, an audio file in a WAV (waveform audio format) or other format that can then be output to an encoder such as, for example, an MP3 encoder, to produce the final audio file in step 110.
In order to create a system which can produce audio information corresponding to popular music, the system can produce tracks from fragments of, for example, pre-recorded rhythms as well as harmonic maths-produced note streams, if desired. Additionally, different composition engines may be used by the system to produce music in a number of different genres based upon which of the different composition engines is used for the production. Further, genres may be represented by corresponding spreadsheets describing the various parameters used for the corresponding genre and a set of MIDI files encoding rhythm “layers” for: (1) kick (bass) and snare drums; (2) “ghost” (e.g., off the beat) kick and snare; (3) hi-hat; and/or (4) other percussion (e.g., instruments other than kick (bass) drum, snare drum, and hi-hat). However, other MIDI files encoding other rhythm layers is also envisioned.
To form a complete rhythm track, the system can combine different rhythms for each of these four rhythm layers, allowing a more authentic rhythm track than can be produced using harmonic maths alone. However, although the individual fragments are pre-recorded, the potential number of ways in which they can be combined is large, so that variety is not significantly sacrificed by using this approach. Further, the present technique may also be extended to produce tracks for other instruments crucial to a genre (e.g., a bass guitar, etc.).
A flow chart illustrating a musical structure process according to the present invention is shown in
In step 204, the input mapping functions are called at each decision point to pick a value from an input stream to pick a particular feature of the output music.
In step 206, inference rules generate track facts (CLIPS) which contain all the information required to generate a complete track of the output music.
In step 208, track facts are decoded and a complete specification of the composed music is generated in XML format.
A block diagram of an embodiment of the system according to the present invention for interfacing with a network such as the Internet is shown in
The web interface 350 provides an interface for one or more users to interact with software and/or provides processing required to translate user input into a form which can be used by the one or more composition engines 332A-332N. A more detailed description of the web interface 350 will be given below.
The one or more worker processes 328A-328N provide computation means for computational-intensive tasks such as, for example, composing the music and/or converting the note data to an audio file having a desired format (e.g., MP3, AAC, WAV, FLAC, CD (compact disc), etc.). The worker processes 328A-328N can receive job requests generated by the Web interface via, for example, the SQL database 324 and can thereafter process the received job requests in, for example, series and/or in parallel, if desired. Accordingly, the greater the number of worker processes running (e.g., one or two per processor core) at the same time (i.e., in parallel), the more work the system can perform during this time. The worker processes are written in, for example, Java and can use a native library to communicate with the C++ software (described above) that is used to compose the music.
The one or more external programs 330A-330N are called by the worker processes 328A-328N to convert the MIDI note data to audio information. The external programs 330A-330N can include one or more UNIX shell scripts each of which can invoke a number of command-line programs (not shown) to perform the conversion of the MIDI note data to audio information.
The SQL database 324 can store all user input data as well as user account information and/or other data required for the web interface to function. In other embodiments, other databases (e.g., local or remote) can be used. Additionally, the databases can use any suitable memory means such as, for example, flash memory, one or more hard discs, etc.
The shared file system 326 can include storage means for storing large data objects such as MP3 audio files which are generated by the system. Each of the functional blocks of the system 300 can read and/or write and/or otherwise access the shared file system 326. For example, MP3 files and other data can be stored in the shared file system 326, and the web interface 350 can access, read, and/or transmit the stored data to other devices over a network such as, for example, the Internet.
The web interface 350 can include components that enables users and/or the system to create accounts, compose pieces of music, and/or access previously composed music. The web interface 350 can include modules to create job requests for processing by the worker processes 350, and/or web interface 350 for staff members and/or the system to manage the system and/or to monitor performance of the system. The major sub-components of the web interface include one or more of: a sound recorder (e.g., a Java applet) 304; a sound picker (e.g., a flash applet) 306; a rhythm recorder (e.g., a flash applet) 308; an MP3 player (e.g., a flash applet) 310; an audio processor (e.g., performed by the C++ software described above) 312; a beat detector (performed by the C++ software described above) 314; an image processor 316; a distributed job controller 318; a user account manager 320; and a system manager 322.
Although only a single web server 350 (e.g., a front end web interface) is illustrated, the system may also include a plurality of web servers 350 that run the web interface. Accordingly, load balancing means such as, for example, load balancing software and/or hardware may be used to balance loads between the plurality of servers 350. Further, although not shown, the server 350 can include one or more of the one or more worker processes 328A-328N, the one or more operative programs such as, for example, external programs 330A-330N, and/or the one or more composition engines 332A-332N, if desired.
The sound recorder 304 can include software and/or hardware for users to record audio directly (e.g., on a user's PC, a stand-alone kiosk, etc.) and/or via a network such as, for example, by using the web (e.g., via the Internet). In the preferred embodiment, the sound recorder includes a Java applet that allows users to record audio using the web without having corresponding recording software installed on their computer. The audio data is sent from the applet to the web server 350 using, for example, an HTTP (hypertext transfer protocol).
The sound picker 306 can record one or more sounds or other audio information (e.g., from a database) for selection (e.g., by a user). One or more of the selected sounds can then be input into the system (e.g., see, steps 100 and 102 in
The rhythm recorder 308 provides can record a rhythm via inputs from an input device such as, for example, for example, a mouse input (e.g., via an input button), a tracking device (e.g., a digitizer pen, a track ball, a finger pad, a track pen, etc.), a keyboard input, a screen input, etc. Additionally, the rhythm record can record a rhythm corresponding to an input from the input device. For example, if using the microphone, sounds indicative of a user clapping or hitting something can be recorded to form a rhythm. Likewise, a user can click a mouse input key to form a rhythm which corresponds to the clicks. Further, a user can tap a digitizer pen on a surface to form a rhythm which corresponds with the taps. The user's input can then be transmitted to the rhythm recorder.
The MP3 player 310 provides means for a user to play back audio files such as, for example, MP3 files. Accordingly, the MP3 player can include a flash MP3 player (soft or hard) button which, for example, when selected, plays MP3 files directly in, for example, a Web page accessed by the user. As such players are common in the art, for the sake of clarity, a further description thereof will not be given.
The audio processor 312 may include a collection of Java and/or C++ classes that can process sound files to generate statistical data for input into, for example, the composition process as described above with respect to the input processing process (e.g., see, step 102,
The image processor 316 can process image files and optionally generate statistical information for input into, for example, the composition process as described above with respect to the input processing process (e.g., see, step 102,
The beat detector 314 performs simple beat detection on a selected audio file and outputs statistical data as described above with respect to the input processing process (e.g., see, step 102,
The distributed job controller 318 manages the creation and processing of job requests which will be processed by worker processes (i.e., 328A-328N, 330A-330N, and/or 332A-332N). The distributed job controller 318 can include, for example, Java classes to manage the creation and processing of the job requests which will be processed by the worker processes.
User account manager 320 provides a web interface for a user to manage his account. Accordingly, the user account manager 320 can include software such as, for example, Java classes which may be used to provide a web interface for providing the user means to manage the user's accounts. Additionally, the user account manager 320 can include software such as, for example, supporting classes which provide functionality to implement user management.
System manager 322 provides a management interface which can be used by, for example, operators (e.g., staff members, etc.) of the system, such that the operators may monitor the system of the present invention. Accordingly, the system manager can include, for example, Java classes to provide the management interface for the operators to monitor the system.
A brief overview of the harmonic maths process (e.g., see, Lawrence Ball, “Harmonic Mathematics, Basic theory & application to audio signals,” May 1999) which is incorporated herein by reference), as used by the present invention to generate music will now be given with reference to Tables 3 and 4 below.
A description of a moving value in a wavetable used by the system of the present invention will now be provided. The values that are adjusted can be, for example: (1) a MIDI note pitch in the range of, for example, 0-127 (or other suitable ranges). Further, adjustments can be optionally made so that the MIDI note pitch can be restricted to values within the current tonality; (2) a MIDI note volume in the range of, for example, 0-127; and (3) a floating point scaling value used to adjust the note length of a MIDI note. An example of a harmonic maths process and the output generated (e.g., see, Table 4) for the parameters listed in Table 3A is shown below with reference to Table 3B.
As described in Tables 3A and 3B, an accumulator is a vector having a length wherein each element of the accumulator is initialized to zero. At each step (i.e., iteration) t, the contents of a mod_vector having the same length as the accumulator vector (e.g., see, Table 5) is added to the accumulator vector. The result modulus, a maximum value is stored back in the accumulator. The elements of the mod_vector are related by a geometric relationship. For example, if element 1 of the mod_vector is 24, then element 2 would be 48, element 3 would be 72 . . . . Thus, the elements of the accumulator will change at different rates that have a fixed relation to one another (e.g., element 2 changes at twice the rate of element 1). Each time an element of the accumulator passes a multiple of the resolution, the value of the output of the system at that position changes. Typically, the output is used to index into a sequence which represents some useful quantity such as, for example, a series of musical pitches. Accordingly, the harmonic maths process can generate musical pitches, as shown in the sample run of Table 4 above. According to the present application, if a[] is the accumaltor and m[] the mod_vector then at each iteration, for every element k of a and m the following operation as defined in Equation (3) is performed:
a[k]=(a[k]+m[k]) % max Eq. (3)
In Equation (3), % is the modulus operator and max is the maximum value.
The system of the present invention uses a mathematical technique known as the “forms of the math” to create computer graphics videos, musical scores, recordings, and in some cases audio-visual videos with mathematical correspondence of the two media. This technique provides a method for controlling a one or more dimensional array of parameters over time (e.g., see, Lawrence Ball, Id.; and John Whitney, “Digital Harmony: On the Complementarity of Music and Visual Art” McGraw Hill 1981). Graphs illustrating the output of a harmonics maths process according to the present invention are shown in
With reference to
An example of the XML format used to encode the tune structure is illustrated in Table 4 below. For the sake of clarity, most of the track definitions have been removed and only a few examples of a preset-driven track and harmonic-maths-driven tracks remain. In addition, the normal XML files used by the method of the present invention may contain a copy of the input streams upon which they are based. However, for the sake of clarity, as the input streams include a long vector of floating point numbers, they are not shown.
Two XML files representing complete tune definitions are illustrated in Table 5 below. The definitions in Table 5 are similar to those in Table 4, but represent a complete tune. The “static_note_generator” and “note” tags permit the representation of pre-recorded rhythmic sections (e.g., a bass drum part, etc.).
One or more spreadsheets can be used to configure the software to generate a variety of different musical styles. For example, Tables 6-11 are provided below to provide a description of salient parts the present invention illustrated in Table 5. However, for the sake of clarity, a full description of each section of Table 5 will not be provided. With reference to Table 6, a global section (e.g., see, Table 6) defines parameters that apply to the tune overall. As many of these parameters are self explanatory, for the sake of clarity, a further description thereof will not be given.
indicates data missing or illegible when filed
With reference to Table 7, the loop duration config (LDC) section specifies how the loops which compose the harmonic maths part of the melody are configured. It specifies the duration of a “loop” (a single iteration of the harmonic maths process), how many notes will be played in each iteration, how many times each iteration will be repeated, the duty cycle (the amount of notes compared to rests making up the duration of an iteration), and what portion of a complete harmonic maths cycle the process can cover.
With reference to Table 5, the LDCMAPS and LOOPLENGTHFILTERS sections describe which harmonic maths parameters may be selected from the space of possible loop duration and loop length values.
For example, with reference to Table 8 below, the rows represent note length values and the columns denote loop durations. The first value at each location (i.e., row, col.) is the loop length and the second the note length produced by dividing the loop duration by the loop length. Thus, with reference to row 2, col. 4, the loop length is 2, and the note length is 1920. As shown, each row can have notes of the same length but different loop lengths.
The LDCMAPS section illustrates how loop duration and loop length combinations can be picked from the Table 8. These values are better illustrated with reference to Table 8 below. As used herein, names for settings have been arbitrarily set to include such names as “Manhattan” which allows any combination of values to be selected. Other names of settings used herein include “plus,” “thick plus,” “multiple column,” “adjacent column,” “column”, and “column subset.”
With reference to the @LDCMAPS variable, this variable indicates allowed map types: multiple-column adjacent-column plus manhattan thick-plus hash column-subset column. Other names of the map types are defined in Table 10.
With reference to the LOOPLENGTHFILTERS section, this section places further limits on the loop duration and loop length parameters that can be selected. For example, as shown in Table 10 below, only powers of two are allowed for the loop lengths
The remainder of the spreadsheet contains a number of parameters for each instrument, these are arranged as columns with one instrument per row and are further described with reference to Table 11 below.
Examples of fact definitions for the input data streams as expressed in CLIPS is shown in Table 12 below.
A definition of the track fact which is the main output of the composition engine is shown in
A primary function of the wrapper software is to take the input streams and create instances of the input facts. The inference engine engine runs and produces a number of facts including several instances of the track fact defined above. The wrapper software then converts the output facts into an XML representation for the next stage. These CLIPS functions define how to extract values from the input facts as is illustrated below with reference to Table
Examples of the input mapper functions called at various decision points are shown in Table 15 below. The input mapper functions can take a value from one of the input streams and convert it into a desired output format.
An example of the portrait process from a sitter's (e.g., a user's) perspective will now be described in more detail below.
A flowchart illustrating a portrait sitting process according to the present invention is shown in
In step 404, a music list and/or a user profile is output (e.g., visually and/or audibly) for use by the user. An example of a visual output (e.g., a webpage) including information informing the user of review and/or update information is shown in
In step 406, introduction information (e.g., an introduction screen or webpage) 407 such as, for example, that which is shown in
In step 408, an optional browser test is performed. The system can then analyze the results of the browser test and determine which settings may be set. For example, if a user does not have a microphone input and cannot record a sound file, then, for example, up to three (or any other suitable number, as desired) pre-recorded sound files can be selected by the user for use by the system. Further, if using known software/hardware configurations (e.g., a kiosk, etc.), this step may be omitted, as desired. After completing step 408, the process continues to step 410.
In step 410, recording information such as is shown in
In step 412, information requesting an upload or a selection of an image is output for the user's selection as shown in
In step 414, information requesting that a sound be recorded, uploaded, and/or selected can be output for a user's selection as is shown in
In step 416, information requesting that the user record, upload, and/or click a rhythm, such as is shown in
In step 418, the system composes music corresponding to the user's inputs and thereafter provide means for playing the user's music as is shown in
If one or more steps in the process shown in the flowchart of
A block diagram illustrating the system including a network according to an embodiment of the present invention is shown in
Certain additional advantages and features of this invention may be apparent to those skilled in the art upon studying the disclosure, or may be experienced by persons employing the novel system and method of the present invention.
While the invention has been described with a limited number of embodiments, it will be appreciated that changes may be made without departing from the scope of the original claimed invention, and it is intended that all matter contained in the foregoing specification and drawings be taken as illustrative and not in an exclusive sense.