The present invention relates generally to identification of matching signals and more particularly to identification, matching and mixing of audio signals based on harmonics by identifying optimal synchronization points.
Rhythm in music is formed by organization of music pieces together related to time.
Whereas, rhythm may also be organized in beats and tempo. For a given music piece tempos may vary considerably. In music, a unit of time is called as a beat. The rhythm when reoccurs often to create results in melodious series. Therefore, mixing of various music pieces are required to create perfect rhythmic songs.
Therefore, there is a need for an efficient solution to determine optimal mixing music and timings of mixing to provide an optimal mixed music.
This summary is provided to introduce concepts related to system and method for automatic data collection as further described in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
In an embodiment of the invention there is provided a computer implemented method for identifying a matching signal from a signal bank, that includes a plurality of signals, to a first signal, the method includes steps of; receiving, by a processor, the first signal; performing, by the processor, a spectral analysis of the first signal and the plurality of signals, the spectral analysis further includes; computing a chromatogram comprising a plurality of frames; splitting each of the plurality of frames into a plurality of pitch classes; analyzing each of the plurality of pitch classes; determining dominant pitch class from the plurality of pitch classes, wherein the dominant pitch class has highest frequency magnitude; and matching, by the processor, dominant pitch class of the first signal with dominant pitch class of at least one of the plurality of signals.
In another embodiment of the invention, there is provided a system for identifying a matching signal from a signal bank includes a plurality of signals, to a first signal, the system including; a processor configured to perform the steps of; receiving, by a processor, the first signal and the plurality of signals from signal bank from which a matching signal to the first signal is to be searched; performing, by the processor, a spectral analysis of the first signal and the plurality of signals, the spectral analysis that further includes; computing a chromatogram comprising a plurality of frames; splitting each of the plurality of frames into a plurality of pitch classes; analyzing each of the plurality of pitch classes; determining dominant pitch class from the plurality of pitch classes, wherein the dominant pitch class has highest frequency magnitude; and matching, by the processor, dominant pitch class of the first signal with dominant pitch class of at least one of the plurality of signals.
In yet another embodiment of the invention, there is provided a non-transitory computer-readable storage medium for providing matching of signals, when executed by a computing device, cause the computing device to perform method steps that includes the steps of receiving, by a processor, the first signal; performing, by the processor, a spectral analysis of the first signal and the plurality of signals, the spectral analysis further includes; computing a chromatogram comprising a plurality of frames; splitting each of the plurality of frames into a plurality of pitch classes; analyzing each of the plurality of pitch classes; determining dominant pitch class from the plurality of pitch classes, wherein the dominant pitch class has highest frequency magnitude; and matching, by the processor, dominant pitch class of the first signal with dominant pitch class of at least one of the plurality of signals.
Other and further aspects and features of the disclosure will be evident from reading the following detailed description of the embodiments, which are intended to illustrate, not limit, the present disclosure
The illustrated embodiments of the subject matter will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of devices, systems, and processes that are consistent with the subject matter as claimed herein.
A few inventive aspects of the disclosed embodiments are explained in detail below with reference to the various figures. Embodiments are described to illustrate the disclosed subject matter, not to limit its scope, which is defined by the claims. Those of ordinary skill in the art will recognize a number of equivalent variations of the various features provided in the description that follows.
Reference throughout the specification to “various embodiments,” “some embodiments,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases “in various embodiments,” “in some embodiments,” “in one embodiment,” or “in an embodiment” in places throughout the specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner in one or more embodiments.
Now referring to
In case network 104 is a wired network, it may be anyone of a Local area network (LAN), Wide area network (WAN), or a Metropolitan area network (MAN), etc.
In case network 104 is a wireless network, it may be anyone of a wireless LAN, mobile network, satellite network, Bluetooth network, or any other suitable wireless network.
Each of the user device 102 may be connected to each other through the server 106. Also, it is not necessary that all the connected user device 102 may be connected through a single server. The server 106 may include a processor 200 (to be described in detail later). The server 106 may also be connected to a memory (not shown in figure). Memory may be a remote or a locally placed memory. The memory may include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. The memory may include modules and data. The modules include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The data, amongst other things, serves as a repository for storing data processed, received, and generated by one or more of the modules.
Now referring to
The modules may further include modules that supplement applications on the processor 200, for example, modules of an operating system. Further, the modules can be implemented in hardware, instructions executed by a processing unit, or by a combination thereof.
The request handling module 202 is configured to receive inputs of a user like a raw audio signal etc. from the user device 102. The request handling module 202 may further convert the received inputs into a format understandable to the processor 200. The request handling module is further connected to the audio file handler 204. The audio file handler 204 stores audio files temporarily and forwards the same to the audio analyzer 206. Simultaneously, the audio file analyzer 202 forwards the audio files received to storage 208 for storage. The audio analyzer 206 is configured to analyze the audio signal. Analysis of audio signals includes analysis of harmonics of the audios etc. Details of the modules will be discussed later in description.
The audio analyzer 206 forwards the analysis to the database 210 and the storage 208 simultaneously. Further, the audio analyzer 206 also forwards the audio signals after analysis to the audio mixer 212. The audio mixer 212 is configured to mix audio signals with each other.
Further, the audio matching module 214 is configured to identify audio signals matching to the other audio signals based on the analysis that may be accessed by the audio matching module 214 from the database 210.
Details of the interaction of each of the modules, of the processor 200, will be described in detail while describing
Now referring to
At step 302, the processor 200 receives a first signal also termed as a first audio signal, through the request handling module. At this step, the processor may also receive instructions from the user as to what needs to be performed. For this method 300, the instruction received may be to identify a matching signal to the first audio signal. At step 304, a spectral analysis of the first audio signal is performed by the audio analyzer 206. It is to be noted that a simultaneous spectral analysis is also performed on a bank of signals stored in the storage 208 of the processor 200 or a post analysis results of each of the signals within the signal bank is stored in the database 210 that may be accessed by the audio analyzer 206 during the identification method 300. The step 304, includes various sub steps. At first sub step that is at step 3042, a chromatogram of the first audio signal is computed consisting of a plurality of frames. The chromatogram may be generated using any well-known algorithm like short-time Fourier transform (STFT). STFT is a Fourier-related transform used to determine a sinusoidal frequency and phase content of local sections of a signal that changes overtime. STFT splits a longer time signal to shorter segments of equal length. Then the Fourier transform separately on each of the shorter segments. This generates a Fourier spectrum for each of the shorter segments. This spectrum may then be plotted in a graph as a function of time.
At step 3044, each of the plurality of chromatogram frames is further split into a plurality of pitch classes.
At step 3048, a dominant pitch class from the plurality of pitch classes analyzed is determined. The dominant pitch class has the highest frequency magnitude. Hence, this step results in a dominant pitch class per frame over a selected time. For example, the most dominant pitch class from the table 400 (form
The step 3048 counts the number of times each pitch class is the most dominant pitch class over time (the number of times each pitch class appears in the result of the last step). This step defines the outcome of the spectral analysis algorithm—The most dominant note is the note that represents the pitch class that is most dominant over all frames (over time). For exemplary purpose the table below shows computation of the dominant pitch and class based on table 400:
At step 3050, after determining the most dominant notes of the first audio signals, comparison is done between the dominant notes of the audio signals within the signal bank. The signals selected for matching are selected if their dominant note is in harmonic interval with the with the dominant note of the first audio signal. The harmonic intervals may be the perfect 4th, perfect 5th, or a major 3rd.
In an embodiment of the invention, the dominant note analysis of the signals from the signal bank may be stored in the database 210 from which the audio analyzer may fetch such analysis for comparison sake.
Now referring to
Beat analysis step 504 contains multiple sub steps. At first sub step S042, the first audio signal and signals in the bank of signals, go through a beat detection process, using recurrent neural network model. This step results in an array of time stamps, that represents the times on which a beat was detected (array of beat times). Further at step S044, the array of time stamps are compared with each other to determine beat times similarity scores. The purpose of this step is to find not only the signal that syncs best with the first audio signal, but also the time on which mixing the first audio signal and the selected matching signal will result thein a best possible mix. In order to do that, each array of beat times, that represent the beat times on the matching signal, from the signal bank is being compared with the array of beat times of the first audio signal. The comparison may be performed in a sliding window method.
Sliding window method 600 is illustrated by flow chart depicted in
At step 604, the first time stamp in the array of time stamps of the matching signal is moved a step forward to be compared with a subsequent time stamp in the array of time stamps of the first audio signal. On every step, the first beat of the sliding signal is aligned to the next beat of the first audio signal. In a case where the sliding array of beat times goes beyond the end of the first audio signal beat times, the non-overlapping beats will be added to the beginning of the first audio signal.
Further at step 606, the above steps are repeated till the first time stamp is matched with all the time stamps in the array of time stamps of the first audio signal. Further, at step 608, a score for each pair of time stamps during the comparison is provided.
Returning back to
Now referring to
At step 706, mixing of an identified matching signal with a determined start point and an end point is initiated. At step 708, determination of length of the matching audio signal is performed. There may be 3 options that may arise out of the determination step 708.
Option 1 depicted by step 710 wherein the matching signal is shorter in length as compared to the first signal also completely overlaps the first signal. Then at step 712, a start time of second signal over play timeline of the first audio signal is identified. Further, at step 714, the second signal is laid over the first signal at the identified start time. In this scenario, the exact loop first audio signal time is captured, on which the matching audio signal is started to be recorded. Using this timestamp, the matching audio signal is being laid (mixed) over the looping first audio signal, starting at the captured timestamp.
Option 2 depicted by step 716 wherein the matching signal is shorter in length as compared to the first signal and also partially overlaps the first signal. Then at step 718, the matching signal is sliced from the end point of the first audio signal to generate a pre end-time segment and a post end-time segment of the matching signal. Further at step 720, the post end-time segment of the matching signal is added to the start point of the first audio signal to generate the mixed signal.
The sliced part, which originally continued past the end time of the looping first audio signal, will be mixed at the beginning of the looping first audio signal. Since the looping first audio signal is being played repeatedly in a loop, this mix replicates that situation, that may be played and heard by the user while recording the mixed signal.
Option 2 depicted by step 722 wherein the matching signal is longer in length as compared to the first signal and also partially overlaps the first signal. Then at step 724, the first audio signal is repeated entirely through the length of the matching signal to generate the mixed signal.
In this scenario, in order to mix the matching signal entirely, the looping first audio signal will be repeated. The matching signal will be mixed at the recording start time. The output mix will be of a new length, because of the repeated appearance of the looping first audio signal.
Exemplary Python™ language coding:
Coding for mixing the matching audio signal to the looping first audio signal
Coding for finding a matching signal to given first audio signal, and the optimal mix point of both signals
Now referring to
Processor 804 may be disposed in communication with one or more input/output (I/O) devices via an I/O interface 806. I/O interface 806 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monoaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.n/b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
Using I/O interface 806, computer system 802 may communicate with one or more I/O devices. For example, an input device 808 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, sensor (e.g., accelerometer, light sensor, GPS, gyroscope, proximity sensor, or the like), stylus, scanner, storage device, transceiver, video device/source, visors, etc. An output device 810 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light-emitting diode (LED), plasma, or the like), audio speaker, etc. In some embodiments, a transceiver 812 may be disposed in connection with processor 804. Transceiver 812 may facilitate various types of wireless transmission or reception. For example, transceiver 812 may include an antenna operatively connected to a transceiver chip (e.g., Texas Instruments WiLink WL1283, Broadcom BCM4760IUB8, Infineon Technologies X-Gold 618-PMB9800, or the like), providing IEEE 802.11a/b/g/n, Bluetooth, FM, global positioning system (GPS), 2G/3G HSDPA/HSUPA communications, etc.
In some embodiments, processor 804 may be disposed in communication with a communication network 814 via a network interface 816. Network interface 816 may communicate with communication network 814. Network interface 816 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/internet protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. Communication network 814 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using network interface 816 and communication network 814, computer system 802 may communicate with devices 818, 820, and 822. These devices may include, without limitation, personal computer(s), server(s), fax machines, printers, scanners, various mobile devices such as cellular telephones, smartphones (e.g., Apple iPhone, Blackberry, Android-based phones, etc.), tablet computers, eBook readers (Amazon Kindle, Nook, etc.), laptop computers, notebooks, or the like. In some embodiments, the computer system 802 may itself embody one or more of these devices.
In some embodiments, processor 804 may be disposed in communication with one or more memory devices (e.g., a RAM 826, a ROM 828, etc.) via a storage interface 824. Storage interface 824 may connect to memory devices 730 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as serial advanced technology attachment (SATA), integrated drive electronics (IDE), IEEE-1394, universal serial bus (USB), fiber channel, small computer systems interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, redundant array of independent discs (RAID), solid-state memory devices, solid-state drives, etc.
Memory devices 830 may store a collection of program or database components, including, without limitation, an operating system 832, a user interface application 834, a web browser 836, a mail server 838, a mail client 840, a user/application data 842 (e.g., any data variables or data records discussed in this disclosure), etc. Operating system 832 may facilitate resource management and operation of computer system 802. Examples of operating system 832 include, without limitation, Apple Macintosh OS X, Unix, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD, NetBSD, OpenBSD, etc.), Linux distributions (e.g., Red Hat, Ubuntu, Kubuntu, etc.), IBM OS/2, Microsoft Windows (XP, Vista/7/8, etc.), Apple iOS, Google Android, Blackberry OS, or the like. User interface 834 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to computer system 802, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, etc. Graphical user interfaces (GUIs) may be employed, including, without limitation, Apple Macintosh operating systems' Aqua, IBM OS/2, Microsoft Windows (e.g., Aero, Metro, etc.), Unix X-Windows, web interface libraries (e.g., ActiveX, Java, Javascript, AJAX, HTML, Adobe Flash, etc.), or the like.
In some embodiments, computer system 802 may implement web browser 836 stored program component. Web browser 836 may be a hypertext viewing application, such as Microsoft Internet Explorer, Google Chrome, Mozilla Firefox, Apple Safari, etc. Secure web browsing may be provided using HTTPS (secure hypertext transport protocol), secure sockets layer (SSL), Transport Layer Security (TLS), etc. Web browsers may utilize facilities such as AJAX, DHTML, Adobe Flash, JavaScript, Java, application programming interfaces (APIs), etc. In some embodiments, computer system 802 may implement mail server 838 stored program component. Mail server 838 may be an Internet mail server such as Microsoft Exchange, or the like. Mail server 838 may utilize facilities such as ASP, ActiveX, ANSI C++/C#, Microsoft .NET, CGI scripts, Java, JavaScript, PERL, PHP, Python, WebObjects, etc. Mail server 838 may utilize communication protocols such as internet message access protocol (IMAP), messaging application programming interface (MAPI), Microsoft Exchange, post office protocol (POP), simple mail transfer protocol (SMTP), or the like. In some embodiments, computer system 802 may implement mail client 840 stored program component. Mail client 840 may be a mail viewing application, such as Apple Mail, Microsoft Entourage, Microsoft Outlook, Mozilla Thunderbird, etc.
In some embodiments, computer system 802 may store user/application data 842, such as the data, variables, records, etc. as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase. Alternatively, such databases may be implemented using standardized data structures, such as an array, hash, linked list, struct, structured text file (e.g., XML), table, or as object-oriented databases (e.g., using ObjectStore, Poet, Zope, etc.). Such databases may be consolidated or distributed, sometimes among the various computer systems discussed above in this disclosure. It is to be understood that the structure and operation of the any computer or database component may be combined, consolidated, or distributed in any working combination.
The order in which the method is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method or alternate methods. Additionally, individual blocks may be deleted from the method without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method may be considered to be implemented in the above described system and/or the apparatus and/or any electronic device (not shown).
The above description does not provide specific details of manufacture or design of the various components. Those of skill in the art are familiar with such details, and unless departures from those techniques are set out, techniques, known, related art or later developed designs and materials should be employed. Those in the art are capable of choosing suitable manufacturing and design details.
Note that throughout the following discussion, numerous references may be made regarding servers, services, engines, modules, interfaces, portals, platforms, or other systems formed from computing devices. It should be appreciated that the use of such terms is deemed to represent one or more computing devices having at least one processor configured to or programmed to execute software instructions stored on a computer readable tangible, non-transitory medium or also referred to as a processor-readable medium. For example, a server can include one or more computers operating as a web server, database server, or other type of computer server in a manner to fulfill described roles, responsibilities, or functions. Within the context of this document, the disclosed devices or systems are also deemed to comprise computing devices having a processor and a non-transitory memory storing instructions executable by the processor that cause the device to control, manage, or otherwise manipulate the features of the devices or systems.
Some portions of the detailed description herein are presented in terms of algorithms and symbolic representations of operations on data bits performed by conventional computer components, including a central processing unit (CPU), memory storage devices for the CPU, and connected display devices. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is generally perceived as a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the discussion herein, it is appreciated that throughout the description, discussions utilizing terms such as “generating,” or “monitoring,” or “displaying,” or “tracking,” or “identifying,” “or receiving,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The methods illustrated throughout the specification, may be implemented in a computer program product that may be executed on a computer. The computer program product may comprise a non-transitory computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or the like. Common forms of non-transitory computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other tangible medium from which a computer can read and use.
Alternatively, the method may be implemented in transitory media, such as a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. It will be appreciated that several of the above-disclosed and other features and functions, or alternatives thereof, may be combined into other systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may subsequently be made by those skilled in the art without departing from the scope of the present disclosure as encompassed by the following claims.
The claims, as originally presented and as they may be amended, encompass variations, alternatives, modifications, improvements, equivalents, and substantial equivalents of the embodiments and teachings disclosed herein, including those that are presently unforeseen or unappreciated, and that, for example, may arise from applicants/patentees and others.
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
The present application claims priority from U.S. Provisional Patent Application No. 62/647,766 filed on Mar. 25, 2018, incorporated herein as a reference.
Number | Date | Country | |
---|---|---|---|
62647766 | Mar 2018 | US |