This disclosure relates generally to audio processing and, more particularly, to methods and apparatus to extract a pitch-independent timbre attribute from a media signal.
Timbre (e.g., timbre/timbral attributes) is a quality/character of audio, regardless of audio pitch or loudness. Timbre is what makes two different sounds sound different from each other, even when they have the same pitch and loudness. For example, a guitar and a flute playing the same note at the same amplitude sound different because the guitar and the flute have different timbre. Timbre corresponds to a frequency and time envelope of an audio event (e.g., the distribution of energy along time and frequency). The characteristics of audio that correspond to the perception of timbre include spectrum and envelope.
The figures are not to scale. Wherever possible, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
Audio meters are devices that capture audio signals (e.g., directly or indirectly) to process the audio signals. For example, when a panelist signs up to have their exposure to media monitored by an audience measurement entity, the audience measurement entity may send a technician to the home of the panelist to install a meter (e.g., a media monitor) capable of gathering media exposure data from a media output device(s) (e.g., a television, a radio, a computer, etc.). In another example, meters may correspond to instructions being executed on a processor in smart phones, for example, to process received audio and/or video data to determine characteristics of the media.
Generally, a meter includes or is otherwise connected to an interface to receive media signals directly from a media source or indirectly (e.g., a microphone and/or a magnetic-coupling device to gather ambient audio). For example, when the media output device is “on,” the microphone may receive an acoustic signal transmitted by the media output device. The meter may process the received acoustic signal to determine characteristics of the audio that may be used to characterize and/or identify the audio or a source of the audio. When a meter corresponds to instructions that operate within and/or in conjunction with a media output device to receive audio and/or video signals to be output by the media output device, the meter may process/analyze the incoming audio and/or video signals to directly determine data related to the signals. For example, a meter may operate in a set-top-box, a receiver, a mobile phone, etc. to receive and process incoming audio/video data prior to, during, or after being output by a media output device.
In some examples, audio metering devices/instructions utilize various characteristics of audio to classify and/or identify audio and/or audio sources. Such characteristics may include energies of a media signal, energies of the frequency bands of media signals, discrete cosine transform (DCT) coefficients of a media signal, etc. Examples disclosed herein classify and/or identify media based on timbre of the audio corresponding to a media signal.
Timbre (e.g., timbre/timbral attributes) is a quality/character of audio, regardless of audio pitch or loudness. For example, a guitar and a flute playing the same note at the same amplitude sound different because the guitar and the flute have different timbre. Timbre corresponds to a frequency and time envelope of an audio event (e.g., the distribution of energy along time and frequency). Traditionally, timbre has been characterized though various features. However, timbre has not been extracted from audio, independent of other aspects of the audio (e.g., pitch). Accordingly, identifying media based on pitch-dependent timbre measurements would require a large database of reference pitch-dependent timbres corresponding to timbres for each category and each pitch. Examples disclosed herein extract a pitch-independent timbre log-spectrum from measured audio that is independent from pitch, thereby reducing the resources required to classify and/or identify media based on timbre.
As explained above, the extracted pitch-independent timbre may be used to classify media and/or identify media and/or may be used as part of a signaturing algorithm. For example, extracted pitch-independent timbre attribute (e.g., log-spectrum) may be used to determine that measured audio (e.g., audio samples) corresponds to violin, regardless of the notes being played by the violin. In some examples, the characteristic audio may be used to adjust audio settings of a media output device to provide a better audio experience for a user. For example, some audio equalizer settings may be better suited for audio from a particular instrument and/or genre. Accordingly, examples disclosed herein may adjust the audio equalizer settings of a media output device based on an identified instrument/genre corresponding to an extracted timbre. In another example, extracted pitch-independent timbre may be used to identify a media being output by a media presentation device (e.g., a television, computer, radio, smartphone, tablet, etc.) by comparing the extracted pitch-independent timbre attribute to reference timbre attributes in a database. In this manner, the extracted timbre and/or pitch may be used to provide an audience measurement entity with more detailed media exposure information than conventional techniques that only consider pitch of received audio.
The example audio analyzer 100 of
The example media output device 102 of
The example audio determiner 108 of
The example media interface 200 of
The example audio extractor 202 of
The example audio characteristic extractor 204 of
In some examples, if the example audio characteristic extractor 204 of
The example device interface 206 of the example audio analyzer 100 of
The example device interface 210 of the example audio determiner 108 of
The example timbre processor 212 of
The example audio settings adjuster 216 of
While an example manner of implementing the example audio analyzer 100 and the example audio determiner 108 of
A flowchart representative of example hardware logic or machine readable instructions for implementing the audio analyzer 100 of
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, and (6) B with C.
At block 302, the example media interface 200 receives one or more media signals or samples of media signals (e.g., the example media signal 106). As described above, the example media interface 200 may receive the media signal 106 directly (e.g., as a signal to/from the media output device 102) or indirectly (e.g., as a microphone detecting the media signal by sensing ambient audio). At block 304, the example audio extractor 202 determines if the media signal correspond to video or audio. For example, if the media signal was received using a microphone, the audio extractor 202 determines that the media corresponds to audio. However, if the media signal is received signal, the audio extractor 202 processes the received media signal to determine if the media signal corresponds to audio or video with an audio component. If the example audio extractor 202 determines that the media signal corresponds to audio (block 304: AUDIO), the process continues to block 308. If the example audio extractor 202 determines that the media signal corresponds to video (block 306: VIDEO), the example audio extractor 202 extracts the audio component from the media signal (block 306).
At block 308, the example audio characteristic extractor 204 determines the log-spectrum of the audio signal (e.g., X). For example, the audio characteristic extractor 204 may determine the log-spectrum of the audio signal by performing a CQT. At block 310, the example audio characteristic extractor 204 transforms the log-spectrum into the frequency domain. For example, the audio characteristic extractor 204 performs a FT to the log-spectrum (e.g., F(X)). At block 312, the example audio characteristic extractor 204 determines the magnitude of the transform update (e.g., |F(X)|). At block 314, the example audio characteristic extractor 204 determines the pitch-independent timbre log-spectrum of the audio based on the inverse transform (e.g., inverse FT) of the magnitude of the transform output (e.g., T=F−1|F(X)|). At block 316, the example audio characteristic extractor 204 determines the complex argument of the transform output (e.g., ej arg(F(X))). At block 318, the example audio characteristic extractor 204 determines the timbre-less pitch log-spectrum of the audio based on the inverse transform (e.g., inverse FT) of the complex argument of the transform output (e.g., P=F−1(ej arg(F(X)))).
At block 320, the example audio characteristic extractor 204 determines if the result(s) (e.g., the determined pitch and/or the determined timbre) is satisfactory. As described above in conjunction with
At block 324, the example device interface 206 transmits the results to the example audio determiner 108. At block 326, the example audio characteristic extractor 204 receives a classification and/or identification data corresponding to the audio signal. Alternatively, if the audio determiner 108 was not able to match the timbre of the audio signal to a reference, the device interface 206 may transmit instructions for additional data corresponding to the audio signal. In such examples, the device interface 206 may transmit prompt to a user interface for a user to provide the additional data. Accordingly, the example device interface 206 may provide the additional data to the example audio determiner 108 to generate a new reference timbre attribute. At block 328, the example audio characteristic extractor 204 transmits the classification and/or identification to other connected devices. For example, the audio characteristic extractor 204 may transmit a classification to a user interface to provide the classification to a user.
At block 402, the example device interface 210 receives a measured (e.g., determined or extracted) pitch-less timbre log-spectrum from the example audio analyzer 100. At block 404, the example timbre processor 212 compares the measured pitch-less timbre log-spectrum to the reference pitch-less timbre log-spectra in the example timbre database 214. At block 406, the example timbre processor 212 determines if a match is found between the received pitch-less timbre attribute and the reference pitch-less timbre attributes. If the example timbre processor 212 determines that a match is determined (block 406: YES), the example timbre processor 212 classifies the audio (e.g., identifying instruments and/or genres) and/or identifies media corresponding to the audio based on the match (block 408) using additional data stored in the example timbre database 214 corresponding to the matched reference timbre attribute.
At block 410, the example audio settings adjuster 216 determines whether the audio settings of the media output device 102 can be adjusted. For example, there may be an enabled setting to allow the audio settings of the media output device 102 to be adjusted based on a classification of the audio being output by the example media output device 102. If the example audio settings adjuster 216 determines that the audio settings of the media output device 102 are not to be adjusted (block 410: NO), the process continues to block 414. If the example audio settings adjuster 216 determines that the audio settings of the media output device 102 are to be adjusted (block 410: YES), the example audio settings adjuster 216 determines a media output device setting adjustment based on the classified audio. For example, the example audio settings adjuster 216 may select an audio equalizer setting based on one or more identified instruments and/or an identified genre (e.g., from the timbre or based on the identified instruments) (block 412). At block 414, the example device interface 210 outputs a report corresponding to the classification, identification, and/or media output device setting adjustment. In some examples the device interface 210 outputs the report to another device for further processing/analysis. In some examples, the device interface 210 outputs the report to the example audio analyzer 100 to display the results to a user via a user interface. In some examples, the device interface 210 outputs the report to the example media output device 102 to adjust the audio settings of the media output device 102.
If the example timbre processor 212 determines that a match is not determined (block 406: NO), the example device interface 210 prompts for additional information corresponding to the audio signal (block 416). For example, the device interface 210 may transmit instructions to the example audio analyzer 100 to (A) prompt a user to provide information corresponding to the audio or (B) prompt the audio analyzer 100 to reply with the full audio signal. At block 418, the example timbre database 214 stores the measured timbre-less pitch log-spectrum in conjunction with corresponding data that may have been received.
As described in conjunction with
The processor platform 600 of the illustrated example includes a processor 612. The processor 612 of the illustrated example is hardware. For example, the processor 612 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example media interface 200, the example audio extractor 202, the example audio characteristic extractor 204, and/or the example device interface of
The processor 612 of the illustrated example includes a local memory 613 (e.g., a cache). The processor 612 of the illustrated example is in communication with a main memory including a volatile memory 614 and a non-volatile memory 616 via a bus 618. The volatile memory 614 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 616 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 614, 616 is controlled by a memory controller.
The processor platform 600 of the illustrated example also includes an interface circuit 620. The interface circuit 620 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 622 are connected to the interface circuit 620. The input device(s) 622 permit(s) a user to enter data and/or commands into the processor 612. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 624 are also connected to the interface circuit 620 of the illustrated example. The output devices 624 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 620 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 620 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 626. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 600 of the illustrated example also includes one or more mass storage devices 628 for storing software and/or data. Examples of such mass storage devices 628 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 632 of
The processor platform 700 of the illustrated example includes a processor 712. The processor 712 of the illustrated example is hardware. For example, the processor 712 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example device interface 210, the example timbre processor 212, the example timbre database 214, and/or the example audio settings adjuster 216.
The processor 712 of the illustrated example includes a local memory 713 (e.g., a cache). The processor 712 of the illustrated example is in communication with a main memory including a volatile memory 714 and a non-volatile memory 716 via a bus 718. The volatile memory 714 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 716 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 714, 716 is controlled by a memory controller.
The processor platform 700 of the illustrated example also includes an interface circuit 720. The interface circuit 720 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 722 are connected to the interface circuit 720. The input device(s) 722 permit(s) a user to enter data and/or commands into the processor 712. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 724 are also connected to the interface circuit 720 of the illustrated example. The output devices 724 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 720 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 720 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 726. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 700 of the illustrated example also includes one or more mass storage devices 728 for storing software and/or data. Examples of such mass storage devices 728 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 732 of
From the foregoing, it would be appreciated that the above disclosed method, apparatus, and articles of manufacture extract a pitch-independent timbre attribute from a media signal. Examples disclosed herein determine a pitch-less independent timbre log-spectrum based on audio received directly or indirectly from a media output device. Example disclosed herein further include classifying the audio (e.g., identifying an instrument) based on the timbre and/or identifying a media source (e.g., a song, a video game, an advertisement, etc.) of the audio based on the timbre. Using examples disclosed herein, timbre can be used to classify and/or identify audio with significantly less resources then conventional techniques because the extract timbre is pitch-independent. Accordingly, audio may be classified and/or identified without the need to multiple reference timbre attributes for multiple pitches. Rather, a pitch-independent timbre may be used to classify audio regardless of the pitch.
Although certain example methods, apparatus and articles of manufacture have been described herein, other implementations are possible. The scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
This patent arises from a continuation of U.S. patent application Ser. No. 16/239,238, entitled “METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL,” filed on Jan. 3, 2019, which is a continuation of U.S. patent application Ser. No. 15/920,060, entitled “METHODS AND APPARATUS TO EXTRACT A PITCH-INDEPENDENT TIMBRE ATTRIBUTE FROM A MEDIA SIGNAL,” filed on Mar. 13, 2018. Priority to U.S. patent application Ser. No. 16/239,238 and U.S. patent application Ser. No. 15/920,060 is claimed. U.S. patent application Ser. No. 16/239,238 and U.S. patent application Ser. No. 15/920,060 are incorporated herein by reference in their entireties.
Number | Name | Date | Kind |
---|---|---|---|
2008007 | Dreffein | Jul 1935 | A |
3681530 | Manley | Aug 1972 | A |
6054646 | Pal et al. | Apr 2000 | A |
7406356 | Peeters et al. | Jul 2008 | B2 |
7667125 | Taub et al. | Feb 2010 | B2 |
8311821 | Breebaart et al. | Nov 2012 | B2 |
8942977 | Chen | Jan 2015 | B2 |
9135923 | Chen | Sep 2015 | B1 |
9916834 | Sukowski et al. | Mar 2018 | B2 |
20050108004 | Otani | May 2005 | A1 |
20050211071 | Lu | Sep 2005 | A1 |
20070131096 | Lu | Jun 2007 | A1 |
20070169613 | Kim | Jul 2007 | A1 |
20070174274 | Kim | Jul 2007 | A1 |
20080075303 | Kim | Mar 2008 | A1 |
20080190269 | Eom | Aug 2008 | A1 |
20100154619 | Taub et al. | Jun 2010 | A1 |
20100241423 | Jackson | Sep 2010 | A1 |
20110303075 | McMillen et al. | Dec 2011 | A1 |
20130019739 | Vainiala | Jan 2013 | A1 |
20130151256 | Nakano | Jun 2013 | A1 |
20130339011 | Visser | Dec 2013 | A1 |
20140074469 | Zhidkov | Mar 2014 | A1 |
20150262587 | Chen | Sep 2015 | A1 |
20160037275 | Drullinger | Feb 2016 | A1 |
20160196812 | Rashad | Jul 2016 | A1 |
20170094440 | Brown | Mar 2017 | A1 |
20180018979 | Rolland | Jan 2018 | A1 |
20180276540 | Sep 2018 | A1 | |
20190287506 | Rafii | Sep 2019 | A1 |
Number | Date | Country |
---|---|---|
10-1101384 | Jan 2012 | KR |
10-1757338 | Mar 2016 | KR |
Entry |
---|
Patent Cooperation Treaty, “International Search Report,” mailed in connection with International Patent Application No. PCT/US2019/021865, dated Jun. 27, 2019, 4 pages. |
Patent Cooperation Treaty, “Written Opinion of the International Searching Authority,” mailed in connection with International Patent Application No. PCT/US2019/021865, dated Jun. 27, 2019, 3 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 15/920,060, dated Sep. 11, 2018, 8 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” issued in connection with U.S. Appl. No. 16/239,238, dateed Mar. 22, 2019, 7 pages. |
United States Patent and Trademark Office, “Notice of Allowance and Fee(s) Due,” issued in connection with U.S. Appl. No. 16/239,238, dated Jul. 15, 2019, 9 pages. |
Marozeau, Jeremy et al., “The Dependency of Timbre on Fundamental Frequency,” The Journal of the Acoustical Society of America, Nov. 2003, pp. 2946-2957, 12 pages. |
Number | Date | Country | |
---|---|---|---|
20200051538 A1 | Feb 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16239238 | Jan 2019 | US |
Child | 16659099 | US | |
Parent | 15920060 | Mar 2018 | US |
Child | 16239238 | US |