The present disclosure relates generally to audio processing in electronic devices and, more particularly, to efficient detection of beats in an audio file.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Portable electronic devices are increasingly capable of performing a range of audio operations in addition to simply playing back streams of audio. One such audio operation, crossfading between songs, may take place as one audio stream ends and another begins for a seamless transition between the two audio streams. Typically, an electronic device may crossfade between two audio streams by mixing the two streams over a span of time (e.g., 1-10 seconds), during which the volume level of the first audio stream is slowly decreased while the volume level of the second audio stream is slowly increased.
Some electronic devices may perform a beat-matched, DJ-style crossfade by detecting and matching beats in the audio streams. Conventional techniques for such beat detection in electronic devices may involve complex, resource-intensive processes. These techniques may involve, for example, analyzing a decoded audio stream for certain information indicative of a beat (e.g., energy flux). While such techniques may be accurate, they may consume significant resources and therefore may be unfit for portable electronic devices.
A summary of certain embodiments disclosed herein is set forth below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of these certain embodiments and that these aspects are not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be set forth below.
Embodiments of the present disclosure relate to methods and devices for efficient beat-matched, DJ-style crossfading between audio streams. For example, such a method may involve determining beat locations of a first audio stream and a second audio stream and crossfading the first audio stream and the second audio stream such that the beat locations of the first audio stream are substantially aligned with the beat locations of the second audio stream. The beat locations of the first audio stream or the second audio stream may be determined based at least in part on an analysis of frequency data unpacked from one or more compressed audio files.
Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Present embodiments relate to techniques for beat detection in audio files, which may allow for a beat-matched, DJ-style crossfade operation. Instead of analyzing a fully decoded audio stream to detect locations of beats (which may consume significant resources), present embodiments may involve analyzing a partially decoded audio file to detect such beat locations. Specifically, a compressed audio file representing an audio file may be unpacked (e.g., decomposed into constituent frames of frequency data). After unpacking the compressed audio file into its constituent frames of frequency data, an embodiment of an electronic device may analyze the frames to detect which frames represent likely beat locations in the audio stream the compressed audio file represents. Such likely beat locations may be identified, for example, by analyzing a series of frames of frequency data for certain changes in frequency (a spectral analysis) or for patterns occurring in the sizes of time windows associated with the frames (a time window analysis).
Having identified likely beat locations in certain of the frames of frequency data, the electronic device may extrapolate likely beat locations elsewhere in the audio stream. In some embodiments, these extrapolated likely beat locations may be confirmed by skipping ahead to another series of frames of frequency data of the audio file where a beat has been extrapolated to be located. The electronic device may test whether a likely beat location occurs using, for example, a spectral analysis or a time window analysis. Beat location information associated with the audio file subsequently may be stored in a database or in metadata associated with the audio file.
Having determined beat locations for the audio stream, the electronic device may perform a beat-matched, DJ-style crossfading operation when the audio stream starts to play. Specifically, the electronic device may perform any suitable crossfading technique, aligning the beats of the starting and ending audio streams by aligning the detected likely beat locations and/or scaling the audio streams. As one audio stream ends and the next begins, the two streams may transition seamlessly, DJ-style.
With the foregoing in mind, a general description of suitable electronic devices for performing the presently disclosed techniques is provided below. In particular,
Turning first to
By way of example, the electronic device 10 may represent a block diagram of the handheld device depicted in
In the electronic device 10 of
The audio decoder 20 may efficiently decode compressed audio files (e.g., AAC files, MP3 files, WMA files, and so forth), into a digital audio stream that can be played back to the user of the electronic device 10. While the audio decoder 20 is decoding one audio file for playback, other data processing circuitry (e.g., the processor(s) 12) may detect likely beat locations in the audio file queued to be played next. The transition from playback of the first audio file to the next audio file may be facilitated by the detected beats, allowing for a beat-matched, DJ-style crossfade operation.
The location-sensing circuitry 22 may represent device capabilities for determining the relative or absolute location of electronic device 10. By way of example, the location-sensing circuitry 22 may represent Global Positioning System (GPS) circuitry, algorithms for estimating location based on proximate wireless networks, such as local Wi-Fi networks, and so forth. The I/O interface 24 may enable electronic device 10 to interface with various other electronic devices, as may the network interfaces 26. The network interfaces 26 may include, for example, interfaces for a personal area network (PAN), such as a Bluetooth network, for a local area network (LAN), such as an 802.11x Wi-Fi network, and/or for a wide area network (WAN), such as a 3G cellular network.
Through the network interfaces 26, the electronic device 10 may interface with a wireless headset that includes a microphone 32. The image capture circuitry 28 may enable image and/or video capture, and the accelerometers/magnetometer 30 may observe the movement and/or a relative orientation of the electronic device 10. When employed in connection with a voice-related feature of the electronic device 10, such as a telephone feature or a voice recognition feature, the microphone 32 may obtain an audio signal of a user's voice.
The handheld device 34 may include an enclosure 36 to protect interior components from physical damage and to shield them from electromagnetic interference. The enclosure 36 may surround the display 18, which may display indicator icons 38. The indicator icons 38 may indicate, among other things, a cellular signal strength, Bluetooth connection, and/or battery life. The I/O interfaces 24 may open through the enclosure 36 and may include, for example, a proprietary I/O port from Apple Inc. to connect to external devices. As indicated in
User input structures 40, 42, 44, and 46, in combination with the display 18, may allow a user to control the handheld device 34. For example, the input structure 40 may activate or deactivate the handheld device 34, the input structure 42 may navigate user interface 20 to a home screen, a user-configurable application screen, and/or activate a voice-recognition feature of the handheld device 34, the input structures 44 may provide volume control, and the input structure 46 may toggle between vibrate and ring modes. The microphone 32 may obtain a user's voice for various voice-related features, and a speaker 48 may enable audio playback and/or certain phone capabilities. Headphone input 50 may provide a connection to external speakers and/or headphones.
As illustrated in
Audio files played by the handheld device 34 may be played back on the speakers 48. In accordance with certain embodiments, when multiple audio streams are played in succession, the handheld device 34 may perform a beat-matched, DJ-style crossfade between the audio streams. Since the handheld device 34 may detect the beat locations in the audio files associated with the streams without using excessive resources, the battery life of the handheld device 34 may not suffer despite this functionality.
Such a beat-matched, DJ-style crossfade generally may take place between two audio streams (e.g., audio stream A and audio stream B) as shown by a flowchart 60 of
A plot 70 of
At the start of the plot 70, audio stream A (curve 76) may be the sole audio stream being output by the electronic device 10. Before audio stream A (curve 76) ends at time t2, the electronic device 10 may begin to decode and/or mix audio stream B (curve 78) at time t1. The crossfading of audio streams A (curve 76) and B (curve 78) may take place between times t1 and t2, during which audio stream B (curve 78) may be gradually increased at a relative level coefficient α and audio stream A (curve 76) may be gradually decreased at a relative level coefficient 1−-α. It should be understood that the precise coefficients α and/or 1−α employed during the crossfading operation may vary and, accordingly, need not be linear or symmetrical. Beyond time t2, the electronic device 10 may remain decoding and/or outputting only audio stream B until crossfading to the next audio stream in the same or similar manner.
To ensure that the beats 80 of the audio stream A (curve 76) and audio stream B (curve 78) are aligned during crossfading, the electronic device 10 may scale audio stream A (curve 76) or audio stream B (curve 78) in any suitable manner. Additionally or alternatively, only certain of the beats 80 may be aligned, such as a beat 80 most centrally located in the crossfade operation, to create the perception of beat alignment.
At least the beats 80 of audio stream A (curve 76) or audio stream B (curve 78) may be detected by the electronic device according to the present disclosure.
A compressed audio file 100 (file B) that represents a second audio stream (audio stream B) may be queued for playback by the electronic device 10 after the compressed audio file 90. At any suitable time, including while the audio decoder 20 is actively decoding the compressed audio file 90 into audio stream A, certain data processing circuitry of the electronic device 10 may analyze the compressed audio file 100 for likely beat locations in audio stream B. Performed in certain embodiments as a background task running on the processor(s) 12, the audio file 100 may be only partially decoded before being analyzed. In other embodiments, partial decoding and/or analysis may take place in any suitable data processing circuitry of the electronic device 10.
The compressed audio file 100 may be partially decoded by an unpacking block 102, which may unpack the frequency data 104 from the audio file 100. This frequency data 104 may represent a series of frames or time windows of audio information in the frequency domain. A beat-analyzing block 106 may analyze the frequency data 104 to determine likely locations of beats in the compressed audio file 100 using any suitable manner, many of which are discussed in greater detail below. For example, the beat-analyzing block 106 may analyze certain frequencies of interest over a series of frames of the frequency data 104 for periodic changes indicative of beats (a spectral analysis) or may analyze a series of frames of the frequency data 104 for patterns occurring in the sizes of time windows associated with the frames (a time window analysis).
The likely location of the beats associated with the compressed audio file 100, as determined by the beat-analyzing block 106, may be stored in a beat database 108 in the nonvolatile storage 16. Additionally or alternatively, the determined location of beats in the audio file 100 may be stored as metadata associated with the audio file 100. Moreover, in certain embodiments, the likely beat locations stored in the beat database 108 may be uploaded to an online database of audio file beat location information hosted, for example, by iTunes® by Apple Inc. The online database of audio file beat location information uploaded by other electronic devices 10 may be used to verify or refine the beat location information stored in the beat database 108.
After the audio decoder 20 has finished decoding the compressed audio file 90 (FILE A) and stored the resulting audio stream A in the audio data 98 in the memory 14, the audio decoder 20 may begin to decode the compressed audio file 100 (FILE B). In some embodiments, the audio decoder 20 may decode the compressed audio file 100 in the same manner as the compressed audio file 90 is decoded as shown in
In certain other embodiments, as shown by
After at least the beginning of the compressed audio file 100 (file B) has been decoded and stored in the audio data 98 on the memory 14, the electronic device 10 may begin to perform a beat-matched, DJ-style crossfading operation. For example, as shown in
The crossfading block 116 may mix the audio data 112 and 114 such that a beat-matched crossfading operation takes place, for example, in the manner illustrated by the plot 70 of
As noted above, the beat-analyzing block 106 may detect beats in a compressed audio file in a variety of manners. Notably, these techniques may involve analyzing the partially decoded frequency data 104 rather than the fully decoded audio stream output by the audio decoder 20. As shown in
By way of example, in certain embodiments, the long-term time windows 124 may hold approximately 40 ms of audio information, while the small time slices 122 may represent transients and thus may contain approximately ⅛ that, or approximately 5 ms of audio information. For some types of compressed audio files (e.g., AAC), the short-term time windows 122 may occur in groups of 8, representing approximately the same amount of time as 1 long-term time window 124. In other embodiments, the frames 120 of frequency data 104 may include more than two sizes of time windows, typically varying in size between long-term and short-term lengths of time.
Each of the frames 120 may represent specific frequency information for a given point in time, as represented schematically by a plot 130 of
By analyzing a series of the frames 120, the beat-analyzing block 106 may determine when beats are likely to occur in the compressed audio file being analyzed. As noted above, the beat-analyzing block 106 may detect beats in a compressed audio file through a spectral analysis of a series of frames 120, a time window analysis, or a combination of both techniques. For example, as shown in
In particular, the beat-analyzing block 106 may discern a periodic change occurring in certain frequency bands of the frames 120 of frequency data 104 (block 144). For example, the beat-analyzing block 106 may consider certain changes in a frequency band of interest, such as a bass frequency where beats may commonly be found. As should be appreciated, such a frequency band of interest may be any frequency in which a beat may be expected to occur, such as a frequency commonly associated with a precaution instrument. Note that, in this way, higher frequencies also may serve as frequencies of interest (e.g., cymbals or higher-frequency drums may provide beats in certain songs). These certain periodic changes in frequency over the series of frames 120 may represent beats, and thus the beat-analyzing block 106 may identify them as such.
Based on such detected likely beat locations, the beat-analyzing block 106 may extrapolate other likely beat locations in the compressed audio file beyond the analyzed series of frames 120 (block 146). Additionally, the beat-analyzing block 106 may perform one or more tests to verify that the extrapolated location of beats in the audio file appear to represent likely beat locations. By way of example, the beat-analyzing block 106 may analyze a smaller series of frames 120 in another location in the audio file where beats have been extrapolated and therefore are expected to be located. If the beats do not appear to be present among the expected frames 120, the beat-analyzing block 106 may reevaluate a new series of frames 120 to determine a new set of beat locations and re-extrapolate the beat locations, as discussed in greater detail below. After extrapolating and/or verifying the likely locations of beats in the compressed audio file, the beat-analyzing block 106 may cause these likely beat locations to be stored in the beat database 108 in nonvolatile storage 16 or otherwise to be associated with the metadata of the audio file (block 148) for later use in a beat-matched, DJ-style crossfading operation.
The spectral analysis discussed with reference to
A frequency band of interest 166 represents a specific band of frequencies being analyzed by the beat-analyzing block 106 for certain changes occurring over the series of frames 120. In the plot 160, the frequency band of interest 166 is a band of frequencies in the bass range. However, it should be understood that in other embodiments the frequency band of interest 166 may represent another band of frequencies in the frame 120 of frequency data 104. Also, in some embodiments, the beat-analyzing block 106 may analyze more than one frequency band of interest 166. For example, one frequency band of interest 166 may be a bass frequency, while another frequency band of interest 166 may be a frequency band associated with other percussion instruments (e.g., cymbals or snare drums).
A plot 170 of
Specifically, a plot 180 of
That is, the beat-analyzing block 106 may discern periodic changes in the frequency band of interest 166 by searching for such peaks in the series of frames 120 being analyzed, as shown by a flowchart 190 of
In addition, or alternatively, to such a spectral analysis, the beat-analyzing block 106 may detect beats in a compressed audio file through a time window analysis of a series of frames 120. For example, as shown in
In particular, the beat-analyzing block 106 may discern a periodic change in the occurrence of short-term time windows 122, which represent relatively rapid changes in the compressed audio file being examined, and long-term time windows 124, which represent relatively slower changes in the compressed audio file being examined (block 204). Since beats in an audio stream may be relatively short-lived transient audio events, beats may be understood to generally occur during a period of short-term time windows 122. By analyzing the periodicity of the occurrence of certain time window sizes, likely locations of beats may be determined where groups of short-term time windows 122 repeat periodically. These certain periodic changes in time window size over the series of frames 120 may represent beats, and thus the beat-analyzing block 106 may identify them as such.
Based on such detected likely beat locations, the beat-analyzing block 106 may extrapolate other likely beat locations in the compressed audio file beyond the analyzed series of frames 120 (block 206). Additionally, the beat-analyzing block 106 may perform one or more tests to verify that the extrapolated location of beats in the audio file appear to represent likely beat locations. By way of example, the beat-analyzing block 106 may analyze a smaller series of frames 120 in another location in the audio file where beats have been extrapolated and therefore are expected to be located. If the beats do not appear to be present among the expected frames 120, the beat-analyzing block 106 may reevaluate a new series of frames 120 to determine a new set of beat locations and re-extrapolate the beat locations, as discussed in greater detail below. After extrapolating and/or verifying the likely locations of beats in the compressed audio file, the beat-analyzing block 106 may cause these likely beat locations to be stored in the beat database 108 in nonvolatile storage 16 or otherwise to be associated with the metadata of the audio file (block 208) for later use in a beat-matched, DJ-style crossfading operation.
As discussed above with reference to block 204 of the flowchart 200, the beat-analyzing block 106 running on the processor(s) 12 may consider the periodicity of short-term time windows 122 amid long-term time windows 124 in the series of frames 120.
In the plot 220, non-beat periods 226 may be represented by a series of long-term time windows 124, during which the underlying audio may change relatively slowly over time. These non-beat periods 226 may be punctuated by likely beat periods 228, when the audio information changes relatively quickly over a series of short-term time windows 122. It is during these likely beat periods 228 that the beat-analyzing block 106 may ascertain that a likely beat 230 is present. For example, the beat-analyzing block 106 may assume that a beat is likely to occur in the middle of a series of periodic short-term time windows 122, and thus may select the frame 120 in the center of the likely beat period 228.
While the plot 220 illustrates, by way of example, that likely beats 230 may be found when short-term time windows 122 punctuate long-term time windows 124, it should be understood that the various time window sizes may not neatly form distinct non-beat periods 226 and likely beat periods 228, as illustrated. Under such conditions, the beat-analyzing block 106 may look for a periodic pattern amid the short-term time windows 122 in the series of frames. For example, the beat-analyzing block 106 may seek a series of short-term time windows 122 occurring at a regular interval, even if there are many other series of short-term time windows 122 among the frames 120 of frequency data 104 that occur sporadically.
The spectral analysis and time window analysis approaches may be combined in certain embodiments. For example, as illustrated by a flowchart 240 of
Similarly, a time window analysis of several of the frames 120 of frequency data 104 may be used to identify specific frequencies to serve as a frequency band of interest 166 for use in a subsequent spectral analysis. Such an embodiment is described by a flowchart 250 of
The spectral changes that may occur at the likely location of the beat as determined by the time window analysis may indicate at which frequencies beats are performed in the audio file being analyzed. For example, in some cases, all of the periodic changes in spectrum may take place in a bass region of frequency, indicating that beats are occurring through bass pulses. Thus, it would be beneficial not to spend resources analyzing other frequency bands in the frames 120 during a spectral analysis, since beats are not expected to occur there. As such, the beat-analyzing block 106 may set the frequency band that is changing as the frequency band of interest 166 in a subsequent spectral analysis of other frames (block 258).
As discussed above, after the beat-analyzing block 106 has extrapolated the likely beat locations based on a time window analysis or spectral analysis, or both, of the frames 120 of frequency data 104, the beat-analyzing block 106 may test whether those beats have been correctly extrapolated. For example,
If a beat is not detected in an extrapolated location (decision block 276), an additional beat detection analysis may take place (block 278). This additional beat detection analysis may involve testing all frames 120 of frequency data 104 of the compressed audio file being tested, or may involve testing only the frames 120 near to where beats have been extrapolated and are expected. After the additional beat detection analysis of block 278, the beat-analyzing block 106 may again extrapolate where beats are likely to occur in the untested portions of frequency data 104. As shown by the flowchart 270, this process may repeat until a one or more beats are detected in untested extrapolated locations.
The specific embodiments described above have been shown by way of example, and it should be understood that these embodiments may be susceptible to various modifications and alternative forms. It should be further understood that the claims are not intended to be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and scope of this disclosure.
Number | Name | Date | Kind |
---|---|---|---|
7302396 | Cooke | Nov 2007 | B1 |
7425674 | Moulios et al. | Sep 2008 | B2 |
7678983 | Komori et al. | Mar 2010 | B2 |
20020133357 | McLean | Sep 2002 | A1 |
20020178012 | Wang et al. | Nov 2002 | A1 |
20040254660 | Seefeldt | Dec 2004 | A1 |
20070291958 | Jehan | Dec 2007 | A1 |
20090178542 | Jochelson et al. | Jul 2009 | A1 |
20100063825 | Williams et al. | Mar 2010 | A1 |
Number | Date | Country | |
---|---|---|---|
20120046954 A1 | Feb 2012 | US |