The present application relates generally to audio processing and more specifically, to providing a karaoke system for a mobile device.
Karaoke is a form of interactive entertainment or video game in which (amateur) singers sing along with pre-recorded music (e.g., a music video). The pre-recorded music is typically a known song without the lead vocal (i.e., background music). Lyrics are usually displayed on a video screen, along with a moving symbol, changing color, or music video images, to guide the singer. Backup vocals may also be included in the pre-recording to guide the singer.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
According to embodiments of the present disclosure, a system for karaoke on a mobile device may comprise one or more mobile devices and a computing cloud. In some embodiments, the mobile device comprises at least speakers, a user interface, two or more microphones, and an audio processor. The mobile device may be configured to receive a music track for a song. In some embodiments, a user, via a user interface, may provide options to apply effects to a played music track. In some embodiments, the mobile device may be further configured to record, via microphones, a sound comprising a mix of a user voice and a music audio track. The recording process may be controlled by a user by providing recording control options via the user interface. The recorded sound may be further processed in order to enhance voice and add sound effects based on the processing control options provided by the user via the user interface. In some embodiments, the recorded sound may be re-aligned and mixed with the original music track. In some embodiments, the recorded sound may be uploaded to the cloud and provided for playback on a mobile device.
Embodiments described herein may be practiced on any device configured to receive and/or provide audio such as, but not limited to, personal computers (PCs), tablet computers, phablet computers; mobile devices, cellular phones, phone handsets, headsets, media devices, and the like.
Other example embodiments of the disclosure and aspects will become apparent from the following description taken in conjunction with the following drawings.
Embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
The present disclosure provides example systems and methods for karaoke on one or more mobile devices. Embodiments of the present disclosure may be practiced on any mobile device configurable, for example, to play a music track, record an acoustic sound, process the acoustic sound, store the acoustic sound, transmit the acoustic sound, and upload the processed acoustic sound through a communications network to social media in a cloud, for instance. While some embodiments of the present disclosure are described with reference to operation of a mobile device, the present disclosure may be practiced with any computer system having an audio device for playing and recording sound.
Referring now to
Musical sound produced by transducer(s) 270 of mobile device and a voice of a singing user may be captured by microphones 220 and 230. Although two microphones are shown in this example, other number of microphones may be used in some embodiments. The audio processing system 260 may be configured to record an acoustic sound comprising a mix of the music sound and the voice. Acoustic sounds may comprise singing from one or more singers, background music (e.g., from the one transducers 270), and ambient sounds (e.g., noise and echo). In some embodiments, a user interface may be provided to receive recording control options 310. The audio processing system 260 may be configured to apply the recording control options 310 to the recording process. The recording control options 310 may include noise suppression, acoustic echo cancelation, suppression of the music component in acoustic sound, automatic gain control, and de-reverbing.
In some embodiments, the audio processing system 260 may be further configured to re-align and mix the recorded acoustic sound with the original music track. In some embodiments, a user interface may be provided to receive processing control options 320 to control the re-alignment and mixing of the recorded acoustic sound and original music track. The processing control options 320 may include constant voice volume, and asynchronous sample rate conversion, and “dry music.” The “dry music” option may allow leaving the recorded acoustic sound as is.
In some embodiments, the audio processing system 260 may be further configured to process the recorded acoustic sound. The additional processing control options 330 may be received via a user interface. The additional processing control options 330 may include a parametric and graphic equalizer filter, a multi-band compander, a dynamic range compressor, and an automatic pitch correction.
In some embodiments, the karaoke recording system 300 may include a monitoring channel which may allow a singer or a user to listen (e.g., via transducer(s) 270 to the signal processed acoustic sound when processing and recording the signal processed acoustic sound. The real-time signal processing may be performed when karaoke recording systems are recording the acoustic sound and during playback.
Various embodiments of the karaoke recording and playback system 300 may store raw or original acoustic sound received by the one or more microphones. In some embodiments, signal processed acoustic sounds may be stored. The original acoustic sounds may include cues. Further cues may be determined during signal processing of the original acoustic sound during recording and stored with the original acoustic signals. The cues may include one or more of inter-microphone level difference, level salience, pitch salience, signal type classification, speaker identification, and the like. During playback of recorded audio and, optionally, associated video, the original acoustic sound and recorded cues may be used to alter the audio provided during playback.
By recording the original acoustic sounds and, optionally, the signal processed acoustic sounds, different audio modes, and signal processing configurations may be used to post process the original acoustic sound and may create a different audio effect both directional and non-directional. A user listening to and, optionally, watching the recording may explore options provided by different audio modes without irreversibly losing the original acoustic sounds.
Some embodiments of the karaoke recording system 300 may provide a user interface during playback of recorded audio and optionally video. The user interface may include, for example, one or more controls using buttons, icons, sliders, menus, and so forth for receiving indicia from a user during playback. The controls may include graphics, text, or both. During playback, the user may, for example, play, stop, pause, fast forward, and rewind the recorded audio and, optionally, associated video. The user may also change the audio mode, for example, to reduce noise, focus on one or more sound sources, and the like, during playback. In various embodiments, one or more buttons may be provided which, for example, enable the user to control the playback, and change to a different audio mode or toggle among two or more audio modes. For example, there may be one button corresponding to each audio mode; pressing one of the buttons selects the audio mode corresponding to that button.
According to various embodiments of the karaoke recording system, the user interface may also include controls to combine two or more audio and, optionally, video recordings. For example, each recording may have been recorded at the same or different times, and on the same or different karaoke recording systems. Each recording may be of the same singer or singers (e.g., for a duet, trio, and so forth) where they sing together on one recording, for instance or of different singers. Each recording may be of the same song, complimentary song, similar song, or completely different song. In various embodiments, the controls may allow the user to select recordings to combine, align or synchronize the recordings, control playback of the resulting combination (e.g., duet, trio, quartet, quintet, and so forth), and change to a different audio mode or toggle among two or more audio modes. In some embodiments, alignment or synchronization of the recordings may be performed automatically.
In various embodiments, indicia may be received through the one or more buttons during playback and in real time, the audio provided may be changed responsive to the indicia, without stopping the playback. The audio provided during playback may be in accordance with a default audio mode or a last audio mode selected, until initial or further indicia respectively from the user is received. There may be latency between the user pressing a button and a change in the audio mode, however in some embodiments, the lag may not be perceptible or may be acceptable to the user. For example, the delay may be about 100 milliseconds. In some embodiments, the audio recording system may include faster than real-time signal processing.
According to various embodiments of the karaoke recording system, the audio modes may include two or more of: default, background and foreground, background only, and foreground only. The default audio mode may, for example, include the original and/or signal processed acoustic sound. In the background and foreground audio mode, the audio provided during playback may, for example, include sound from both a primary singer and a background. In the background audio mode, the audio provided during playback may, for example, include sounds from the background to the exclusion of or otherwise attenuate sound from the foreground. In the foreground audio mode, the audio provided during playback may, for example, include sounds from the foreground to the exclusion of or otherwise attenuate sound from the background. Each audio mode may change from the other modes the sound provided during playback such that the audio perspective changes.
The foreground may, for example, include sound originating from one or more audio sources (e.g., singer or singers), background music from speakers, other people, animals, machines, inanimate objects, natural phenomena, and other audio sources that may be visible in a video recording, for instance. The background may, for example, include sound originating from the operator of the karaoke recording system and/or other audio sources (e.g., other primary singers), guidance backup singers, other people, animals, machines, inanimate objects, natural phenomena, and the like.
When combining two or more recordings, there may, for example, be one or more audio modes to include sound from one of the recordings and/or combinations of the recordings to the exclusion of or otherwise attenuate sound from the other recordings not included in the combination. The user interface may also include controls to control the combination of the recordings, e.g., audio mixing, and manipulate each recording's level, frequency content, dynamics, and panoramic position and add effects such as reverb.
A user may switch between different post processing options when listening to the original and/or signal processed acoustic signals in real time, to compare the perceived audio quality of the different audio modes. The audio modes may include different configurations of directional audio capture (e.g., DirAc, Audio Focus, Audio Zoom, etc.) and multimedia processing blocks, (e.g., bass boost, multiband compression, stereo noise bias suppression, equalization filters, and the like). The audio modes may enable a user to select an amount of noise suppression, direction of an audio focus toward one or more singers (e.g., in the same or different recordings, foreground, background, both foreground and background, and the like).
In various embodiments, aspects of the user interface may appear in a screen or display during playback, for example, in response to the user touching a screen. Controls may include buttons for controlling playback (e.g., rewind, play/pause, fast forward, and the like), and controlling the audio mode (e.g., representing emphasis on one or more different recordings in a combination of recordings, and in each recording the foreground only; background only; a combination of foreground and background; a combination of foreground, background, and other sounds or properties of sound that were not included in the original acoustic sound). In some embodiments, in response to a user selection, the audio may dynamically change after a slight delay, but stay synchronized with an optional video, such that the sound selected by the user is provided.
In some embodiments, the audio provided, according to one or more audio mode selections made during playback, may be stored. In various embodiments, the stored acoustic sounds may reflect at least one of the default audio mode, a last audio mode selected, and audio modes selected during playback and applied to respective segments of the original audio sounds and/or processed audio sounds. According to some embodiments, the stored audio may be stored (e.g., on the mobile device, in a cloud computing environment, etc.) and/or disseminated, for example, via social media or sharing website/protocol.
In some embodiments, a user may play a recording of comprising audio and video portions. A user may touch or otherwise actuate a screen during playback and in response buttons may appear (e.g., rewind, play/pause, fast forward buttons, scene, narrator, and the like). The user may touch or otherwise actuate the foreground button and in response, the audio recording system is configured such that the video portion may continue playing with a sound portion modified to provide an experience associated with the foreground audio mode. The user may continue listening to and watching the recording to determine if the user prefers the foreground audio mode. The user may optionally rewind to an earlier time in the recording if desired. Similarly, the user may touch or otherwise actuate a background button and in response, the audio recording system is configured such that the video portion may continue playing with a sound portion modified to provide an experience associated with the background audio mode. The user may continue listening to the recording to determine if the user prefers the background audio mode.
Alternatively or in addition, in certain embodiments, a user may select and play two recordings of the same song by different singers from two different karaoke recording systems. An optional video portion displayed to the user may, for example, include video from the two recordings, e.g., side by side, and/or include the video from one of the recordings based on the audio mode selected. The user may touch or otherwise actuate a button and in response, the audio recording system is configured such that the optional video portion may continue playing with a sound portion modified to emphasize sound from a first recording, e.g. a first audio mode. The user may continue listening to and watching the recording to determine if the user prefers the sound from the first recording. The user may optionally rewind to an earlier time in the recording, if desired. Similarly, the user may touch or otherwise actuate another button and in response, the audio recording system is configured such that the optional video portion may continue playing with a sound portion modified to emphasize sound from a second recording (e.g., a second audio mode). The user may continue listening to the recording to determine if the user prefers the second audio mode.
In some embodiments, the user may determine that a certain audio mode is how the final recording should be stored, the user may press a reprocess button, and the audio recording and playback system may begin processing in the background the entire audio and optionally video according to a last audio mode selected by the user. The user may continue listening and optionally watching or may stop (e.g., exit from an application), while the process continues to completion in the background. The user may track the background process status via the same or a different application.
In some embodiments, the background process may optionally be configured to delete the stored original acoustic sounds associated with the original video, for example, to save space in the karaoke recording system's memory. According to various embodiments, the karaoke recording system may also compress at least one of the audio sounds (e.g., the original acoustic sound, signal processed acoustic sounds, acoustic signals corresponding to one or more of the audio modes, and the like), for example, to conserve space in the karaoke recording system's memory. The user may upload (e.g., to a social media service, the cloud, and the like) the processed audio and video.
In some embodiments, the music track may be provided to a user through one or more transducers 270 (e.g., speakers, headphones, earbuds, and the like). In these embodiments, the acoustic sound being captured by microphones 220 and 230 may be mixed with the music track to be listened to by the user via the transducer(s) 270.
The output sound S2 may be further processed by applying filters, for example, a parametric and graphic equalizer, multi-band compander, dereverbing, etc.. The input music track S1 may be resampled to rate of 24 kHz using an asynchronous sample rate conversion and re-aligned and mixed with the output sound S2. A user interface may be provided to receive mixing control options. The output sound S2 may be resampled to rate of 48 KHz. The output sound S2 may be stored in memory storage 250 or uploaded to a cloud 120.
The output sound S2 may be further processed by applying filters, for example, parametric and graphic equalizer, stereo widening multi-band compander, dynamic range compression, etc. The input music track S1 may be re-aligned and mixed with the output sound S2. A user interface may be provided to receive mixing control options. The output sound S2 may be stored, for example, in memory storage 250 or uploaded to a cloud 120.
The system 800 may capture acoustic sound via microphones 220 and 230. The acoustic sound may comprise a user's voice V, a noise N, and a music S*1′. The acoustic sound may be recorded to generate an output sound S2 in stereo mode with a sampling rate of 48 kHz. The recording of the acoustic sound may include, for example, noise suppression, acoustic echo cancelling, automatic gain control, and de-reverbing. The reference signal for the echo cancellation may be provided from input music track S1. The output sound S2 may be further processed by applying filters using a parametric and graphic equalizer, multi-band compander, and dynamic range compression. The input music track S1 may be re-aligned and mixed with output sound S2. A user interface may be provided to receive mixing control options. The output sound S2 may be stored, for example, in memory storage 250 or uploaded to a cloud 120.
The system 900 may capture acoustic sound via microphones 220 and 230. The acoustic sound may comprise a user's voice V, a noise N, and a music S*1′. The acoustic sound may be recorded to generate an output sound S2 in stereo mode with a sampling rate of 48 kHz. The recording of the acoustic sound may include noise suppression, acoustic echo cancelling, automatic gain control, and de-reverbing. The reference signal for the echo cancellation may be provided from input music track S1.
The output sound S2 may be further processed by applying filters, for example, parametric and graphic equalizer, multi-band compander, dynamic range compression, etc. A voice morphing and automatic pitch correction may be applied to the output sound S2 to enhance the voice component. A user interface may be provided to receive processing control options.
The input music track S1 may be re-aligned and mixed with output sound S2. A user interface may be provided to receive mixing control options. A reverbing may be further applied to output sound S2. The output sound S2 may be stored in memory storage 250 or uploaded to a cloud 120.
The components shown in
Mass storage device 1130, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 1110. Mass storage device 1130 may store the system software for implementing embodiments of the present disclosure for purposes of loading that software into main memory 1120.
Portable storage device 1140 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk, digital video disc, or Universal Serial Bus (USB) storage device, to input and output data and code to and from the computing system 1100 of
Input devices 1160 provide a portion of a user interface. Input devices 1160 may include one or more microphones, an alphanumeric keypad, such as a keyboard, for inputting alphanumeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Input devices 1160 may also include a touchscreen. Additionally, the computing system 1100 as shown in
Graphics display system 1170 may include a liquid crystal display (LCD) or other suitable display device. Graphics display system 1170 receives textual and graphical information and processes the information for output to the display device.
Peripheral devices 1180 may include any type of computer support device to add additional functionality to the computer system.
The components provided in the computing system 1100 of
It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the embodiments provided herein. Computer-readable storage media refer to any medium or media that participate in providing instructions to a central processing unit (CPU), a processor, a microcontroller, or the like. Such media may take forms including, but not limited to, non-volatile and volatile media such as optical or magnetic disks and dynamic memory, respectively. Common forms of computer-readable storage media include a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic storage medium, a Compact Disk Read Only Memory (CD-ROM) disk, digital video disk (DVD), BLU-RAY DISC (BD), any other optical storage medium, Random-Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electronically Erasable Programmable Read Only Memory (EEPROM), flash memory, and/or any other memory chip, module, or cartridge.
In some embodiments, the computing system 1100 may be implemented as a cloud-based computing environment, such as a virtual machine operating within a computing cloud. In other embodiments, the computing system 1100 may itself include a cloud-based computing environment, where the functionalities of the computing system 1100 are executed in a distributed fashion. Thus, the computing system 1100, when configured as a computing cloud, may include pluralities of computing devices in various forms, as will be described in greater detail below.
In general, a cloud-based computing environment is a resource that typically combines the computational power of a large grouping of processors (such as within web servers) and/or that combines the storage capacity of a large grouping of computer memories or storage devices. Systems that provide cloud-based resources may be utilized exclusively by their owners or such systems may be accessible to outside users who deploy applications within the computing infrastructure to obtain the benefit of large computational or storage resources.
The cloud may be formed, for example, by a network of web servers that comprise a plurality of computing devices, such as the computing device 200, with each server (or at least a plurality thereof) providing processor and/or storage resources. These servers may manage workloads provided by multiple users (e.g., cloud resource customers or other users). Typically, each user places workload demands upon the cloud that vary in real-time, sometimes dramatically. The nature and extent of these variations typically depends on the type of business associated with the user.
Thus systems and methods for karaoke on a mobile device have been disclosed. Present disclosure is described above with reference to example embodiments. Therefore, other variations upon the example embodiments are intended to be covered by the present disclosure.
This application claims the benefit of the U.S. Provisional Application No. 61/714,598, filed Oct. 16, 2012, and U.S. Provisional Application No. 61/788,498, filed Mar. 15, 2013. The subject matter of the aforementioned applications are incorporated herein by reference for all purposes to the extent such subject matter is not inconsistent herewith or limiting hereof.
Number | Date | Country | |
---|---|---|---|
61714598 | Oct 2012 | US | |
61788498 | Mar 2013 | US |