The present invention relates to a media rendering system, and more particularly, is related to voice commands for a media rendering system.
Voice initiated playback of digital media is one of the most used features of the commercially available voice agents like Alexa, Siri, and Google Assistant, however the customer is limited to playback of only the digital media services offered by the developer of each voice agent. For example, in the United States Alexa is limited to Amazon Music, Pandora, Spotify, Sirius XM, TuneIn, Deezer, iHearRadio, and Gimmie Radio. Google's Assistant is limited to playback of YouTube Music, Google Play Music, Pandora, and Deezer. Apple's Siri is limited to playback of Apple Music.
Developers of commercially available media rendering devices, for example, so called “smart speakers,” have incorporated these voice agents into their products or systems allowing the customer to select and render media on many more media services than the voice agent developers allow. Furthermore, it is difficult if not impossible for some voice agents to access media services which are owned by a competing agent. For example, as of this writing, Amazon Alexa customers cannot access Google Play media. In addition, some media may be available on one media service (possibly due to licensing restrictions) and not another.
Customers, on the other hand, want choice and expect to be able to access any of their media sources from any voice agent. The customer wants to access and play the requested media regardless of the native capabilities of their smart speaker's voice agent and without having prior knowledge about any licensing arrangements regarding which services offered media by which artists.
A decision of a customer to purchase a smart speaker is currently limited to what media services are offered by the device manufacturer instead of purchasing a speaker based on its sound qualities, aesthetics, or other criteria. Therefore, there is a need in the industry to address one or more of these shortcomings.
Embodiments of the present invention provide a virtual music service. Briefly described, the present invention is directed to a virtual music service that receives an identifier for a media program from a voice agent, queries a first media service and a second media service for the media program, and receives a first response from the first and/or second media service that includes access information for the media program. One of the first and second media services is selected according to the response based on a predetermined selection criteria. The virtual music service provides the access information for the media program from the selected media service to the media rendering device.
Other systems, methods and features of the present invention will be or become apparent to one having ordinary skill in the art upon examining the following drawings and detailed description. It is intended that all such additional systems, methods, and features be included in this description, be within the scope of the present invention and protected by the accompanying claims.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principals of the invention.
The following definitions are useful for interpreting terms applied to features of the embodiments disclosed herein, and are meant only to define elements within the disclosure.
The following definitions are useful for interpreting terms applied to features of the embodiments disclosed herein, and are meant only to define elements within the disclosure.
As used within this disclosure, a “voice agent” is a service or a device that receives a voice utterance (for example, an audio stream), parses the voice utterance into a command, and executes the command. Examples of a voice agent include Alexa, Siri, and Google Assistant, among others.
As used within this disclosure, a “smart media player” is a device configured to render digital media from a plurality of media sources. The media sources, for example, media services, are typically external to the smart media player, for example, in communication with the smart media player via a communication network. The media sources generally transmit a media stream to the smart media player (herein referred to as “streaming”). Within this disclosure, the terms “smart media player” and “media rendering device” are used interchangeably.
As used within this disclosure, “media” generally refers to audio, video, or audio synchronized with video. A media stream refers to a digital transmission of a live or recorded media program provided (“streamed”) via a communication network. The media stream may be associated with metadata related to the media stream, for example, providing information regarding the content of the media stream, listing credits of individuals involved with producing the media being streamed, artwork music lyrics, reviews, promotional material, and other related data.
As used within this disclosure, “rendering” refers to converting a media stream into audio and/or video. This is also referred to as media playback.
As used within this disclosure, an “application program interface (API)” may be thought of as a protocol translator. An API is a set of routines, protocols, and tools for different network elements to communicate with one another. Specifically, a voice agent media service API is an API that allows a voice agent to interact with a particular media service, and a voice media service (VMS) media service API is an API that allows the virtual media service to interact with a media service. Likewise, the virtual media service may interact with a particular voice agent via a voice agent API.
As used within this disclosure, a “skill” is a software interface provided between a voice assistant and a cloud based music service. The skill may be associated with an API.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
As shown by
The smart media player 150 includes a microphone 160 to detect a voice utterance 190 from the user 180. The smart media player 150 conveys the voice utterance 190 to the voice agent 110, for example, in the form of an audio stream. The voice agent 110 receives the voice utterance 190 and parses the voice utterance 190 to formulate the voice utterance 190 into a command descriptor or directive for execution. The command descriptor may be thought of as a description of the desired action to be executed. For purposes of this disclosure, the command descriptor is assumed to be a request to search for, select and/or render digital media. The voice agent 110 may have a plurality of voice agent user preferences 115 distinct from the smart media player user preferences 155. The voice agent 110 may be integral to the smart media player 150, or may be external to the smart media player 150, for example, the smart media player 150 may be resident in the cloud and accessed via a communication network.
The voice agent 110 communicates with a media service via an application program interface (API). In general, the voice agent 110 has a separate API tailored to each media service, for example, a media service API stored in a voice agent media service API store 116. Therefore, the voice agent 110 may typically only have an API for a subset of media services 122, 124, 125 of a set 145 media services available to the user 121-128. As shown by
For example, the voice agent 110 has APIs for N music services. For the example shown by
The voice agent 110 selects a media service, which in general is the default media service E 125 unless otherwise indicated by the voice utterance 190. The voice agent 110 converts the voice utterance 190 into a command descriptor according to the provided media services API, in this case API E 135 for media service E 125. The command descriptor includes an identifier for a media program 194 based upon the voice utterance 190. In general, the voice agent 110 executes a command to select media from the user selected (default) media service only, in this case media service E 125. Via the API 135, the voice agent 110 provides an identifier for a media program 194 to the default media service 125. If the selected media is available from the default media service (media service E 125), the default media service 125 provides the voice agent 110 with a link 191 to the selected media on the default media service 125 via the default media service API 135.
The default media service 125 may also provide the voice agent 110 with metadata 192 related to the selected media via the default media service API 135. For example, if the selected media is an audio recording, the metadata 192 may include the name of the recording artist, the song title, the album name, the recording label, the recording date, an image of the album cover, and/or other information associated with the audio recording.
The voice agent 110 provides the link 191 to the selected media on the default media service 125 and the metadata 192 to the smart media player 150. The smart media player 150 may then access the selected media from the default media service 125 via the link 191. For example, executing the link 191 may cause media service E 125 to stream the selected media to the smart media player 150 via a media stream 195. The smart media player 150 renders the media stream 195, for example, via an audio transducer 170 and/or a video display (not shown).
If the selected media is not available from the default media server 125, the default media server 125 indicates this to the voice agent 110 via the API, for example, via an error message. The voice agent 110 may then convey an audio message to the smart media player 150 which, when rendered as audio by the smart media player 150, informs the user 180 that the voice command failed. For example, the audio of the error message may say, “sorry, I couldn't find that song.”
In the event of such a failure, the user 180 may choose to change the voice agent user preferences 115 to a different default media service. Alternatively, the user may utter a subsequent voice utterance that directs the voice agent 110 to query a non-default media service, for example Media Service B 122, or Media Service 125. However, this may be cumbersome and time consuming, as well as frustrating to the user 180 who may be aware that the selected media is available on another of the media services available to the user 180.
As shown by
As described previously regarding
The voice agent 110 is configured via the voice agent user preferences 115 to select the virtual media service 240 as the default media service, and to access the VMS 240 via a virtual media service API 230. The virtual media service API 230 for the virtual media service 240 preferably has identical or similar inputs and outputs to the voice agent APIs 132, 134, 135 for individual media services 122, 124, 125, for example, receiving as input an identifier for a media program 194 and returning access to the media program, such as a media service link 191 and metadata 192. The virtual media service API 230, like the voice agent media service APIs 132, 134, 135 may also include additional inputs and outputs, for example, user permission data, audio formats, desired streaming data rates, and a media service identifier, among others.
As a result, the VMS 240 interacts with the voice agent 110 via the VMS API 230 in the same or similar manner as an individual media service 122, 124, 125 would interact with the voice agent 110 via an individual media service API 132, 134, 135. Like the individual media service APIs 132, 134, 135, the VMS API 230 provides an identifier for a media program 194 via the VMS 240. Like the individual media service APIs 132, 134, 135, the voice agent receives access to the media program, for example, the link 191 to the selected media on the default media service 125 and the metadata 192 from the VMS API 230. However, instead of providing access to just one media service of the media services 122, 124, 125 that have individual voice agent media service APIs 132, 134, 135, the virtual media service 240 provides access to all media services 121-128 of the aggregated media services 245, even the individual media services 121, 123, 126-128 that do not have an individual voice agent media service API.
Functionality provided by the virtual media service 240 may be executed by one or more modules 350, 360. In general, this functionality includes selecting a media service 121-128 from the aggregated media services 245, and formulating messages to send to the selected media service and interpreting messages received from the selected media service.
The virtual media system 240 includes a media service selection module 350 that prioritizes media services 121-128 of the aggregated media services 245 for search and rank order the results based on rules, for example, via user preferences stored in a media service selection rules store 355, and/or by rules that take commercial considerations into account, for example, agreements between the VMS developers and individual media services. The media service selection module 350 may concurrently or sequentially select a first media service 121 having the highest priority preference and then attempt to obtain the selected media from the first media service 121. If the selected media is not available from the first media service, the media service selection module 350 may select a second media service 122 having the second highest priority preference. This process may continue, for example, selecting a third, fourth, fifth highest priority preference (and so on) until a media service is found that can provide the selected media.
The virtual media service 240 may include a VMS media service API for each media service 121-128 of the aggregated media services 245, for example, stored in a VMS media services API store 365. Each VMS media service API is configured to allow the virtual media service 240 to interact with a particular media service 121-128 of the aggregated media services 245.
Under a second embodiment, shown in
A virtual media service 240 receives an identifier for a media program 194 from a voice agent 110, as shown by block 610. The virtual media service 240 selects a first media service 126 from a plurality of aggregated media services 245, as shown by block 620. For example, the first media service 126 may be selected according to a ranking of media services 121-128 of the aggregated media services 245 as per a plurality of media service selection rules 355 (
If the first response is a fail status, as shown by block 650, the virtual media service 240 selects a second media service 122 from the plurality of aggregated media services and queries the second media service 122 for the media program, as shown by block 670. If the first response is not a fail status, as shown by block 650, the virtual media service 240 forwards the access information (link) for the media program 191 and metadata 192, for example, to the voice agent 110, as shown by block 660. Alternatively, the access information may instead/in addition be forwarded to a media player 852 (
While
An identifier for the media program is received from the voice agent 110, as shown by block 710. A first media service 121 and a second media service 122 from the plurality of aggregated media services 245 are queried for the media program, as shown by block 720. A response is received from the first media service 121 and the second media service 122, each response including access information such as a link 191 for the media program and a description 192 of the media program, as shown by block 730. The first media service 121 or the second media service 122 is selected based on a predetermined selection criteria, as shown by block 740. For example, the selection of the first media service 121 or the second media service 122 may be selected according to a ranking of media services 121-128 of the aggregated media services 245 as per a plurality of media service selection rules 355 (
While
As previously mentioned, the present system for executing the functionality described in detail above may be a server or a computer, an example of which is shown in the schematic diagram of
The processor 502 is a hardware device for executing software, particularly that stored in the memory 506. The processor 502 can be any custom made or commercially available single core or multi-core processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the present system 500, a semiconductor based microprocessor (in the form of a microchip or chip set), a microprocessor, or generally any device for executing software instructions.
The memory 506 can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). Moreover, the memory 506 may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory 506 can have a distributed architecture, where various components are situated remotely from one another, but can be accessed by the processor 502.
The software 508 defines functionality performed by the system 500, in accordance with the present invention. The software 508 in the memory 506 may include one or more separate programs, each of which contains an ordered listing of executable instructions for implementing logical functions of the system 500, as described below. The memory 506 may contain an operating system (O/S) 520. The operating system essentially controls the execution of programs within the system 500 and provides scheduling, input-output control, file and data management, memory management, and communication control and related services.
The I/O devices 510 may include input devices, for example but not limited to, a keyboard, mouse, scanner, microphone, etc. Furthermore, the I/O devices 510 may also include output devices, for example but not limited to, a printer, display, a transducer (speaker), etc. Finally, the I/O devices 510 may further include devices that communicate via both inputs and outputs, for instance but not limited to, a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, or other device.
When the system 500 is in operation, the processor 502 is configured to execute the software 508 stored within the memory 506, to communicate data to and from the memory 506, and to generally control operations of the system 500 pursuant to the software 508, as explained above.
When the functionality of the system 500 is in operation, the processor 502 is configured to execute the software 508 stored within the memory 506, to communicate data to and from the memory 506, and to generally control operations of the system 500 pursuant to the software 508. The operating system 520 is read by the processor 502, perhaps buffered within the processor 502, and then executed.
When the system 500 is implemented in software 508, it should be noted that instructions for implementing the system 500 can be stored on any computer-readable medium for use by or in connection with any computer-related device, system, or method. Such a computer-readable medium may, in some embodiments, correspond to either or both the memory 506 or the storage device 504. In the context of this document, a computer-readable medium is an electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by or in connection with a computer-related device, system, or method. Instructions for implementing the system can be embodied in any computer-readable medium for use by or in connection with the processor or other such instruction execution system, apparatus, or device. Although the processor 502 has been mentioned by way of example, such instruction execution system, apparatus, or device may, in some embodiments, be any computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the processor or other such instruction execution system, apparatus, or device.
Such a computer-readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). Note that the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In an alternative embodiment, where the system 500 is implemented in hardware, the system 500 can be implemented with any or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
The following example scenario explains how an Alexa music skill system works. It should be noted the example includes Alexa specific commands which are included to explain the command flow-through, but are not part of the present invention:
An action (for example, “resolve to playable content”).
A list of resolved entities (for example, artist, album, track, etc.) that were found in the music partner's catalog for that utterance.
An OAuth 2.0 token authenticating the user (only for skills that have enabled account linking).
The cloud based skill adaptor 830 receives and parses the request for the action, the resolved entities, and authentication details. The cloud based skill adaptor 830 uses this information to communicate with the cloud based music service, for
The cloud based skill adaptor 830 communicates with the cloud based music service, for
The cloud based skill adaptor 830 sends a GetPlayableContent response back to the cloud based voice service 810 indicating that the utterance of the user can be satisfied, and includes the identifier for the audio 892.
The Alexa service 850 sends an Initiate API request to the cloud based music service, for
The Alexa service 850 translates the Initiate response into a response on the smart media player 150 and/or an associate networked speaker 875. For example, Alexa might say, “Playing popular songs by The Beatles.” Alexa then queues the first track on the smart media player 150 software for immediate playback.
When the first track is almost done playing on the smart media player 50, the Alexa service requests the next track from the cloud based skill adaptor 830 using a GetNextItem API request. The cloud based skill adaptor 830 returns another playable track to the Alexa service, which is sent to the smart media player 150 for playback. This process repeats until the cloud based skill adaptor 830, in response to a request for the next track, indicates there are no more tracks to play.
From the perspective of the cloud based voice assistant 850, the processes shown in
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. For example,
In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/775,981, filed Dec. 6, 2018, entitled “Virtual Media Service,” which is incorporated by reference herein in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/064639 | 12/5/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62775981 | Dec 2018 | US |