The present invention relates to a multi-functional personal digital assistance device, primarily for use in a kitchen, that can converse with and instruct a user in the preparation and planning of meals. More specifically, the present invention is related to multi-functional personal assistant devices that operate based on speech recognition, artificial intelligence, and wireless Internet access.
Artificial intelligence (AI) is a major scientific advance now delivering huge technology rewards in manufacturing, military, automobiles, virtual reality, and other highly technical industries, and now in homes and kitchens. Constant access to the Internet makes large expensive AI processors available as servers to do jobs off-loaded from even simple desktop devices in consumers' homes.
Computers and digital assistants that can respond to user voice commands and inquiries are now using advanced artificial intelligence methods to engage users in natural speech. This makes it possible for users to verbally pose questions and get intelligent and useful answers spoken in response. New devices like the Amazon Echo, Alexa, and Show are able to play music requests, turn on lights, give weather forecasts, and many other skills.
These new devices depend on constant wireless Internet access in order to have access to the kind of artificial intelligence processing needed to parse and understand verbal user commands and inquiries, and access to the wide variety of encyclopedic sources, recipes, cookbooks, music, video, Internet website, and the vast community online.
The primary function of the multi-functional intelligent personal assistant device of the present invention is to provide cooking assistance via step-by-step voice-navigated recipe video tutorials. However, real-time prompts from a human support team for those who might need a little more hand-holding in the kitchen is contemplated and is available as an additional function of the personal assistant device of the present invention. Among the aforementioned functions, the personal assistant device of the present invention is capable to concurrently maintain a lively conversation with the user, express itself through mimicking facial expressions, and keep the user entertained by providing the user with an on-demand and instant access to various music streaming services such as Spotify, Deezer, Google Play All Access, Grooveshark, Last.fm, Pandora Radio, and etc., as well as audio news feeds, and weather forecasts. Finally, the multi-functional intelligent personal assistant device includes voice-activated timers and reminders which are delivered to the user according to the user's preference, such as by the device's own speech, playing of a selected music, sound of an alarm, and etc.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the invention. The summary is not an extensive overview of the invention. It is neither intended to identify key or critical elements of the invention nor to delineate the scope of the invention. The following summary merely presents some concepts of the invention in a simplified form as a prelude to the description below.
Throughout this disclosure, unless the context dictates otherwise, the word “comprise” or variations such as “comprises” or “comprising,” is understood to mean “includes, but is not limited to” such that other elements that are not explicitly mentioned may also be included. Further, unless the context dictates otherwise, use of the term “a” may mean a singular object or element, or it may mean a plurality, or one or more of such objects or elements.
The multi-functional intelligent personal assistant device of the present invention includes a consumer desktop product that can converse with and instruct a user in the preparation and planning of meals. It is packaged in a large egg-shaped shell with a rounded bottom having a low center of gravity such that the device is always able to eventually assume an upright position. The center of gravity can be manipulated with internal motors and gears such that the egg-shaped shell can be made to tilt and even rotate. These actions are choreographed to give the device an expressive personality that can endear and entertain its users. A portion of the upper hemisphere of the egg-shaped device is dedicated to a visual display screen that is used to exhibit an animation of an eyeball to give lifelike personality to the personal assistant device and to further endear and entertain the users. A principal use of the visual display screen, however, is to display menus, recipes, and videos to the users related to food preparation.
The multi-functional intelligent personal assistant device responsive to spoken requests for assistance and information from a user, comprising an egg-shaped main body having a shell that encloses inside at least one microcomputer, audio subsystem, video display subsystem, sensor subsystem, movement control electro-mechanical subsystem, internal bus, battery, battery charger, wireless transceiver that supports connections with the Internet, and plurality of interconnections, wherein the main body has a rounded bottom end and a low center of gravity in the bottom end so as to assume an upright position when leveled, wherein the movement control electro-mechanical subsystem is operative to respond to control of the microcomputer and includes accelerometer or similar sensors allowing to detect tilt (the device position relating to vertical axis) and may include at least one rotation control system and at least one tilt control system, wherein the rotation control system enables the device to rotate on its axis and the tilt control system enables the device to tilt left, right, fore and aft, wherein the rotation control system includes at least one rotation motor mounted to at least one rotor ring and a set of gears to rotate the device on its axis; wherein the tilt control system includes at least a pair of sliding gear motors and at least a pair of ballast weights to tilt the device left, right, fore and aft by manipulating a center of gravity (COG) of the main body of the device, and wherein the sensor subsystem includes at least one user detection system having a plurality of passive infrared sensors (PIT).
The multi-functional intelligent personal assistant device further comprising at least one microphone and at least one speaker included in the audio subsystem, wherein at least one microphone is operative to receive spoken words from the user, and at least one speaker is operative to produce verbal responses, music, and sounds to the user, wherein words and phrases spoken by the user are recorded into audio files by the microcomputer and audio subsystem, and then transmitted wirelessly to servers on the Internet for understanding and responding back to the user using artificial intelligence (AI) processors and commercial application program interfaces (API).
The multi-functional intelligent personal assistant device, further comprising an animation program control of the microcomputer that causes real-time visual changes of the device to be displayed to the user, wherein the visual changes comprise a cartoon animation shown on the video display of the device, and may change to the COG of the device such that the device tilts left, right, fore or aft.
The multi-functional intelligent personal assistant device further comprising a video display program control of the microcomputer that transmits digital pictures, graphics, text, photos, and/or videos through the video display subsystem as a response to a spoken command of the user.
The multi-functional intelligent personal assistant device further comprising the COG of the main body is within two to three centimeters from a bottom end of the device and is such that the whole stands up at attention on the bottom end when laid on a flat surface, and wherein, the COG may be adjustable under program control of the microcomputer through the movement control electro-mechanical subsystem.
The multi-functional intelligent personal assistant device further comprising a plurality of passive infrared sensors (PIR) and a plurality of far-field microphones that are dispersed around the circumference of the device, and at least one pinhole camera with infrared cut-off filter (IR filter).
The multi-functional intelligent personal assistant device further comprising a speech recognition sorter that redirects speech recognition requests and audio files to a human concierge whenever artificial intelligence fails a speech recognition task.
The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
Embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout. Where possible, any terms expressed in the singular form herein are meant to also include the plural form and vice versa, unless explicitly stated otherwise. Also, as used herein, the term “a” and/or “an” shall mean “one or more,” even though the phrase “one or more” is also used herein.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of, and not restrictive on, the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other changes, combinations, omissions, modifications and substitutions, in addition to those set forth in the above paragraphs, are possible. Those skilled in the art will appreciate that various adaptations and modifications of the just described embodiments can be configured without departing from the scope and spirit of the invention. Therefore, it is to be understood that, within the scope of the appended claims, the invention may be practiced other than as specifically described herein.
Embodiments of the present invention include a voice-operated smart assistant with a screen display and an expressive, lifelike personality designed primarily to assist a user with food preparation in a kitchen. Such assistance is provided by the device by its ability to recognize user's commands, and, in response, provide step-by-step voice-navigated recipe video tutorials, music streaming, audio news feeds, weather forecasts, as well as multiple voice-activated timers, and reminders.
A number of passive infrared sensors (PIR) and far-field microphones 310-312 are dispersed around the circumference and used to sense where the user is and to better capture and recognize what the user is saying. A capacitive sense control button 320 is configured as a soft button to take on a variety of different control functions of the device. A main pinhole camera with Infrared cut-off filter (IR filter) 322 permit the user to be imaged and located. In one embodiment, the rounded bottom 308 is automatically turned by an internal motor and gear so that the front and pinhole camera 322 face the user.
One of the embodiments of the present invention contemplates the use of a digital light processing (DLP) projector system to display an image on the display screen for the user, as shown in
In one embodiment, the video images projected onto display screen 506 include cartoon animations of an eyeball, recipes, instructional how-to videos, etc. The display screen 506 is confined by a cutout 512 in an upper inner shell 514, such as above PIR sensors 311 and 312 in shell 302 of
Turning now to
A bowl-shaped mezzanine frame 802 with a flat floor 803 carries a sliding Y-axis gear motor and ballast weight 806 above floor 804 on fixed slider rods 806. Similarly, a sliding X-axis gear motor and ballast weight 808 below floor 804 on fixed slider rods 810. Not shown in
An antenna 1180 is provided for Bluetooth low energy (BLE) and WiFi wireless communication with transceiver 1132. Wireless is the primary way Internet connectivity is supported.
The software needed for the various embodiments includes a cloud application and device firmware. The cloud application provides an applications programming interface (API) to work with the hardware of
Embodiments of the present invention may leverage various external AI service providers such as IBM (IMB Watson), Google (Tensor Flow) and Microsoft (Azure), and etc. to enable the device of the present invention to converse with a user. For example, the IBM Watson Speech to Text service uses speech recognition capabilities to convert Arabic, English, Spanish, French, Brazilian Portuguese, Japanese, and Mandarin speech into text. The transcribing of audio begins with using the microphones to record an audio file, e.g., Waveform Audio File Format (WAV), Free Lossless Audio Codec (FLAC), an audio coding format Opus, and etc. The API can be directed to turn on and recognize audio coming from the microphone in real-time, recognize audio coming from different real-time audio sources, or to recognize audio from a file. In all cases, real-time streaming is available, so as the audio is being sent to the server, partial recognition results are also being returned. The Speech to Text API enables the building of smart apps that are voice triggered.
The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles described herein can be applied to other embodiments without departing from the spirit or scope of the invention. Thus, it is to be understood that the description and drawings presented herein represent a presently preferred embodiment of the invention and are therefore representative of the subject matter which is broadly contemplated by the present invention. It is further understood that the scope of the present invention fully encompasses other embodiments that may become obvious to those skilled in the art and that the scope of the present invention is accordingly limited by nothing other than the appended claims.
Operation Model (business operation, process sheets):
1. Customer talks<1.a
2. Server side use one+AI services to recognize what customer wants>2.a
3. Human Assistance receives new/reopened chat together with any last message from customer >3.a
4. Text-to-speech processor generates an audio file>4
5. Device plays the audio
AI Egg 1208 converts this to “9. Pancakes summary” for device 1204 to display to the user 1202. The user 1202 can then be walked through the preparation with more voice interaction and interpretation.
Referring to
Please note: Speech API and conversational AI—are a third-party server(s)/system(s), while AI EGG and EGGSPERT are parts of HelloE Server
Embodiments of the present invention are not limited to providing recipes and cooking instructions to users preparing foods. For example, kit assembly instructions, user operation manuals, certified maintenance procedures, pre-approved emergency procedures, disaster escape plans, weapons loading, bank procedures, driving instruction, etc. A third party can be the one to launch the action, e.g., “Let's cook pancakes”, or the third party can be the one to receive the pancake recipe and preparations instructions.
Although particular embodiments of the present invention have been described and illustrated, such is not intended to limit the invention. Modifications and changes will no doubt become apparent to those skilled in the art, and it is intended that the invention only be limited by the scope of the appended claims.