The present invention is in the field of smart homes, and more particularly in the field of television, robotics and voice assistants.
There is a need to provide useful, robust and automated, services to a person. Many current services are tied to the television (TV) and, therefore, only provided or useful if a user or an object of interest is within view of stationary cameras and/or agents embedded in or stored on the TV. Other current services are tied to a voice assistant such as Alexa, Google Assistant, and Siri. Some voice assistants are stationary, others are provided in a handheld device (usually a smartphone). Again, usage is restricted when the user is not near the stationary voice assistant or is not carrying a handheld voice assistant. Services may further be limited to appliances and items that are capable of communication, for instance over the internet, a wireless network, or a personal network. A user may be out of range of a TV or voice assistant when in need, or an object of interest may be out of range of those agents.
One example of an object of interest is an appliance such as a washing machine. Monitoring appliances can be impossible or very difficult because of the above described conditions. Interfacing the TV with the appliance electronically may be very difficult, for example when the appliance is not communication-enabled. Thus, while at home watching TV, people tend to forget about appliances that are performing household tasks. Sometimes an appliance can finish a task and the user wants to know when it is done. Other times an appliance may have a problem that requires a user's immediate attention. The user may not be able to hear audible alarms or beeps from the appliance when watching TV in another room.
A further problem is that currently available devices and services offer their users inadequate help. For example, a user who comes home from work may have a pattern of turning on a TV and all connected components. Once the components are on, the user may need to press multiple buttons on multiple remote controls to find desired content or to surf to a channel that may offer such content. Currently there are solutions for one-button push solutions to load specific scenes and groups of devices, but they do not load what the user wants immediately, and they do not help to cut down on wait time. Another example of inadequate assistance is during the occurrence of an important family event. Important or noteworthy events may occur when no one is recording audio/video or taking pictures. One participant must act as the recorder or photographer and is unable to be in the pictures without using a selfie-stick or a tripod and timer.
A yet further, but very common, problem is losing things in the home. Forgetting the last placement of a TV remote control, keys, phones, and other small household items is very common. Existing services (e.g., Tile) for locating such items are very limited or non-existent for some commonly misplaced items. One example shortcoming is that a signaling beacon must be attached to an item to locate it. The signaling beacon needs to be capable of determining its location, for example by using Global Positioning System (GPS). Communication may be via Bluetooth (BT), infra-red (IR) light, WiFi, etc. Especially GPS, but also the radio or optical link can require considerable energy, draining batteries quickly. GPS may not be available everywhere in a home, and overall the signaling beacons are costly and inconvenient. Many cellphones include a Find the Phone feature, which allows users to look up the GPS location of their phone or to ring it, if it is on and signed up for the service. However, for many reasons such services and beacons may fail. Further, it is quite possible to lose the devices delivering the location services.
Until now, there has not been a comprehensive solution for the above problems. Embodiments of the invention can solve them all at once.
There is a need to provide useful, robust and automated, services to a person. Many current services are tied to the television (TV) and, therefore, only provided or useful if a user or an object of interest is within view of stationary cameras and/or agents embedded in or stored on the TV. Embodiments of the invention overcome this limitation and provide a method and an apparatus for assisting a TV user.
In a first aspect, an embodiment provides a television (TV) capable of interacting with a robot. The TV and the robot are in a location, and the robot is capable of moving around in the location. The TV includes a camera for capturing local images, an image processor coupled to the camera, a microphone for capturing local sounds, a loudspeaker, a voice assistant coupled with the microphone and loudspeaker, and a wireless transceiver that is capable of performing two-way communication.
The TV is configured to:
(i) communicate with the robot via the wireless transceiver;
(ii) communicate and control other devices via the wireless transceiver;
(iii) communicate with a user via at least one of the voice assistant and the wireless transceiver;
(iv) issue commands to the robot;
(v) receive remote images and remote sounds streamed by the robot;
(vi) monitor the local images and remote images using the image recognition processor;
(vii) recognize objects of interest and beings using the image recognition processor;
(viii) monitor the local sounds and remote sounds in the voice assistant;
(ix) recognize situations based on results from the image recognition processor and the voice assistant; and
(x) monitor the user directly and via the robot.
In an embodiment, the TV is configured to receive information from the robot measured with one or more health status sensors. The TV may also be configured to receive information from sensors for at least one of ambient temperature, infrared light, ultra-violet light, smoke, carbon monoxide, humidity, location, and movement.
The TV may accept commands from an authorized being. A command may include a text, an interaction with a graphical user interface, a voice command, body language, or a gesture.
In a further embodiment, the TV is configured to receive a model of the location from the robot, and to recognize a change in the location. The TV may be configured to recognize, remember and report a placement of an object of interest. It is configured to report the placement of the object of interest to the user if the placement is not regular.
In a yet further embodiment, an object of interest is an appliance, and the TV identifies the state of the appliance and determines a priority for displaying the state and a priority for displaying other TV content, such as news or entertainment. The TV immediately displays the state to the user if the priority for displaying the state is higher than the priority for displaying other TV content.
In even further embodiments, the TV determines if a situation is regular or non-regular, and it takes actions based thereon and based on the type of situation. For example, if the situation includes an emergency, the TV seeks immediate help to mitigate the emergency and keep a being safe. It may also categorize, capture, record, and document the situation.
In a second aspect, an embodiment provides a method for a TV to interact with a robot. The method comprises the steps of receiving and recording a data stream, analyzing it to recognize an object, being, or situation, selecting a recognized object, being, or situation, and determining its status. The embodiment invites a user command based on the selection, and determines from a received user command if the status must be changed. If so, it changes the status directly or via the robot.
A further understanding of the nature and the advantages of particular embodiments disclosed herein may be realized by reference of the remaining portions of the specification and the attached drawings.
The invention is described with reference to the drawings, in which:
There is a need to provide useful, robust and automated, services to a person. Many current services are tied to the television (TV) and, therefore, only provided or useful if a user or an object of interest is within view of stationary cameras and/or agents embedded in or stored on the TV. Embodiments of the invention overcome this limitation and provide a method and an apparatus for assisting a TV user, as described in the following.
Robot 110 may be autonomous, or partially or fully controlled by TV 100. Even if robot 110 is autonomous, TV 100 and robot 110 share a protocol that enables TV 100 to issue commands to robot 110, wherein the commands include implicit or explicit instructions for robot 110 to collect certain information, and to provide the certain information back to TV 100. Robot 110 includes sensors for at least one of ambient temperature, infrared light, ultra-violet light, smoke, carbon monoxide, humidity, location, and movement, and TV 100 is configured to receive and process information from the sensors. In embodiments, TV 100 uses the information to assist a user 190 as further detailed herein.
Robot 110 may be shaped like a human or other animal, e.g. Sony's doggy robot aibo, or like a machine that is capable of locomotion, including a vacuum cleaning robot, or like any other device that may be capable of assisting a TV user.
Location 120 may be building, a home, an apartment, an office, a yard, a shop, a store, or generally any location where a TV user may require assistance.
Voice assistant 170 may be or include a proprietary system, such as Alexa, Echo, Google Assistant, and Siri, or it may be or include a public-domain system. It may be a general-purpose voice assistant system, or it may be application-specific, or application-oriented.
Wireless transceiver 180 may be configured to use any protocol such as WiFi, Bluetooth, Zigbee, ultra-wideband (UWB), Z-Wave, 6LoWPAN, Thread, 2G, 3G, 4G, 5G, LTE, LTE-M1, narrowband IoT (NB-IoT), MiWi, and any other protocol used for RF electromagnetic links, and it may be configured to use an optical link, including infrared IR).
TV 100 may be configured to communicate with user 190 in various ways. It may communicate using texts, sounds, and images. It may communicate directly using voice assistant 170, microphone 150 and loudspeaker 160. Embodiments may show user 190 information or alerts directly on a TV 100 screen, and may monitor user 190 for gestures, or body language in general, using camera 130 and image recognition processor 140. Further embodiments may use wireless transceiver 180 to communicate with user 190, for example when user 190 uses a Bluetooth or WiFi headset. Yet further embodiments may communicate with user 190 via robot 110, or via another third-party device, including via a mobile phone or smartphone, a tablet, or a computer. For example, in an embodiment, TV 100 may call user 190 via a mobile phone network and leave a voice message, a text message, a video message, another type of message, or it may talk with user 190. In an even further embodiment, TV 100 may command robot 110 to make a gesture to user 190. For example, it could command Sony's dog robot aibo to wag its tail or drop its ears.
TV 100 is configured to receive remote images and remote sounds streamed by robot 110. It forwards received remote images to image recognition processor 140 and received remote sounds to voice assistant 170. TV 100 uses image recognition processor 140 and voice assistant 170 to communicate with humans, and/or with animals in general. But also, using image recognition processor 140 and/or voice assistant 170, TV 100 recognizes and monitors beings and objects of interest, and aspects of location 120.
An object of interest may comprise anything commonly found in the household of a user 190, or anything not commonly found but that is particular to user 190. In case location 120 is not a household but, for example, an office, the object of interest may comprise anything commonly or particularly found in the office. In case location 120 is not a household or an office, the object of interest may comprise anything commonly or particularly found in location 120. The being may comprise user 190, a family member, a friend, an acquaintance, a visitor, a pet, a co-worker, or any other human or animal of interest to user 190. The situation may be user-defined, or automatically defined based on artificial intelligence learning techniques. It may be a regular situation or a non-regular situation. The situation may be desired or undesired. It may include an emergency, a party, a burglary, a child's first steps, a wedding, a ceremony, a transgression, or any other event that is relevant to user 190.
Method 300 comprises the following steps.
Step 350—receiving local images from camera 130. An image may be still, or streaming. A still image may be a single image taken from a stream of images.
Step 352—receiving remote images from another source. Again, an image may be still or streaming. The other source may be robot 110, or some other device, appliance, or apparatus.
Step 354—receiving local sounds from microphone 150.
Step 356—receiving remote sounds from another source. The other source may be robot 110, or some other device, appliance, or apparatus.
Step 358 (optional)—receiving data from another sensor. The sensor may be included in robot 110 or in some other device, appliance, or apparatus. The sensor may measure ambient temperature, infrared light, ultra-violet light, smoke, carbon monoxide, humidity, location, movement, or any other physical quality that is relevant for assisting user 190. The sensor may be a health status sensor, for measuring a person's temperature, blood pressure, heart rate, blood oxygenation, brain activity, blood composition, or any other physical quality relevant to the person's health.
Step 360—processing local images from Step 350 and/or remote images from Step 352 in image recognition processor 140 to obtain at least partial results in recognizing an object of interest, a being, or a situation.
Step 364—processing local sounds from Step 354 and/or remote sounds from Step 356 in voice assistant 170 to obtain at least partial results in recognizing an object of interest, a being, or a situation.
Step 368—(optional) processing the data received in Step 358 to additional results in recognizing an object of interest, a being, or a situation.
Step 370—combining one or more at least partial results from steps 360-368 to obtain combined results in recognizing an object of interest, a being, or a situation. The combined results may be final or provisional. The combined results may include one or more candidate final results with probability information.
Step 380 (optional)—based on the combined results, monitoring the object of interest, the being, and/or the situation.
Step 910—Receiving one or more data streams. The data streams may include video and/or audio, and data from any other sensors configured to provide data to the TV. The data streams may come from a camera, microphone, or other sensor built into the TV, from a camera, microphone, or other sensor built into the robot, or from another external camera, microphone, or sensor.
Step 920—Recording at least one of the one or more data streams.
Step 930—Analyzing the at least one of the one or more data streams to recognize an object of interest, a being, and/or a situation. The TV uses an image recognition processor to analyze a video stream, and a voice assistant to analyze an audio stream.
Step 940—(Optional) Instructing the robot to observe additional objects around the object of interest, the being, and/or the situation. Including the additional objects in the analysis.
Step 950—Selecting one of a recognized object of interest, a being, and a situation, and determining its status.
Step 960—Inviting a user to command an action based upon the status and the selected object of interest, being, or situation.
Step 970—Upon receiving a user command, determining if the status must be changed, and upon determining that the status must be changed, changing the status. The TV may change the status directly, or may instruct the robot to change the status, or it may work with the robot to change the status.
Step 980—(Optional) Repeating steps 910-970.
Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive. For example, the illustrations show a dog-shaped robot. However, any shape robot meets the spirit and ambit of the invention, and embodiments may work with a single robot or multiple robots, whatever their shape. The illustrations and examples show a single TV embodying the invention. However, embodiments may spread their methods over multiple TVs that act in parallel and in collaboration. Methods may be implemented in software, stored in a tangible and non-transitory memory, and executed by a single or by multiple processors. Alternatively, methods may be implemented in hardware, for example custom-designed integrated circuits, or field-programmable gate arrays (FPGAs). The examples distinguish between an image recognition processor and a voice assistant. However, the image recognition processor and the voice assistant may share a processor or set of processors, and only be different in the software executed, or in the software routines being executed.
Any suitable programming language can be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
Particular embodiments may be implemented in a computer-readable non-transitory storage medium for use by or in connection with the instruction execution system, apparatus, system, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic, when executed by one or more processors, may be operable to perform that which is described in particular embodiments.
Particular embodiments may be implemented by using a programmed general-purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms may be used. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
A “processor” includes any suitable hardware and/or software system, mechanism or component that processes data, signals or other information. A processor can include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor can perform its functions in “real time,” “offline,” in a “batch mode,” etc. Portions of processing can be performed at different times and at different locations, by different (or the same) processing systems. Examples of processing systems can include servers, clients, end user devices, routers, switches, networked storage, etc. A computer may be any processor in communication with a memory. The memory may be any suitable processor-readable storage medium, such as random-access memory (RAM), read-only memory (ROM), magnetic or optical disk, or other non-transitory media suitable for storing instructions for execution by the processor.
As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.
This application is related to U.S. patent application Ser. No. ______, entitled “Method & Apparatus for Assisting an autonomous robot”, filed on ______, (Attorney Ref. 020699-112710US/Client Ref. 201805922.01) which is hereby incorporated by reference, as if set forth in full in this specification. This application is further related to U.S. patent application Ser. No. ______, ______, filed on filed on ______, (Attorney Ref. 020699-112720US/Client Ref. 201805934.01) which is hereby incorporated by reference, as if set forth in full in this specification.