The present subject matter relates to methods, systems and apparatuses that provide an improved speech-based user interface with a lighting device, for example, for navigational guidance to the location of an item within a space, a part of which is illuminated by the lighting device.
The use of voice as an input to a mobile device or computer terminal has become more prevalent as voice recognition systems, such as Siri®, Cortana® Alexa® and Hi Google®, have become easier to use and more accurate with their recognition results. These voice recognition systems may take advantage of positioning systems, such as Global Positioning System (GPS) and positioning systems provided by cellular service providers, and mapping services, such as Google Maps®, to provide outdoor navigation assistance. Information may be provided to the user in audio, e.g., synthesized speech responses, or via the display of the user's device. These examples require that the user has a mobile device or computer terminal at their disposal. In addition, the described systems presume that the user wants to use voice input to their mobile device for navigation purposes which consumes battery life.
Voice-based interfaces have also been used in indoor settings to provide voice-based user commands to lighting devices and other appliances. For example, a lighting device that provides a voice-based interface allowing the user to control the lighting device has been known. A voice based interface also allows the user to obtain information from the Internet, such as stock quotes or sports scores.
Hence, there is room for further improvement in an apparatus for use as a lighting device or system that incorporates a speech-based user interface for assisting a user in locating items within a premises.
An example of an apparatus includes a general illumination light source, a speech-based user interface, a communication interface, a memory, and a processor. The general illumination light source is configured to emit general illumination light for illuminating a space of a premises. The speech-based user interface includes a microphone with an audio coder that detects speech-related audio inputs from a source of speech, and a controllable speaker with an audio decoder. The speaker is configured to output an audio message in a specified direction toward the source of speech. The communication interface is configured to be coupled to a data network and an application server. The memory stores program instructions and is coupled to the processor. The processor is also coupled to the general illumination light source, the audio coder, the audio decoder, and the communication interface. The processor upon executing the programming instructions stored in the memory configures the apparatus to perform functions. The functions include enabling the microphone and audio coder, and outputting an audio greeting or prompt via the controllable speaker. A record and coded data collection process by the microphone and audio coder that detects speech from a specified location beneath the apparatus is initiated. Coded data is received from the audio coder. The coded data is forwarded, via the communication interface, to a natural language processing service for recognition of the coded data. A recognition result is obtained, via the communication interface, from the natural language processing service. The processor processes the recognition result to identify an item identifier. The item identifier is forwarded, via the communication interface, to an application server. A location of the identified item in the premises and navigation-related information is obtained, via the communication interface, from the application server. The obtained location of the identified item and navigation-related information are encoded into an inquiry response for output.
An example of a method is also described. In the method, a directional microphone of a speech-based user interface is enabled to detect sounds in a subarea beneath a lighting device in an area in which the lighting device is located. The speech-based user interface is incorporated in the lighting device. The detected sound is processed to identify speech-related sound from the subarea. A speech prompt is output, via a speaker of the speech-based user interface. The speech prompt is audible to a person within the subarea, and is output from the speaker as speech that has an audio amplitude higher within the subarea than outside the subarea. Upon receipt of a spoken request output by the directional microphone in response to the speech prompt, a voice recognition process based on the speech prompt is initiated. The spoken request includes an item identifier. In response to an output result of the voice recognition process containing an item identifier, a database containing a location within the premises of the item corresponding to the item identifier is accessed. Based on information in the database, navigation instructions enabling traversal by the person from the subarea to the location of the item within a premises are provided via a speaker of the speech-based user interface. The navigation instructions are provided as speech that has an audio amplitude higher within the subarea than outside the subarea.
An example of a system example is also described that includes a premises-related server, a natural language processing service, and a number of lighting devices. The premises-related server configured to provide information related to identified items within a premises. The natural language processing service provides recognition results in response to receipt of coded speech data, and coupled to communicate with the premises-related server via a data network. The number of lighting devices are coupled to the premises-related server. Each lighting device of the number of lighting devices includes a general illumination light source, a speech-based user interface, a communication interface, a memory, and a processor. The general illumination light source is configured to emit general illumination light for illuminating an area of a premises. The speech-based user interface includes a microphone coupled to an audio coder that detects speech-related audio inputs from a source of speech and a controllable speaker coupled to an audio decoder. The speaker is configured to output an audio message in a specified direction for presentation to the source of speech. The communication interface is configured to enable communications of the respective lighting device via the data network. The processor is coupled to the general illumination light source, the audio coder, the audio decoder, the communication interface, and the memory. The processor upon executing the programming instructions stored in the memory configures the lighting device to perform functions. The functions include monitoring coded speech-related sound data provided by the audio coder based on speech-related sound detected by the microphone. Upon identification of encoded speech-related sound data representing a spoken keyword, a source localization process is performed that identifies within the area, a subarea from which the spoken keyword originated. A primary lighting device of the number of lighting devices is identified as being closest to the subarea. In response to the identification of the primary lighting device, responsibility is established for further processing by the primary lighting device. The processor of the primary lighting device is further configured to, in response to the source localization process, identify the subarea, initiate a record and coded data collection process by the microphone and audio coder of the primary lighting device that detects speech from the identified subarea. Coded speech data based on speech originating from the identified subarea is received from the audio coder of the primary lighting device. The coded speech data is forwarded via the communication interface of the primary lighting device to the natural language processing service. A recognition result from the natural language processing service is obtained via the communication interface of the primary lighting device. The recognition result is processed to identify an item identifier. The item identifier is forwarded to the premises-related server via the communication interface of the primary lighting device. A location of the identified item in the premises is obtained from the premises-related server via the communication interface of the primary lighting device. The obtained location with item and location-related data are encoded as an inquiry response for output by the speaker of the primary lighting device. The encoded inquiry response includes an encoded audio response message for output as speech. Audio directional control signals to configure the controllable speaker of the primary lighting device determined to output speech substantially limited to the identified subarea. The encoded inquiry response is forwarded to the audio decoder coupled to the speaker of the primary lighting device. The audio decoder decodes the encoded inquiry response. An audio output generated by the speaker of the primary lighting device includes speech based on the decoded inquiry response and the audio directional control signals. The generated audio output is being substantially limited to the identified subarea of the premises.
Additional objects, advantages and novel features of the examples will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The objects and advantages of the present subject matter may be realized and attained by means of the methodologies, instrumentalities and combinations particularly pointed out in the appended claims.
The drawing figures depict one or more implementations by way of example only, not by way of limitations. In the figures, like reference numerals refer to the same or similar elements.
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent to those skilled in the art that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
Reference now is made in detail to the examples illustrated in the accompanying drawings and discussed below.
In the example, the apparatus 100 may have a covering 105 that distinguishes the apparatus 100 from other devices, including other lighting devices in the premises. In addition to the covering 105 or as an alternative to the covering 105, the apparatus 100 may include a general illumination light source (described in more detail with reference to another example) that illuminates the specified location 120 to indicate to a person, such as P1, where to stand to use the speech-based user interface on the apparatus 100. The specified location 120 may, for example, be a preselected subarea within the premises 110. The apparatus 100 may be tuned to interact with a person, such as P1, standing at the specified location 120; therefore, the apparatus 100 may not respond to speech from persons, such as P2, that are outside the specified location 120. When person P1 moves into the specified location 120, the apparatus 100 as will be explained in more detail with reference to another example, may generate an audio prompt that is directed to and intended to be heard by a person, such as P1, within the extent of the apparatus' audible output 130. For example, when person P1 is interacting with the speech-based user interface of the apparatus 100, the person P2 is not intended to hear any audio messages output by the apparatus 100 because person P2 is outside the extent of the audible output 130.
An example of a configuration of an apparatus, such as 100, will be described in more detail with reference to
In the example of
At a high level, the apparatus 200 may be a lighting fixture or other type of lighting device. As described in the following examples, the apparatus 200 includes a general illumination light source 213, the processor 223, one or more memories 225, a communication interface 241, a microphone(s) 235, 239, and a speaker(s) 237; and the apparatus 200 may include one or more sensors, such as a person detection sensor 233.
As noted, an example of an implementation of the processor is the microprocessor (μP) 223, which serves as the programmable central processing unit of the apparatus 200. The μP 223, for example, may be a type of device similar to microprocessors used in servers, in personal computers or in tablet computers, or in smartphones, or in other general purpose computerized devices. Although the drawing shows a single μP 223, for convenience, the apparatus 200 may use a multi-processor architecture. The μP 223 in the example is of a type configured to communicate data at relatively high speeds via one or more standardized interface buses (not shown).
Typical examples of memories 225 include read only memory (ROM), random access memory (RAM), flash memory, a hard drive, and the like. In this example, the memory or memories 225 store executable programming for the μP 223 as well as data for processing by or resulting from processing of the μP 223.
The example apparatus 200 is a lighting device and therefore, includes a light source, e.g. a set of light emitting diodes 213. The source 213 may be in an existing light fixture or other lighting device coupled to the other device components, or the source 213 may be an incorporated source, e.g. as might be used in a new design or installation. The source 213 may be a general illumination light source configured to emit general illumination light for illuminating a space of a premises. For example, the source 213 may be any type of light source that is suitable to the general illumination application (e.g. task lighting, broad area lighting, object or personnel illumination, information luminance, etc.) desired for the space or area in which the particular apparatus 200 is or will be operated. Although the source 213 in the apparatus 200 may be any suitable type of light source, many such devices will utilize the most modern and efficient sources available, such as solid state light sources, e.g. LED type light sources.
Power is supplied to the light source 213 by an appropriate driver 231. The source driver 231 may be a simple switch controlled by the processor of the device 200, for example, if the source 213 is an incandescent bulb or the like that can be driven directly from the AC current. Power for the apparatus 200 is provided by a power supply circuit (not shown) which supplies appropriate voltage(s)/current(s) to the source driver 231 to power the light source 213 as well as to the components of the device 200. Since the source 213 is shown as LEDs, for example, the driver would be a corresponding type of LED driver as shown at 231. Although not shown, the apparatus 200 may have or connect to a back-up battery or other back-up power source to supply power for some period of time in the event of an interruption of power from the AC mains.
The source driver circuit 231 receives a control signal as an input from the processor 223 of the device 200, to at least turn the source 213 ON/OFF. Depending on the particular type of source 213 and associated driver 231, the processor input may control other characteristics of the source operation, such as dimming of the light output, pulsing of the light output to/from different intensity levels, color characteristics of the light output, or the like. These functions may be used to get the attention of a person and/or indicate the specified location, such as 120.
The apparatus 200 also includes one or more communication interfaces 241. The communication interfaces at least include an interface configured to provide two way data communication for the μP (and thus for the device 200) via a data communication network 227. In the example of
An apparatus like 100 in the
In the example, the apparatus 200 has a speech-based user interface 250 that includes a number of microphones such as 235, 239, an audio coder (processor) 245, one or more speakers 237 and an audio decoder (driver) 246. The number of microphones 235, 239 are configured for detection of speech-related sound and to support associated signal processing to determine direction of detected speech-related sound. For ease of discussion, the description refers to two microphones 235, 239 but more or less microphones may be used depending upon the implementation. Examples of microphones that may be used with the apparatus 200 include digital/analog-type, micro-electro-mechanical system (MEMS), condenser, optical microphones or the like. For example, the microphones 235, 239 with the audio coder or audio processor, 245 may detect speech-related audio inputs from a source of speech, such as a person, a person with a speech synthesizer or robot. For example, the signal processing techniques relate to phase delay of signals from multiple microphones for beamforming (e.g. for directional sound pickup), source localization, blind source separation (to identify and/or characterize different sounds received by the number of microphones 235, 239), and to selectively accept only the desired speech-related sound signal. The apparatus 200 in this example also includes a radio frequency (RF) transceiver, such as 249. The RF transceiver 249 may detect the presence of a mobile phone in the specified location, such as 120, by detecting one or more of a cellular radio frequency, a Bluetooth frequency, or a Wi-Fi frequency. The RF transceiver 249 may also be used to communicate with the mobile device 297 (e.g. via Bluetooth or Wi-Fi) in the specified location or a subarea of the premises. In another example, the apparatus 200 may output ultrasonic encoded signals that are detectable by the mobile device 297. For example, the mobile device 297 microphone and speaker may be configured to respectively detect and output sound in the ultrasonic frequency range. Alternatively, the mobile device 297 may be coupled to a device that detects and outputs audio frequencies in the ultrasonic range In order to avoid detecting mobile phones of persons other than the user of the apparatus 200, the RF transceiver 249 and antenna 248 may be configured with a low gain setting or the like such that any signals transmitted by the RF transceiver 249 are attenuated outside the specified location and do not have sufficient power for reception by a mobile device outside the specified location or subarea. Alternatively, or in addition, the radio frequency transceiver is configured to emit signals at a power setting at which the power of the emitted signals is higher in the specified location or subarea than outside the specified location or subarea. In the space encompassed by the specified location or subarea, the transmit power of the radio frequency transceiver is sufficiently high to normally be received by a mobile device currently within that space. In contrast, in a space outside the specified location or subarea the transmit power of the radio frequency transceiver is sufficiently low that it normally would not be received with sufficient signal strength to be detectable by a mobile device in the space. Alternatively, or in addition, the RF transceiver 249 may utilize an antenna array, such as 248 to shape the radio frequency beam output from the RF transceiver 249 to only transmit and receive in an area substantially within and/or not extending much beyond the specified location.
In the example, the speech-based user interface 250 of the apparatus 200 also includes an audio output component such as one or more speakers 237 configured to provide information output to the user. The one or more speakers 237 may be controllable speakers coupled to an audio decoder or driver, such as 246. The controllable speakers 237 output audio, and are controllable to direct the output audio in a specified direction, in this example for presentation to the source of speech detected via the microphone 235 and/or 239. For example, the speakers 237 may be phased array speakers controllable to output audio that is directed to a person in the specified location 120, and the outputted audio has an amplitude that is higher within the specified location than outside the specified location. In the space encompassed by the specified location 120, the amplitude is sufficiently high to normally be heard by a person currently within that space. In contrast, in a space outside the specified location the amplitude is sufficiently low that it normally would not be heard by a person currently within that space. Alternatively, or in addition, the speakers 237 or additional speakers at, for example, the perimeter the apparatus may be configured to output sound that provides destructive interference. The apparatus may be configured such that the destructive interference occurs at the ears of the person standing outside the specified location to achieve absolute cancellation. For example, the processor 223 and the person detection sensor 233 may be configured to enable tracking of a person immediately outside the specified location and acquire an approximation height of the person. Using this information, the processor may control the speakers 237 or the additional speakers to deliver phase delayed sound directly to the ears of the person outside the specified area. The apparatus 200 may be equipped with additional directional speakers that point outward, away from the covering, such as 105 of
The example apparatus 200 utilizes an audio input circuit that is or includes an audio coder or processor, as shown at 245. The audio coder 245 converts an audio responsive analog signal from the microphone 235 to a digital format and supplies the digital audio to the μP 223 for processing and/or to a memory 225 for temporary storage. The audio coder 245 may also be an audio processor configured to perform tasks such as audio conditioning and noise cancellation. Conversely, the audio decoder 246 receives digitized audio via the bus and converts the digitized audio to an analog signal which the audio decoder 246 outputs to drive the speaker 237. The audio decoder 246 may also receive audio directional control signals to cause the decoder/driver 246 to configure the controllable speakers 237 to output speech substantially limited to an identified subarea of the premises, such as the specified location 120. “Speech” is an analog audio sound that includes spoken/verbal information for human communication. The speakers 237 may be one or more of various types of directional speakers, i.e., speakers that direct sound, such as speech, in a narrow path to a specified location within the premises in which the directed sound has an amplitude higher within the specified location than outside the specified location such that the directed sound is substantially limited to the specific location. The signals to directionalize audio output may be actual signals to adjust aspects of speaker operation; or in a speaker array arrangement, the signals to directionalize audio output may be variations in parameters (e.g., phase and amplitude) superimposed on actual analog audio output signals going from the driver 246 to the speaker components of the array.
The speakers 237 of the speech-based user interface 250 may be of various types of controllable audio output, or audio reproduction, devices. For example, the speaker 237 may be a steerable ultrasonic array that enables sound to be targeted to a relatively small area, such as those described in a MIT thesis paper available at dspace.mit.edu/handle/1721.1/7987 or, for example in U.S. Pat. No. 8,128,342 B2. For example, the audio decoder or parametric speaker driver may be configured to be responsive to an audio message and audio directional control signals. The speaker 237 generates an audio message by outputting component ultrasonic sounds that when combined form speech that is directed, based on the audio directional control signals, to a subarea of an area in proximity to the apparatus 200. The generated audio message is intended, by this directional output to be audible as speech in the subarea, and the speech has a higher amplitude within the subarea than outside the subarea. The subarea may be, for example, the specified location 120 in
In the example, the apparatus 200 may optionally include a camera 240, configured to detect visible user input activity, from which may be ascertained user disposition (e.g., frustration, amazement or the like), user age, or the like. For example, the person using the speech-based navigation service may be a hearing-impaired person, in which case the camera 240 may be used to assist in identifying the hearing-impaired person based on recognizing an approximate age of the person (e.g., an older person is more apt to have a hearing impairment). The apparatus 200 may also have an image (still or video) output component such as a projector 243, or a display in a software configurable lighting device as described in U.S. patent application Ser. No. 15/244,402 which published as US 2017/00618904, the disclosure of which is incorporated herein by reference in its entirety. The display or image output component, such as projector 243, may be configured to provide information, such as navigation results, output to the user in a visual format in the form of, for example, a directional indicator (e.g. arrow or the like), a premises map with item location indicators, for example, on the floor in or near the specified location 120 of
The actual user interface elements, e.g. speaker 237 and/or microphone 235, may be in the apparatus 200 or may be outside the apparatus 200 with some other link to a lighting fixture. If outside the apparatus 200, the link may be a hard media (wire or fiber) or a wireless media.
For example, the apparatus 200 and/or the system 10 can incorporate a voice recognition/command type interface via a lighting device and a network to obtain information, to access item location and premises navigation applications/functions, etc. For example, a user in the lighted space can ask questions related to location information of items held in inventory in the premises by speaking the questions. The system 10, as will be explained in more detail with reference to the other examples, is configured to provide, in response to item location questions received by the microphone 235, navigation-related information relevant to the item location to the user. It may be appropriate at this time to describe a couple specific examples of an apparatus 200.
The example of
The apparatus 400 in the example of
It may be appropriate at this time to discuss a process example that may be performed using the apparatus examples described with reference to
The apparatus 510, such as apparatus 300 and 400, may be installed in a premises, such as a grocery store, a retail establishment, a warehouse, an indoor market, shopping mall, or the like. For example, the apparatus 510 may be affixed to a ceiling of the premises and hang into a portion of the premises frequented by persons, such as an entrance way, an end of an aisle, a customer stand or the like. In addition, the apparatus 510 includes a processor coupled via a communication interface (shown in other examples) to an application specific server 540 (shown in other examples) and a voice recognition service (shown in other examples), such as a natural language processing service 560. The natural language processing service 560 may be hosted on a server within the premise or external to the premises. Examples of the natural language processing service 560 are provided, for example, by Google®, Amazon® (e.g., Alexa®), Apple® (e.g., Siri®), Microsoft® (e.g., Cortana®) and others. The process executed by the system 500 is able to interact with persons with and without a mobile device 580. The availability of a mobile device 580 allows the system 500 to provide services, such as discounts, loyalty/affinity program rewards or the like, and/or augmented navigation, such as store map for presentation on display of mobile device, real time navigation updates or the like, in addition to the item location and navigation-related services. At an initial interaction between the apparatus 510 and a person (not shown in this example), the apparatus 510 may begin the process executed by system 500 using speech-related processes provided through a speech-based user interface of the apparatus 510. The apparatus 510 incorporating the speech-based user interface may be used without a mobile device 580.
As shown in
For example, in response to the detected presence of a person either using a person detection sensor, a mobile device detector, an RF transceiver or detecting via the speech-based user interface a keyword that triggers the speech-based navigation services, the apparatus 510 via a processor may alter a characteristic of the emitted general illumination light, such as continuous light output or white light output, to emphasize a subarea, or specified location, of the area in the vicinity of the apparatus 510. The premises may include signage informing persons that the emphasized subarea is where a person is to stand in order to interact with the apparatus to obtain the speech-based navigation service. As discussed above, the subarea may be directly beneath the apparatus 510 or beneath and to a side of the apparatus 510 (at 512). For example, the sub area may be beneath and to the side of the apparatus 510, if the apparatus 510 were angled and not pointed directly downward. The apparatus 510, at 513, may initiate a timer to determine whether the person is interested in using the system 500 or is not interested (e.g., merely passing by the system 500). For example, the person detection sensor may be configured to continuously detect person's presence for a preset amount of time (e.g., 5 seconds, 10 seconds or the like) as a way to confirm a person's intent to use the apparatus 500. If the person's presence is not detected continuously for the preset amount of time, the apparatus 510 returns to the idle state at 511.
The apparatus 510 may optionally, at 514, alter a characteristic of emitted light, such as changing a composition of the emitted general illumination light directed to the subarea by increasing an amount of one of the colors of red, green or blue, or flashing the emitted general illumination light directed to the subarea to indicate the apparatus's readiness to begin receiving speech inputs usable in the speech-related item location and navigation process.
At 515 of
The process 500 proceeds to
The apparatus 510 processor may be configured to perform noise cancellation and echo cancellation of any sounds detected outside the specified location, such that the recording and coded data collection at 517 is of only the speech detected from a specified location beneath the apparatus. The apparatus 510 processor forwards, via a communication interface, such as 241 of
Returning to step 519 of
Continuing with the example at step 519, the person may speak an inquiry or request related to the location of an item in the premises, such as “Where are the Cheerios®?” In addition, the system may be connected to the internet, such as network 295 of
In response to the recognition result from a voice recognition process of the natural language processing service 560 containing tokens that include at least an item identifier, the apparatus 510 may access a database containing a location of the item related to the item identifier within the premises or depending upon the tokens provided in the recognition result, the apparatus 510 may access the internet via a data network, such as 270 in
The application specific server 540 may resolve, at 541, the inquiry and request to generate a database query for a location of an item corresponding to the item identifier(s) in the request. The database may return a query response at 542. The returned query response may include information related to the item and identified item navigation-related information. The query response may include information related to the item(s), such as brand name, size(s) (e.g., 12 ounces, 32 ounces), location(s) of the item(s) in the premises (e.g. aisle 7, end unit A, shelving unit 345, Bay 1). The identified item navigation-related information may include, for example, navigation instructions and landmarks along a path through the premises to the item, to direct the person to the item location in the premises, is forwarded to the apparatus 510. The identified item navigation-related information may include, for example, navigation instructions (e.g., turn left, turn right, walk 5 feet, 6 feet, look up, look down) and landmarks (e.g., support post, an aisle end along a path through the premises to the item, other signs and displays). The navigation instructions enable the person to traverse from the subarea to the location of the item within the premises.
At step 521, the apparatus 510 obtains a location of the identified item in the premises from the application specific server 540. The apparatus 510 may form an inquiry response for speech synthesis by encoding the obtained location of the identified item and navigation-related information as an inquiry response having navigation instructions for output by the apparatus 510 speaker. The encoded inquiry response is forwarded to the apparatus speaker. The audio navigation instructions are output as speech toward the specified location and has an amplitude higher within the specified location than outside the specified location. More specifically, the speaker generates audio information based on the encoded inquiry response, the generated audio information conveying the location of the identified item in the premises and location-related information. The generated audio information is directed to the specified location, and has an amplitude higher within the specified location than outside the specified location. For example, the generated audio information includes the navigation instructions that describe a path through the premises to the identified item location. Alternatively, or in addition, as a graphical output, other devices, such as lighting devices within the premises, may be configured to display directional prompts, such as arrows or flashing lights, or display signage or animated graphics showing a path to the identified item location, or multiple locations if a number of item locations are identified.
At 522, the apparatus 510 may cause the speaker to present an audio prompt audible only to the person in the specified location asking if there is a next question or if further assistance is needed. The processor, at 522, may determine whether another question is being presented by a user. For example, if the apparatus 510 receives a YES response to the audio prompt, the process returns to step 519. If the apparatus 510 receives a NO response to the audio prompt, the process proceeds to step 523. At 523, the apparatus 510 using a radio frequency transceiver, such as a Bluetooth® transceiver, a Wi-Fi transceiver, cellular transceiver or other radio frequency transceiver, or another communication method, such as ultrasonic communications as described above, determines whether a mobile device 580 is detected near (i.e. within a specified area) the person using the apparatus 510. As a note, the process steps 523-527 may occur in parallel with steps 517-522, however, for ease of explanation, the process steps 523-527 are described as occurring serially after steps 517-522.
Returning to the example, if the determination at 523 is NO, a mobile device is not near the person, the process executed by system 500 proceeds to 528 at which the apparatus 528 outputs a farewell to the user. If the determination is YES at 523, the process 500 proceeds to
Upon determining at 523 that a mobile device is near the person using the apparatus 510, the apparatus 510 determines at 524 of
At 581, the mobile device 581 receives the notification, and, at 582, an application (e.g., a retail store branded application, loyalty/affinity program, an indoor positioning program, or the like) associated with the premises executing on the mobile device opens and presents information (e.g., discounts, coupons, maps, item information or the like) on a display device of the mobile device 580. After step 582, the process executed by system 500 returns to 526 and proceeds to 528 to deliver a farewell message.
Returning to step 524, the apparatus 510 may send via a low-power RF signal a query, such as a Bluetooth advertisement packet, that is intended for receipt by a mobile device in the specified location or subarea. If a mobile device is present in the subarea and has Bluetooth enabled, the mobile device, such as 580, receives the advertisement packet and may begin a pairing process with the apparatus 510, which indicates that the mobile device's Bluetooth is active. In response to the determination that YES, the Bluetooth is active in the vicinity of the specified area, and the process executed by system 500 proceeds to step 527. At 527, the apparatus 510 may transmit, or “push”, a data packet containing a URL for a premises coupon and/or location information with respect to the premises to be used by the mobile device. The location information may include a premises map, item locations within the premises and on the map, and other item-related or premises-related information, e.g., sale item locations or cash register availability. The mobile device 580, in response to receiving the transmitted data packet(s), may launch an application related to the premises (e.g., a retail store specific application, a shopping mall, or the like), to receive the location information, which may be information usable by the application executing on the mobile device. In the example, the premises-related application may be previously installed on the mobile device 580 or the data packet may include information for obtaining the application from the internet or a premises server. The mobile device 580 may also provide information to the apparatus 510 that allows the apparatus 510 to uniquely identify the mobile device 580 and also enables the apparatus 510 to provide information related to the identified item to the mobile device 580. For example, the application executing on the mobile device 580 may provide mobile device identifying information to the apparatus 510 which may be passed to the application specific server 540. The application specific server 540 may use the mobile device identifying information to determine the types of items and conditions for a coupon. The application specific server 540 may deliver to the apparatus 510 coupons, discounts and other item related information. The apparatus 510 upon connecting to the mobile device 580 may present coupons, location information of items, navigation related information and the like via a display device and/or an audio device of the mobile device 580.
Upon delivering the data packets to the mobile device 580, the process executed by system 500 proceeds to 528 at which the apparatus 510 delivers a farewell message to the user.
When the apparatus 510 pushes notifications containing information related to the identified item to the mobile device 580 in the specified location or subarea, the apparatus 510 may deliver, via a low power Bluetooth-compatible transmission detectable only by the mobile device 580 within the subarea. The radio frequency signal when decoded by the mobile device 580 includes the location information that may include navigation instructions having item location information to the mobile device that allows the mobile device to present on the display device a map of the premises and a static presentation of navigation instructions to the identified item. The static presentation of navigation instructions may include the presentation of text directions, such as go to aisle 5, turn right, after the in-aisle display of wheat crackers, look to the right at the shelf about 2 feet from the bottom of the shelves for the identified item (e.g., the Cheerios). Or alternatively, the static presentation may include a map of the premises with a line drawn from the location of the apparatus to the Cheerios. Since the presentation is static, the provided navigation instructions would not show the person's progress toward the identified item. Dynamic navigation systems such visible light communication (VLC) indoor positioning and indoor RF position determination systems, may be used to provide a user with their progress toward the identified item. In another alternative, the navigation instructions may be presented via a mobile device's audio output device.
After delivery of the farewell message at 528 is complete, the apparatus 510 disables the microphones at 529, and proceeds to the idle state 511.
In some examples, the location information delivered to the mobile device 580 includes additional content, such as recipes, (if the item is clothing) matching accessories, other items commonly purchased with identified item (e.g., an oil filter if the identified item is a case of motor oil) or the like. Alternatively or in addition at 527, the apparatus 510 may prompt the person to allow the apparatus to access the person's mobile device to access a loyalty program application executing on the mobile device or access information, such as user preferences or other loyalty program information that may be stored on the mobile device or accessible through the mobile device's connection with an external network (e.g., a cellular network, a Wi-Fi network or the like).
In some instances, there may be difficulty with a person's interaction with the apparatus 510. For example, the apparatus 510 may be configured to detect a person's frustration with the apparatus during the process executed by system 500 based on an analysis of repeated requests by the same person for a particular item. In which case, the system 500 may determine that the person is having difficulty and may trigger a customer service alert to a staff member of the premises to provide personal assistance to the person. Upon resolution of the difficulty, the apparatus 510 may be configured to respond to a communication from the staff member causing the apparatus 510 to return to the idle state at 511, or may respond to a determination that a person is no longer present as in step 513.
The above discussion is only a general description of but one example of a process that may be implemented using the apparatuses described in the discussion of the examples in
It is contemplated that additional implementations may be provided that utilize different apparatuses than those of
Each of the apparatuses 660 in this example may operate as a speech-based user interface, which cooperate by using keyword active listening to locate and identify persons requesting speech-based navigation assistance. The apparatus 660 includes a microphone 661 and a speaker 662. The microphone 661 may be an omnidirectional microphone or an array of microphones. The general illumination light source 630 is configured to emit general illumination light for illuminating a space in the premises 610. Each of the remaining lighting devices L2-L5 is configured in a manner similar to lighting device L1, and therefore a detailed discussion of each lighting device will be omitted. However, the person detection sensor 631 may be included as part of the lighting device L1 to provide the additional benefit of providing power management and/or energy conservation features to the system 600. For example, the detection sensor detector 631 may be used in combination with the microphone 661 to provide an indication of whether persons are in the vicinity of the lighting devices L1-L5. Based on the indication that a person is not detected via the detection sensor detector 631, no speech, for example, from a conversation, and/or certain noises generated by a person, such as footsteps, a cart moving down an aisle or the like, is detected via the microphone 661, the respective lighting device light source may be turned OFF or dimmed.
In addition to a number of lighting devices L1-L5, the system 600 includes a premises network 607 and a premises-related server 620. The lighting devices L1-L5 and server 620 may be coupled to the premises network 607. The premises network 607 may also be a lighting-control network that enables control of the light sources of the lighting devices L1-L5. Each of the lighting devices L1-L5 may be commissioned into the lighting-control network. The lighting devices L1-L5 and server 620 may be coupled to detect a keyword based inquiry and output an audio message in response to the detected keyword based inquiry.
The lighting devices L1-L5 have a similar hardware configuration as described with reference to earlier examples. However, aspects of the lighting devices L1-L5 may be different. For example, an example of apparatus 660 will be described in more detail with reference to the apparatus 700 of
The radial array of microphones 719 may each detect sound and be coupled to an audio coder that provides the coded sound data to a processor for keyword detection analysis. Keyword detection analysis may be a speech recognition algorithm intended to recognize the utterance of particular set of keywords. Alternatively, each microphone of the radial array of microphones may be coupled to a processor. In this alternative example, the processor is configured to encode the analog signals received from the microphones into encoded sound data. The audio processor is further configured to analyze the encoded sound data from each of the microphones to identify from which direction the detected sound was received. Different forms of such sound data analysis are known, for example, spatial perception, sound localization, blind source separation or the like may be utilized. In addition, the audio processor may also be configured to perform echo cancelation and/or other noise suppression techniques.
A benefit of the apparatus 700 is that the radial array of microphones and controllable speaker permits a person using the speech-related navigation service to move about the premises as compared to hypercardioid microphone in the example of
For example, the system 600 may perform a speech-related navigation process similar to that described with reference to the process example shown in
In the operational example, the system 600 including the lighting devices L1-15 and the premises server 620 are located in a premises 610 that is, for example, a “brand name retail store” or the like. The items 650 and 651 may be maintained on shelving bays 1 and 4. Shelving bays 2 and 3 may also store items, but for ease of illustration none are shown. In an alternative example, each of the lighting devices L1-L5 is shown coupled to a server 620 either via a wired or wireless connection. The processor of each lighting device L1-L5 may forward the encoded sound data via the wired or wireless connection to the server 620.
The operation of the system 600 will be described in more detail with reference to the flowchart of
A processor in each of the lighting devices L1-L5 is configured to perform the following process 800 of
For example, the processor, such as 635 of lighting device L1 as shown in
In addition or alternatively, the server 620 may be configured to receive the encoded sound data from each of the lighting devices L1-L5, and perform a blind source separation algorithm to determine which microphones of the lighting devices L1-L5 detected the keyword.
When a subarea is identified as the location of the source of the spoken keyword, a lighting device that is determined to be closest to the subarea is identified as a primary lighting device of the number of lighting devices (820). For example, the lighting devices may be commissioned into a lighting-control network, and as a result of the commissioning the locations of each of the lighting devices L1-L5 within the premises is known. Based on identified location of the source of the spoken keyword, a lighting device L1-L5 may be selected based on the location of the light device provided during commissioning. Commissioning within the lighting-control network also allows an additional benefit of utilizing the sound detection capabilities of the lighting devices L1-L5 to turn off light sources or dim light sources of lighting devices in areas in which persons are not detected either by noting the lack of conversation or an absence of presence signals output by the person detection sensors. In response to the identification of the primary lighting device, the system 600 establishes responsibility for further processing by the primary lighting device. For example, the person P10 may have been the person who uttered the keyword, and as such lighting device L1, which is closest to person P10 is designated the primary lighting device. When designated as a primary lighting device, the primary lighting device processor performs the communication functions with the person.
In particular, the primary lighting device processor, in response to the source localization process identifying the subarea, initiate a record and coded data collection process by the microphone and audio coder of the primary lighting device that only detects speech from the identified subarea (825). The primary lighting device is provided with the location of the subarea within the area. The location of the subarea may be provided as grid coordinates, latitude and longitude, or the like. The primary lighting device processor may be further configured, to use the location of the subarea to tune the radial microphone array to focus on detecting speech-related sounds from the direction of the subarea. The processor may determine audio direction control signals to configure the controllable speaker to output speech to the identified subarea (827). For example, the audio direction control signals may be used by the processor to tune the ultrasonic transducer array of the speaker to direct all sound output by the speaker toward the subarea, so that the outputted sound has an amplitude that is higher within the subarea than areas outside the subarea. In the space encompassed by the identified subarea, the audio amplitude is sufficiently high to normally be heard by a person currently within that space. In contrast, in a space outside the identified subarea, the audio amplitude is sufficiently low that it normally would not be heard by a person currently within that space.
The subarea may have dimensions such as an approximately 4 feet by 4 feet area or smaller, such as an approximately 2 feet by 2 feet area. The subarea is not limited to being square, but may also be rectangular, circular, or another shape. For example, the subarea may be circular with a diameter of approximately 3 feet, or the like. The foregoing dimensions are only examples, and the actual size of the subarea depends on various factors, such as the distance the subarea is away from a particular lighting device, the angles between the lighting device and the subarea, configuration of shelving, and the like. In this configuration, the primary lighting device processor proceeds to execute a process similar to that described with reference to
For example, person P10 speaks a request, such as “In what aisle are the Cheerios located?” The processor of the primary lighting device (i.e. L1) receives coded speech data from the audio coder or audio processor coupled to the radial microphone array (830). The coded speech data is forwarded (at 835), via the communication interface, such as 636 of
The primary lighting device obtains, via the communication interface, a recognition result from the natural language processing service (840). The processor of the primary lighting device processes the recognition result to identify an item identifier in the recognition result (845). Upon identifying the item identifier, the processor may forward the item identifier to a premises-related server (850). The premises-related server may access a database, such as 618, to retrieve information related to the item identifier, and returns the item identifier information. The item identifier information may include a stock number or UPC of the item, an item description (e.g., size, shape, packaging type, such as can, bottle, box, or the like), and/or the item location expressed in grid coordinates, aisle and bay or shelf number, latitude and longitude or the like. The primary lighting device processor obtains at a location of the identified item in the premises from the item identifier information provided premises-related server (855).
At 860, the obtained location with item and location-related data is encoded by the processor as an inquiry response for output by the speaker. For example, the processor encodes the obtained location with item and location-related data as an inquiry response for output by the speaker. The location-related data may include navigation instructions to provide the person in the subarea with directions to the item location. The navigation instructions indicate a path through the premises to the identified item location. The encoded inquiry response is forwarded to the audio decoder coupled to the speaker of the primary lighting device (870) for decoding and application to the speaker. The speaker of the primary lighting device generates audio output including speech based on the decoded inquiry response and the audio directional control signals (875). For example, the inquiry response may be presented to the person in the subareas as speech in the form of a spoken message. The generated audio (i.e. speech) is output, via the speaker's directional output capabilities, in a manner substantially limited to the identified subarea of the premises. The generated audio speech output by the speaker is substantially limited to the identified subarea. The generated audio speech has a higher amplitude within the identified subarea than outside the identified subarea. As a result, the chances of distracting others persons, such as P20 near the identified area around P10 are mitigated, and the user P10 has some privacy with regard to their inquiry. For example, the spoken message may state as speech to the identified subarea, “The ketchup that you requested in located is in aisle 9, please turn right, walk past 3 aisles, turn left into aisle 9 after pass the end display of baby food. Once in aisle 9 walk to the shelves with pickles on the right-hand side, and the ketchup will be to the left of the pickles at the second shelf from the top. Should you need further assistance, please let us know.” Of course, the inquiry response may contain information for generating different forms of spoken messages or combinations of pre-arranged inquiry response messages.
In the example of
In an operational example, the mobile device 910 has a processor that executes a retail store application 909. The retail store application 909 receives via the voice input 912 a request spoken by a user of the mobile device 910. The retail store application 909 utilizes the voice assistant system 911 which may be an available natural language processing service, such as Ski, Cortana, OK Google or the like to recognize the spoken request. The voice assistant system 911 provides a recognition result to the retail store application 909. The retail store application 909 may parse the recognition result to locate an item identifier, and may forward the item identifier to the API 917. The API 917 forwards, via a wireless connection, the item identifier to a retail store application API 925 executing on the premises-related server 920. Retail store application API 925 may enable the premises-related server 920 to couple to the database 930. The database 930 may store information related to the premises in which the mobile device 910 is located. In response to receiving the item identifier, the retail store application API 925 may forward a request for location-related data related to the item identifier. The database 930 may return the location-related data corresponding to the item identifier to the premises-related server 920. The premises-related server 920 forwards the location-related data to the mobile device 910. The retail store application 909, in response receiving the location-related data, may process the location-related data to generate navigation instructions for output from the output 915 of the mobile device 910. The navigation instructions may be text-based instructions, speech-related instructions, or map-based for output on one or more of the mobile device's outputs 915.
As shown by the above discussion, at least some functions of devices associated or in communication with the networked system 600 of
A server (see e.g.
A mobile device (see
The mobile device example in
Program aspects of the technology discussed above may be thought of as “products” or “articles of manufacture” typically in the form of executable code and/or associated data (software or firmware) that is carried on or embodied in a type of machine readable medium. “Storage” type media include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software or firmware programming. All or portions of the programming may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a premises-related server into the apparatus 200 of
The term “coupled” as used herein refers to any logical, physical or electrical connection, link or the like by which signals produced by one system element are imparted to another “coupled” element. Unless described otherwise, coupled elements or devices are not necessarily directly connected to one another and may be separated by intermediate components, elements or communication media that may modify, manipulate or carry the signals.
It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “includes,” “including,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
Unless otherwise stated, any and all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain. It is intended by the following claims to claim any and all modifications and variations that fall within the true scope of the present concepts.
Number | Name | Date | Kind |
---|---|---|---|
8128342 | Dunn et al. | Mar 2012 | B2 |
8909382 | Malakuti | Dec 2014 | B1 |
9418115 | Ganick | Aug 2016 | B2 |
9517175 | Sisbot | Dec 2016 | B1 |
9668080 | Sun | May 2017 | B2 |
20070075898 | Markhovsky | Apr 2007 | A1 |
20090171571 | Son | Jul 2009 | A1 |
20100159943 | Salmon | Jun 2010 | A1 |
20110137881 | Cheng | Jun 2011 | A1 |
20120229048 | Archer | Sep 2012 | A1 |
20130026224 | Ganick et al. | Jan 2013 | A1 |
20140280316 | Ganick | Sep 2014 | A1 |
20140354160 | Aggarwal et al. | Dec 2014 | A1 |
20140375222 | Rains, Jr. | Dec 2014 | A1 |
20150102745 | Pijlman | Apr 2015 | A1 |
20150130355 | Rains, Jr. | May 2015 | A1 |
20150371319 | Argue | Dec 2015 | A1 |
20160335917 | Lydecker | Nov 2016 | A1 |
20160373269 | Okubo | Dec 2016 | A1 |
20170134853 | Beaty | May 2017 | A1 |
20170176964 | O'Keeffe | Jun 2017 | A1 |
Entry |
---|
Politis et al., “Sector-Based Parametric Sound Field Reproduction in the Spherical Harmonic Domain”, IEEE Journal of Selected Topics in Signal Processing, 2015, 16 pages. |
Shi, “Investigation of the steerable parametric loudspeaker based on phased array techniques,” Thesis, May 2013, 3 pages. |
Pompei, “Sound From Ultrasound: The Parametric Array as an Audible Sound Source”, MIT Thesis, Jun. 2002, 132 pages. |
Number | Date | Country | |
---|---|---|---|
20180376243 A1 | Dec 2018 | US |