1. Field
This disclosure relates generally to a personal health system, and more specifically but not exclusively, to method and apparatus for identifying a user using the voice recognition technology.
2. Description
A Personal Health System (PHS) gathers patient data readings from approved medical peripherals, aggregates this data, forwards it to a medical facility, and may also perform trending and other analysis on the data. As currently specified, the first version of PHS is a single-user device, and peripherals are connected to the PHS platform via USB or Bluetooth. The PHS is intended to support multiple-user scenarios in the near future. These will likely include multi-patient homes and nursery homes. When multiple patients may use the same PHS, it is necessary to recognize a patient correctly and match data collected from this patient to a right profile. Therefore, it is desirable to employ patient recognition technologies to recognize a patient whether the patient uses the PHS at the center console or at a remote peripheral of the PHS.
The features and advantages of the disclosed subject matter will become apparent from the following detailed description of the subject matter in which:
According to embodiments of the subject matter disclosed in this application, speaker recognition/identification technology may be used to recognize/identify a patient who intends to use a personal health system (“PHS”) and to match collected data to the profile of a right patient. The PHS may be used by multiple patients at different locations via a center console or a remote peripheral. The center console and the remote peripheral is equipped with a voice input/output device to playback prompt from the PHS and to collect voice data from a patient. The peripheral then sends the voice data to the PHS. The PHS uses the voice data collected from either the center console or a peripheral recognize/identify the patient. If the patient is correctly recognized/identified, the patient's profile will be retrieved and measurement taken from the patient may be added in the profile. When multiple patients use the PHS simultaneously at different locations, the PHS may recognize each of the patients and correctly store measurement data from each patient to his/her profile.
Reference in the specification to “one embodiment” or “an embodiment” of the disclosed subject matter means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosed subject matter. Thus, the appearances of the phrase “in one embodiment” appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
Peripherals may be connected to PHS 110 using different approaches. For example, peripheral A 120 may connect to the PHS via a USB (Universal Serial Bus) wire 125; peripheral B 130 may connect to the PHS via a Bluetooth® wireless channel 135; peripheral C 140 may connect to the PHS via a Wi-Fi (Wireless Fidelity) wireless channel 145; peripheral N 150 may connect to the PHS via a WiMAX (Wireless Max) wireless channel 155. These are only a few examples for connections between a peripheral and the PHS. In fact, any wired or wireless technology may be used for connecting a peripheral with the PHS. Additionally, a peripheral may be connected with the PHS via different channels at the same time or at different times. For example, a USB wired channel, a Bluetooth® wireless channel, and a WiMAX wireless channel may exist between a peripheral and the PHS at the same time or at a different time. The peripheral or a user may choose which channel is used during what time period.
PHS 110 may be used in different ways. In one scenario, a patient may access PHS 110 at different locations. For example, a patient may check his data at the center console; he may measure his blood pressure at one peripheral (e.g., peripheral. B 130) at a different time and have the measure data stored in his profile in the PHS. In another scenario, multiple patients may access PHS 110 simultaneously from different locations. For example, person 160 may access the PHS directly at the center console of the PHS; person 170 may access the PHS through peripheral A 120. When multiple patients use a PHS, it is necessary for the PHS to recognize a patient and retrieve the correct profile for the patient. In fact, the privacy law requires that a patient record be kept confidential and not accessed by another person who does not have a lawful right to do so. Even if a PHS is intended to be used by a single user, it is still desirable for the PHS to identify a user as the desired patient before letting the user access the patient's data.
According to an embodiment of the subject matter disclosed in this application, speaker recognition/identification technology may be used for patient recognition and identification. For example, when a user starts using a peripheral or tries to access a PHS through its center console, the user may be prompted to speak a phrase/sentence (e.g., the user's name). For a single-user PHS, the PHS may process the phrase/sentence and tries to identify the user by comparing the processed phrase/sentence with the intended user's model in a database. If the user is identified as the intended user, the PHS will authorize the user to use the peripheral or the center console. For a multi-user PHS, on the other hand, the PHS may process the phrase/sentence and tries to recognize the user by comparing the processed phrase/sentence with models of a number of PHS's intended users. If the user's speech matches with one model, the PHS may verify with the user whether s/he is indeed the recognized user. If the answer is positive, the PHS may pull the user's profile from a database and authorize the user to use the peripheral or the center console. In case that a user fails during the identification/recognition process, the user may be prompted to speak the same phrase/sentence or a different one again. The PHS may perform the identification/recognition process again based on the newly collected phrase/sentence. If the user passes the identification/recognition process this time, the PHS may authorize the user to use the system; otherwise, the user may be asked to go through the identification/recognition process again. If the user continues failing the identification/recognition process for a number of times (e.g., 3 times), the PHS may reject the user and does not allow the user to use the peripheral or the center console.
PHS 110 may comprise a patient management application 210, a data storage device 230, a data collector 240, a detector & prompter 250, and a speaker recognition/identification module 260. Patient management application 210, data collector 240, detector & prompter 250, data storage device 230, and speaker recognition/identification module 260 each may be implemented using pure software codes, pure hardware components, or a combination of software and hardware. Each of the above components in PHS 110 may run in or in connection with a computing system that has at least one processor (not shown in the figure).
In one embodiment, patient management application 210 may be a software application running on a processor of a computing system. Among many functions it may perform, the patient management application may pull the patient profile from data storage device 230 after a patient is identified or recognized correctly. The patient management application may then receive measurement data from a peripheral or the center console through data collector 240 and store the data into the patient profile. In one embodiment, the measurement data from the peripheral or the center console may be stored along with the patient profile in the patient's medical record stored in data storage device 230. A patient may decide to have more than one measurement done. If this is the case, the patient management application may aggregate all of the new measurement data from the same patient together, forward the data to a medical facility, and/or may further perform some analysis on the data. For example, the patient management application may perform trending analysis on the data. If it is found that there is anything abnormal with the patient, the patient management system may send an alert to the patient's doctor and/or the patient himself/herself. Furthermore, patient management application 210 may control and/or coordinate among other components of PHS 110 such as data collector 240, detector & prompter 250, and speaker recognition/identification module 260.
Detector & prompter 250 may detect a patient who is trying to use PHS 110 through a peripheral or the center console. A patient may be detected when the patient press one key at a peripheral or the center console, or when the patient tries to use measurement device at a peripheral or the center console. Once a patient is detected, detector & prompter 250 may prompt the patient to speak a phrase/sentence. Patient management application 210 may then direct speaker recognition/identification module 260 to receive the patient speech, which processes it and uses it to perform patient recognition/identification. If the patient is correctly recognized/identified, detector & prompter may then inform the patient that s/he can now use the peripheral or the center console; otherwise, the patient may be re-prompted to either repeat the phrase/sentence or speak a new phrase/sentence. If speaker recognition/identification fails more than a certain number of times (e.g., 3 times), detector & prompter 250 may inform the patient that s/he cannot use the system right now and suggest him/her to contact a service representative.
After a patient is successfully recognized in the situation when the PHS is intended to be used by multiple users, detector & prompter 250, under the direction of patient management application 210, may further confirm with the patient via voice or some other means (e.g., screen display if available) if the patient is indeed the recognized one. For example, the director & prompter may ask the patient via voice, “You are Karen Smith, right?” If the answer is positive, the director & prompter may say, “Thank you, you may now use the device.” In one embodiment, the director & prompter or the patient management application may include a speech synthesis module to synthesize any prompt or response to a patient. In another embodiment, no speech synthesis module may be necessary and the detector & prompter or the patient management application may pre-record prompts and responses if the number of prompts and responses is limited.
Speaker recognition/identification module 260 may include several components (not shown in the figure) such as a pre-processor, a feature extractor, and a pattern recognizer. The pre-processor may receive speech signal from user interface 220 or peripheral 270, convert the signal to digital form, pre-emphasize the signal to compensate for transmission loss at certain frequency ranges. The feature extractor may segment the pre-processed speech signal into overlapped frames and to extract features from each frame. A number of types of features may be extracted, which may include energy, zero-crossing rate, formants, mel-frequency cepstral coefficient (MFCC), etc. Each frame is represented by a feature vector which may include a single type of feature (e.g., MFCCs) or a combination of a few speech features. After feature extraction, an input speech signal is represented by a sequence of a feature vectors.
The pattern recognizer in speaker recognition/identification module 260 may compare the feature vector sequence with one or more templates or models. For speaker identification, typically there is one template or model for an intended patient; and the pattern recognizer compares the feature vector sequence with the template or the model. If the feature vector sequence matches the template or the model, the user is identified as the intended patient; otherwise, the user may be asked to go through the identification process again. For speaker recognition, there may be multiple templates or models, each for one of multiple intended users. The pattern recognizer compares the feature vector sequence with each of the templates or models to find the best match for the vector sequence. In one embodiment, the user may be recognized as the patient corresponding to the best matched template or model. In another embodiment, the pattern recognizer may further determine whether match between the feature vector sequence and the best matched template or model is close enough. If the answer is positive, the user may be recognized as the patient corresponding to the best matched template or model; otherwise, the pattern recognizer may decide that the user cannot be recognized as any of the intended users (i.e., the user fails the recognition process) and the user may be asked to go through the recognition again. After the user fails the recognition/identification process for a number of times (e.g., 3 times), the user may be rejected by the system.
The pattern recognizer in speaker recognition/identification module 260 may choose one of several available technologies for comparing the feature vector sequence and template(s) or model(s). For example, the pattern recognizer may use hidden Markov model (HMM) based technology, based on which an HMM is trained using speeches collected from each intended patient and is used as this patient's model. A Viterbi approach is used to compute a likelihood score for the feature vector sequence to match each of the HMMs. An intended patient whose HMM produces the highest likelihood score may be considered as the candidate for the user. In one embodiment, the pattern recognizer may further determine if the highest likelihood score is below a pre-determined threshold. If it is, the pattern recognizer may decide that the user cannot be recognized as the candidate patient and the user may be asked to try again by submitting another piece of speech.
Peripheral 270 may include a voice input/output (I/O) device 275, which plays prompt or response from PHS 110 to the user and accepts a user's speech. Voice I/O device 275 may be simply a headset including a microphone and a loudspeaker. Peripheral 270 may use a wireless technology to connect to PHS 110. In such a case, all the connections between peripheral 270 and PHS components (e.g., speaker recognition/identification module 260, detector & prompter 250, and data collector 240) for data and control signal transmission may be through wireless channels. When peripheral 270 is a Bluetooth® device, the peripheral may need to be upgraded to support the Bluetooth® headset profile.
Once a patient is successfully recognized/identified, patient management application 210 may direct detector & prompter 250 to prompt the patient to proceed to conduct any medical measurement, and direct data collector 240 to collect any medical measurement data from a peripheral or the center console, and transmit such data to the patient management application. In one embodiment, data collector 240 may directly store the measurement data in the patient profile or the patient medical record in data storage device 230. Data collector 250 may include circuitry to perform simple processing for raw measurement data from a peripheral or the center console. For example, if the raw measurement data is analog data, the data collector may convert it into a digital form.
In the above description, it is assumed that peripheral 270 only connect a user's speech without any further processing. In another embodiment, peripheral 270 may have sufficient computing power to performing a certain amount of processing for received speech. For example, some or all of the pre-pre-processing and/or feature extraction work may be performed by peripheral 270. In other words, the workload of speaker recognition/identification may be distributed between peripheral 270 and PHS 110. In such a situation, instead of directly transmitting raw speech to PHS 110, peripheral 270 transmits intermediate results (e.g., pre-processed speech signal or extracted speech feature vector sequence) to PHS 110. If only speech features are transmitted from peripheral 270 to PHS 119, the bandwidth requirement for the transmission channel may be reduced. Similarly, the workload of speaker recognition/identification may also be distributed between user interface 220 and PHS 110.
If it is determined that the patient is not correctly recognized/identified at block 325, it may be further determined whether the number for failed recognition/identification has exceeded a predetermined number (e.g., 3 times) at block 330. If it is, the PHS may reject the patient and suggest the patient to seek health from a representative at block 340; otherwise, the patient may be re-prompt via voice or text to speak the same or a new phrase/sentence at block 340 to go through the speaker recognition/identification process again from block 315 through block 330.
A PHS using speaker recognition/identification technology as described above may be implemented in a computing system 400 as shown in
Additionally, chipset 430 may comprise a memory controller 425 that is coupled to a main memory 450 through a memory bus 455. The main memory 450 may store data and sequences of instructions that are executed by multiple cores of the processor 410 or any other device included in the system. The memory controller 425 may access the main memory 450 in response to memory transactions associated with multiple cores of the processor 410, and other devices in the computing system 400. In one embodiment, memory controller 425 may be located in processor 410 or some other circuitries. The main memory 450 may comprise various memory devices that provide addressable storage locations which the memory controller 425 may read data from and/or write data to. The main memory 450 may comprise one or more different types of memory devices such as Dynamic Random Access Memory (DRAM) devices, Synchronous DRAM (SDRAM) devices, Double Data Rate (DDR) SDRAM devices, or other memory devices.
Moreover, chipset 430 may include a disk controller 470 coupled to a hard disk drive (HDD) 490 (or other disk drives not shown in the figure) through a bus 495. The disk controller allows processor 410 to communicate with the HDD 490. In some embodiments, disk controller 470 may be integrated into a disk drive (e.g., HDD 490). There may be different types of buses coupling disk controller 470 and HDD 490, for example, the advanced technology attachment (ATA) bus and PCI Express (PCI-E) bus.
An OS (not shown in the figure) may run in processor 410 to control the operations of the computing system 400. The OS may facilitate a patient management application (such as 210 in
Although an example embodiment of the disclosed subject matter is described with reference to block and flow diagrams in
In the preceding description, various aspects of the disclosed subject matter have been described. For purposes of explanation, specific numbers, systems and configurations were set forth in order to provide a thorough understanding of the subject matter. However, it is apparent to one skilled in the art having the benefit of this disclosure that the subject matter may be practiced without the specific details. In other instances, well-known features, components, or modules were omitted, simplified, combined, or split in order not to obscure the disclosed subject matter.
Various embodiments of the disclosed subject matter may be implemented in hardware, firmware, software, or combination thereof, and may be described by reference to or in conjunction with program code, such as instructions, functions, procedures, data structures, logic, application programs, design representations or formats for simulation, emulation, and fabrication of a design, which when accessed by a machine results in the machine performing tasks, defining abstract data types or low-level hardware contexts, or producing a result.
For simulations, program code may represent hardware using a hardware description language or another functional description language which essentially provides a model of how designed hardware is expected to perform. Program code may be assembly or machine language, or data that may be compiled and/or interpreted. Furthermore, it is common in the art to speak of software, in one form or another as taking an action or causing a result. Such expressions are merely a shorthand way of stating execution of program code by a processing system which causes a processor to perform an action or produce a result.
Program code may be stored in, for example, volatile and/or non-volatile memory, such as storage devices and/or an associated machine readable or machine accessible medium including solid-state memory, hard-drives, floppy-disks, optical storage, tapes, flash memory, memory sticks, digital video disks, digital versatile discs (DVDs), etc., as well as more exotic mediums such as machine-accessible biological state preserving storage. A machine readable medium may include any mechanism for storing, transmitting, or receiving information in a form readable by a machine, and the medium may include a tangible medium through which electrical, optical, acoustical or other form of propagated signals or carrier wave encoding the program code may pass, such as antennas, optical fibers, communications interfaces, etc. Program code may be transmitted in the form of packets, serial data, parallel data, propagated signals, etc., and may be used in a compressed or encrypted format.
Program code may be implemented in programs executing on programmable machines such as mobile or stationary computers, personal digital assistants, set top boxes, cellular telephones and pagers, and other electronic devices, each including a processor, volatile and/or non-volatile memory readable by the processor, at least one input device and/or one or more output devices. Program code may be applied to the data entered using the input device to perform the described embodiments and to generate output information. The output information may be applied to one or more output devices. One of ordinary skill in the art may appreciate that embodiments of the disclosed subject matter can be practiced with various computer system configurations, including multiprocessor or multiple-core processor systems, minicomputers, mainframe computers, as well as pervasive or miniature computers or processors that may be embedded into virtually any device. Embodiments of the disclosed subject matter can also be practiced in distributed computing environments where tasks may be performed by remote processing devices that are linked through a communications network.
Although operations may be described as a sequential process, some of the operations may in fact be performed in parallel, concurrently, and/or in a distributed environment, and with program code stored locally and/or remotely for access by single or multi-processor machines. In addition, in some embodiments the order of operations may be rearranged without departing from the spirit of the disclosed subject matter. Program code may be used by or in conjunction with embedded controllers.
While the disclosed subject matter has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments of the subject matter, which are apparent to persons skilled in the art to which the disclosed subject matter pertains are deemed to lie within the scope of the disclosed subject matter.