The present invention relates generally to video telephony devices and more particularly to a video telephony device that provides to registered users who are visually identified by the device customized access to a user specific database that may include a personal phonebook, personal graphical user interface (GUI), alerts, screensavers, and the like
Many conventional telephones have an electronic phone book capability, which stores names, telephone numbers, and other personal information so that they can be accessed as needed. This phone-book capability can allow a user to make a telephone call without searching another medium, such as a printed phone directory or an address book. Additionally, when a call arrives, it is possible to compare the caller number informed by the caller with the data registered in the phone book and display the corresponding name so that the user can know who the caller is before answering the call.
Video telephones are available that are capable of handling image information. Such video telephones transmit and receive voice and image data simultaneously so that the calling and called parties can talk to each other while viewing the images sent from the opposite parties. The video telephone also may be able to record an image received while talking or record an image that is taken by a camera incorporated into the video telephone. Recorded image data associated with a caller may in some cases also be stored in the electronic phone book. When a user accesses the phone book to place a call, this capability can permit the user to conduct a search while viewing image information. On the other hand, when a call is received, this capability can allow the image information to be displayed together with name information, helping the user immediately identify the caller.
Conventional telephones, with or without video capability, are generally used by a number of different individuals. For example, telephones located in a residence are usually accessed by various family members. The electronic phone book associated with such telephones, however, is generally a single common phone book for the entire household. Because of the complexity of presenting user-specific information, telephones typically do not provide personal phonebooks or other information and preferences for each and every user.
Conventional video telephones or other telephony devices offer a user-facing camera that has not been previously employed to provide simplified access to and navigation through the telephone. Specifically, the sensor or camera of a video telephony device is not used to detect and recognize a user by taking an image of the user and the user's facial features as he or she approaches the device.
In the system and method described herein, image identification software such as facial feature software is used to compare features of the user to images stored in a database or lookup table of the video telephony device. If the software makes a match of the face with one stored in the database, the user will be presented with his or her personal information such as a personal phonebook. In addition, the device also may be automatically adjusted in accordance with any personal preferences of the identified user such as a personally configured user interface. In this way, for instance, the user will not be required to navigate through a complicated menu of choices before retrieving his or her information and preferences. Depending on the features and functionality offered by the video telephony device, the personal information and/or preferences may include such things as a personal phonebook, a personally configured graphical user interface (GUI), alerts, screensavers, and the like. Other personal information that may be made available includes call logs, buddy lists, journals, blogs, and web sites.
In addition to presenting the user with personal information and preferences, other menus may be offered to the user that allow customization of various features and settings. For example, such menus may include a restriction menu, a settings menu and a control and action menu. The restriction menu allows the user to impose on other users restrictions on usage of the video telephony device. For example, a parent may wish to provide control restrictions on children who also may be users of the video telephony device. For instance, a parent may not want a child to place calls after 6 pm on weeknights (except for emergency numbers such as 911). Also, a resident may wish to prevent guests (unregistered users) from placing long distance calls. The settings menu allows the user to customize various characteristics of the video telephony device that impact the individual user's interaction with the device, such as the volume and screen brightness, for example. Each user can establish and customize his or her own settings. The control and action menu allows each user to enter new data (e.g., a new phonebook entry) or edit old entries. In some cases the control and action menu may be a part of the device's operating system.
In general, personal information that is made available may be information that is directly associated with the user such as phonebook entries. In addition, as previously mentioned, the information may be restrictions or the like that are imposed on the user and thus are indirectly associated with the user. For instance, if a child's (the user) usage is restricted by entries in the restrictions menu that has been established by a parent (another user), then the restriction menu is information that is indirectly associated with the user.
At the outset, it should be noted that the features and functionality discussed herein may be embodied in a video telephony device that can transmit and receive information over any of a variety of different external communication media supporting any type of service, including voice over broadband (VoBB) and legacy services. VoBB is defined herein to include voice over cable modem (VoCM), voice over DSL (VoDSL), voice over Internet protocol (VoIP), fixed wireless access (FWA), fiber to the home (FTTH), and voice over ATM (VoATM). Legacy services include the integrated service digital network (ISDN), plain old telephone service (POTS), cellular and 3G. Accordingly, the external communication medium may be a wireless network, a convention telephone network, a data network (e.g., the Internet), a cable modem system, a cellular network and the like.
Various industry standards have been evolving for video telephony services such as those promulgated by the International Telecommunications Union (ITU). The standards and protocols that are employed will depend on the external communication medium that is used to communicate the voice and audio information. For example, if the video telephony device employs a POTS service, protocols may be employed such as the CCITT H.261 specification for video compression and decompression and encoding and decoding, the CCITT H.221 specification for full duplex synchronized audio and motion video communication framing, the CCITT H.242 specification for call setup and disconnect. On the other hand, video telephony devices operating over the Internet can use protocols embodied in video conference standards such as H.323 as well as H.263 and H.264 for video encoding and G.723.1, G.711 and G.729 for audio encoding. Of course, any other appropriate standards and protocols may be employed. For example, IETF standards such as SIP, RTP/RTCP protocols may be employed.
Of these components, the main controller 10, the personalized user interface database 11, the image memory 32, the video codec 12, the LCD interface 13, the camera interface 16, the multiplexing and separating section 17, the communications interface 18, the voice codec 20, and the manual entry control circuit portion 26 are connected together via a main bus 27.
The multiplexing and separating section 17, which manages the incoming and outgoing video and audio data to and from the external communications network, is connected with the video codec 12, the communications system interface 18, and the voice codec 20 via sync buses 28, 29, and 30, respectively. The main controller 10 includes a CPU, a ROM, a RAM, and so on. The operations of the various portions of the video telephony device are under control of the main controller 10. The main controller 10 performs various functions in software according to data stored in the ROM, RAM, personalized user information database 11, image memory 32 and face template memory 34.
The personalized user information database 11 is used to store a database of information for each registered user. Each database is composed of plural records. Each record may comprise, for instance, a personal phonebook (including, e.g., a phone book memory number, a phone number, a name, various addresses and any other appropriate information such as typically found in a contact list), a personally configured graphical user interface (GUI) for display on display unit 14, and/or alerts, screensavers, call logs, buddy lists, journals, blogs, and web sites or other preferences. When retrieved, the personal phonebook may be presented to the user on the display unit 14.
The user folders 521-525 may or may not include all the same record fields. For instance, it generally will not be necessary for the mom and dad folders to include the restrictions record. Alternatively, the restrictions record may be present in the mom and dad folders, but they may simply remain unpopulated. On the other hand, the public folder 524 will not need the image data record 54, and thus, as shown in
The video codec 12 decodes and reproduces encoded video data, and sends the reproduced video data to the display interface 13. Furthermore, the video codec 12 encodes video data supplied from the camera portion 15 via the camera interface 16 and creates video data encoded in accordance with e.g., MPEG-4.
The display interface 13 converts the video data supplied from the video codec 12 into a signal form that can be processed by the display 14, and sends the converted data to the display 14. The display 14 may be, for example, a color or monochrome liquid crystal display having sufficient video displaying capabilities (such as resolution) to display video with MPEG-4, and displays a picture according to video data supplied from the display interface 13.
For example, a CCD or CMOS camera may be used as the camera 15, which picks up an image of an object, creates video data, and sends it to the camera interface 16. The camera interface 16 receives the video data from the camera 15, converts the data into a form that can be processed by the video codec 12, and supplies the data to the codec 12.
The multiplexing and separating portion 17 is responsible for managing the incoming and outgoing video and audio data to and from the external communications network via communications system interface 18. Specifically, multiplexing and separating portion multiplexes encoded video data supplied from the video codec 12 via the sync bus 28, the encoded audio data supplied from the voice codec 20 via the sync bus 30, and other data supplied from the main controller 10 via the main bus by a given method (e.g., H.221). The multiplexing and demultiplexing portion 17 supplies the multiplexed data as transmitted data to the external communications interface 18 via the sync bus 29.
The multiplexing and demultiplexing portion 17 demultiplexes encoded video data, encoded audio data, and other data from the transmitted data supplied from the communications interface 18 via the sync bus 29. The multiplexing and demultiplexing portion 17 supplies the demultiplexed data to the video codec 12, the voice codec 20, and the main controller 10, respectively, via the sync buses 28, 30, and the main bus 27.
The external communications interface 18 is used to make a connection to the external communications network, which, as previously mentioned, may be any suitable network such as, but not limited to, a wireless network, a conventional telephone network, a data network (e.g., the Internet), and a cable modem system. The interface 18 makes various calls for communications via the communications network and sends and receives voice and video data via communications paths established in the network.
The voice codec 20 digitizes analog audio signal applied via the microphone 21 and the microphone interface. The codec 20 encodes the signal by a given audio encoding method such as ADPCM to create encoded audio data, and sends the encoded audio data to the multiplexing and demultiplexing portion 17 via the sync bus 30.
The voice codec 20 decodes the encoded audio data supplied from the multiplexing and demultiplexing portion 17 into an analog audio signal, which is supplied to the speaker interface 23.
The microphone 21 converts sound from the surroundings into an audio signal and supplies it to the microphone interface 22, which in turn converts the audio signal supplied from the microphone 21 into a signal form that can be processed by the voice codec 20 and supplies it to the voice codec 20.
The speaker interface 23 converts the audio signal supplied from the voice codec 20 into a signal form capable of being processed by the speaker 24, and supplies the converted signal to the speaker 24. The speaker 24 converts the audio signal supplied from the speaker interface 23 into an audible signal at an increased level.
The manual control portion 25 receives various instructions of the user to be applied to the main controller 10. The manual control portion 25 has control buttons for specifying various functions, push buttons for entering phone numbers and various numerical values, and a power switch for turning on and off the operation of the present terminal. The manual entry control circuit portion 26 recognizes the contents of an instruction entered from the manual control portion 25 and informs the main controller 10 of the contents of the instruction.
Image memory 32 stores (at least on a temporary basis) one or more facial images of each individual who will be using the video telephony device 100. Prior to use, a registration process will be performed in which these individuals will have their images captured by camera 15 and stored in image memory 32. The images will be associated with the names of each individual, which may be entered manually via the manual control portion 25. The stored images of each individual are converted to a facial representation or template. The representation or template may correspond to an image or simply a set of points and vectors between them identifying selected features of the face. Alternatively, the representation may be a single parameter corresponding to something as simple as eye color or the distance between the individual's eyes. These representations or templates are stored in face templates memory 34. Once the representations or templates have been obtained, the images stored in image memory may be deleted. If desired, image memory 32 and face templates memory 34 may be implemented as part of the memory 120 incorporated in main controller 10. This memory may also store an image recognition software program, discussed below.
In steps 310-315 of the flowchart of
If a face is detected in step 310 (e.g., if individual approaches the video telephony device and thus enters the field of view of camera 15), decision step 315 succeeds and constructs a face template of the detected face. Thus, in step 325 the system extracts the detected face from the video signal provided by camera 15. The system proceeds to step 330 where it converts the facial image into a facial representation or template that is temporarily stored in memory.
At this point, the system attempts to match the acquired facial representation against the facial representations of the N individuals stored in face template memory 34. As shown in
Continuing with
In one alternative, instead of performing the continuous loop established by steps 310 and 315 in which the video telephony device repeatedly searches for a face, the camera may be used as a proximity detector to determine when a face has come within some predetermined distance (e.g., 2 feet) of the telephony device. In this case the telephony device may remain in a sleep mode until the triggering event (e.g., detection of a face) occurs, at which an point an interrupt is sent to the controller 10 (or a software event generated) requesting it to begin the registration process. In the sleep mode the video telephony device may power down or place in a standby mode a variety of different components including, for example, the display 14, the camera portion 15, and the main controller 10. In some cases the video telephony device may incorporate a dedicated sensor that serves as the proximity detector instead of the camera. For example, a heat sensor, motion detector or like may be used as a proximity detector to determine when a triggering event (e.g., detection of motion, detection of body temperature) has occurred that is indicative of the presence of an individual who is ready to use the telephony device.
Returning to step 500, if instead of registering a new user, the user specifies that he or she is an existing user, the process continues with step 550 instead of step 510. In step 550 the user confirms that he or she wants to revise the preferences and information stored in the user database 11. Preliminarily, in step 560, the user is asked if he or she has had a change in facial features that may be sufficient to prevent recognition as an existing user. That is, a query could be presented to the user along the lines of “Do you want to re-initialize the phone so that it will recognize your current appearance or look?” For example, the user may have recently grown or shaved a beard or began wearing glasses, which could interfere with the recognition process. In this case the user is requested in step 580 to enter a PIN or other personal identifier that may be used as an alternative form of identification and which has been previously stored in user database 11. Once the user has been so recognized, a new image of the user is obtained in step 585, from which is extracted a new facial representation or template that is stored in face template memory 34 (either replacing or supplementing the currently stored facial representation or template), after which the process continues with step 590. On the other hand, if in response to the query of step 560 the user indicates in step 570 that there has been no change in facial features, the process proceeds to step 565 in which a facial representation is acquired for comparison to the stored representations of registered users, after which the process proceeds to step 590. In step 590, the user is presented with the opportunity to edit and revise his or her various records stored in user database 11.
Although various embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and are within the purview of the appended claims without departing from the spirit and intended scope of the invention. For example, while the above systems and methods have been described in terms of a video telephony device that resides in or on a fixed location such as desk, the systems and methods could also be used in a cellphone or other mobile phone environment.