Video conferencing has become a popular way to participate in meetings, seminars and other such activities. In a multi-party video conferencing session, users often see remote participants on their conference displays but have no idea who that participant is. Other times users have a vague idea of who someone is, but would like to know for certain, or may know the names of some people, but not know which name goes with which person. Sometimes users want to know not only a person's name, but other information, such as what company that person works for, and so forth. This is even more problematic in a one-to-many video conference where there may be relatively large numbers of people who do not know each other.
At present, there is no way for users to obtain such information, other than by chance or by multiple (often time consuming) introductions where people verbally introduce themselves (including remotely over video), or if a person has a name tag, name plate or the like that the user is able to see. It is desirable for users to have information about others video conferencing sessions, including without having to have verbal introductions and the like.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which an entity such as a person or object is recognized, with associated metadata used to identify that entity when it appears in a video session. For example, when a video session shows a person's face or an object, that face or object may be labeled (e.g., via text overlay) with a name and/or other related information.
In one aspect, an image of a face that is shown within a video session is captured. Facial recognition is performed to obtain metadata associated with the recognized face, The metadata is then used to labeling the video session, such as to identify a person corresponding to the recognized face when the recognized face is being shown during the video session. The facial recognition matching process may be narrowed by other, known narrowing information, such as calendar information that indicates who the invitees are to a meeting that is being shown in the video session
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards automatically inserting metadata (e.g., overlaid text) into a live or prerecorded/played back video conferencing session based on a person or object currently on the display screen. In general, this is accomplished by automatically identifying the person or object, and then using that identification to retrieve relevant information, such as the person's name and/or other data.
It should be understood that any of the examples herein are non-limiting. Indeed, the use of facial recognition is described herein as one type of identification mechanism for persons, however other sensors, mechanisms and/or ways that work to identify people, as well as to identify other entities such as inanimate objects, are equivalent. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing, data retrieval, and/or video labeling in general.
The narrowing module 108 receives data from the sensor or sensors 106 and provides it to a recognition mechanism 110; (note that in an alternative implementation, one or more of the sensors may more directly provide their data to the recognition mechanism 110). In general, the recognition mechanism 110 queries a data store 112 to identify the entity 104 based on the sensor-provided data. Note that as described below, the query may be formulated to narrow the search based upon narrowing information received from the narrowing module 108.
Assuming that a match is found, the recognition mechanism 110 outputs a recognition result, e.g., the metadata 102 for the sensed entity 104. This metadata may be in any suitable form, e.g., an identifier (ID) useful for further lookup, and/or a set of results already looked up, such as in the form of text, graphics, video, audio, animation, or the like.
A video source 114 such as a video camera (which also may be a sensor as indicated by the dashed block/line) or a video playback mechanism, provides a video output 116, e.g., a video stream, When the entity 104 is shown, the metadata 102 is used (directly or to access other data) by a labeling mechanism 118 to associate corresponding information with the video feed. In the example of
Another example output is to have a display or the like viewable to occupants of a meeting or conference room, possibly accompanying a video screen. When a speaker stands behind a podium, or when one person of a panel of speakers is talking, the person's name may appear on the display. A questioner in the audience may similarly be identified and have his or her information output in this way.
For facial recognition, the search of the data store 112 may be time consuming, whereby narrowing the search based upon other information r may be more efficient. To that end, the narrowing module 108 also may receive additional information related to the entity from any suitable information provider 122 (or providers). For example, a video camera may be set up in a meeting room, and calendar information that establishes who are the invitees to the meeting room at that time may be used to help narrow the search. Conference participants typically register for the conference, and thus a list of those participants may be provided as additional information for narrowing the search. Other ways of obtaining narrowing information may include making predictions based on organization information, learning meeting attendance patterns based upon past meetings (which people typically go to meetings together) and so forth. The narrowing module 108 can convert such information to a form useable by the recognition mechanism 110 in formulating a query or the like to narrow the search candidates.
Instead of or in addition to facial recognition, various other types of sensors are feasible for use in identification and/or narrowing. For example, a microphone can be coupled to voice recognition technology that can match a speaker's voice to a name; a person can speak theft name as a camera captures their image, with the name recognized as text. Badges and/or nametags may be read to directly identify someone, such as via text recognition, or by being outfitted with visible barcodes, or RFID technology or the like. Sensing may also be used for narrowing a facial or voice recognition search; e.g., many types of badges are already sensed upon entry to a building, and/or RFID technology can be used determine who has entered a meeting or conference room. A cellular telephone or other device may broadcast a person's identity, e.g., via Bluetooth® technology.
Moreover, the data store 112 may be populated by a data provider 124 with data that is less than all available data that can be searched. For example, a corporate employee database may maintain pictures of its employees as used with their ID badges. Visitors to a corporate site may be required to have their photograph taken along with providing their name in order to be allowed entry. A data store of only employees and current visitors may be built and searched first. For a larger enterprise, an employee that enters a particular building may do so via their badge, and thus the currently present employees within a building are generally known via a badge reader, whereby a per-building data store may be searched first.
In the event a suitable match (e.g., to a sufficient probability level) is not found while searching, the search may be expanded. Using one of the examples above, if one employee enters a building with another and does not use his or her badge for entry, then a search of the building's known occupants will not find a suitable match. In such a situation, the search may be expanded to the entire employee database, and so on (e.g., previous visitors). Note that ultimately the result may be “person not recognized” or the like. Bad input may also cause problems, e.g., poor lighting, poor viewing angle, and so forth.
An object may be similarly recognized for labeling. For example, a user may hold up a device or show a picture, such as of a digital camera. A suitable data store may be searched with an image to find the exact brand name, model, suggested retail price, and so on, which may then be used to label the user's view of the image.
When a video capture source 226 obtains a facial image 228, the image is provided to the face recognition mechanism 230, which calls the web service (or any other mechanism that provides metadata for a given face or entity) requesting a label (or other metadata) be returned with the face. The web service responds with the label, which is then passed to a face labeling mechanism 232, such as one that overlays text on the image, thereby providing a labeled image 234 of the face. The face recognition mechanism 230 can store facial/labeling information in a local cache 236 for efficiency in labeling the face the next time that the face appears.
The facial recognition thus may be performed at a remote service, by sending the image of the person's face, possibly along with any narrowing information that is known, to the service. The service may then perform the appropriate query formulation and/or matching. However, some or all of the recognition may be performed locally. For example, the user's local computer may extract a set of features representative of a face, and user or send those features to search a remote database of such features. Still further, the service may be receiving the video feed; if so, a frame number and location within the frame where the face appears may be sent to the service whereby the service can extract the image for processing.
Moreover, as described above, the metadata need not include a label, but rather may be an identifier or the like from which a label and/or other information may be looked up. For example, an identifier may be used to determine a person's name identity, biographical information such as the person's company, links to that person's website, publications, and so forth, his or her telephone number, email address, place within an organizational chart, and the like.
Such additional information may be dependent on user interaction with the user interface 220. For example, the user may at first see only a label, but be able to expand and collapse additional information with respect to that label. A user may be able to otherwise interact with a label (e.g., click on it) to obtain more viewing options.
Steps 306 and 308 represent the use of narrowing information when available. As described above, any narrowing information may be used to make the search more efficient, at least initially. The above example of calendar information used to provide a list of meeting attendees, or a registration list of conference participants, can make a search far more efficient.
Step 310 represents formulating a query to match a face to a person's identity. As described above, the query may include a list of faces to search. Note that step 310 also represents searching a local cache or the like when available.
Step 312 represents receiving the results of the search. In the example of
If no match is found, step 316 represents evaluating whether the search scope may be expanded for another search attempt. By way of example, consider a meeting in which someone who was not invited decides to attend. Narrowing the search via calendar information will result in not finding a match for that uninvited person. In such an event, the search scope may be expanded (step 320) in some way, such as to look for people in the company who are hierarchically above or below the attendees, e.g., the people they report to or who report to them. Note that the query may need to be reformulated to expand the search scope, and/or a different data store may be searched. If still no match is found at step 314, the search expansion may continue to the entire employee database or visitor database if needed, and so on. If no match is found, step 318 can return something that indicates this non-recognized state.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer 410 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 410 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 410. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above may also be included within the scope of computer-readable media.
The system memory 430 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 431 and random access memory (RAM) 432. A basic input/output system 433 (BIOS), containing the basic routines that help to transfer information between elements within computer 410, such as during start-up, is typically stored in ROM 431. RAM 432 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 420. By way of example, and not limitation,
The computer 410 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, described above and illustrated in
The computer 410 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 480. The remote computer 480 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 410, although only a memory storage device 481 has been illustrated in
When used in a LAN networking environment, the computer 410 is connected to the LAN 471 through a network interface or adapter 470. When used in a WAN networking environment, the computer 410 typically includes a modem 472 or other means for establishing communications over the WAN 473, such as the Internet. The modem 472, which may be internal or external, may be connected to the system bus 421 via the user input interface 460 or other appropriate mechanism. A wireless networking component such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 410, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
An auxiliary subsystem 499 (e.g., for auxiliary display of content) may be connected via the user interface 460 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 499 may be connected to the modern 472 and/or network interface 470 to allow communication between these systems while the main processing unit 420 is in a low power state.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions. and equivalents falling within the spirit and scope of the invention.