This application claims priority to Korean Patent Application No. ______ filed on ______, the contents of which are herein incorporated by reference in its entirety.
1. Technical Field
Embodiments of the present invention are directed to an electronic device and a method of operating the electronic device, and more specifically to an electronic device that may be used for a videoconference and a method of controlling the electronic device.
2. Discussion of the Related Art
Tele-presence refers to a set of technologies which allow a person to feel as if they were present. Tele-presence technologies reproduce information on five senses a person feels in a specific space at a remote location. Element technologies for tele-presence may include video, audio, tactile, and network transmission technologies. Such tele-presence technologies are adopted for video conference systems. Tele-presence-based video conference systems provide higher-quality communications and allow users to further concentrate on the conversation compared to conventional video conference systems.
The tele-presence technologies for video conference systems, although showing a little difference for each and every manufacturer, may be applicable to video, audio, and network transmission technologies as follows:
For video technologies, the tele-presence technologies apply as generating natural eye-contact images for being able to make a user further feel like he would face another user and generating high-resolution images. For audio technologies, the tele-presence technologies apply as audio playback technologies that may create a feeling of a space based on a speaker's location. For network transmission technologies, the tele-presence technologies apply as real-time image/sound transmission technologies based on an MCU (Multi Control Unit).
In contrast to video, audio, and network transmission for video conference systems which have been actively researched, data sharing between attendants in a conference is still not satisfactory. Current video conference systems use a separate monitor for data sharing. Accordingly, when a user shifts his eyes from an image screen to a data screen, the eye contact is not maintained lowering a feeling as if actually facing another user. Moreover, a short drop in conversation occurs at every data manipulation because the data manipulation is conducted by a peripheral device, such as a mouse.
Embodiments of the present invention provide an electronic device and a method of operating the electronic device, which may allow for a vivid video conference.
According to an embodiment of the present invention, there is provided an electronic device including a memory, a communication unit configured to receive a video image streamed from a first electronic device, a display unit configured to display the video image, and a control unit configured to identify a specific area included in the video image, to store a first image displayed on the specific area at a first time point in the memory, and to store in the memory a second image displayed on the specific area at a second time point when a variation of an image is equal to or more than a predetermined threshold.
The control unit is configured to store the first image so that the first image corresponds to the first time point and to store the second image so that the second image corresponds to the second time point.
The first and second images are still images.
The control unit is configured to determine the variation of the image displayed on the specific area based on a variation between the image displayed on the specific area and the first image stored in the memory.
The control unit is configured to analyze the video image to identify the specific area.
The control unit is configured to receive information on the specific area from the first electronic device through the communication unit and to identify the specific area based on the received information.
According to an embodiment of the present invention, there is provided to an electronic device including a memory, a communication unit configured to receive a video image streamed from a first electronic device, a display unit configured to display the video image, and a control unit configured to identify a specific area included in the video image, to store in the memory a still image reflecting a content displayed on the specific area whenever the content changes, so that the still image corresponds to a time point when the content changes, to determine a time point corresponding to a predetermined request when the request is received, and to call a still image corresponding to the determined time point from the memory.
The control unit is configured to display both the streamed video image and the called still image on the display unit.
The control unit is configured to display the still image on a second area of the video image, the second area not overlapping the specific area.
The controller is configured to replace an image displayed on the specific area of the streamed video image by the still image and to display the replaced still image on the display unit.
According to an embodiment of the present invention, there is provided an electronic device including a memory, a communication unit configured to receive at least one multimedia data clip from at least one second electronic device, a display unit configured to display the at least one multimedia data clip, and a control unit configured to identify a first speaker corresponding to audio data included in the at least one multimedia data clip, to obtain information corresponding to the identified first speaker, and to store the obtained information so that the obtained information corresponds to a first time point for the at least one multimedia data clip.
The first time point is when the first speaker begins to speak.
The control unit is configured to analyze audio data included in the at least one multimedia data clip and to determine that the first time point is when a human voice included in the audio data is sensed.
The control unit is configured to analyze video data included in the at least one multimedia data clip to identify the first speaker.
The control unit is configured to identify the first speaker based on a lip motion included in the video data.
Information relating to the first speaker includes at least one of personal information on the first speaker, information on a place where the first speaker is positioned, and a keyword which the first speaker speaks.
According to an embodiment of the present invention, there is provided an electronic device including a communication unit configured to receive at least one multimedia data clip streamed from at least one second electronic device, a memory configured to store the at least one multimedia data clip, a display unit configured to display video data included in the at least one multimedia data clip, and a control unit configured to, whenever a speaker corresponding to audio data included in the at least one multimedia data clip changes, store information corresponding to the speaker so that the information corresponds to a time point when the speaker changes, to determine a time point corresponding to a predetermined input when the predetermined input is received, and to call at least part of a multimedia data clip corresponding to the determined time point from the memory.
The control unit is configured to display both the video data included in the streamed at least one multimedia data clip and video data included in the called at least part of the multimedia data clip.
The electronic device further includes a sound output unit, wherein the control unit is configured to output through the sound output unit at least one of audio data included in the streamed at least one multimedia data clip and audio data included in the called at least part of the multimedia data clip.
The control unit is configured to display both the video data included in the streamed at least one multimedia data clip and text data corresponding to audio data included in the called at least part of the multimedia data clip.
The embodiments of the present invention may provide the following effects.
First, the second user who attends the video conference at the second place may store the image for the presentation material without separately receiving data for the presentation material provided for performing the conference by the first user who conducts the video conference at the first place. The image for the presentation material may be separately extracted from the video image provided through the video conference and stored through the electronic device used by the second user without any annoying process such as previously receiving separate electronic data (for example, electronic files) for the presentation material used for the video conference by the first user.
Second, in the case that the conference is performed with materials difficult to convert into data (for example, samples used for introducing a prototype model) or the first user has not converted presentation material used for the conference into electronic data in advance, according to an embodiment, the presentation material may be used for the conference while converted into image data at the same time, so that the second user may see again the presentation material used for the video conference.
Third, the second user may review the previous pages of the presentation material used for the video conference hosted by the first user while the video conference is in progress, thereby enabling more efficient video conference.
Fourth, the electronic device may continue to monitor the speakers during the course of the video conference and may store various types of information on the speakers so that the information corresponds to the time points when the speakers begin to speak, thereby generating metadata for video conference. Further, the video conference metadata is used in various manners, thus enhancing user convenience. For example, the video conference metadata may be used to make brief proceedings for the video conference, which are to be provided to the attendees of the conference or may be used to provide a search function which allows the attendees to review the conference.
Fifth, by identifying the speaker and outputting the multimedia data clip corresponding to the speaker in a different manner than those for the other multimedia data clips, more attention can be oriented toward the user who is making a speech in the video conference, thereby enabling the video conference to proceed more efficiently.
Finally, after or while the video conference ends, specific time points of the multimedia data for the video conference may be searched to review the video conference, and the multimedia data corresponding to the searched time points may be output.
The embodiments of the present invention will become readily apparent by reference to the following detailed description when considered in conjunction with the accompanying drawings wherein:
The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. The invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein; rather, there embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
Hereinafter, a mobile terminal relating to the present invention will be described below in more detail with reference to the accompanying drawings. In the following description, suffixes “module” and “unit” are given to components of the mobile terminal in consideration of only facilitation of description and do not have meanings or functions discriminated from each other.
Referring to
The communication unit 110 may include one or more modules that enable communication between the electronic device 100 and a communication system or between the electronic device 100 and another device. For instance, the communication unit 110 may include a broadcast receiving unit 111, an Internet module 113, and a near-field communication module 114.
The broadcast receiving unit 111 receives broadcast signals and/or broadcast-related information from an external broadcast managing server through a broadcast channel.
The broadcast channel may include a satellite channel and a terrestrial channel The broadcast managing server may refer to a server that generates broadcast signals and/or broadcast-related information and broadcasts the signals and/or information or a server that receives pre-generated broadcast signals and/or broadcast-related information and broadcasts the signals and/or information to a terminal. The broadcast signals may include TV broadcast signals, radio broadcast signals, data broadcast signals as well as combinations of TV broadcast signals or radio broadcast signals and data broadcast signals.
The broadcast-related information may refer to information relating to broadcast channels, broadcast programs, or broadcast service providers. The broadcast-related information may be provided through a communication network.
The broadcast-related information may exist in various forms, such as, for example, EPGs (Electronic Program Guides) of DMB (Digital Multimedia Broadcasting) or ESGs (Electronic Service Guides) of DVB-H (Digital Video Broadcast-Handheld).
The broadcast receiving unit 111 may receive broadcast signals using various broadcast systems. Broadcast signals and/or broadcast-related information received through the broadcast receiving unit 111 may be stored in the memory 160.
The Internet module 113 may refer to a module for access to the Internet. The Internet module 113 may be provided inside or outside the electronic device 100.
The near-field communication module 114 refers to a module for near-field communication. Near-field communication technologies may include Bluetooth, RFID (Radio Frequency Identification), IrDA (Infrared Data Association), UWB (Ultra Wideband), and ZigBee technologies.
The user input unit 120 is provided for a user's entry of audio or video signals and may include a camera 121 and a microphone 122.
The camera 121 processes image frames including still images or videos as obtained by an image sensor in a video call mode or image capturing mode. The processed image frames may be displayed by the display unit 151. The camera 121 may perform 2D or 3D image capturing or may be configured as one or a combination of 2D and 3D cameras.
The image frames processed by the camera 121 may be stored in the memory 160 or may be transmitted to an outside device through the communication unit 110. According to an embodiment, two or more cameras 121 may be included in the electronic device 100.
The microphone 122 receives external sound signals in a call mode, recording mode, or voice recognition mode and processes the received signals as electrical voice data. The microphone 122 may perform various noise cancelling algorithms to remove noises created when receiving the external sound signals. A user may input various voice commands through the microphone 122 to the electronic device 100 to drive the electronic device 100 and to perform functions of the electronic device 100.
The output unit 150 may include a display unit 151 and a sound output unit 152.
The display unit 151 displays information processed by the electronic device 100. For example, the display unit 151 displays a UI (User Interface) or GUI (Graphic User Interface) associated with the electronic device 100. The display unit 151 may be at least one of a liquid crystal display, a thin film transistor liquid crystal display, an organic light emitting diode display, a flexible display, and a 3D display. The display unit 151 may be configured in a transparent or light transmissive type, which may be called a “transparent display” examples of which include transparent LCDs. The display unit 151 may have a light-transmissive rear structure in which a user may view an object positioned behind the terminal body through an area occupied by the display unit 151 in the terminal body.
According to an embodiment, two or more display units 151 may be included in the electronic device 100. For instance, the electronic device 100 may include a plurality of display units 151 that are integrally or separately arranged on a surface of the electronic device 100 or on respective different surfaces of the electronic device 100.
When the display unit 151 and a sensor sensing a touch (hereinafter, referred to as a “touch sensor”) are layered (this layered structure is hereinafter referred to as a “touch sensor”), the display unit 151 may be used as an input device as well as an output device. The touch sensor may include, for example, a touch film, a touch sheet, or a touch pad.
The touch sensor may be configured to convert a change in pressure or capacitance, which occurs at a certain area of the display unit 151, into an electrical input signal. The touch sensor may be configured to detect the pressure exerted during a touch as well as the position or area of the touch.
Upon touch on the touch sensor, a corresponding signal is transferred to a touch controller. The touch controller processes the signal to generate corresponding data and transmits the data to the control unit 180. By doing so, the control unit 180 may recognize the area of the display unit 151 where the touch occurred.
The sound output unit 152 may output audio data received from the communication unit 110 or stored in the memory 160. The sound output unit 152 may output sound signals associated with functions (e.g., call signal receipt sound, message receipt sound, etc.) performed by the electronic device 100. The sound output unit 152 may include a receiver, a speaker, and a buzzer.
The memory 160 may store a program for operation of the control unit 180, and may preliminarily store input/output data (for instance, phone books, messages, still images, videos, etc.). The memory 160 may store data relating to vibrations and sounds having various patterns, which are output when the touch screen is touched.
The memory 160 may include at least one storage medium of flash memory types, hard disk types, multimedia card micro types, card type memories (e.g., SD or XD memories), RAMs (Random Access Memories), SRAM (Static Random Access Memories), ROMs (Read-Only Memories), EEPROMs (Electrically Erasable Programmable Read-Only Memories), PROM (Programmable Read-Only Memories), magnetic memories, magnetic discs, and optical discs. The electronic device 100 may operate in association with a web storage performing a storage function of the memory 160 over the Internet.
The interface unit 170 functions as a path between the electronic device 100 and any external device connected to the electronic device 100. The interface unit 170 receives data or power from an external device and transfers the data or power to each component of the electronic device 100 or enables data to be transferred from the electronic device 100 to the external device. For instance, the interface unit 170 may include a wired/wireless headset port, an external recharger port, a wired/wireless data port, a memory card port, a port connecting a device having an identification module, an audio I/O (Input/Output) port, a video I/O port, and an earphone port.
The control unit 180 controls the overall operation of the electronic device 100. For example, the control unit 180 performs control and processes associated with voice call, data communication, and video call. The control unit 180 may include an image processing unit 182 for image process. The image processing unit 182 is described below in relevant parts in greater detail.
The power supply unit 190 receives internal or external power under control of the control unit 180 and supplies the power to each component for operation of the component.
The embodiments described herein may be implemented in software or hardware or in a combination thereof, or in a recording medium readable by a computer or a similar device to the computer. When implemented in hardware, the embodiments may use at least one of ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, processors, controllers, micro-controllers, microprocessors, and electrical units for performing functions. According to an embodiment, the embodiments may be implemented by the control unit 180.
When implemented in software, some embodiments, such as procedures or functions, may entail a separate software module for enabling at least one function or operation. Software codes may be implemented by a software application written in proper programming language. The software codes may be stored in the memory 160 and may be executed by the control unit 180.
Referring to
The electronic device 100 may be any electronic device having the display unit 151 that can display images. The electronic device 100 may be a stationary terminal, such as a TV shown in
The camera 121 may be an optical electronic device that performs image capturing in a front direction of the electronic device 100. The camera 121 may be a 2D camera for 2D image capturing and/or a 3D camera for 3D image capturing. Although in
The control unit 180 may trace a user U having a control right when discovering the user U. The issue and trace of the control right may be performed based on an image captured by the camera 121. For example, the control unit 180 may analyze a captured image and continuously determine whether there a specific user U exists, whether the specific user U performs a gesture necessary for obtaining the control right, and whether the specific user U moves or not.
The control unit 180 may analyze a gesture of a user having the control right based on a captured image. For example, when the user U makes a predetermined gesture but does not own the control right, no function may be conducted. However, when the user U has the control right, a predetermined function corresponding to the predetermined gesture may be conducted.
The gesture of the user U may include various operations using his/her body. For example, the gesture may include the operation of the user sitting down, standing up, running, or even moving. Further, the gesture may include operations using the user's head, foot, or hand H. For convenience of illustration, a gesture of using the hand H of the user U is described below as an example. However, the embodiments of the present invention are not limited thereto.
According to an embodiment, analysis of a hand gesture may be conducted in the following ways.
First, the user's fingertips are detected, the number and shape of the fingertips are analyzed, and then converted into a gesture command.
The detection of the fingertips may be performed in two steps.
First, a step of detecting a hand area may be performed using a skin tone of a human. A group of candidates for the hand area is designated and contours of the candidates are extracted based on the human's skin tone. Among the candidates, a candidate the contour of which has the same number of points as a value in a predetermined range may be selected as the hand.
Secondly, as a step of determining the fingertips, the contour of the candidate selected as the hand is run around and a curvature is calculated based on inner products between adjacent points. Since the fingertips show sharp variation of their curvatures, when a change in a curvature of a fingertip exceeds a threshold value, the fingertip is chosen as a fingertip of the hand. The fingertips thusly extracted may be converted into meaningful commands during gesture-command conversion.
According to an embodiment, it is often necessary with respect to a gesture command for a synthesized virtual 3D image (3D object) to judge whether a contact has occurred between the virtual 3D image and a user's gesture. For example, it may be necessary, as is often case, whether there is a contact between an actual object and a virtual object to manipulate the virtual object interposed in the actual object.
Whether the contact is present or not may be determined by various collision detection algorithms. For instance, a rectangle bounding box method and a bounding sphere method may be adopted for such judgment.
The rectangle bounding box method compares areas of rectangles surrounding a 2D object for collision detection. The rectangle bounding box method has merits such as being less burden in calculation and easy to follow. The bounding sphere method determines whether there is collision or not by comparing radii of spheres surrounding a 3D object.
For example, a depth camera may be used for manipulation of a real hand and a virtual object. Depth information of the hand as obtained by the depth camera is converted into a distance unit for a virtual world for purposes of rendering of the virtual image, and collision with the virtual object may be detected based on a coordinate.
Hereinafter, an exemplary environment in which the embodiments of the present invention are implemented is described.
Referring to
A voice and/or motion of the first user U1 may be obtained and converted into video data and/or audio data by an electronic device 200 arranged in the first place. Further, the video data and/or audio data may be transferred through a predetermined network (communication network) to another electronic device 300 positioned in the second place. The first electronic device 300 may output the transferred video data and/or audio data through an output unit in a visual or auditory manner. The first electronic device 300 and the first electronic device 300 each may be the same or substantially the same as the electronic device 100 described in connection with
For example, the first user U1 may transfer his image and/or voice through the first electronic device 300 to the first electronic device 300 and may receive and output an image and/or voice of the second user U2. Likewise, the first electronic device 300 may also perform the same functions and operations as the first electronic device 300.
Hereinafter, a method of controlling an electronic device according to an embodiment of the present invention is described. For purposes of illustration, the control method is performed by the electronic device 100 described in connection with
Referring to
The first electronic device 300 may further include a step of determining whether a variation of the image displayed on the specific area is equal to or larger than a predetermined threshold (S140) and a step of storing an image (e.g., second image) displayed on the specific area when the variation of the image is equal to or larger than the predetermined threshold (S150). When the variation of the image is smaller than the predetermined threshold, the first electronic device 300 may continue to monitor whether the variation of the image becomes equal to or larger than the threshold (S140).
The first electronic device 300 may continuously display video images received from the second electronic device 200 on the display unit 151. When the first electronic device 300 receives a predetermined request while performing steps 5100 to S150 (S160), the first electronic device 300 obtains an image corresponding to the request among images stored in the memory 160 in steps S130 and/or S150 (S170) and displays the obtained image on the display unit 151 (S180). Hereinafter, the steps are described in greater detail.
The first electronic device 300 positioned in the second place may receive a video image from the second electronic device 200 (S100). The video image may be streamed from the second electronic device 200 to the first electronic device 300.
The video image may be obtained by the second electronic device 200. For instance, the video image may include a scene relating to a video conference performed by the first user U1 or a scene relating to an online lecture conducted by the first user U1 as obtained by the second electronic device 200.
The video image may be a video image that is obtained by the camera 121 included in the second electronic device 200 and reflects a real situation. The video image may be a composite image of a virtual image and a video image reflecting a real situation. At least part of the video image reflecting the real situation may be replaced by another image.
The video image may be directly transmitted from the second electronic device 200 to the first electronic device 300 or may be transmitted from the second electronic device 200 to the first electronic device 300 via a server (not shown).
The first electronic device 300 may generate a control signal for visually representing a video image (S110). The first electronic device 300 may visually output the video image through the display unit 151 or a beam projector (not shown) according to the control signal.
The first electronic device 300 may identify a specific area included in the video image (S120). For instance, the first electronic device 300 may identify an area which displays a presentation material necessary for performing a video conference.
The first electronic device 300 may identify the specific area SA on which the presentation material is displayed as described above. The first electronic device 300 may employ various methods to identify the specific area SA. For example, the first electronic device 300 may use an image processing technology to analyze the video image and to identify an area on which marks such as letters and/or diagrams are intensively displayed, so that the specific area SA may be noticed. As another example, the second electronic device 200 may transmit location information of the specific area SA to the first electronic device 300 together with or separately from the video image upon transmission of the video image. The first electronic device 300 may identify the specific area SA based on the transmitted location information.
Subsequently, the first electronic device 300 may store the image displayed on the specific area (S130). For example, the first electronic device 300 may store the image for the presentation material included in the video image. The image may be stored in the memory 160.
The presentation material may include a number of pages or may include a video material. The image displayed on the specific area, which is stored in step S130, may be a still image for part of the presentation material displayed on the specific area at a particular time point. For example, in the case that the presentation material includes several pages, the image stored in step S130 (hereinafter, referred to as “a first image”) may be an image for a particular page that is displayed at a time point when step 120 and/or step S130 are performed among the pages included in the presentation material. In the case that the presentation material includes a video, the image stored in step S130 may be an image for a particular frame displayed at a time point when step S120 and/or step S130 are performed among a plurality of frames included in the movie (presentation material).
The second user who attends the video conference at the second place may store the image for the presentation material without separately receiving data for the presentation material provided for performing the conference by the first user who conducts the video conference at the first place. The image for the presentation material may be separately extracted from the video image provided through the video conference and stored through the electronic device used by the second user without any annoying process such as previously receiving separate electronic data (for example, electronic files) for the presentation material used for the video conference by the first user.
For example, in the case that the conference is performed with materials difficult to convert into data (for example, samples used for introducing a prototype model) or the first user has not converted presentation material used for the conference into electronic data in advance, according to an embodiment, the presentation material may be used for the conference while converted into image data at the same time, so that the second user may see again the presentation material used for the video conference.
The first electronic device 300 determines whether the variation of the image displayed on the specific area is equal to or more than a predetermined threshold (S140), and when the variation is determined to be not less than the threshold, the first electronic device 300 may store the image displayed on the specific area (S150). However, when the variation is less than the threshold, the first electronic device 300 may continue to monitor whether the variation is equal to or more than the threshold (S140). When the variation is less than the threshold, the first electronic device 300 may keep monitoring any change to the image without separately storing the image.
To perform step S140, the first electronic device 300 receives and displays the video image and continues to monitor the specific area. The first electronic device 300 may continuously perform step S140 and monitor whether there is any change to the presentation material displayed on the specific area (e.g., content displayed on the specific area).
For example, in the case that the first user U1 changes the presentation material from a first material to a second material while performing the conference at the first place, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150).
As another example, in the case that the presentation material include a plurality of pages, when the first user U1 changes the presentation material from an Nth page to an N+1th page, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150).
For example, in the case that the presentation material is a video material, the first electronic device 300 may sense a change to the image displayed on the specific area (S140) and may store the image displayed on the specific area after such change separately from the first image stored in step S130 (S150). For example, an image corresponding to an Nth frame of the video material is stored as the first image, and an image corresponding to an N+ath frame in which a variation of the image corresponding to the Nth frame is equal to or more than a predetermined threshold may be stored in step S150 (where, a is an integer equal to or more than 1). For example, when a difference (variation) between the image corresponding o the Nth frame and images corresponding to the N+1th frame and the N+2th frame does not exceed the threshold, the first electronic device 300 does not sore the images corresponding to the N+1th and N+2th frames of the video material. However, when a change (variation) between the image corresponding to the Nth frame and an image corresponding to the N+3th frame of the video material is in excess of the threshold, the first electronic device 300 stores the image corresponding to the N+3th frame in step S150. Accordingly, even when the presentation material provided by the first user is a video material, an image corresponding to a frame positioned at a border where the image changes a lot may be stored in the first electronic device 300, so that the second user may review the presentation material later.
The first electronic device 300 may compare an image displayed in real time on the specific area with the first image stored in step S130 (or as described below, an image right coming right before the first image when steps S140 and S150 are repeated) and may yield a variation. For example, in the case that the presentation material is a video, the image currently displayed on the specific area is an image corresponding to the Nth frame, and the stored first image (or image stored immediately before the first image) is an image corresponding to the N−5th frame, the first electronic device 300 may compare the image corresponding to the Nth frame with the image corresponding to the N−5th frame rather than corresponding to the N−1th frame and may produce a variation.
Subsequently, the first electronic device 300 repeats steps S140 and S150 and stores the image displayed on the specific area SA in the memory 160 whenever the image changes by more than the threshold. The first electronic device 300 may store the plurality of images stored in steps S130 and/or S150 in order of storage.
In storing the plurality of images, the first electronic device 300 may number images stored while the video conference is performed in order of storage as shown in (a) of
In storing the plurality of images, the first electronic device 300, as shown in (b) of
While performing steps S120 to S150, the first electronic device 300 may continue to display the video image received from the second electronic device 200 on the display unit 151. While continuously performing steps S100 to S150, the first electronic device 300 may receive a predetermined request (S160).
The second user U2 may want to review the presentation material that has been just explained by the user U1 while viewing the video conference hosted by the first user U1. In this case, the second user U2 may input a predetermined request to the first electronic device 300. Alternatively, the second user U2 may input the predetermined request to another electronic device (not shown) wirelessly or wiredly connected to the first electronic device 300, and the other electronic device may transfer the predetermined request and/or the fact that the predetermined request has been generated to the first electronic device 300. Hereinafter, unless stated otherwise, it is assumed that the second user U2 directly inputs the predetermined request to the first electronic device 300.
The predetermined request may be input by various methods.
Referring to
Referring to
Referring to
Subsequently, the first electronic device 300 may obtain an image corresponding to the received request among the images stored in the memory 160 in steps S130 and/or S150 (S170) and may display the obtained image on the display unit 151 (S180).
Obtaining the image corresponding to the predetermined request in step S170 may be performed by various methods.
For example, as described in connection with
As another example, in the case that the predetermined request is input by a user's voice command (e.g., when the user says “to the previous page”) as described in connection with (a) of
As still another example, as described in connection with (b) of
As yet still another example, in the case that the predetermined request is input by the control button CB separately displayed on the display unit 151 as described in connection with
As described above, the obtained image may be displayed on the display unit 151. The first electronic device 300 may display the obtained image by various methods.
Referring to
Referring to
As such, the second user may review the previous pages of the presentation material used for the video conference hosted by the first user while the video conference is in progress, thereby enabling more efficient video conference.
A method of controlling an electronic device according to an embodiment of the present invention is now described.
Referring to
In the environment as illustrated in
As shown in
According to an embodiment, the control method may further include a step of determining whether the speaker corresponding to the human voice included in the movie data changes from the first speaker to a second speaker (S250), a step of, when the speaker changes into the second speaker, identifying the changed second speaker (S260), a step of obtaining information relating to the second speaker (S270), and a step of storing the obtained information so that the obtained information corresponds to a second time point when the speaker changes to the second speaker (S280). The information relating to the first speaker and/or second speaker may include personal information of each speaker, information on the place where each speaker is positioned during the course of the video conference, and keywords included in the speech which each speaker makes. Each step is now described in greater detail.
The first electronic device 300 may receive multimedia data obtained by the other electronic devices 200, 400, and 500 (S200). The first electronic device 300 may receive first multimedia data obtained for the first user U1 attending the video conference at the first place, third multimedia data obtained for the third to sixth users U3, U4, U5, and U6 attending the video conference at the third place, and fourth multimedia data obtained for the seventh and eighth users U7 and U8 attending the video conference at the fourth place.
The multimedia data may include video data reflecting in real time each user attending the video conference and images surrounding the users as well as audio data reflecting in real time each user attending the video conference and sound surrounding the users.
The first electronic device 300 may output the received multimedia data through the output unit 150 in real time. For example, the video data included in the multimedia data may be displayed on the display unit 151, and the audio data included in the multimedia data may be audibly output through the sound output unit 152. According to an embodiment, while the received multimedia data is displayed, the second multimedia data obtained for the second user U2 and his surroundings directly by the camera 121 and/or microphone 122 of the first electronic device 300 may be also displayed. Hereinafter, unless stated otherwise, the “multimedia data obtained by the first electronic device 300” includes multimedia data obtained by and received from the electronic devices 200, 400, and 500 and multimedia data directly obtained by the user input unit 120 of the first electronic device 300.
The received multimedia data may be stored in the memory 160.
The first electronic device 300 may display all or at least selected one of the video data clips included in the multimedia data obtained by the first electronic device 300 on the display unit 151. Likewise, all or at least selected one of the audio data clips included in the multimedia data may also be output through the sound output unit 152. As used herein, the “multimedia data clip” may refer to part of the multimedia data, the “video data clip” may refer to part of the video data, and the “audio data clip” may refer to part of the audio data.
As described above, while outputting the multimedia data obtained by the first electronic device 300 through the output unit 150, the first electronic device 300 may sense a human voice by analyzing the audio data included in at least one multimedia data clip of the multimedia data (S210). Hereinafter, the time when the human voice is sensed in step S210 is referred to as “first time point”.
For example, as shown in
Subsequently, the first electronic device 300 may identify a speaker corresponding to the sensed voice (S220). For example, the first electronic device 300 may identify which user has generated the voice among the first to eighth users U1, U2, U3, U4, U5, U6, U7, and U8 that attend the video conference. To identify the speaker corresponding to the sensed voice, the first electronic device 300 may identify which electronic device has sent the multimedia data including the voice among the electronic devices 200, 400, and 500. For example, in the case that the voice is included in the multimedia data received from the second electronic device 200 located at the first place, the first electronic device 300 may determine that the speaker corresponding to the voice is the first user U1 located at the first place.
According to an embodiment, the first electronic device 300 may analyze the video data included in the multimedia data to identify the speaker corresponding to the sensed voice. According to an embodiment, the first electronic device 300 may analyze images for respective users reflected by the video data after the voice has been sensed to determine which user has generated the voice. For example, the first electronic device 300 may determine the current speaker by recognizing each user's face and analyzing the recognized face (e.g., each user's lips). For example, as shown in (b) of
The first electronic device 300 may use both the method of identifying the electronic device that has sent the multimedia data and the method of analyzing the video data included in the multimedia data to identify the speaker corresponding to the sensed voice.
It is not necessary to perform step S210 and/or S220 by the first electronic device 300. According to an embodiment, step S210 and/or S220 may be performed by the other electronic devices 200, 400, and 500. For example, each electronic device 200, 400, or 500 may determine whether a voice is included in multimedia data it receives, and if the voice is determined to be included, may analyze the video data included in the multimedia data to determine who is the speaker corresponding to the sensed voice as described above. If step S210 and/or S220 is performed by each of the electronic devices 200, 400, and 500, information on the speaker determined by the electronic devices 200, 400, and 500 may be transmitted to the first electronic device 300 along with the multimedia data. Information on the time point when the voice was sensed may be also transmitted to the first electronic device 300.
The first electronic device 300 may then obtain information relating to the identified speaker (S230). Such information may be diverse. For example, the information may include personal information on the speaker, information on the place where the speaker was positioned during the video conference, and keywords which the speaker has spoken.
The personal information on the speaker may be obtained from a database of personal information for the conference attendees which are previously stored. The database may be provided in the first electronic device 300 or at least one of the electronic devices 200, 400, and 500, or distributtedly established in the electronic devices 200, 300, 400, and 500. Or the database may be provided in a server (not shown) connected over a communication network. The personal information may include names, job positions, and divisions of, e.g., the conference attendees.
The first electronic device 300 may receive information on place where the speaker is located from the electronic devices 200, 400, and 500. Or, the first electronic device 300 may obtain the place information based on IP addresses used by the electronic devices 200, 400, and 500 for the video conference. The place information may include any information that are conceptually discerned and may distinguish one place from another, as well as any information, such as addresses for geographically specifying locations. For example, the place information may include an address, such as “xxx, Yeoksam-dong, Gangnam-gu, Seoul”, a team name, such as “Financial Team” or “IP group”, a branch name, such as “US branch of XX company” or “Chinese branch of XX company”, or a company name, such as “A corporation” or “B corporation”.
While the identified speaker makes a speech, the first electronic device 300 may analyze what the speaker says about, determine words, phrases, or sentences repeatedly spoken, and consider the repeatedly spoken words, phrases, or sentences as keywords of the speech the speaker has made. The first electronic device 300 may directly analyze audio data reflecting what the speaker has spoken or may convert the audio data into text data through an STT (Speech-To-Text) engine and analyze the converted text data.
Subsequently, the first electronic device 300 may store the obtained information so that the obtained information corresponds to the first time point (S240).
The first time point may be information specifying a time determined on a time line whose counting commences when the video conference begins. For example, if the fifth user U5 starts to speech 15 seconds after the video conference has commenced, the first time point may be the “15 seconds”.
In the example described in connection with
Hereinafter, a set of the information (e.g., personal information of users, place information, keywords, etc.) stored corresponding to the time point when the speaker begins to speak (e.g., the first time point as in the above example) is referred to as “metadata”, and the name, division, position, place information, and keyword are referred to as fields of the metadata. In the above-described example, it has been described that the metadata for the video conference includes the personal information, place information, and keywords. However, this is merely an example, and according to an embodiment, other fields may be added in the metadata for video conference.
According to an embodiment, the control method ma continue to monitor whether the speaker corresponding to the human voice included in the multimedia data changes from the current speaker (e.g., the fifth user in the above-described example) to another user (S250). In the example illustrated in
The first electronic device 300 may determine whether the speaker changes by analyzing the audio data included in the multimedia data received from the electronic devices 200, 400, and 500. For example, the first electronic device 300 may determine whether a human voice included in the audio data is identical to the previous voice, and if not, may determine that the speaker has changed.
If it is determined in step S250 that there is a change of the speaker, the first electronic device 300 may identify the changed speaker (S260). For convenience of description, a speaker who is identified to speak next to the first speaker SP1 that first started to speak after the video conference had commenced is referred to as a “second speaker SP2”. In the example illustrated in
Step S260 may be performed by the same or substantially the same method as step S220. To identify a speaker corresponding to the sensed voice, the first electronic device 300 may identify which electronic device has sent the multimedia data including the voice among the electronic devices 200, 400, and 500, or may analyze the video data included in the multimedia data, or may use both identifying the video data included in the multimedia data and analyzing the video data included in the multimedia data.
Similar to step S210 and/or S220, step S250 and/or S260 is not necessarily performed b the first electronic device 300. Each of the electronic devices 200, 400, and 500 may perform step S250 and/or S260.
Subsequently, the first electronic device 300 may obtain information relating to the identified second speaker (S270) and may store the obtained information so that the information corresponds to the time point when the speaker changed (e.g., the second time) (S280). Steps S270 and step S280 may be performed identical or similar to steps S230 and S240.
Thereafter, the first electronic device 300 may repeatedly perform steps S250 to S280. Accordingly, whenever the speaker making a speech in the video conference changes, the first electronic device 300 may obtain information on the changed speaker and may store the information with the information corresponding to the time when the change occurred. For example, when the person speaking in the video conference changes from the fourth user U4 to the seventh user U7 as shown in
Referring to
As such, the first electronic device 300 may continue to monitor the speakers during the course of the video conference and may store various types of information on the speakers so that the information corresponds to the time points when the speakers begin to speak, thereby generating metadata for video conference. Further, the video conference metadata is used in various manners, thus enhancing user convenience. For example, the video conference metadata may be used to make brief proceedings for the video conference, which are to be provided to the attendees of the conference or may be used to provide a search function which allows the attendees to review the conference.
As described above, the first electronic device 300 may output the received multimedia data in real time through the output unit 150 while simultaneously generating the video conference metadata. For example, the video data included in the multimedia data may be displayed on the display unit 151, and the audio data included in the multimedia data may be audibly output through the sound output unit 152. While the received multimedia data is displayed, the second multimedia data obtained for the second user U2 and his surroundings directly by the camera 121 and/or microphone 122 of the first electronic device 300 may be output as well. The first electronic device 300 may display the whole video data included in the multimedia data obtained by the first electronic device 300 on the display unit 151 at once.
According to an embodiment, when identifying the current speaker in step S210 and/or S260, the first electronic device 300 may identify a multimedia data clip including the identified speaker among a plurality of multimedia data clips obtained and transmitted by the electronic devices 200, 400, and 500, and may output the identified multimedia data clip through the output unit 150 by a different method from output methods for the other multimedia data clips.
For instance, the first electronic device 300 may display the identified multimedia data clip (hereinafter, referred to as “multimedia data clip for speaker”) so that the speaker multimedia data clip appears larger than the other multimedia data clips (hereinafter, referred to as “multimedia data clips for listener”). For example, as shown in (b) of
As another example, the first electronic device 300 may output both the video and audio data included in the multimedia data clip for speaker, among the plurality of multimedia data clips, through the sound output unit 152 while outputting only the video data included in the multimedia data clips for listener except for the audio data. For example, the whole video data included in the plurality of multimedia data clips may be displayed on the display unit 151 whereas only the audio data included in the multimedia data clips for listener may be selectively output through the sound output unit 152.
As still another example, the first electronic device 300 may output only the video and audio data corresponding to the multimedia data clip for speaker among the plurality of multimedia data clips through the output unit 150 while receiving the multimedia data clips for listener from the electronic devices 200, 400, and 500 and storing the received data clips in the memory 160 without outputting the stored data clips through the display unit 151 or the sound output unit 152.
Although it has been described that the control method is performed by the first electronic device 300 located at the second place, the embodiments of the present invention are not limited thereto. For example, according to an embodiment, the control method may be performed by each of the electronic device 200 located at the first place, the electronic device 400 located at the third place, and the electronic device 500 located at the fourth place.
By identifying the speaker and outputting the multimedia data clip corresponding to the speaker in a different manner than those for the other multimedia data clips, more attention can be oriented toward the user who is making a speech in the video conference, thereby enabling the video conference to proceed more efficiently.
Hereinafter, a method of controlling an electronic device according to an embodiment of the present invention is described. The metadata described above may be used to search for multimedia data for video conference at specific times. For ease of description, those described in connection with
Referring to
The first electronic device 300 may receive a predetermined input (S300). The predetermined input, which is provided to input specific time points of the multimedia data for the video conference stored in the first electronic device 300, may be received by various methods. Any method to receive the search conditions may be used for entry of the predetermined input.
For example, referring to (a) of
As another example, the first electronic device 300 may receive the predetermined input using a touch input method. Referring to (b) of
As still another example, the first electronic device 300 may receive the predetermined input using voice recognition. Referring to (c) of
According to an embodiment, a combination of the above-described methods may be used to receive the predetermined input. For example, if the user touches the input window F1 corresponding to the “Name” field followed by saying “search Jack!” with the screen image displayed as in (a) of
Subsequently, the first electronic device 300 may obtain information on time point for the received input (S310). The time information may be information for specifying a time point determined on a time line counted since the video conference begins. For example, according to an embodiment, the “time information” described in connection with
For example, in the case that the video conference metadata is generated and stored as described in connection with
Accordingly, the first electronic device 300 may output the multimedia data corresponding to the obtained time point (S320). For example, the first electronic device 300 may store the multimedia data relating to the video conference and may call the stored multimedia data and output the data through 152 and/or the display unit 151 from the part corresponding to the time information obtained in step S310.
If steps S300 and S310 are performed while the video conference is on the go through the first electronic device 300, step S320 may be conducted by various methods as follows. Hereinafter, for convenience of description, the multimedia data for video conference now in progress is referred to as “current multimedia data”, and the multimedia data corresponding to the time information obtained in steps S300 and S310 is referred to as “past multimedia data”.
According to an embodiment, the first electronic device 300 may display both video data included in the current multimedia data and the video data included in the past multimedia data on the display unit 151 and may output only the audio data included in the current multimedia data through the sound output unit 152 without outputting the audio data included in the past multimedia data. Referring to
According to an embodiment, the first electronic device 300 may display the video data included in the current multimedia data on a region of the display unit 151 and may output the audio data included in the current multimedia data through the sound output unit 152. The video data included in the past multimedia data is not displayed, and text data converted from the audio data included in the past multimedia data may be displayed on another region of the display unit 151. Referring to
In the case that the current multimedia data includes a plurality of multimedia data clips, the screen image displayed on the third region R3 may correspond to a multimedia data clip including the speaker currently speaking in the conference among the plurality of multimedia data clips, while the other multimedia data clips are displayed on the second region R2. Referring to
According to an embodiment, the first electronic device 300 may output the audio data included in the current multimedia data through the sound output unit 152 while not outputting the audio data included in the past multimedia data and may display the video data included in the past multimedia data on the display unit 151 while not displaying the video data included in the current multimedia data.
According to an embodiment, the first electronic device 300 may output the audio data included in the past multimedia data through the sound output unit 152 while not outputting the audio data included in the current multimedia data and may display the video data included in the current multimedia data on the display unit 151 while not displaying the video data included in the past multimedia data.
Alternatively, the first electronic device 300 may output the current and past multimedia data by various methods.
As such, after or while the video conference ends, specific time points of the multimedia data for the video conference may be searched to review the video conference, and the multimedia data corresponding to the searched time points may be output.
In the methods of controlling an electronic device according to the embodiments, each step is not necessary and according to an embodiment, the steps may be selectively included therein. The steps are not necessary to perform in the order described above, and according to an embodiment, a later step may be performed earlier than an earlier step.
The steps in the methods of controlling an electronic device may be performed separately or in combination thereof. According to an embodiment, steps in a method may be performed in combination with steps in another method.
The methods of controlling an electronic device may be stored in a computer readable medium in the form of codes or a program for performing the methods.
The invention has been explained above with reference to exemplary embodiments. It will be evident to those skilled in the art that various modifications may be made thereto without departing from the broader spirit and scope of the invention. Further, although the invention has been described in the context its implementation in particular environments and for particular applications, those skilled in the art will recognize that the present invention's usefulness is not limited thereto and that the invention can be beneficially utilized in any number of environments and implementations. The foregoing description and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Number | Date | Country | |
---|---|---|---|
61541289 | Sep 2011 | US |