INTERACTION METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM

Information

  • Patent Application
  • 20220300066
  • Publication Number
    20220300066
  • Date Filed
    February 25, 2022
    2 years ago
  • Date Published
    September 22, 2022
    a year ago
Abstract
Methods, apparatuses, devices, and computer-readable storage media for interactions between interactive objects and users are provided. In one aspect, a computer-implemented method includes: obtaining an image, acquired by a camera, of a surrounding of a display device that displays an interactive object through a transparent display screen, detecting at least one of a face or a body in the image to obtain a detection result, and driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.
Description
TECHNICAL FIELD

The present disclosure relates to the field of computer vision technology, and in particular to an interaction method, apparatus, device and storage medium.


BACKGROUND

Human-computer interaction is mostly implemented by a user input based on keys, touches, and voices, and by a respond with an image, text or a virtual human on a screen of a device. Currently, a virtual human is mostly developed on the basis of voice assistants, and the output is only generated based on a piece of voices input from the device, and the interaction between the user and the virtual human remains superficial.


SUMMARY

The embodiments of the present disclosure provide a solution of interactions between interactive objects (e.g., virtual humans) and users.


In a first aspect, a computer-implemented method for interactions between interactive objects and users is provided, the computer-implemented method includes: obtaining an image, acquired by a camera, of a surrounding of a display device; wherein the display device displays an interactive object through a transparent display screen; detecting at least one of a face or a body in the image to obtain a detection result; and according to the detection result, driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.


In the embodiments of the present disclosure, by detecting an image of the surrounding of the display device, and driving the interactive object displayed on the transparent display screen of the display device to respond according to a detection result, the response of the interactive object can be more complied with the needs of a user, thereby the interaction between the user and the interactive object is more real and vivid, and the user experience is improved.


In an example, a reflection of the interactive object is displayed by the display device on one of the transparent display screen or a base plate.


By displaying the stereoscopic image on the transparent display screen, and a reflection of the interactive object is formed on the transparent display screen or the base plate to achieve the stereoscopic effect, the displayed interactive object is more stereoscopic and vivid, thereby the interaction experience of the user is improved.


In an example, the interactive object includes a virtual human with a stereoscopic effect.


By using a virtual human with a stereoscopic effect to interact with the user, the interaction process is more natural and the interaction experience of the user is improved.


In an example, the detection result includes at least one current service state of the display device; wherein the at least one current service state includes at least one of a waiting for user state, a user leaving state, a user detected state, a service activated state or an in-service state.


By combining the current service state of the device to drive the interactive object to respond, the response of the interactive object can be more complied with the interaction needs of the user.


In an example, detecting the at least one of the face or the body in the image to obtain the detection result includes one of: in response to determining that the face and the body are not detected at a current time, and the face and the body are not detected within a preset time period before the current time, determining that the current service state is the waiting for user state, in response to determining that the face and the body are not detected at a current time, and the face and the body are detected within a preset time period before the current time, determining that the current service state is the user leaving state, or in response to determining that the at least one of the face or the body is detected at the current time, determining that the current service state of the display device is the user detected state.


In the case where there is no user interacting with the interactive object, by determining that the display device is currently in the waiting for user state or the user leaving state, and driving the interactive object to make different responses, the display state of the interactive object is more complied with the interaction needs and more targeted.


In an example, the detection result further includes at least one of user attribute information or user historical operation information; the method further includes at least one of: in response to determining that the current service state of the display device is the user detected state, obtaining the user attribute information through the image; or, searching for the user historical operation information that matches feature information of at least one of the face or the body.


By obtaining historical operation information of the user and driving the interactive object with the historical operation information of the user, the interactive object can respond to the user in a more targeted manner.


In an example, the method further includes: in response to determining that at least one user is detected in the image, obtaining feature information of the at least one user; determining a target user from the at least one user according to the feature information of the at least one user; and driving the interactive object displayed on the transparent display screen of the display device to respond to the target user.


By determining the target user of the at least two users according to the feature information of the at least two users, and driving the interactive object to respond to the target object, the target user for interaction can be selected in a multi-user scenario, and a switching and response between different target users can be realized, thereby improving the user experience.


In an example, the method further includes: obtaining environment information of the display device; wherein driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result includes: driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result and the environment information.


In an example, the environment information includes at least one of a geographic location of the display device, an IP address of the display device, a weather or date of an area where the display device is located.


By obtaining the environment information of the display device and driving the interactive object to respond with the environment information, the response of the interactive object can be more complied with actual interaction needs, and the interaction between the user and the interactive object can be more natural and vivid, thereby the user experience is improved.


In an example, driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result and the environment information includes: obtaining a preset response label matching with the detection result and the environment information; driving the interactive object displayed on the transparent display screen to make a response corresponding to the response label.


In an example, driving the interactive object displayed on the transparent display screen to make the response corresponding to the response label includes: inputting the preset response label to a trained neural network for the neural network to output at least one driving contents corresponding to the response label, wherein the at least one driving content is used to drive the interactive object to output one or more of corresponding actions, expressions, or voices.


By configuring corresponding response labels for a combination of different detection results and different environmental information, and using the response labels to drive the interactive object to output one or more of the corresponding actions, expressions, or voices, the interactive object can be driven according to different states and different scenarios of the device to make different responses, so that the responses of the interactive object are more diversified.


In an example, the method further includes: in response to determining that the current service state is the user detected state, after driving the interactive object to respond, tracking a user detected in the image of the surrounding of the display device; in the process of tracking the user, in response to detecting first trigger information output by the user, determining that the display device enters the service activated state, and driving the interactive object to display a first service matching the first trigger information.


Through the interaction method provided by the embodiments of the present disclosure, the user does not need to enter keys, touches, or input voices. The user just needs to stand by the display device, the interactive object displayed on the display device can make a targeted welcome action and follow an instruction from the user, and display services can be provided according to the needs or interests of the user, thereby the user experience is improved.


In an example, the method further includes: when the display device is in the service activated state, in response to detecting second trigger information output by the user, determining that the display device enters the in-service state, and driving the interactive object to display a second service matching the second trigger information.


After the display device enters the user detected state, two granular of recognition methods are provided. When the first trigger information output by the user is detected, the first-granular (coarse-grained) recognition method is to enable the device to enter the service activated state, and drive the interactive object to display the service matching the first trigger information. When the second trigger information output by the user is detected, the second-granular (fine-grained) recognition method is to enable the device to enter the in-service state, and drive the interactive object to provide the corresponding service. Through the above two granular of recognition methods, the interaction between the user and the interactive object can be smoother and more natural.


In an example, the method further includes: in response to determining that the current service state is the user detected state, obtaining position information of the user relative to the interactive object displayed on the transparent display screen according to a position of the user in the image; and adjusting an orientation of the interactive object according to the position information so that the interactive object faces the user.


By automatically adjusting the body orientation of the interactive object according to the position of the user, the interactive object always faces to the user, such that the interaction is more friendly, and the user's interaction experience is improved.


In a second aspect, an interaction device is provided, the interaction device includes: at least one processor; and one or more memories coupled to the at least one processor and storing programming instructions for execution by the at least one processor to perform the method of any of the embodiments of the present disclosure.


In a third aspect, a non-transitory computer readable storage medium is provided, the non-transitory computer readable storage medium having machine-executable instructions stored thereon that, when executed by at least one processor, cause the at least one processor to perform the method of any of the embodiments of the present disclosure.


It is appreciated that methods in accordance with the present disclosure may include any combination of the aspects and features described herein. That is, methods in accordance with the present disclosure are not limited to the combinations of aspects and features specifically described herein, but also include any combination of the aspects and features provided.


The details of one or more embodiments of the present disclosure are set forth in the accompanying drawings and the description below. Other features and advantages of this specification will be apparent from the description and drawings, and from the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a flowchart illustrating an interaction method according to at least one embodiment of the present disclosure.



FIG. 2 is a schematic diagram illustrating an interactive object according to at least one embodiment of the present disclosure.



FIG. 3 is a schematic structural diagram illustrating an interaction apparatus according to at least one embodiment of the present disclosure.



FIG. 4 is a schematic structural diagram illustrating an interaction device according to at least one embodiment of the present disclosure.





DETAILED DESCRIPTION OF THE EMBODIMENTS

Examples will be described in detail herein, with the illustrations thereof represented in the drawings. When the following descriptions involve the drawings, like numerals in different drawings refer to like or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the present disclosure as detailed in the appended claims.


The term “and/or” in the present disclosure is merely an association relationship for describing associated objects, and indicates that there may be three relationships, for example, A and/or B may indicate that there are three cases: A alone, both A and B, and B alone. In addition, the term “at least one” herein means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, and may be any one or more elements selected in the set formed by A, B and C.



FIG. 1 is a flowchart illustrating an interaction method according to at least one embodiment of the present disclosure. As shown in FIG. 1, the method includes steps 101 to 103.


At step 101, an image of surrounding of a display device acquired by a camera is obtained, and an interactive object is displayed by the display device through a transparent display screen.


The surrounding of the display device includes any direction within a preset range of the display device, for example, the surrounding may include one or more of a front direction, a side direction, a rear direction, or an upper direction of the display device.


The camera for acquiring images can be installed on the display device or used as an external device which is independent from the display device. The image acquired by the camera can be displayed on the transparent display screen of the display device. The cameras may be plural in number.


Optionally, the image acquired by the camera may be a frame in a video stream, or may be an image acquired in real time.


At step 102, at least one of a face or a body in the image is detected to obtain a detection result.


By performing face and/or body detection on the image of the surrounding of the display device, a detection result is obtained, for example, the detection result indicates whether there is a user around the display device, the number of the users, and related information of the user can be obtained from the image through face and/or body detection technology, or the related information of the user can be queried by the image of the user. In addition, an action, a posture, a gesture of the user can also be detected through image detection technology. Those skilled in the art should understand that the above detection results are only examples, and other detection results may also be included.


At step 103, the interactive object displayed on the transparent display screen of the display device is driven to respond according to the detection result.


In response to different detection results, the interactive object can be driven to make different responses. For example, when there is no user around the display device, the interactive object is driven to output welcome actions, expressions, voices, and so on.


In the embodiments of the present disclosure, by detecting an image of the surrounding of the display device, and driving the interactive object displayed on the transparent display screen of the display device to respond according to a detection result, the response of the interactive object can be more complied with the needs of the user, thereby the interaction between the user and the interactive object is more real and vivid, and the user experience is improved.


In some embodiments, the interactive object displayed on the transparent display screen of the display device include a virtual human with a stereoscopic effect.


By using the virtual human with a stereoscopic effect to interact with users, the interaction is more natural and the interaction experience of the user can be improved.


Those skilled in the art should understand that the interactive object is not limited to the virtual human with a stereoscopic effect, but may also be a virtual animal, a virtual item, a cartoon character, and other virtual images capable of realizing interaction functions.


In some embodiments, the stereoscopic effect of the interactive object displayed on the transparent display screen can be realized by the following method.


Whether the human eye sees an object is stereoscopic is usually determined by the shape of the object itself and the light and shadow effects of the object. The light and shadow effects are, for example, highlight and dark light in different areas of the object, and the projection of light on the ground after the object is irradiated (that is, reflection).


Using the above principles, in an example, when the stereoscopic video or image of the interactive object is displayed on the transparent display screen, the reflection of the interactive object is also displayed on the transparent display screen, so that the human eye can observe the interactive object with a stereoscopic effect.


In another example, a base plate is provided under the transparent display screen, and the transparent display is perpendicular or inclined to the base plate. While the transparent display screen displays the stereoscopic video or image of the interactive object, the reflection of the interactive object is displayed on the base plate, so that the human eye can observe the interactive object with a stereoscopic effect.


In some embodiments, the display device further includes a housing, and the front side of the housing is configured to be transparent, for example, by materials such as glass or plastic. Through the front side of the housing, the image on the transparent display screen and the reflection of the image on the transparent display screen or the base plate can be seen, so that the human eye can observe the interactive object with the stereoscopic effect, as shown in FIG. 2.


In some embodiments, one or more light sources are also provided in the housing to provide light for the transparent display screen.


In the embodiments of the present disclosure, the stereoscopic video or the image of the interactive object is displayed on the transparent display screen, and the reflection of the interactive object is formed on the transparent display screen or the base plate to achieve the stereoscopic effect, so that the displayed interactive object is more stereoscopic and vivid, thereby the interaction experience of the user is improved.


In some embodiments, the detection result may include a current service state of the display device. The current service state includes, for example, any one of a waiting for user state, a user detected state, a user leaving state, a service activated state, and an in-service state. Those skilled in the art should understand that the current service state of the display device may also include other states, and is not limited to the above.


When no face or body is detected in the image of the surrounding of the display device, it means that there is no user around the display device, that is, the display device is not currently in a state of interacting with user. This state includes a state in which there is no user interacting with the device in a preset time period before the current time, that is, the waiting for user state, and also includes a state in which the user has completed the interaction in a preset time period before the current time, that is, the display device is in the user leaving state. For these two different states, the interactive object should be driven to make different responses. For example, for the waiting for user state, the interactive object can be driven to make a response of welcoming the user in combination with the current environment; and for the user leaving state, the interactive object can be driven to make a response of ending the interaction of the last user who has completed the interaction.


In an example, the waiting for user state can be determined by the following method. In response to that the face and the body are not detected at the current time, and the face and the body are not detected within a preset time period before the current time, for example, 5 seconds, it is determined that the current service state of the display device is the waiting for user state.


In an example, the user leaving state can be determined by the following method. In response to that the face and the body are not detected at the current time, and the face and the body are detected within a preset time period before the current time, for example, 5 seconds, it is determined that the current service state of the display device is the user leaving state.


When the display device is in the waiting for user state or the user leaving state, the interactive object may be driven to respond according to the current service state of the display device. For example, when the display device is in the waiting for user state, the interactive object displayed on the display device can be driven to make a welcome action or gesture, or make some interesting actions, or output a welcome voice. When the display device is in the user leaving state, the interactive object can be driven to make a goodbye action or gesture, or output a goodbye voice.


In the case where the face and/or the body is detected from the image of the surrounding of the display device, it means that there is a user around the display device, and the current service state at the moment when the user is detected can be determined as the user detected state.


When a user is detected around the display device, user feature information of the user can be obtained through the image. For example, a number of users around the device can be determined by the results of face and/or body detection; for each user, face and/or body detection technology can be used to obtain the information related to the user from the image, for example, a gender of the user, an approximate age of the user, etc. The interactive object can be driven to make different responses to the users with different genders and different ages.


In the user detected state, for the detected user, user historical operation information of the detected user stored in the display device can also be obtained, and/or the user historical operation information stored in the cloud can be obtained to determine whether the user is a regular customer, or whether he/she is a VIP customer. The user historical operation information may also include a name, gender, age, service record, remark of the user. The user historical operation information may include information input by the user, and may also include information recorded by the display device and/or cloud. By obtaining the user historical operation information, the interactive object can be driven to respond to the user in a more targeted way.


In an example, the user historical operation information matching the user may be searched according to the detected feature information of the face and/or body of the user.


When the display device is in the user detected state, the interactive object can be driven to respond according to the current service state of the display device, the user feature information obtained from the image, and the user historical operation information obtained by searching. When a user is detected for the first time, historical operation information of the user may be empty, that is, the interactive object is driven according to the current service state, the user feature information, and the environment information.


In the case that a user is detected in the image of the surrounding of the display device, the face and/or body of the user can be detected through the image first to obtain user feature information of the user. For example, the user is a female and the age of the user is between 20 and 30 years old; then, according to the face and/or body feature information, the historical operation information of the user is searched in the display device and/or the cloud, for example, a name of the user, a service record of the user, etc. After the user is detected, the interactive object is driven to make a targeted welcoming action to the female user, and to show the female user services that can be provided for the female user. According to the services previously used by the user included in the historical operation information of the user, the order of providing services can be adjusted, so that the user can find the service of interest more quickly.


When at least two users are detected in images of the surrounding of the device, feature information of the at least two users can be obtained first, and the feature information can include at least one of user posture information or user attribute information, and the feature information corresponds to user historical operation information, where the user posture information can be obtained by recognizing the action of the user in the image.


Next, a target user among the at least two users is determined according to the obtained feature information of the at least two users. The feature information of each user can be comprehensively evaluated in combination with the actual scene to determine the target user.


After the target user is determined, the interactive object displayed on the transparent display screen of the display device can be driven to respond to the target user.


In some embodiments, when the user is detected, after driving the interactive object to respond, by tracking the user detected in the image of the surrounding of the display device, for example, tracking the facial expression of the user, and/or, tracking the action of the user, etc., and determining whether to make the display device enter the service activated state by determining whether the user has an active interaction expression and/or action.


In an example, in the process of tracking the user, designated trigger information can be set, such as common facial expressions and/or actions for greetings, such as blinking, nodding, waving, raising hands, and slaps. In order to distinguish from the following, the designated trigger information herein may be referred to as first trigger information. When the first trigger information output by the user is detected, it is determined that the display device has entered the service activated state, and the interactive object is driven to display the service matching the first trigger information, for example, through voice or through text information of the screen.


The current common somatosensory interaction requires the user to raise his hand for a period of time to activate the service. After selecting a service, the user needs to keep his hand still for several seconds to complete the activation. In the interaction method provided by the embodiments of the present disclosure, the user does not need to raise his hand for a period of time to activate the service, and does not need to keep the hand still to complete the selection. By automatically determining the designated trigger information of the user, the service can be automatically activated, so that the device is in the service activated state, thereby the user is avoided from raising his hand and waiting for a period of time, and the user experience is improved.


In some embodiments, in the service activation state, designated trigger information can be set, such as a specific gesture, and/or a specific voice command. In order to distinguish the designated trigger information from the above, the designated trigger information herein may be referred to as second trigger information. When the second trigger information output by the user is detected, it is determined that the display device has entered the in-service state, and the interactive object is driven to display a service matching the second trigger information.


In an example, the corresponding service is executed through the second trigger information output by the user. For example, the service that can be provided to the user include: a first service option, a second service option, a third service option, etc., and corresponding second trigger information can be configured for the first service option, for example, the voice “one” can be set for the second trigger information corresponding to the first service option, the voice “two” can be set for the second trigger information corresponding to the second service option, and so on. When it is detected that the user outputs one of the voices, the display device enters the service option corresponding to the second trigger information, and the interactive object is driven to provide the service according to the content set by the service option.


In the embodiment of the present disclosure, after the display device enters the user detected state, two granular of recognition methods are provided. When the first trigger information output by the user is detected, the first-granular (coarse-grained) recognition method is to enable the device to enter the service activated state, and drive the interactive object to display the service matching the first trigger information. When the second trigger information output by the user is detected, the second-granular (fine-grained) recognition method is to enable the device to enter the in-service state, and drive the interactive object to provide the corresponding service. Through the above two granular of recognition methods, interactions between the user and the interactive object can be smoother and more natural.


Through the interaction method provided by the embodiments of the present disclosure, the user does not need to enter keys, touches, or input voices. The user just needs to stand by the display device, the interactive object displayed on the display device can make a targeted welcome action and follow an instruction from the user, and display services can be provided according to the needs or interests of the user, thereby the user experience is improved.


In some embodiments, the environmental information of the display device may be obtained, and the interactive object displayed on the transparent display screen of the display device can be driven to respond according to a detection result and the environmental information.


The environmental information of the display device may be obtained through a geographic location of the display device and/or an application scenario of the display device. The environmental information may be, for example, the geographic location of the display device, an internet protocol (IP) address, or the weather, date, etc. of the area where the display device is located. Those skilled in the art should understand that the above environmental information is only an example, and other environmental information may also be included.


For example, when the display device is in the waiting for user state and the user leaving state, the interactive object may be driven to respond according to the current service state and the environment information of the display device. For example, when the display device is in the waiting for user state, the environmental information includes time, location, and weather condition, the interactive object displayed on the display device can be driven to make a welcome action and gesture, or make some interesting actions, and output the voice “it's XX o'clock, X (month) X (day), X (year), weather is XX, welcome to XX shopping mall in XX city, I am glad to serve you”. In addition to the general welcome actions, gestures, and voices, the current time, location, and weather condition are also added, which not only provides more information, but also makes the response of interactive objects more complied with interaction needs and more targeted.


By performing user detection on the image of the surrounding of the display device, the interactive object displayed in the display device is driven to respond according to the detection result and the environmental information of the display device, so that the response of the interactive object is more complied with the interaction needs, and the interaction between the user and the interactive object is more real and vivid, thereby the user experience is improved.


In some embodiments, a matching and preset response label may be obtained according to the detection result and the environmental information; then, the interactive object is driven to make a corresponding response according to the response label. The response label may correspond to the driving text of one or more of the action, expression, gesture, or voice of the interactive object. For different detection results and environmental information, corresponding driving text can be obtained according to the response label, so that the interactive object can be driven to output one or more of a corresponding action, an expression, or a voice.


For example, if the current service state is the waiting for user state, and the environment information indicates that the location is Shanghai, the corresponding response label may be that the action is a welcome action, and the voice is “Welcome to Shanghai”.


For another example, if the current service state is the user detected state, the environment information indicates that the time is morning, the user attribute information indicates a female, and the user historical record indicates that the last name is Zhang, the corresponding response label can be: the action is welcome, the voice is “Good morning, madam Zhang, welcome, and I am glad to serve you”.


By configuring corresponding response labels for the combination of different detection results and different environmental information, and using the response labels to drive the interactive object to output one or more of the corresponding actions, expressions, and voices, the interactive object can be driven according to different states of the device and different scenarios to make different responses, so that the responses from the interactive object are more diversified.


In some embodiments, the response label may be input to a trained neural network, and the driving text corresponding to the response label may be output, so as to drive the interactive object to output one or more of the corresponding actions, expressions, or voices.


The neural network may be trained by a sample response label set, wherein the sample response label is annotated with corresponding driving text. After the neural network is trained, the neural network can output corresponding driving text for the output response label, so as to drive the interactive object to output one or more of the corresponding actions, expressions, or voices. Compared with directly searching for the corresponding driving text on the display device or the cloud, the trained neural network can be used to generate the driving text for the response label without a preset driving text, so as to drive the interactive object to make an appropriate response.


In some embodiments, for high-frequency and important scenarios, it can also be optimized through manual configuration. That is, for a combination of the detection result and the environmental information with a higher frequency, the driving text can be manually configured for the corresponding response label. When the scenario appears, the corresponding driving text is automatically called to drive the interactive object to respond, so that the actions and expressions of the interactive object are more natural.


In one embodiment, in response to the display device being in the user detected state, according to the position of the user in the image, position information of the interactive object displayed in the transparent display screen relative to the user is obtained; and the orientation of the interactive object is adjusted according to the position information so that the interactive object faces the user.


By automatically adjusting the body orientation of the interactive object according to the position of the user, the interactive object always faces to the user, such that the interaction between the user and the interactive object is more friendly, and the user's interaction experience is improved.


In some embodiments, the image of the interactive object is acquired by a virtual camera. The virtual camera is a virtual software camera applied to 3D software and used to acquire images, and the interactive object is displayed on the screen through the 3D image acquired by the virtual camera. Therefore, a perspective of the user can be understood as the perspective of the virtual camera in the 3D software, which may lead to a problem that the interactive object cannot have eye contact with the user.


In order to solve the above problem, in at least one embodiment of the present disclosure, while adjusting the body orientation of the interactive object, the line of sight of the interactive object is also kept aligned with the virtual camera. Since the interactive object faces the user during the interaction process, and the line of sight remains aligned with the virtual camera, the user may have an illusion that the interactive object is looking at himself, such that the comfort of the user's interaction with the interactive object is improved.



FIG. 3 is a schematic structural diagram illustrating an interaction apparatus according to at least one embodiment of the present disclosure. As shown in FIG. 3, the apparatus may include: an image obtaining unit 301, a detection unit 302 and a driving unit 303.


The image obtaining unit 301 is configured to obtain an image, acquired by a camera, of a surrounding of a display device; where the display device displays an interactive object through a transparent display screen; the detection unit 302 is configured to detect at least one of a face or a body in the image to obtain a detection result; the driving unit 303 is configured to drive the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.


In some embodiments, the display device displays a reflection of the interactive object on the transparent display screen, or displays the reflection of the interactive object on a base plate.


In some embodiments, the interactive object includes a virtual human with a stereoscopic effect.


In some embodiments, the detection result includes at least a current service state of the display device; the current service state includes any of a waiting for user state, a user leaving state, a user detected state, a service activated state and an in-service state.


In some embodiments, the detection unit 302 is specifically configured to: in response to that the face and the body are not detected at a current time, and the face and the body are not detected within a preset time period before the current time, determine that the current service state is the waiting for user state.


In some embodiments, the detection unit 302 is specifically configured to: in response to that the face and the body are not detected at a current time, and the face and the body are detected within a preset time period before the current time, determine that the current service state is the user leaving state.


In some embodiments, the detection unit 302 is specifically configured to: in response to that at least one of the face or the body is detected at the current time, determine that the current service state of the display device is the user detected state.


In some embodiments, the detection result further includes user attribute information and/or user historical operation information; the apparatus further includes an information acquiring unit, configured to: obtain the user attribute information through the image; and/or, search for the user historical operation information that matches feature information of at least one of the face or the body of the user.


In some embodiments, the apparatus further includes a target determining unit, configured to: in response to that at least two users are detected, obtain feature information of the at least two users; determine a target user from the at least two users according to the feature information of the at least two users. The driving unit 303 is configured to drive the interactive object displayed on the transparent display screen of the display device to respond to the target user.


In some embodiments, the apparatus further includes an environment information acquiring unit for acquiring environment information of the display device, the driving unit 303 is specifically configured to: drive the interactive object displayed on the transparent display screen of the display device to respond according to the detection result and the environment information.


In some embodiments, the environment information includes at least one of a geographic location, an internet protocol (IP) address of the display device, and a weather or date of an area where the display device is located.


In some embodiments, the driving unit 303 is specifically configured to obtain a preset response label matching with the detection result and the environment information; drive the interactive object displayed on the transparent display screen to make a response corresponding to the response label.


In some embodiments, when the driving unit 303 is configured to drive the interactive object displayed on the transparent display screen of the display device to make a corresponding response according to the response label, the driving unit 303 is specifically configured to input the response label to a trained neural network to output driving contents corresponding to the response label, wherein the driving content is used to drive the interactive object to output one or more of corresponding actions, expressions, or voices.


In some embodiments, the apparatus further includes a service activation unit, configured to: in response to determining that the current service state is the user detected state, after driving the interactive object to respond, track the user detected in the image of the surrounding of the display device; in the process of tracking the user, in response to detecting first trigger information output by the user, determine that the display device enters the service activated state, and driving the interactive object to display a service matching the first trigger information.


In some embodiments, the apparatus further includes a service unit, the service unit is configured to: when the display device is in the service activated state, in response to detecting second trigger information output by the user, determine that the display device enters the in-service state, and driving the interactive object to display a service matching the second trigger information.


In some embodiments, the apparatus further includes a direction adjusting unit, configured to: in response to determining that the current service state detected by the detection unit is the user detected state, obtain position information of the user relative to the interactive object displayed on the transparent display screen according to a position of the user in the image; adjust an orientation of the interactive object according to the position information so that the interactive object faces the user.


At least one embodiment of the present disclosure also provides an interaction device. As shown in FIG. 4, the device includes a memory 401 and a processor 402. The memory 401 is used to store computer instructions executable by the processor, and when the instructions are executed, the processor 402 is prompted to implement the method described in any embodiment of the present disclosure.


At least one embodiment of the present disclosure also provides a computer-readable storage medium, having a computer program stored thereon, when the computer program is executed by a processor, the processor implements the interaction method according to any of the foregoing embodiments of the present disclosure.


Those skilled in the art should understand that one or more embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. One or more embodiments of the present disclosure may take the form of a computer program product which is implemented on one or more computer-usable storage media storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer-usable program codes.


The various embodiments in the present disclosure are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, since the apparatus embodiments are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the description of the method embodiments.


The specific embodiments of the present disclosure have been described above. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. The embodiments of the subject and functional operation in the present disclosure can be implemented in the following: a digital electronic circuit, a tangible computer software or firmware, a computer hardware including the structure disclosed in the present disclosure and structural equivalents thereof, or a combination of one or more of the above.


The embodiments of the subject and functional operation in the present disclosure can be implemented in the following: a digital electronic circuit, a tangible computer software or firmware, a computer hardware including the structure disclosed in the present disclosure and structural equivalents thereof, or a combination of one or more of the above. Embodiments of the subject matter of the present disclosure may be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing apparatus or to control the operation of the data processing apparatus. Alternatively or additionally, program instructions may be encoded on an artificially generated propagating signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for execution by a data processing device. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more thereof.


The processes and logic flows in the present disclosure may be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating in accordance with input data and generating an output. The processing and logic flows may also be performed by dedicated logic circuitry, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the apparatus may also be implemented as dedicated logic circuitry.


Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read only memory and/or random access memory. The basic components of the computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks or optical disks, or the like, or the computer will be operatively coupled with such mass storage devices to receive data therefrom or to transfer data thereto, or both. However, a computer does not necessarily have such a device. Furthermore, a computer may be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a portable storage device such as a universal serial bus (USB) flash drive, to name a few.


Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disks or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory may be supplemented by or incorporated into a dedicated logic circuit.


While this disclosure includes numerous specific implementation details, these should not be construed as limiting the scope of the disclosure or the claimed scope, but are primarily used to describe features of some embodiments of the disclosure. Certain features of various embodiments of the present disclosure may also be implemented in combination in a single embodiment. On the other hand, various features in a single embodiment may also be implemented separately in multiple embodiments or in any suitable sub-combination. Moreover, while features may function in certain combinations as described above and even initially so claimed, one or more features from the claimed combination may in some cases be removed from the combination, and the claimed combination may point to a variation of the sub-combination or alternative of the sub-combination.


Similarly, although operations are depicted in a particular order in the figures, this should not be construed as requiring these operations to be performed in the particular order shown or in order, or requiring all of the illustrated operations to be performed to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the above embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product or encapsulated into multiple software products.


Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the acts described in the claims may be performed in different orders and still achieve the desired results. Moreover, the processes depicted in the figures are not necessarily the particular order or order shown to achieve the desired results. In some implementations, multitasking and parallel processing may be advantageous.


The foregoing is merely some embodiments of the present disclosure, and is not intended to limit the present disclosure. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present disclosure should be included within the scope of the present disclosure.

Claims
  • 1. A computer-implemented method for interactions between interactive objects and users, the computer-implemented method comprising: obtaining an image of a surrounding of a display device, wherein the display device displays an interactive object through a transparent display screen;detecting at least one of a face or a body in the image to obtain a detection result; anddriving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.
  • 2. The computer-implemented method of claim 1, wherein the interactive object comprises a virtual human with a stereoscopic effect.
  • 3. The computer-implemented method of claim 1, wherein a reflection of the interactive object is displayed by the display device on one of the transparent display screen or a base plate.
  • 4. The computer-implemented method of claim 1, comprising: in response to determining that at least one user is detected in the image, obtaining feature information of the at least one user;determining a target user from the at least one user according to the feature information of the at least one user; anddriving the interactive object displayed on the transparent display screen of the display device to respond to the target user.
  • 5. The computer-implemented method of claim 1, further comprising: obtaining environment information of the display device, wherein driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result comprises:driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result and the environment information.
  • 6. The computer-implemented method of claim 5, wherein the environment information comprises at least one of: a geographic location of the display device, an IP address of the display device, a weather or date of an area where the display device is located.
  • 7. The computer-implemented method of claim 5, wherein driving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result and the environment information comprises: obtaining a preset response label matching with the detection result and the environment information; anddriving the interactive object displayed on the transparent display screen to make a response corresponding to the preset response label.
  • 8. The method of claim 7, wherein driving the interactive object displayed on the transparent display screen to make the response corresponding to the preset response label comprises: inputting the preset response label to a trained neural network to output at least one driving content corresponding to the preset response label,wherein the at least one driving content is used to drive the interactive object to output one or more of corresponding actions, expressions, or voices.
  • 9. The computer-implemented method of claim 1, wherein the detection result comprises at least one current service state of the display device, and wherein the at least one current service state comprises at least one of a waiting for user state, a user leaving state, a user detected state, a service activated state, or an in-service state.
  • 10. The computer-implemented method of claim 9, wherein detecting the at least one of the face or the body in the image to obtain the detection result comprises one of: in response to determining that the face and the body are not detected at a current time and that the face and the body are not detected within a preset time period before the current time, determining that the current service state is the waiting for user state,in response to determining that the face and the body are not detected at a current time and that the face and the body are detected within a preset time period before the current time, determining that the current service state is the user leaving state, orin response to determining that the at least one of the face or the body is detected at the current time, determining that the current service state is the user detected state.
  • 11. The computer-implemented method of claim 9, wherein the detection result further comprises at least one of user attribute information or user historical operation information, and wherein the computer-implemented method further comprises at least one of: in response to determining that the current service state of the display device is the user detected state, obtaining the user attribute information through the image, orsearching for the user historical operation information that matches feature information of the at least one of the face or the body.
  • 12. The computer-implemented method of claim 9, further comprising: in response to determining that the current service state is the user detected state, after driving the interactive object to respond, tracking a user detected in the image of the surrounding of the display device;during tracking the user, in response to detecting first trigger information output by the user, determining that the display device enters the service activated state; anddriving the interactive object to display a first service matching the first trigger information.
  • 13. The computer-implemented method of claim 12, further comprising: when the display device is in the service activated state, in response to detecting second trigger information output by the user, determining that the display device enters the in-service state; anddriving the interactive object to display a second service matching the second trigger information.
  • 14. The computer-implemented method of claim 9, further comprising: in response to determining that the current service state is the user detected state, obtaining position information of the user relative to the interactive object displayed on the transparent display screen according to a position of the user in the image; andadjusting an orientation of the interactive object according to the position information so that the interactive object faces the user.
  • 15. An interaction device, comprising: at least one processor; andone or more memories coupled to the at least one processor and storing programming instructions for execution by the at least one processor to perform operations for interactions between interactive objects and users, the operations comprising: obtaining an image of a surrounding of a display device, wherein the display device displays an interactive object through a transparent display screen;detecting at least one of a face or a body in the image to obtain a detection result; anddriving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.
  • 16. The interaction device of claim 15, wherein the detection result comprises at least one current service state of the display device, and wherein the at least one current service state comprises at least one of a waiting for user state, a user leaving state, a user detected state, a service activated state, or an in-service state.
  • 17. The interaction device of claim 16, wherein detecting the at least one of the face or the body in the image to obtain the detection result comprises one of: in response to determining that the face and the body are not detected at a current time, and the face and the body are not detected within a preset time period before the current time, determining that the current service state is the waiting for user state,in response to determining that the face and the body are not detected at a current time, and the face and the body are detected within a preset time period before the current time, determining that the current service state is the user leaving state, orin response to determining that the at least one of the face or the body is detected at the current time, determining that the current service state of the display device is the user detected state.
  • 18. The interaction device of claim 16, wherein the detection result further comprises at least one of user attribute information or user historical operation information, and wherein the operations further comprise at least one of: in response to determining that the current service state of the display device is the user detected state, obtaining the user attribute information through the image, orsearching for the user historical operation information that matches feature information of the at least one of the face or the body.
  • 19. The interaction device of claim 15, the operations further comprise: in response to that at least one user is detected, obtaining feature information of the at least one user;determining a target user from the at least one user according to the feature information of the at least one user; anddriving the interactive object displayed on the transparent display screen of the display device to respond to the target user.
  • 20. A non-transitory computer readable storage medium having machine-executable instructions stored thereon that, when executed by at least one processor, cause the at least one processor to perform operations for interactions between interactive objects and users, the operations comprising: obtaining an image of a surrounding of a display device, wherein the display device displays an interactive object through a transparent display screen;detecting at least one of a face or a body in the image to obtain a detection result; anddriving the interactive object displayed on the transparent display screen of the display device to respond according to the detection result.
Priority Claims (1)
Number Date Country Kind
201910804635.X Aug 2019 CN national
CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of international application no. PCT/CN2020/104291, filed on Jul. 24, 2020, which claims a priority of the Chinese patent application no. 201910804635.X filed on Aug. 28, 2019, all of which are incorporated herein by reference in their entireties.

Continuations (1)
Number Date Country
Parent PCT/CN2020/104291 Jul 2020 US
Child 17680837 US