The present principles generally relate to augmented reality (AR) apparatuses and methods, and in particular, to an exemplary augmented reality system in which content characteristics are used to affect the individual viewing experience of the content.
This section is intended to introduce a reader to various aspects of art, which may be related to various aspects of the present principles that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are augmented (or supplemented) by computer-generated sensory inputs such as, e.g., sound, video, graphics, GPS data, and/or other data. It is related to a more general concept called mediated reality, in which a view of reality is modified by a computer. As a result, the technology functions by enhancing one's current perception of reality. Augmented reality is the blending of virtual reality (VR) and real life, as developers can create images within applications that blend in with contents in the real world. With augmented reality devices, users are able to interact with virtual contents in the real world, and are able to distinguish between the two.
One well-known AR device is Google Glass developed by Google X. Google Glass is a wearable computer which has a video camera and a head mounted display in the form of a pair of glasses. In addition, various improvements and apps have also been developed for the Google Glass.
Accordingly, an exemplary method is presented, comprising: acquiring metadata associated with video content to be displayed by an augmented reality (AR) video system, the AR video system including a display screen and at least one pair of AR glasses, the metadata indicating respectively a characteristic of a corresponding scene of the video content; acquiring respective viewer profile data for a plurality of viewers of the video content, the respective viewer profile data indicating respective viewing preference for each of the plurality of viewers of the video content; determining an objectionable scene included in the video content based on the respective viewer profile data and the metadata; creating a modified form of the video content according to the objectionable scene, the video content constituting an unmodified form of the video content; and providing one of the forms of the video content to the display screen and providing the other form of the video content to the at least one pair of AR glasses.
In another exemplary embodiment, an apparatus is presented, comprising: a pair of AR glasses; a display screen; and a processor configured to: acquire metadata associated with video content to be displayed by the AR video system, the metadata indicating respectively a characteristic of a corresponding scene of the video content; acquire respective viewer profile data for a plurality of viewers of the video content, the respective viewer profile data indicating respective viewing preference for each of the plurality of viewers of the video content; determine an objectionable scene included in the video content based on the respective viewer profile data and the metadata; create a modified form of the video content according to the objectionable scene, the video content constituting an unmodified form of the video content; and provide one of the forms of the video content to the display screen and provide the other form of the video content to the at least one pair of AR glasses.
In another exemplary embodiment, a computer program product stored in a non-transitory computer-readable storage medium is presented, comprising acquiring metadata associated with video content to be displayed by an augmented reality (AR) video system, the AR video system including a display screen and at least one pair of AR glasses, the metadata indicating respectively a characteristic of a corresponding scene of the video content; acquiring respective viewer profile data for a plurality of viewers of the video content, the respective viewer profile data indicating respective viewing preference for each of the plurality of viewers of the video content; determining an objectionable scene included in the video content based on the respective viewer profile data and the metadata; creating a modified form of the video content according to the objectionable scene, the video content constituting an unmodified form of the video content; and providing one of the forms of the video content to the display screen and providing the other form of the video content to the at least one pair of AR glasses.
The above-mentioned and other features and advantages of the present principles, and the manner of attaining them, will become more apparent and the invention will be better understood by reference to the following description of embodiments of the present principles taken in conjunction with the accompanying drawings, wherein:
The examples set out herein illustrate exemplary embodiments of the present principles. Such examples are not to be construed as limiting the scope of the invention in any manner.
The present principles determine one or more viewers who are viewing video content in an augmented reality environment. Once a viewer's identity is determined by the AR system, his or her viewer profile data may be determined from the determined identity of the viewer. In addition, respective content metadata for one or more video contents available for viewing on the AR system are also acquired and determined in order to provide respectively a content profile for each content. A comparison of the content profile and the viewer profile may then be performed. The result of the comparison is a list of possibly objectionable scenes and the corresponding possible user selectable actions. One exemplary user selectable actions may be a modification such as, e.g., a replacement or an obscuring of a potentially objectionable scene of the video content.
Therefore, a modified content may be created by replacing or obscuring the objectionable content or scenes of the one or more of the original contents. In one exemplary embodiment, the modification of the content may be performed a period of time before a potentially objectionable content is to be shown to the one or more viewers of the content. In another exemplary embodiment, the modification is performed by a parent or a guardian of at least one of the viewers. In another exemplary embodiment, the modification is performed by a curator of the video content (e.g., a keeper, a custodian and/or an acquirer of the content).
In another embodiment, an exemplary apparatus and method is employed in a system having one or more augmented reality devices such as e.g., one or more pairs of AR glasses. The system may also include a non-AR display screen to display and present the content to be viewed and shared by one or more viewers. Accordingly, different forms of the same content may be presented on the different AR glasses and also on the shared screen.
In another aspect, the present principles provide an advantageous AR system to efficiently distribute different forms of video content depending on the respective viewing profile data of the viewers. In one exemplary embodiment according to the present principles, an exemplary AR system determines whether an objectionable scene would be objectionable to a majority of the viewers. If it is determined that the objectionable scene would be objectionable to the majority of viewers, the system provides the video content in modified form to the display screen to be viewed and shared by the majority of viewers, and provides the video content in unmodified form to the plurality of AR glasses. If on the other hand, it is determined that the objectionable scene would not be objectionable to the majority of viewers, the system provides the video content in unmodified form to the display screen to be viewed and shared by the majority of viewers, and provides the video content in modified form to the plurality of AR glasses. In one embodiment, the exemplary AR system may be deployed in a people transporter such as an airplane, bus, train, or a car, or in a public space such as at a movie theater or stadium, or even in a home theater environment.
The present description illustrates the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are included within its spirit and scope.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present principles. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
Reference in the specification to “one embodiment”, “an embodiment”, “an exemplary embodiment” of the present principles, or as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment”, “in an embodiment”, “in an exemplary embodiment”, or as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as readily apparent by one of ordinary skill in this and related arts, for as many items listed.
An exemplary system 100 in
Various exemplary user devices 160-1 to 160-n in
User devices 160-1 to 160-n shown in
An exemplary user device 160-1 in
Device 160-1 may also comprise a display 191 which is driven by a display driver/bus component 187 under the control of the processor 165 via a display bus 188 as shown in
In additional, the exemplary device 160-1 in
Exemplary device 160-1 also comprises a memory 185 which may represent both a transitory memory such as RAM, and a non-transitory memory such as a ROM, a hard drive, a CD drive, a Blu-ray drive, and/or a flash memory, for processing and storing different files and information as necessary, including computer program products and software (e.g., as represented by flow chart diagrams of
Also as shown in
According to the present principles, AR system 100 may determine one or more viewers who are viewing video content in the augmented reality environment of 100. An exemplary device 160-1 in
Another example of a sensor 181 may be an audio sensor such as a microphone, and/or a visual sensor such as a camera so that voice recognition and/or facial recognition may be used to identify a viewer, as is well known in the art. In another exemplary embodiment according to the present principles, sensor 181 may be a RFID reader for reading a respective RFID tag having the identity of the respective viewer already pre-provisioned. In another example, sensor 181 may represent a monitor for monitoring a respective electronic connection or activity of a person or a person's device in a room or on a network. Such an exemplary person identity sensor may be, e.g., a Wi-Fi router which keeps track of different devices or logins on the network served by the Wi-Fi router, or a server which keeps track of logins to emails or online accounts being serviced by the server. In addition, other exemplary sensors may be location-based sensors such as GPS and/or Wi-Fi location tracking sensors, which may be used in conjunction with e.g., applications commonly found on mobile devices such as the Google Maps app on an Android mobile device that can readily identify the respective locations of the users and the user devices.
Also as shown in
Continuing with
Server 105 also comprises a memory 125 which may represent both a transitory memory such as RAM, and a non-transitory memory such as a ROM, a hard drive, a CD Rom drive, a Blu-ray drive, and/or a flash memory, for processing and storing different files and information as necessary, including computer program products and software (e.g., as represented by flow chart diagrams of
In addition, server 105 is connected to network 150 through a communication interface 120 for communicating with other servers or web sites (not shown) and one or more user devices 160-1 to 160-n, as shown in
According to the present principles, once a viewer's identity is determined by the AR system 100 as described above using sensors (e.g., 181 and/or 182), his or her viewer profile may be determined from the determined identity of the viewer. The viewer profile data of a viewer indicate viewing preferences (including viewing restrictions) of a viewer. The viewer profile may include data such as, e.g., age, political beliefs, religious preferences, sexual orientation, native language, violence tolerance, nudity tolerance, potential content triggers (e.g., PTSD, bullying), demographic information, offensive language, preferences (e.g., actors, directors, lighting), racial conflict, medical issues (e.g., seizures, nausea), and etc.
In one exemplary embodiment according to the present principles, the viewer profile data may be acquired from a pre-entered viewer profile data already provided by each corresponding viewer of the AR viewing system 100. In another embodiment, the viewer profile may be acquired automatically from different sources and websites such as social network profiles (e.g., profiles on LinkedIn, Facebook, Twitter), people information databases (e.g., anywho.com, peoplesearch.com), personal devices (e.g., contact information on mobile phones or wearables), machine learning inferences, browsing history, content consumption history, purchase history, and etc. These viewer profile data may be stored in e.g., memory 125 of server 105 and/or memory 185 of device 160-1 in
In addition, respective content metadata for one or more video contents available for viewing on the AR system 100 are also acquired and determined in order to provide a content profile for each content. Content metadata that are acquired and determined may comprise, e.g., content ratings (e.g., MPAA ratings), cast and crew of content, plot information, genre, offensive scene specific details and/or ratings (e.g., adult content, violence content, other triggers), location information, annotation of where AR-changes are available, emotional profile, and etc. Likewise, these content metadata may be acquired from auxiliary information embedded in the content (as provided by the content and/or the content metadata creator), crowdsourcing (internal and/or external). Accordingly, the content metadata may be gathered automatically by machine learning inferences and Internet sources such as third-party content databases (e.g., Rotten Tomatoes, IMDB); and/or manually provisioned by a person associated with the content and/or metadata provider. These content metadata may also be stored in e.g., memory 125 of server 105 and/or memory 185 of device 160-1 of
According to the present principles, a comparison of the content profile and the viewer profile may be performed by e.g., processor 110 and/or processor 165. The comparison of content profile and viewer profile may be performed via e.g., a hard threshold based on the viewer profile data. That is, for example, if the viewer's age is less than 10, and therefore, the content with adult or nudity scenes will be deemed objectionable to the viewer. The comparison may also be done using a soft threshold by machine learning inferences to determine viewing patterns.
Accordingly, this comparison determines whether the content is appropriate to a viewer and whether content modification should be first performed by e.g., a parent or a guardian of the viewer, as to be further described below. Therefore, this comparison may be performed by a content provider 105, the viewer, or a third-party (e.g., parent/guardian or an external organization). This comparison may be done in real-time or off-line. The result of the comparison is a list of possibly objectionable scenes and the corresponding possible user selectable actions for the video content.
In one embodiment, the content server 105 is aware of when the objectionable content will be presented to the viewers. It can then detect that a pre-screening by a parent/guardian/curator is required using the viewer's user profile information. The content provider will then present a preview of the questionable scenes. For example, when an age/gender/race inappropriate person is watching a particular content by himself or herself with no parent/guardian/curator present, the streaming service 105 would notify the parent/guardian/curator with a representative list of objectionable scenes and a corresponding list of actions that could be applied to these scenes. In another embodiment, one or more of the above functions may be performed by the user device 160-1 in conjunction with the AR glasses 125-1, as to be described further below.
The representative list of objectionable scenes is created from the whole list of objectionable scenes by clustering the inappropriate scenes into groups based on a similarity measure. One way to do this clustering is by using the well-known clustering algorithm such as the K-means algorithm. Of course, other well-known clustering algorithms may also be used to make the groupings as readily appreciated by one skilled in the art.
As shown in
The representative scene for each group may be selected, e.g., based on the objectionable scene which is the closest to the centroid of the corresponding group. Thereafter, for example, the video clip of the representative scene will be displayed to represent the respective clustered group, as illustrated in elements 662 and 664 of
One example of a machine learning aspect of the present principles is by using computer algorithms to automatically determine e.g., the nudity and violent scenes of the video content and their respective nudity and violent ratings. Various well-known algorithms may be used to provide these functions and capabilities. For example, nudity scene detection and a corresponding rating for a video scene may be determined by using various skin detection techniques, such as those described in and referenced by, e.g., in H. Zheng, H. Liu, and M. Daoudi, “Blocking objectionable images: adult images and harmful symbols,” in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), June 2004, pp. 1223-1226. In addition, many other nudity detection algorithms are also described may be used such as, e.g., described in and referenced by Lopes, A., Avila, S., Peixoto, A., Oliveira, R., and de A. Ara'ujo, A. (2009), “A bag-of-features approach based on hue-sift descriptor for nude detection”, European Signal Processing Conference (EUSIPCO), pages 1552-1556.
Likewise, various violent scene detection techniques have also been proposed and may be used to automatically determine violent scenes in video content and provide associated ratings in accordance with the present principles, as described, e.g., in C. H. Demarty, B. Ionescu, Y. G. Jiang, and C. Penet, Benchmarking Violent Scenes Detection in movies, Proceedings of the 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI), 2014. For example, violent scene detection and ratings may be determined by the occurrence of bloody images, facial expressions, and motion information, as described in Liang-Hua Chen, et al., “Violence Detection in Movies”, Computer Graphics, Imaging and Visualization (CGIV), 2011 Eighth International Conference on Computer Graphics, Imaging & Visualization. As the authors of the above article noted, the experimental results show that the proposed approach works reasonably well in detecting most of the violent scenes in the content.
In one embodiment according to the present principles, content provider 105 may provide the content which already has the associated content metadata that define precisely which plurality of frames constitute one scene of the content. The provided metadata also include a corresponding description in the metadata to describe the characteristics of the scene. Such characteristics may include, for example, violence and nudity ratings from 1 to 5. In one exemplary embodiment, such characterization data may be provisioned by a content screener manually going through the content and delineating each scene of interest for the entire content.
In another exemplary embodiment, a collection of descriptive words may be collected for each scene from the content metadata and a similarity measure of the collection of words may be a distance measurement between the respective collections of the words for scenes. This information is then used to cluster the scenes together (for example, nudity, violence, horror groups) using the well-known K-means algorithm as described before.
Thereafter, the notification being provided may be a representative list of the clustered groups of objectionable scenes along with corresponding actions which may be performed by a user (e.g., editing actions such as, e.g., remove, replace, or approve). In another alternative embodiment, a default set of actions may be automatically provided. The default set of actions may be created based one or more filters (such as, e.g., children friendly, race friendly, religion friendly images or scenes replacements) created beforehand. Therefore, if no action is taken by the user within a certain time period, a default filter may be applied accordingly.
The modification of the video content may be an overlay of a replacement content over the original content to be shown on a display device. For this modification to be performed, each scene of the video content is defined and associated with an appropriate content profile, as described above. In addition, each element of a scene may be associated with such a profile. For example, each area of a nudity scene may be defined to detail the spatial characteristics of the area. This may be done via coordinates, shape map, polygon definition, etc., as well known in the art.
In addition, the AR glasses 125-1 also includes a communication interface 260 which is connected to the external device interface 183 of the user device 160-1 of
The user device 160-1 in the embodiment of
In the example depicted in
In the embodiment of
In one embodiment as shown in
As indicated,
Furthermore, the head mounted AR glasses 125-1 may be one of many alternatives that embed or allow the user to see a private screen through specialty lenses and may be a part of a head-mounted display (HMD), a headset, a harness, a helmet for augmented reality displays, or other wearable and non-wearable arrangements as may be appreciated by those skilled in the art. In the alternative, none of the components may be connected physically or a subset of them may be physically connected selectively as may be appreciated by those skilled in the art.
Referring back to the embodiment of
As determined at step 360 of
At step 450 of
Each of the groups of the objectionable scenes 612 and 614 also has a corresponding video clip or a graphical image (as represented by elements 662 and 664) to provide efficient review for the objectionable content by the user 615. As described previously, a representative scene may be selected, e.g., based on the objectionable scene which is the closest to the centroid of the corresponding group as discussed previously in connection with
In addition, the user interface screen 600 also provides one or more of exemplary user selectable menu choices 651-660 for the list of the objectionable scenes 610. Therefore, the user 615 of the AR glasses 125-1 may accept or reject each of the one or more representative scenes being displayed on the AR glasses 125-1 by moving a selection icon 680 on the user interface screen 600 as shown in
For example, a user may select “Yes” 652 for the “Replace all scenes” user selection icon 651 (illustrated in shaded background), and in response, all of the 6 scenes in the group of the adult content 612 will be replaced with a preselected non-objectionable scene. Of course, other user selectable edits are available by selecting the other user selection choices shown in
In addition,
At step 1130 of
In another exemplary embodiment as shown at step 1170, the objectionable scene of the video content is provided to the pair of AR glasses for a user of the AR glasses a period of time before the objectionable scene is to be shown to the at least one of viewers of the video content. Therefore the objectionable scene may be modified by the user before the modified content is shown to the other viewers. As described previously, the user modifying the content may be one of a parent or guardian of at least one of the viewers or a curator of the video content. Also as described before, the modification may be by replacing the objectionable scene with an un-objectionable scene, or by obscuring the objectionable scene.
At step 1230 of
Accordingly, the present AR video system is able to efficiently provide the appropriate form of the video content to a shared display screen to be viewed and shared by the majority of the viewers of the AR video system. Therefore, the present principles provide an AR video system which is well-suited to be deployed in a people transporter such as an airplane, bus, train, or a car, or in a public space such as at a movie theater or stadium, or even in a home theater environment where multiple viewers may enjoy a shared viewing experience even though some scenes of the shared content may not be preferred or appropriate for all of the viewers.
Also, in certain video editing applications in accordance with the present principles, virtual reality (VR) glasses may also be used to provide a private content editing experience for a user. Examples of some well-known VR glasses include e.g., Oculus Rift (see www.oculus.com), PlayStation VR (from Sony), Gear VR (from Samsung), and etc.
The foregoing has provided by way of exemplary embodiments and non-limiting examples a description of the method and systems contemplated by the inventors. It is clear that various modifications and adaptations may become apparent to those skilled in the art in view of the description. However, such various modifications and adaptations fall within the scope of the teachings of the various embodiments described above.
While several embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the functions and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the present embodiments. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the teachings herein is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereof, the embodiments disclosed may be practiced otherwise than as specifically described and claimed. The present embodiments are directed to each individual feature, system, article, material and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials and/or methods, if such features, systems, articles, materials and/or methods are not mutually inconsistent, is included within the scope of the present embodiment.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2016/081287 | 12/15/2016 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62268650 | Dec 2015 | US | |
62268652 | Dec 2015 | US | |
62268656 | Dec 2015 | US |