The present disclosure generally relates to making the metaverse and other virtual environments more accessible to visually-impaired users and to visually-impaired and hearing-impaired users.
The metaverse refers to a virtual universe or digital realm where people can interact with each other and their surroundings in a shared online space. The metaverse is often described as an immersive and interconnected virtual reality space. The metaverse is envisioned as a comprehensive and persistent virtual environment that can be accessed from different devices and platforms. It is not limited to a single virtual world but encompasses multiple interconnected virtual worlds, virtual reality experiences. In the metaverse, users can explore virtual environments, interact with objects, engage in activities. The metaverse aims to provide a sense of presence and immersion, allowing users to feel as if they are physically present in the virtual space. It may incorporate technologies such as virtual reality (VR), augmented reality (AR), mixed reality (MR), haptic technology, and advanced graphics rendering to enhance the sensory experience.
In some implementations, a metaverse environment reader receives a first scene of a metaverse. The metaverse environment reader performs semantic segmentation and object detection steps to identify a plurality of objects in the first scene. Next, the metaverse environment reader indexes the plurality of objects based on the semantic segmentation and object detection steps. In an example, indexing the plurality of objects based on the segmenting of the first scene involves determining an order of importance of the plurality of objects in the first scene based at least on a location and a size of each object of the plurality of objects, and then sorting the plurality of objects of the first scene based on the determined order of importance. Then, the metaverse environment reader creates a description of the first scene based on the indexing of the first scene, where the description is an audio, haptic, or braille representation of the first scene. Next, the metaverse environment reader generates one or more electrical signals which include an encoding of the description of the first scene. Then, the metaverse environment reader conveys the one or more electrical signals encoded with the description of the first scene to a user device to be presented on a user interface.
Non-transitory computer program products (i.e., physically embodied computer program products) are also described that store instructions, which when executed by one or more data processors of one or more computing systems, causes at least one data processor to perform operations herein. Similarly, computer systems are also described that may include one or more data processors and memory coupled to the one or more data processors. The memory may temporarily or permanently store instructions that cause at least one processor to perform one or more of the operations described herein. In addition, methods can be implemented by one or more data processors either within a single computing system or distributed among two or more computing systems. Such computing systems can be connected and can exchange data and/or commands or other instructions or the like via one or more connections, including a connection over a network (e.g., the Internet, a wireless wide area network, a local area network, a wide area network, a wired network, or the like), via a direct connection between one or more of the multiple computing systems, etc.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
The metaverse is a hypothetical iteration of the Internet as a single, universal, and immersive virtual world. Users typically connect to the metaverse via virtual reality (VR) or augmented reality (AR) headsets. The metaverse can also refer to a network of three-dimensional (3D) virtual worlds that focus on social and economic connections. Components of metaverse technology have been deployed within online video games. Deployments of the metaverse often involve generating a persistence 3D world with the user represented as an avatar. It is noted that the “metaverse” may also be referred to more generally as a “virtual environment”. Also, the descriptions of techniques for enhancing and making the metaverse experience more accessible may also be implemented to enhance and make video games, online games, and/or other types of video and/or audio content more accessible.
The current state of inaccessible metaverse platforms and experiences is significantly excluding and leaving behind individuals who are blind and individuals who are blind and deaf. The lack of accessibility in the metaverse is impacting these specific populations in multiple ways. A first impact relates to the visual exclusion of individuals from the metaverse. The visual nature of the metaverse creates a significant barrier for individuals who are blind or blind and deaf. Inaccessible interfaces prevent such individuals from accessing and navigating virtual environments. Without the accessibility support, these individuals are unable to fully engage with the metaverse and participate in virtual activities.
A second impact relates to the navigational difficulties for blind individuals or blind and deaf individuals. This is caused by inaccessible navigation interfaces. For example, the lack of haptic cues poses challenges for individuals who are blind or blind and deaf. These individuals rely on assistive technologies and accessible navigation features to explore virtual environments, move within the metaverse, and interact with virtual objects. Without these accessibility features, their ability to navigate and engage with the metaverse is severely limited. The inaccessibility of the metaverse denies individuals who are blind or blind and deaf the opportunity to fully participate within virtual environments. Accordingly, solutions to the above problems are presented herein.
Before addressing the solutions to the above problems, a general description of the metaverse will be presented. The metaverse consists of various scenes and objects that users can encounter and interact with. A metaverse scene typically represents a virtual environment, such as a living room, a park, a city street, a train station, a shopping mall, and so on. These scenes are often highly detailed and immersive, designed to provide a rich and engaging experience to users. Metaverse objects are the digital entities that populate these virtual environments. The metaverse objects can be anything from simple geometric shapes to complex 3D models like a television, coffee table, and a couch in a living room. These objects have various properties and behaviors, allowing users to interact with them in different ways. For example, people (avatars) can walk around the living room, the television can be turned on or off, and so on.
In addition to static objects, the metaverse can also include dynamic interactive objects such as moving vehicles (e.g., train, car) which have the ability to move. Dynamic or interactive objects may have programmed behaviors, enabling them to react to user actions. Overall, the metaverse scenes encompass the virtual environments users explore, while metaverse objects are the digital entities that populate these environments. Together, they form the building blocks of the metaverse, providing a metaverse platform for immersive experiences, social interactions, and creative expression.
Referring now to
The one or more client devices 110, the one or more servers 130, and the one or more servers 140 may be communicatively coupled via a network 120. The one or more client devices 110 may include processor-based devices including, for example, a mobile device, a wearable apparatus, a virtual reality (VR) or augmented reality (AR) headset, a personal computer, a workstation, an Internet-of-Things (IoT) appliance, and/or the like. The network 120 may be a wired network and/or wireless network including, for example, a public land mobile network (PLMN), a local area network (LAN), a virtual local area network (VLAN), a wide area network (WAN), the Internet, and/or the like.
The one or more servers 130 and 140 may include any number of processing devices, memory devices, and/or the like for executing software applications. Server 130 and/or server 140 may be part of a private computing platform or part of a public cloud platform, depending on the implementation. The cloud platform may include resources, such as at least one computer (e.g., a server), data storage, and a network (including network equipment) that couples the computer(s) and storage. The cloud platform may also include other resources, such as operating systems, hypervisors, and/or other resources, to virtualize physical resources (e.g., via virtual machines) and provide deployment (e.g., via containers) of applications (which provide services, for example, on the cloud platform, and other resources). In the case of a “public” cloud platform, the services may be provided on-demand to a client, or tenant, via the Internet. For example, the resources at the public cloud platform may be operated and/or owned by a cloud service provider (e.g., Amazon Web Services, Azure), such that the physical resources at the cloud service provider can be shared by a plurality of tenants. Alternatively, or additionally, the cloud platform may be a “private” cloud platform, in which case the resources of the cloud platform may be hosted on an entity's own private servers (e.g., dedicated corporate servers operated and/or owned by the entity). Alternatively, or additionally, the cloud platform may be considered a “hybrid” cloud platform, which includes a combination of on-premises resources as well as resources hosted by a public or private cloud platform. For example, a hybrid cloud service may include web servers running in a public cloud while application servers and/or databases are hosted on premise (e.g., at an area controlled or operated by the entity, such as a corporate entity).
Server 140 includes metaverse engine 150 for generating a metaverse including various immersive scenes for one or more users of client device 110 to interact with. Metaverse engine 150 may also be referred to as metaverse platform 150. Server 130 includes metaverse environment reader and navigation assistant 135A which reads and analyzes the metaverse generated by metaverse engine 150. Based on the reading and analysis, metaverse environment reader and navigation assistant 135A generates an accessible version of the metaverse for one or more users of client device 110. The accessible version of the metaverse is intended to enable visually-impaired (i.e., blind) users or visually-impaired and hearing-impaired (i.e., deaf) users to enjoy a more interactive experience of the metaverse than would be attainable from the original version of the metaverse generated by metaverse engine 150. It is noted that the terms “blind” and “visually-impaired” may be used interchangeably herein. Similarly, the terms “deaf” and “hearing-impaired” may be used interchangeably herein.
Turning now to
Once all of the virtual objects of a scene are labelled, an object manager 230 generates an ordering of objects constituent in the scene. In this step, a description of the scene is generated by the object manager 230 in an accurate and coherent manner to provide an immersive and interconnected experience using the labels from various virtual objects. For example, when generating a description of a living room scene in the metaverse, if the generated description mentions the couch, the coffee table, and then the television, it creates a mental image where the couch and coffee table are positioned in front of the television. However, if the generated description reverses the order and mentions the television first, followed by the couch and then the coffee table, it conveys a different arrangement where the television is positioned in front of the couch.
Similarly, the ordering of objects can influence the user's attention and focus within the scene. If the generated description of a scene starts with the most prominent or central object, followed by the secondary objects, it helps guide the user's visual and mental exploration of the virtual environment. This can be particularly relevant when describing complex scenes with multiple objects or when emphasizing a specific feature or interaction. Furthermore, the ordering of objects can also affect the narrative or storytelling within the metaverse scene. By strategically arranging the objects and describing them in a specific sequence, object manager 230 can create a sense of progression or build-up in the user's experience. For example, describing a path leading to a grand castle gate, followed by a description of the castle's towering walls and finally the magnificent courtyard, sets the stage for a narrative journey through the scene.
In an example, the location of each object in a given scene is maintained by the developer of the scene. In this example, the order of objects is determined on the fly with respect to the avatar via mathematically calculating the distance of objects from the avatar based on their locations. In another example, object manager 230 determines the ordering of various objects in a given metaverse scene and generates a coherent and immersive description of the scene using the labels of the various metaverse objects in the scene. The ordering of various objects in metaverse can be determined in multiple ways such as: (1) the ordering of each object is manually maintained by the developer of the scene. In this example, the ordering is like maintaining a tab index order in graphical user interface (GUI) screens. Also, (2) one or more image processing tools may be used to identify the locations of various objects in the scene and then this information is used to determine the order of each object in the scene. In an example, the image processing may include a semantic segmentation step and an object detection step. The semantic segmentation step may involve segmenting a scene into various object regions, such as buildings, trees, vehicles, or the like. Segmenting the scene makes it possible to determine the size, location, and prominence of objects in the scene. Next, object manager 230 may determine the order of objects based on their size, location, or prominence in the scene. The prominence of an object may be determined based on the size and location of the object, such as if a first object is in the center of the scene, it will be considered more prominent than a second object on the periphery of the scene. Similarly, if a third object is larger than a fourth object, then the third object will be considered to be more prominent than the fourth object. Algorithms such as U-Net, Fully Convolutional Networks (FCN), Mask R-CNN (region-based convolutional neural network), or others may be used for the semantic segmentation step. The object detection step may use algorithms such as Faster R-CNN, YOLO (You Only Look Once), SSD (Single Shot MultiBox Detector), or others. These algorithms use deep learning techniques to detect objects by bounding boxes and provide their approximate positions.
Once the order of the objects is determined, object manager 230 may use various natural language processing (NLP) deep learning techniques like Recurrent Neural Networks (RNNs), Transformer models, Generative Pre-trained Transformer (GPT) models, or other models to generate an accurate and coherent description of the scene. Similarly, to help the user navigate across the metaverse, navigation assistance module 240 is part of metaverse platform 220. The navigation assistance module 240 includes a sub-module to obtain the user's (i.e., avatar's) current direction orientation using the metaverse application programming interfaces (APIs).
In the metaverse, proximity alerts, direction guidance, and point of interest descriptions work together to enhance the user's navigation, awareness, and understanding of the virtual environment. Proximity alerts notify users when they are near to certain objects within the metaverse scene. These alerts can be triggered based on predefined distances or spatial boundaries. For example, if a user approaches an important object or enters a specific area, a proximity alert may be generated to draw their attention. Direction guidance provides users with navigational assistance within the metaverse. Direction guidance helps users find their way to specific objects of interest. Direction guidance may be based on pre-defined paths, user-set waypoints, or a combination of both. Direction guidance helps users navigate complex scenes, follow specific routes, or locate points of interest within the metaverse. Point of interest (POI) descriptions provide users with detailed information and context about significant objects within the metaverse. When a user approaches a POI, a description or additional details about the object can be provided.
To enable these proximity alerts, direction guidance, and POI features, the objects within the metaverse scene may have additional properties or metadata assigned to them. For example, objects may be tagged with proximity thresholds and descriptive information that is used by the metaverse platform 220 to generate proximity alerts, direction guidance, and point of interest descriptions. By enhancing the objects with these properties, the metaverse platform 220 may intelligently detect user proximity, calculate navigation paths, and provide relevant information which is converted to the format which the user will understand to enhance the user experience.
With the various enhancements in the metaverse platform 220, the metaverse environment reader 210 will receive the textual cues for metaverse scenes, metaverse navigation, and various spatial audio outputs via a platform-dependent metaverse integration component. A text analyzer module of metaverse environment reader 210 is coupled to the metaverse integration component via accessibility APIs which are platform agnostic. The text analyzer module will analyze the text for composing utterance of words. After text analysis, a linguistic analysis process may be performed. In an example, the linguistic analysis process may have three steps. These three steps may include phasing of text (e.g., dividing a large word or phrase into phases), intonation (e.g., the fall and rise of the voice in speaking), and lastly duration for speaking. Additionally, various preferences may be applied to impart a particular accent or dialect for a given language (e.g., American English, British English) or for particular regions of a country. After the linguistic analysis process, the sound waves may be generated and provided to the user via a user interaction interface (e.g., headphones, speakers). Also, additional preferences that may be applied include volume selections for increasing or decreasing the volume of the audio output. For a blind and deaf user, the analyzed text may be converted into an output of a haptic device such as a refreshable braille display.
It is noted that metaverse environment reader 210 may be implemented using any suitable combination of program instructions, firmware, and/or circuitry. In an example, metaverse environment reader 210 is implemented by a processing device (e.g., central processing unit (CPU), graphics processing unit (GPU)) executing program instructions. In another example, metaverse environment reader 210 is implemented by a programmable logic device (e.g., field programmable gate array (FPGA)). In a further example, metaverse environment reader 210 is implemented by dedicated circuitry (e.g., application specific integrated circuit (ASIC)). In a still further example, metaverse environment reader 210 is implemented by any combination of the above mechanisms and/or with other types of circuitry and executable instructions. Metaverse environment reader 210 may be implemented as part of a server (e.g., server 130 of
Referring now to
The scene analysis module 330 may have sub-modules for semantic segmentation and object detection. The semantic segmentation sub-module may segment a scene into various object regions, such as buildings, trees, vehicles, and so on. The semantic segmentation sub-module may determine the order of objects based on their size, location, or prominence in the scene. The semantic segmentation sub-module may use algorithms such as U-Net, FCN (Fully Convolutional Networks), and Mask R-CNN. The object detection sub-module may use algorithms such as Faster R-CNN, YOLO (You Only Look Once), or SSD (Single Shot MultiBox Detector), and the like. These algorithms use deep learning techniques to detect objects by bounding boxes and provide their approximate positions.
After the scene analysis process is performed, various NLP deep learning techniques like Recurrent Neural Networks (RNNs), Transformer models, or GPT (Generative Pre-trained Transformer) models may be used to generate an accurate and coherent description of the scene. Similarly, the spatial audio processing module 350 may have a sub-module for extracting features from the spatial sound and another sub-module to analyze these extracted features in the context of the current metaverse scene. Once the features have been extracted from the spatial sound and analyzed, a text generation module will generate text which is presented to the user in a format which the user will be able to understand.
In an example, the extraction of various features from the spatial audio signal is a multi-step process. These steps may include the following: (1) Pre-process the spatial audio signal for removing noise and normalizing the volume. Various algorithms like spectral subtraction, Wiener filtering, or adaptive filtering may be used for noise reduction. Peak normalization or dynamic range compression algorithms may be used for volume normalization. (2) Algorithms like Short-Time Fourier Transform (STFT) and Mel-frequency cepstral coefficients (MFCCs) may be used to extract features. These extracted features may be used against a pre-trained classifications model or sound event detection algorithms (built using CNNs, Recurrent Neural Networks (RNNs), long short-term memory (LSTM), and/or Gaussian Mixture Model (GMM) algorithms) to recognize the object from the sound.
Once spatial audio processor module 350 has extracted the various features from the audio signal, the features may be analysed in the context of the metaverse virtual scene to provide a more accurate and informative text description. This step may consider factors like the location, time of day, or other environmental cues that could influence the interpretation of the sound. Once the spatial audio signal is processed and analysed in context of the metaverse scene, the corresponding text description may be generated using deep learning techniques like RNNs or LSTM or Transformer-based models (e.g., generative pre-trained transformer (GPT), bidirectional encoder representations from transformers (BERT)).
After the scene analysis and spatial audio processing steps, the text analyzer module will analyze the text description for composing an utterance of words. After text analysis is performed by the text analyzer module, a linguistic analysis process may be performed. The linguistic analysis process may have three steps. First, phasing of text (e.g., dividing a large word or phrase into phases), second, intonation (e.g., the rise and fall of the voice in speaking), and third, duration for speaking.
After the linguistic analysis process is complete, the sound waves may be generated. For a blind and deaf user, the analyzed text may be converted into an output of haptic device like a refreshable braille display. In addition to describing the scene and the spatial audio in text to users, the navigation assistance module 340 may work in tandem with scene analyzer module 330 to provide proximity alerts (based on configured proximity thresholds), direction guidance, and detailed point of interest description of any identified significant object.
Turning now to
Referring now to
The metaverse environment reader may utilize any number of deep learning engines to analyze scene 500. The first deep learning engine used by the metaverse environment reader may have been pre-trained with some number of images of different environments to be able to identify scene 500 as a living room. Then, after scene 500 has been classified as a living room, the metaverse environment reader may use a second deep learning engine that has been pre-trained specifically for a living room environment. The second deep learning engine may have been pre-trained with a large number of images of living rooms to enable the second deep learning engine to identify the objects that are typically present in a living room environment. In other examples, the metaverse environment reader may use more than two different types of deep learning engines depending on the type of metaverse scene being analyzed. Alternatively, two or more deep learning engines may be fused together into a single deep learning engine to perform multiple functions.
The metaverse environment reader may also include a navigation assistance module that tracks, based on data received from the user's headset, the location and orientation of user 505 in relation to the objects of scene 500. If the user 505 stands up from chair 507 and starts to move around the living room, then the navigation assistance module may generate proximity alerts as the distance between user 505 and the various objects falls below corresponding thresholds for these objects. Also, the description of the living room may change based on which direction user 505 is facing and based on which objects are prominent when viewed from the current orientation of user 505.
Turning now to
In an example, the metaverse environment reader may utilize a deep learning engine that has been trained with a large number (e.g., hundreds, thousands) of images of different famous landmarks from around the world. This trained deep learning engine may identify the environment of scene 600 as Paris, France based on a recognition of the Eiffel Tower. Then, another deep learning engine may be utilized, with this other deep learning engine trained specifically using a relatively large number of images of Paris, France. Other deep learning engines may also be utilized for specific neighborhoods within Paris. Also, the time of year (e.g., winter, spring, summer, autumn) of scene 600 may be determined using a specifically trained deep learning engine, and then another deep learning engine trained for the particular time of year may be utilized for another round of analysis of scene 600. This approach may fine-tune the analysis of scene 600 based on whether there is snow on the ground, leaves on the trees, and so on.
Additionally, fine-tuning based on the type of weather may be utilized by the metaverse environment reader, with an identification of the time of day (e.g., daytime, nightime), type of weather (e.g., rain, overcast, sunny), and other factors being identified by a first deep learning engine and then subsequent analysis being performed by a second deep learning engine, with the second deep learning engine selected based on the results generated by the first deep learning engine. Additional deep learning engines may be utilized, with the results of the second deep learning engine being utilized to select a third deep learning engine, results from the third deep learning engine being utilized to select a fourth deep learning engine, and so on.
When analyzing and segmenting scene 600, the metaverse environment reader may determine the order of importance of the objects in scene 600. For example, the metaverse environment reader may utilize a specially trained deep learning engine to determine the most prominent object in scene 600 and then the other less prominent objects in their order of importance in scene 600. In scene 600, deep learning engine may determine that Eiffel Tower 620 is the most prominent object in scene 600 based on its size, based on an environmental context, and based on one or more other factors. When a description of scene 600 is generated to be presented to the visually impaired user or to the visually impaired and hearing impaired user, the metaverse environment reader may begin with a description of the Eiffel Tower 620 since the deep learning engine has determined that Eiffel Tower 620 is the most prominent object in scene 600. The description generated and presented to the user may be based on the order of importance determined by the trained deep learning engine.
Referring now to
Next, the metaverse environment reader determines an order of importance of the plurality of objects in the scene based at least on the environment context, a location and a size of each object, and a distance of each object to an avatar (block 715). In an example, the metaverse environment reader uses a deep learning engine to determine the environment context and to determine an order of importance of the plurality of objects in the scene. Then, the metaverse environment reader sorts the plurality of objects of the scene based on the determined order of importance (block 720). The sorted order of objects in the scene corresponds to the order of importance. As used herein, the term “indexing” may be defined as determining an order of importance of the plurality of objects in a scene and sorting the plurality of objects based on the determined order of importance. Accordingly, performing blocks 715 and 720 may be referred to as indexing the plurality of objects in the scene.
In an example, there could be different deep learning algorithms that determine the order based on different parameters. For example, a first deep learning engine may employ an algorithm that is based on the size of objects, a second deep learning engine may be based on the location of objects, a third deep learning engine may be based on the environment context, and so on. Next, the metaverse environment reader creates a description of the scene based on the sorting of the objects of the scene, where the description is an audio, haptic, and/or braille representation of the scene (block 725). The description of the scene may include a listing of objects in a given order based on the order of importance determined from the sorting of the objects of the scene. Then, the metaverse environment reader causes the description of the scene to be generated on a user device to be presented to a user (block 730). After block 730, method 700 may end. It is noted that method 700 may be repeated for each separate scene (e.g., first scene, second scene, third scene) of the metaverse.
Turning now to
In some implementations, the current subject matter may be implemented in a system 900, as shown in
Turning now to
As shown in
In an example, point of interest description 1075 is generated to describe metadata for objects in a scene, with the metadata not being based on the visual information of the scene or on the visual information of the objects. In this example, point of interest description 1075 is generated for scenes with pre-labeled objects. For example, the point of interest description may include information like “the castle was built in the year 1223 and was conquered in 1305 by the Spanish conqueror.”. In other examples, the point of interest description may include information like “it is very cold in this scene” or “the drink smells like coffee”. In an example, scene reader 1055 may identify a contextual setting of a scene, and scene reader 1055 may generate an abstract term for the scene based on the identified contextual setting. For example, scene reader 1055 may identify a scene as a playground, a theater, a train station, a forest, and so on. The identification of the scene with an abstract term is a way to make a visually impaired person understand the environment they are in. Listing the objects in the environment based on their importance corresponds to a detailed description of the scene. In an example, object reader 1050 identifies and describes individual objects of a scene. For example, object reader 1050 may identify and describe an individual object with a description such as “this is a wooden chair with armrests”. The description may come from a pre-defined label, or the description may come from a generated label using visual information associated with the object.
It should be understood that the structure and arrangement of components of environment reader 1040 presented in
Referring now to
Then, the metaverse environment reader determines an order of the identified objects based on their size, location, and prominence in the scene (block 1125). In an example, the metaverse environment reader uses a machine learning engine to determines the order of the identified objects. The machine learning engine may use any of various machine learning algorithms such as U-Net, FCN, and Mask R-CNN. Next, the metaverse environment reader uses a NLP deep learning engine to generate a scene description using the labels of the individual objects based on the determined order (block 1130).
Then, the metaverse environment reader generates one or more electrical signals which include an encoding of the scene description (block 1135). Next, the metaverse environment reader conveys the electrical signals encoded with the scene description to a user device to be presented to a user via a user interface of the user device (block 1140). The scene description may be an audio, haptic, and/or braille representation of the scene. After block 1140, method 1100 may end. It is noted that method 1100 may be repeated for each separate of the metaverse.
Turning now to
Next, an environment reader (e.g., metaverse environment reader 210) receives the labeled metaverse scene, the determined order of objects, textual cues, metaverse navigation outputs, spatial audio outputs, and textual description of sound from the metaverse platform via a platform-dependent metaverse integration component (block 1230). A test analyzer module of the metaverse environment reader analyzes the received text for composing utterance of words and a linguistic analysis process is performed (block 1235). Then, sound waves are generated and provided to a blind user or text is generated and provided, via a haptic device, to a blind and deaf user for the metaverse scene (block 1240). After block 1240, method 1200 may end.
The systems and methods disclosed herein can be embodied in various forms including, for example, a data processor, such as a computer that also includes a database, digital electronic circuitry, firmware, software, or in combinations of them. Moreover, the above-noted features and other aspects and principles of the present disclosed implementations can be implemented in various environments. Such environments and related applications can be specially constructed for performing the various processes and operations according to the disclosed implementations or they can include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and can be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines can be used with programs written in accordance with teachings of the disclosed implementations, or it can be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.
Although ordinal numbers such as first, second and the like can, in some situations, relate to an order; as used in a document ordinal numbers do not necessarily imply an order. For example, ordinal numbers can be merely used to distinguish one item from another. For example, to distinguish a first event from a second event, but need not imply any chronological ordering or a fixed reference system (such that a first event in one paragraph of the description can be different from a first event in another paragraph of the description).
The foregoing description is intended to illustrate but not to limit the scope of the invention, which is defined by the scope of the appended claims. Other implementations are within the scope of the following claims.
These computer programs, which can also be referred to programs, software, software applications, applications, components, or code, include program instructions (i.e., machine instructions) for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives program instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such program instructions non-transitorily, such as for example as would a non-transient solid state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as would a processor cache or other random access memory associated with one or more physical processor cores.
To provide for interaction with a user, the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
The subject matter described herein can be implemented in a computing system that includes a back-end component, such as for example one or more data servers, or that includes a middleware component, such as for example one or more application servers, or that includes a front-end component, such as for example one or more client computers having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described herein, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, such as for example a communication network. Examples of communication networks include, but are not limited to, a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
The computing system can include clients and servers. A client and server are generally, but not exclusively, remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
In the descriptions above and in the claims, phrases such as “at least one of” or “one or more of” may occur followed by a conjunctive list of elements or features. The term “and/or” may also occur in a list of two or more elements or features. Unless otherwise implicitly or explicitly contradicted by the context in which it used, such a phrase is intended to mean any of the listed elements or features individually or any of the recited elements or features in combination with any of the other recited elements or features. For example, the phrases “at least one of A and B;” “one or more of A and B;” and “A and/or B” are each intended to mean “A alone, B alone, or A and B together.” A similar interpretation is also intended for lists including three or more items. For example, the phrases “at least one of A, B, and C;” “one or more of A, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, B alone, C alone, A and B together, A and C together, B and C together, or A and B and C together.” Use of the term “based on,” above and in the claims is intended to mean, “based at least in part on,” such that an unrecited feature or element is also permissible.
In view of the above-described implementations of subject matter this application discloses the following list of examples, wherein one feature of an example in isolation or more than one feature of said example taken in combination and, optionally, in combination with one or more features of one or more further examples are further examples also falling within the disclosure of this application:
Example 1: A method, comprising: receiving, by a metaverse environment reader, a first scene of a metaverse; performing, by the metaverse environment reader, semantic segmentation and object detection steps to identify a plurality of objects in the first scene; indexing, by the metaverse environment reader, the plurality of objects of the first scene based on the semantic segmentation and object detection steps; creating, by the metaverse environment reader, a description of the first scene based on the indexing of the first scene, wherein the description is an audio, haptic, or braille representation of the first scene; generating, by the metaverse environment reader, one or more electrical signals which include an encoding of the description of the first scene; and conveying, by the metaverse environment reader, the one or more electrical signals encoded with the description of the first scene to a user device to be presented on a user interface.
Example 2: The method of Example 1, further comprising: determining, by the metaverse environment reader, an order of importance of the plurality of objects in the first scene based at least on a location and a size of each object of the plurality of objects; and sorting, by the metaverse environment reader, the plurality of objects of the first scene based on the determined order of importance.
Example 3: The method of any of Examples 1-2, further comprising: receiving, by a navigation assistance module, a current direction orientation and a current location of an avatar in the first scene of the metaverse; determining, by the navigation assistance module, a distance to each object of the plurality of objects in the first scene based on the current location of the avatar in the first scene; generating, by the navigation assistance module, a proximity alert in response to the distance from the current location of the avatar to each object of the plurality of objects being less than a threshold; and providing, by the navigation assistance module, direction guidance to the user device to be presented on the user interface to guide the avatar through the first scene based on the current location and the current direction orientation of the avatar in the first scene.
Example 4: The method of any of Examples 1-3, wherein the navigation assistance module is a sub-module of the metaverse environment reader.
Example 5: The method of any of Examples 1-4, wherein the first scene is unlabeled, and wherein the method further comprising generating a label for each object of the plurality of objects based on the segmenting of the first scene.
Example 6: The method of any of Examples 1-5, wherein the description of the first scene comprises: a listing of objects in a given order based on the determined order of importance; and the label for each object of the plurality of objects.
Example 7: The method of any of Examples 1-6, further comprising segmenting the first scene with a trained deep learning engine.
Example 8: The method of any of Examples 1-7, further comprising receiving an indication of a type of user accessing the first user device, wherein the indication can specify whether the type of user is visually-impaired or visually-impaired and hearing-impaired.
Example 9: The method of any of Examples 1-8, wherein segmenting the first scene comprises partitioning the first scene into a plurality of object regions.
Example 10: The method of any of Examples 1-9, wherein determining an order of objects in the first scene is performed by a trained deep learning engine and further based on a prominence of each object of the plurality of objects in the first scene.
Example 11: A system, comprising: at least one processor; and at least one memory including program instructions which when executed by the at least one processor causes operations comprising: receiving, by a metaverse environment reader, a first scene of a metaverse; performing, by the metaverse environment reader, semantic segmentation and object detection steps to identify a plurality of objects in the first scene; indexing, by the metaverse environment reader, the plurality of objects of the first scene based on the semantic segmentation and object detection steps; creating, by the metaverse environment reader, a description of the first scene based on the indexing of the first scene, wherein the description is an audio, haptic, or braille representation of the first scene; generating, by the metaverse environment reader, one or more electrical signals which include an encoding of the description of the first scene; and conveying, by the metaverse environment reader, the one or more electrical signals encoded with the description of the first scene to a user device to be presented on a user interface.
Example 12: The system of Example 11, wherein the program instructions are further executable by the at least one processor to cause operations comprising: determining, by the metaverse environment reader, an order of importance of the plurality of objects in the first scene based at least on a location and a size of each object of the plurality of objects; and sorting, by the metaverse environment reader, the plurality of objects of the first scene based on the determined order of importance.
Example 13: The system of any of Examples 11-12, wherein the program instructions are further executable by the at least one processor to cause operations comprising: receiving, by a navigation assistance module, a current direction orientation and a current location of an avatar in the first scene of the metaverse; determining, by the navigation assistance module, a distance to each object of the plurality of objects in the first scene based on the current location of the avatar in the first scene; generating, by the navigation assistance module, a proximity alert in response to the distance from the current location of the avatar to each object of the plurality of objects being less than a threshold; and providing, by the navigation assistance module, direction guidance to the user device to be presented on the user interface to guide the avatar through the first scene based on the current location and the current direction orientation of the avatar in the first scene.
Example 14: The system of any of Examples 11-13, wherein the navigation assistance module is a sub-module of the metaverse environment reader.
Example 15: The system of any of Examples 11-14, wherein the first scene is unlabeled, and wherein the program instructions are further executable by the at least one processor to cause operations comprising generating a label for each object of the plurality of objects based on the segmenting of the first scene.
Example 16: The system of any of Examples 11-15, wherein the description of the first scene comprises: a listing of objects in a given order based on the determined order of importance; and the label for each object of the plurality of objects.
Example 17: The system of any of Examples 11-16, wherein the program instructions are further executable by the at least one processor to cause operations comprising segmenting the first scene with a trained deep learning engine.
Example 18: The system of any of Examples 11-17, wherein the program instructions are further executable by the at least one processor to cause operations comprising receiving an indication of a type of user accessing the first user device, wherein the indication can specify whether the type of user is visually-impaired or visually-impaired and hearing-impaired.
Example 19: The system of any of Examples 11-18, wherein segmenting the first scene comprises partitioning the first scene into a plurality of object regions.
Example 20: A non-transitory computer readable medium storing instructions, which when executed by at least one data processor, cause operations comprising: receiving, by a metaverse environment reader, a first scene of a metaverse; performing, by the metaverse environment reader, semantic segmentation and object detection steps to identify a plurality of objects in the first scene; indexing, by the metaverse environment reader, the plurality of objects of the first scene based on the semantic segmentation and object detection steps; creating, by the metaverse environment reader, a description of the first scene based on the indexing of the first scene, wherein the description is an audio, haptic, or braille representation of the first scene; generating, by the metaverse environment reader, one or more electrical signals which include an encoding of the description of the first scene; and conveying, by the metaverse environment reader, the one or more electrical signals encoded with the description of the first scene to a user device to be presented on a user interface.
Example 21: The method of any of Examples 1-9, further comprising: receiving, by the metaverse environment reader, a second scene of the metaverse, wherein objects in the second scene are labeled; creating, by the metaverse environment reader, a second description of the second scene based on one or more labels of one or more objects in the second scene; generating, by the metaverse environment reader, one or more second electrical signals which include an encoding of the second description of the second scene; and conveying, by the metaverse environment reader, the one or more second electrical signals encoded with the second description of the first scene to the user device to be presented on the user interface.
Example 22: The method of Example 21, further comprising generating a point of interest description describing metadata for the one or more objects in the second scene, wherein the metadata is not based on visual information of the second scene or of the one or more objects in the second scene.
Example 23: The method of any of Examples 1-9 or 21-22, further comprising: identifying, by an object reader, the plurality of objects in the first scene; generating, by the object reader, descriptions of the plurality of objects in the first scene based on visual information associated with the plurality of objects; and generating, by the object reader, descriptions of the one or more objects in the second scene from the one or more labels.
Example 24: The method of any of Examples 1-9 or 21-23, further comprising: identifying, by a scene reader, a contextual setting of the first scene; and generating, by the scene reader, an abstract term for the first scene based on the identified contextual setting.
The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and sub-combinations of the disclosed features and/or combinations and sub-combinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations can be within the scope of the following claims.