The present disclosure relates generally to methods and systems for dynamive image processing and, in particular, to methods and systems for determining a target object, taking a target image of the target object, and displaying a virtual image related to the target object for a viewer.
People having vision impairment or handicap oftentimes need to carry vision aids to enhance their daily life convenience. Vision aids may typically include lenses or compound lens devices such as magnify glasses or binoculars. In recent years, portable video cameras or mobile devices have also been used as vision aids. However, these devices of the current art usually have many shortcomings. For example, magnify glasses or binoculars have very limited fields of view; portable video cameras or mobile devices may be too complicated to be operated. Additionally, these vision aids may be too cumbersome to be carried around for a prolonged period of time. Furthermore, these vision aids are not practical for the user to view moving targets, such as the bus number on a moving bus. In another aspect, people having vision impairment or handicap are more vulnerable to environmental hazards while traveling. These environmental hazards may cause slips, trips, and falls, such as a gap, unevenness, or sudden change in height occurring on the road, or cause collisions by objects, such as fast-moving vehicles or glass doors. None of the vision aids in the current art has the capability to alert people having vision impairment or handicap about these environmental hazards. To resolve these issues, the present invention aims to provide solutions to these drawbacks of the current arts.
The present disclosure relates to systems and methods to improve a viewer's interaction with the real world by applying a virtual image display technology. In details, such systems and methods determine a target object, take a target image of the target object, process the target image for a virtual image, and then display the virtual image at a predetermined size, color, contrast, brightness, location and/or depth for the viewer. As a result, the viewer, possibly with impaired vision, may clearly comprehend and interact with the real world with comfort, such as reading texts/languages, identifying persons and objects, locating persons and objects, tracking a moving objects, walking up and down stairs, moving without collision with persons and objects etc. The target object and the virtual image may respectively be two dimensional or three dimensional.
In one embodiment of the present invention, a system for dynamic image processing comprises a target detection module, an image capture module, a process module, and a display module. The target detection module is configured to determine a target object for a viewer. The image capture module is configured to take a target image of the target object. The process module receives the target image, processes the target image based on a predetermined process mode, and provides information of a virtual image related to the target image to a display module. And the display module is configured to display the virtual image by respectively projecting multiple right light signals to a viewer's first eye and corresponding multiple left light signals to a viewer's second eye. In addition, a first right light signal and a corresponding first left light signal are perceived by the viewer to display a first virtual binocular pixel of the virtual image with a first depth that is related to a first angle between the first right light signal and the corresponding first left light signal projected into the viewer's eyes.
The target detection module may have multiple detection modes. In first embodiment, the target detection module may include an eye tracking unit to track eyes of the viewer to determine a target object. In second embodiment, the target detection module may include a gesture recognition unit to recognize a gesture of the viewer to determine a target object. In third embodiment, the target detection module may include a voice recognition unit to recognize a voice of the viewer to determine a target object. In fourth embodiment, the target detection module may automatically determine a target object by executing predetermined algorithms.
The image capture module may be a camera to take a target image of the target object for further image processing. The image capture module may include an object recognition unit to recognize the target object, such as a mobile phone, a wallet, an outlet, and a bus. The object recognition unit may also perform OCR (optical character recognition) function to identify the letters and words on the target object. The image capture module may also be used to scan surroundings to identify and locate the target object by employing the object recognition unit.
The process module may apply various different manners to process the target image based on a predetermined operation mode of the system, in order to generate information of the virtual image for a display module.
The display module may comprise a right light signal generator, a right combiner, a left light signal generator, and a left combiner. The right light signal generator generates multiple right light signals which are redirected by a right combiner to project into the viewer's first eye to form a right image. The left light signal generator generates multiple left light signals which are redirected by a left combiner to project into the viewer's second eye to form a left image. In some embodiments, the system may further comprise a depth sensing module, a position module, a feedback module, and/or an interface module. The depth sensing module may measure the distance between an object in surroundings, including the target object, and the viewer. The position module may determine the position and direction of the viewer indoors and outdoors. The feedback module provides feedbacks to the viewer if a predetermined condition is satisfied. The interface module allows the viewer to control various functions of the system.
The present invention may include several system operation modes related to image processing, including a reading mode, a finding mode, a tracking mode, a collision-free mode, a walking guidance mode. In the reading mode, after receiving the target image from the image capture module, the process module may separate the texts/languages in the target object from other information, use OCR function to recognize the letters and words in the texts/languages. In addition, the process module may separate marks, signs, drawings, charts, sketches, logos from background information for the viewer. Depending on each viewer's vision characteristics, resulting from the physical features of the viewer's eyes, measured during the calibration stage, the viewer's display preferences are set up and the process module accordingly magnifies the size, adopts certain colors for these two types of information, adjusts the contrast and brightness to an appropriate level, decide the location and depth for the virtual image to be displayed.
In the finding mode, the process module may separate geometric features of the target object from the target image, such as points, lines, edges, curves, corners, contours, and/or surfaces from other information. Then, based on the viewer's display references, the process module processes the virtual image to be displayed to have a color, contrast, and brightness that can easily catch the viewer's attention.
In the tracking mode, after determining the target object by the target detection module, such as a bus, the image capture module scans surroundings to identify and locate the target object. The process module processes the target image to generate information for the virtual image based on specific applications. Once the target object is located, the virtual image is displayed usually to superimpose on the target object and then remain on the target object when it is moving.
In the collision-free mode, the system continuously scans surroundings, recognize the objects in surroundings, detect how fast these objects move towards the viewer, and identify a potential collision object which may collide into the viewer within a predetermined time period. The process module may generate information for the virtual image. Then the display module displays the virtual image to warn the viewer about the potential collision.
In the walking guidance mode, the system continuously scans surroundings, in particular the pathway in front of the viewer, recognize the objects in surroundings, detect the ground level of the area in front of the viewer who expects to walk into in a predetermined time period and identify an object which may cause slips, trips, or falls. The process module may process the target image to obtain the surface of the target object for generating information of the virtual image. The display module then displays the virtual image to superimpose on the target object such as stairs.
In some embodiments, the system further includes a support structure that is wearable on a head of the viewer. The target detection module, the image capture module, the process module, and the display module, may be carried by the support structure. In one embodiment, the system is a head wearable device, such as a virtual reality (VR) goggle and a pair of augmented reality (AR)/mixed reality (MR) glasses.
The terminology used in the description presented below is intended to be interpreted in its broadest reasonable manner, even though it is used in conjunction with a detailed description of certain specific embodiments of the technology. Certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be specifically defined as such in this Detailed Description section.
The present disclosure relates to systems and methods to improve a viewer's interaction with the real world by applying a virtual image display technology. In details, such systems and methods determine a target object, take a target image of the target object, process the target image for a virtual image, and then display the virtual image at a predetermined size, color, contrast, location and/or depth for the viewer. As a result, the viewer, possible with impaired vision, may clearly comprehend and the interact with the real world with comfort, such as reading texts/languages, identifying persons and objects, locating persons and objects, walking up and down stairs, moving without collision with persons and objects etc. The target object and the virtual image may respectively be two dimensional or three dimensional.
In general, the virtual image is related to the target image. More specifically, the first type of virtual image may include texts/languages, hand written or printed, on the target object, which are taken by the target image and then recognized. This type of virtual image is usually displayed at a larger font size and higher contrast for the viewer to read and comprehend the contents in the texts/languages. The second type of virtual image may include geometric features of the target object, which are taken by the target image and then recognized, including points, lines, edges, curves, corners, contours, or surfaces. This type of virtual image is usually displayed at a bright and complimentary color to highlight the shape and/or location of the target object. In addition to the texts/languages on the target object or geometric features of the target object, the virtual image may include additional information obtained from other resources such as libraries, electronic databases, transportation control center, webpages via internet or telecommunication connection, or other components of the system, such as a distance from the target object to the viewer provided by a depth sensing module. Moreover, the virtual image may include various signs to relate the above information and the target object for example with respect to their locations.
As shown in
The target detection module 110 may have multiple detection modes. In first embodiment, the target detection module 110 may include an eye tracking unit 112 to track eyes of the viewer to determine a target object. For example, the target detection module 110 uses the eye tracking module 112 to detect the fixation location and depth of the viewer's eyes, and then determines the object disposed at the fixation location and depth to be the target object. In second embodiment, the target detection module 110 may include a gesture recognition unit 114 to recognize a gesture of the viewer to determine a target object. For example, the target detection module 110 uses the gesture recognition unit 114 to detect the direction and then the object to which the viewer's index finger points, and then determines the object pointed by the viewer's index finger to be the target object. In third embodiment, the target detection module 110 may include a voice recognition unit 116 to recognize a voice of the viewer to determine a target object. For example, the target detection module 110 uses the voice recognition unit 116 to recognize the meaning of the viewer's voice, and then determines the object to which the voice is referred to be the target object. In fourth embodiment, the target detection module 110 may automatically (without any viewer's action) determine a target object by executing predetermined algorithms. For example, the target detection module 110 uses a camera or a lidar (light detection and ranging) to continuously scan surroundings, detect how fast the objects move towards the viewer, identify a potential collision object which may collide into the viewer within a predetermined time period, and then determine the potential collision object to be the target object.
The image capture module 120 may be a camera to take a target image of the target object for further image processing. The image capture module 120 may include an object recognition unit 122 to recognize the target object, such as a mobile phone, a wallet, an outlet, and a bus. The object recognition unit 112 may also perform OCR (optical character recognition) function to identify the letters and words on the target object. The image capture module 120 may also be used to scan surroundings to identify and locate the target object by employing the object recognition unit 122.
The process module 150 may include processors, such as CPU, GPU, AI (artificial intelligence) processors, and memories, such as SRAM, DRAM and flash memories. The process module 150 may apply various different manners to process the target image based on a predetermined operation mode of the system 100, in order to generate information of the virtual image for a display module 160. In addition, the image module may use the following methods to improve the quality of the virtual image: (1) sampling and quantization to digitize supplementary image; and the quantization level determines the number of grey (or R, G, B separated) levels in the digitized virtual image, (2) histogram analysis and/or histogram equalization to effectively spread out the most frequent intensity values, i.e. stretching out the intensity range of the virtual image, and (3) Gamma correction or contrast selection to adjust the virtual image.
The display module 160 is configured to display the virtual image by respectively projecting multiple right light signals to a viewer's first eye and corresponding multiple left light signals to a viewer's second eye. In addition, a first right light signal and a corresponding first left light signal are perceived by the viewer to display a first virtual binocular pixel of the virtual image with a first depth that is related to a first angle between the first right light signal and the corresponding first left light signal projected into the viewer's eyes. The display module 160 includes a right light signal generator 10, a right combiner 20, a left light signal generator 30, and a left combiner 40. The right light signal generator 10 generates multiple right light signals which are redirected by a right combiner 20 to project into the viewer's first eye to form a right image. The left light signal generator 30 generates multiple left light signals which are redirected by a left combiner 40 to project into the viewer's second eye to form a left image.
The system 100 may further comprise a depth sensing module 130. The depth sensing module 130 may measure the distance between an object in surroundings, including the target object, and the viewer. The depth sensing module 130 may be a depth sensing camera, a lidar, or other ToF (time of flight) sensors. Other devices, such as structured light module, ultrasonic module or IR module, may also function as a depth sensing module used to detect depths of objects in surroundings. The depth sensing module may detect the depths of the viewer's gesture to provide such information to the gesture recognition unit to facilitate the recognition of the viewer's gesture. The depth sensing module 130 alone or together with a camera may be able to create a depth map of surroundings. Such a depth map may be used for tracking the movement of the target objects, hands, and pen-like stylus and further for detecting whether a viewer's hand touches a specific object or surface.
The system 100 may further comprise a position module 140 which may determine the position and direction of the viewer indoors and outdoors. The position module 140 may be implemented by the following components and technologies: GPS, gyroscope, accelerometers, mobile phone network, WiFi, ultra-wideband (UWB), Bluetooth, other wireless networks, beacons for indoor and outdoor positioning. The position module 140 may include an integrated inertial measurement unit (IMU), an electronic device that measures and reports a body's specific force, angular rate, and sometimes the orientation of the body, using a combination of accelerometers, gyroscopes, and sometimes magnetometers. A viewer using the system 100 comprising a position module 140 may share his/her position information with other viewers via various wired and/or wireless communication manners. This function may facilitate a viewer to locate another viewer remotely. The system may also use the viewer's location from the position module 140 to retrieve information about surroundings of the location, such as maps and nearby stores, restaurants, gas stations, banks, churches etc.
The system 100 may further comprise a feedback module 170. The feedback module 170 provides feedbacks, such as sounds and vibrations, to the viewer if a predetermined condition is satisfied. The feedback module 160 may include a speaker to provide sounds, such as sirens to warn the viewer so that he/she can take actions to avoid collision or prevent falls, and/or a vibration generator to provide various types of vibrations. These types of feedback may be set up in by the viewer through an interface module 180.
The system 100 may further comprise an interface module 180 which allows the viewer to control various functions of the system 100. The interface module 180 may be operated by voices, hand gestures, finger/foot movements and in the form of a pedal, a keyboard, a mouse, a knob, a switch, a stylus, a button, a stick, a touch screen, etc.
All components in the system may be used exclusively by a module or shared by two or more modules to perform the required functions. In addition, two or more modules described in this specification may be implemented one physical module. One module described in this specification may be implemented by two or more separate modules. An external server 190 is not part of the system 100 but can provide extra computation power for more complicated calculations. Each of these modules described above and the external server 190 may communicate with one another via wired or wireless manner. The wireless manner may include WiFi, bluetooth, near field communication (NFC), internet, telecommunication, radio frequency (RF), etc.
The present invention may include several system operation modes related to image processing, including a reading mode, a finding mode, a tracking mode, a collision-free mode, a walking guidance mode. The first operation mode may be a reading mode for the viewer. In the reading mode, after receiving the target image from the image capture module 120, the process module 150 may separate the texts/languages (first information type in the reading mode) in the target object from other information, use OCR function to recognize the letters and words in the texts/languages. In addition to texts and languages, the process module 150 may separate marks, signs, drawings, charts, sketches, logos (second information type in the reading mode) from background information for the viewer. Then, depending on each viewer's vision characteristics, resulting from the physical features of the viewer's eyes, measured during the calibration stage, the viewer's display preferences are set up and the process module 150 accordingly magnifies the size, adopts certain color for these two types of information, including texts/language, marks etc., adjusts the contrast to an appropriate level, decide the location and depth for the virtual image to be displayed. For example, the virtual image may need to be displayed at a visual acuity equivalent to 0.5 for one viewer but 0.8 for another viewer. The size corresponding to visual acuity equivalent to 0.5 is larger than that of 0.8. Thus, when the size corresponding to visual acuity equivalent to 0.5 is used, less amount of information, such as words, may be displayed within the same area or space. Similarly, one viewer's eyes may be more sensitive to green lights but the other viewer's eyes may be more sensitive to red lights. During the calibration, the system may set up preferences of size, color, contrast, brightness, location, and depth for each individual viewer to customize the virtual image display. Such an optimal display parameters may reduce visual fatigue and improve visibility for the viewer. To facilitate the viewer's reading of these two types of information, the size, color, contrast, location, and/or depth may be further left depending on the color and light intensity of the surrounding environment. For example, when the light intensity of the surrounding environment is low, the virtual image needs to be displayed with higher light intensity or higher contrast. In addition, the virtual image needs to be displayed in a color complementary to the color of the surrounding environment.
For reading an article or a book, the virtual image with magnified font size and appropriate color/contrast may be displayed at a location adjacent to (close but not overlapped with) the target object and at approximately the same depth as the target object. As a result, the viewer can easily read the texts/languages in the virtual image without shifting the depth back and forth. For reading a sign or mark remote away, the virtual image may be displayed at a depth closer to the viewer plus an estimated distance between the viewer and the target object, for example 50 meters.
The second operation mode may be a finding mode for the viewer. In one scenario, the viewer may want to find his/her car key, mobile phone or wallet. In another scenario, the viewer may want to find switches (such as light switches) or outlets (such as electric outlets). In the finding mode, the process module 150 may separate geometric features of the target object, such as points, lines, edges, curves, corners, contours, and/or surfaces from other information. The process module 150 may use several known algorithms, such as corner detection, curve fitting, edge detection, global structure extraction, feature histograms, line detection, connected-component labeling, image texture, motion estimation, to extract these geometric features. Then, based on the viewer's display references, the process module 150 processes the virtual image to be displayed to have a color, contrast, and brightness that can easily catch the viewer's attention. In one embodiment, the virtual image may include complimentary colors, such as red and green, which flash alternatively and repeatedly. To facilitate the viewer to find/locate the target object, such virtual image is usually displayed to superimpose on the target object and at approximately the same depth as the target object. In addition to the geometric features of the target object, the process module 150 may further include marks or signs, such as an arrow, from the location where the viewer's eyes fixate to the location where the target object is located, to guide the viewer's eyes to recognize the target object. Again, the color, contrast, and brightness may be further left depending on the color and light intensity of the surrounding environment.
The third operation mode may be a tracking mode for the viewer. In one scenario, the viewer wants to take a transportation vehicle, such as a bus, and needs to track the movement of the transportation vehicle until it stops for passengers. In another scenario, the viewer has to keep his/her eye sight on a moving object, such as a running dog or cat, or a flying drone or kite. The process module 150 processes the target image to generate information for the virtual image based on specific applications. For example, for tracking a bus, the virtual image may be the bus number, including Arabic numbers and alphabets, with a circle outside the bus number. For tracking a running dog, the virtual image maybe the contour of the dog. In the tracking mode, the virtual image usually needs to be displayed to superimpose on the target object and at approximately the same depth as the target object so that the viewer may easily locate the target object. In addition, to track a target object that is moving, the virtual image has to remain superimposed on the target object when it is moving. Thus, based on the target image continuously taken by the image capture module 120, the process module 150 has to calculate the next location and depth the virtual image to be displayed and even predict the moving path of the target object, if possible. Such information for displaying a moving virtual image is then provided to the display module 160.
The fourth operation mode may be a collision-free mode. The viewer may want to avoid colliding into a car, a scooter, a bike, a person, or a glass door regardless whether he or she is moving or remain still. In the collision-free mode, the process module 150 may provide calculation power to support the target detection module 110 which uses a camera or a lidar (light detection and ranging) to continuously scan surroundings, recognize the objects in surroundings, detect how fast these objects move towards the viewer, and identify a potential collision object which may collide into the viewer within a predetermined time period, for example 30 seconds. Once a potential collision object is determined to be the target object, the process module 150 may process the target image to obtain the contour of the target object for generating information of the virtual image. To alert the viewer to take actions immediately trying to avoid a collision accident, the virtual image has to catch the viewer's attention right away. For that purpose, the virtual image may include complimentary colors, such as red and green, which flash alternatively and repeatedly. Similar to the tracking mode, the virtual image may be displayed to superimpose on the target object and at approximately the same depth as the target object. In addition, the virtual image usually has to remain superimposed on the target object which moves fast towards the viewer.
The fifth operation mode may be a walking guidance mode. The viewer may want to prevent slips, trips, and falls when he/she walks. In one scenario, when the viewer walks up or down stairs, he or she does not want to miss his/her step or take an infirm step that cause a fall. In another scenario, the viewer may want to be aware of an uneven ground (such as the step connecting a road and sidewalk), a hole, an obstacle (such as a brick or rock) before he or she walks close to it. In the walking guidance mode, the target detection module 110 which may use a camera (image capture module 120 or a separate camera) or a lidar (light detection and ranging) to continuously scan surroundings, in particular the pathway in front of the viewer, recognize the objects in surroundings, detect the ground level of the area in front of the viewer who expects to walk into in a predetermined time period, for example 5 seconds, and identify an object, for example having a height difference of more than 10 cm, which may cause slips, trips, or falls. The process module 150 may provide computation power to support the target detection module 110 to identify such an object. Once such an object is determined to be the target object, the process module 150 may process the target image to obtain the surface of the target object for generating information of the virtual image. To alert the viewer to take actions immediately trying to avoid slips, trips, and falls, the virtual image may further include an eye-catching sign displayed at the location the viewer's eyes fixate at that moment.
As shown in
As shown in
As described above, the viewer's first eye 50 perceives the right image 162 of the virtual image 70 and the viewer's second eye 60 perceives the left image 164 of the virtual image 70. For a viewer with appropriate image fusion function, he/she would perceive a single virtual image at the first location and the first depth because his/her brain would fuse the right image 162 and the left image 164 into one binocular virtual image. However, if a viewer has a weak eye with impaired vision, he/she may not have appropriate image fusion function. In this situation, the viewer's first eye 50 and the second eye 60 may respectively perceive the right image 162 at a first right image location and depth, and the left image 164 at a first left image location and depth (double vision). The first right image location and depth may be close to but different from the first left image location and depth. In addition, the locations and depths of both the first right image and first left image may be close to the first targeted location and first targeted depth. Again, the first targeted depth D1 is related to the first angel θ1 between the first right light signal 16′ and the corresponding first left light signal 36′ projected into the viewer's eyes.
The display module 160 displays the virtual image 70 moving from the second location and the second depth (collectively the “second position” or “T2”) to the first position T1. The first depth D1 is different from the second depth D2. The second depth D2 is related to a second angle θ2 between the second right light signal 16′ and the corresponding second left light signal 38′.
As shown in
As shown in
The display module 160 and the method of generating virtual images at a predetermined locations and depths as well as the method of moving the virtual images as desired are discussed in details below. The PCT international application PCT/US20/59317, filed on Nov. 6, 2020, titled “SYSTEM AND METHOD FOR DISPLAYING AN OBJECT WITH DEPTHS” is incorporated herein by reference at its entirety.
As shown in
The distance between the right pupil 52 and the left pupil 62 is interpupillary distance (IPD). Similarly, the second angle between the second redirected right light signal (the second right light signal) 18′ and the corresponding second redirected left light signal (the second left light signal) 38′ is θ2. The second depth D2 is related to the second angle θ2. In particular, the second depth D2 of the second virtual binocular pixel 74 of the virtual object 70 at T2 can be determined approximately by the second angle θ2 between the light path extensions of the second redirected right light signal and the corresponding second redirected left light signal by the same formula. Since the second virtual binocular pixel 74 is perceived by the viewer to be further away from the viewer (i.e. with larger depth) than the first virtual binocular pixel 72, the second angle θ2 is smaller than the first angle θ1.
Furthermore, although the redirected right light signal 16′ for RLS_2 and the corresponding redirected left light signal 36′ for LLS_2 together display a first virtual binocular pixel 72 with the first depth D1. The redirected right light signal 16′ for RLS_2 may present an image of the same or different view angle from the corresponding redirected left light signal 36′ for LLS_2. In other words, although the first angle θ1 determines the depth of the first virtual binocular pixel 72, the redirected right light signal 16′ for RLS_2 may be or may not be a parallax of the corresponding redirected left light signal 36′ for LLS_2. Thus, the intensity of red, blue, and green (RBG) color and/or the brightness of the right light signal and the left light signal may be approximately the same or slightly different, because of the shades, view angle, and so forth, to better present some 3D effects.
As described above, the multiple right light signals are generated by the right light signal generator 10, redirected by the right combiner 20, and then directly scanned onto the right retina to form a right image 162 (right retina image 86 in
With reference to
In one embodiment shown in
A virtual object perceived by a viewer in area C may include multiple virtual binocular pixels but is represented by one virtual binocular pixel in this disclosure. To precisely describe the location of a virtual binocular pixel in the space, each location in the space is provided a three dimensional (3D) coordinate, for example XYZ coordinate. Other 3D coordinate system can be used in another embodiment. As a result, each virtual binocular pixel has a 3D coordinate—a horizontal direction, a vertical direction, and a depth direction. A horizontal direction (or X axis direction) is along the direction of interpupillary line. A vertical direction (or Y axis direction) is along the facial midline and perpendicular to the horizontal direction. A depth direction (or Z axis direction) is right to the frontal plane and perpendicular to both the horizontal and vertical directions. The horizontal direction coordinate and vertical direction coordinate are collectively referred to as the location in the present invention.
As shown in
The look up table may be created by the following processes. At the first step, obtain an individual virtual map based on his/her IPD, created by the virtual image module during initiation or calibration, which specify the boundary of the area C where the viewer can perceive a virtual object with depths because of the fusion of right retina image and left retina image. At the second step, for each depth at Z axis direction (each point at Z-coordinate), calculate the convergence angle to identify the pair of right pixel and left pixel respectively on the right retina image and the left retina image regardless of the X-coordinate and Y-coordinate location. At the third step, move the pair of right pixel and left pixel along X axis direction to identify the X-coordinate and Z-coordinate of each pair of right pixel and left pixel at a specific depth regardless of the Y-coordinate location. At the fourth step, move the pair of right pixel and left pixel along Y axis direction to determine the Y-coordinate of each pair of right pixel and left pixel. As a result, the 3D coordinate system such as XYZ of each pair of right pixel and left pixel respectively on the right retina image and the left retina image can be determined to create the look up table. In addition, the third step and the fourth step are exchangeable.
The light signal generator 10 and 30 may use laser, light emitting diode (“LED”) including mini and micro LED, organic light emitting diode (“OLED”), or superluminescent diode (“SLD”), LCoS (Liquid Crystal on Silicon), liquid crystal display (“LCD”), or any combination thereof as its light source. In one embodiment, the light signal generator 10 and 30 is a laser beam scanning projector (LBS projector) which may comprise the light source including a red color light laser, a green color light laser, and a blue color light laser, a light color modifier, such as Dichroic combiner and Polarizing combiner, and a two dimensional (2D) adjustable reflector, such as a 2D electromechanical system (“MEMS”) mirror. The 2D adjustable reflector can be replaced by two one dimensional (1D) reflector, such as two 1D MEMS mirror. The LBS projector sequentially generates and scans light signals one by one to form a 2D image at a predetermined resolution, for example 1280×720 pixels per frame. Thus, one light signal for one pixel is generated and projected at a time towards the combiner 20, 40. For a viewer to see such a 2D image from one eye, the LBS projector has to sequentially generate light signals for each pixel, for example 1280×720 light signals, within the time period of persistence of vision, for example 1/18 second. Thus, the time duration of each light signal is about 60.28 nanosecond.
In another embodiment, the light signal generator 10 and 30 may be a digital light processing projector (“DLP projector”) which can generate a 2D color image at one time. Texas Instrument's DLP technology is one of several technologies that can be used to manufacture the DLP projector. The whole 2D color image frame, which for example may comprise 1280×720 pixels, is simultaneously projected towards the combiners 20, 40.
The combiner 20, 40 receives and redirects multiple light signals generated by the light signal generator 10, 30. In one embodiment, the combiner 20, 40 reflects the multiple light signals so that the redirected light signals are on the same side of the combiner 20, 40 as the incident light signals. In another embodiment, the combiner 20, 40 refracts the multiple light signals so that the redirected light signals are on the different side of the combiner 20, 40 from the incident light signals. When the combiner 20, 40 functions as a refractor. The reflection ratio can vary widely, such as 20%-80%, in part depending on the power of the light signal generator. People with ordinary skill in the art know how to determine the appropriate reflection ratio based on characteristics of the light signal generators and the combiners. Besides, in one embodiment, the combiner 20, 40 is optically transparent to the ambient (environmental) lights from the opposite side of the incident light signals so that the viewer can observe the real-time image at the same time. The degree of transparency can vary widely depending on the application. For AR/MR application, the transparency is preferred to be more than 50%, such as about 75% in one embodiment.
The combiner 20, 40 may be made of glasses or plastic materials like lens, coated with certain materials such as metals to make it partially transparent and partially reflective. One advantage of using a reflective combiner instead of a wave guide in the prior art for directing light signals to the viewer's eyes is to eliminate the problem of undesirable diffraction effects, such as multiple shadows, color displacement . . . etc.
The foregoing description of embodiments is provided to enable any person skilled in the art to make and use the subject matter. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the novel principles and subject matter disclosed herein may be applied to other embodiments without the use of the innovative faculty. The claimed subject matter set forth in the claims is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein. It is contemplated that additional embodiments are within the spirit and true scope of the disclosed subject matter. Thus, it is intended that the present invention covers modifications and variations that come within the scope of the appended claims and their equivalents.
This application claims the benefit of the provisional application 63/085,161, filed on Sep. 30, 2020, titled “DYNAMIC IMAGE PROCESSING SYSTEMS AND METHODS FOR AUGMENTED REALITY DEVICES”, which are incorporated herein by reference at their entireties. In addition, the PCT international application PCT/US20/59317, filed on Nov. 6, 2020, titled “SYSTEM AND METHOD FOR DISPLAYING AN OBJECT WITH DEPTHS” and the PCT international application PCT/US21/46078, filed on Aug. 18, 2021, titled “SYSTEMS AND METHODS FOR SUPERIMPOSING VIRTUAL IMAGE ON REAL-TIME IMAGE” are incorporated herein by reference at their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/053048 | 9/30/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63085161 | Sep 2020 | US |