A computing system, such as a mobile device, typically includes various types of sensors, such as an image sensor, a motion sensor, etc., to generate sensor data about the operation conditions of the mobile device. The computing system can include a display to output certain contents. The computing system may operate an application that can determine the operation conditions based on the sensor data, and generate the contents accordingly. For example, a virtual reality (VR)/mixed reality (MR)/augmented reality (AR) application can determine the location of a user of the mobile device based on the sensor data, and generate virtual or composite images including virtual contents based on the location, to provide an immersive experience.
The application can benefit from increased resolutions and operation speeds of the sensors and the display. However, various constraints, such as area and power constraints imposed by the mobile device, can limit the resolution and operation speeds of the sensors and the displays, which in turn can limit the performance of the application that relies on the sensors and the display to provide inputs and outputs as well as user experience SUMMARY
The disclosure relates generally to a sensing and display system, and more specifically, an integrated sensing and display system.
In one example, an apparatus is provided. The apparatus includes a first semiconductor layer that includes an image sensor; a second semiconductor layer that includes a display; a third semiconductor layer that includes compute circuits configured to support an image sensing operation by the image sensor and a display operation by the display; and a semiconductor package that encloses the first, second, and third semiconductor layers, the semiconductor package further including a first opening to expose the image sensor and a second opening to expose the display. The first, second, and third semiconductor layers form a first stack structure along a first axis. The third semiconductor layer is sandwiched between the first semiconductor layer and the second semiconductor layer in the first stack structure.
In some aspects, the first semiconductor layer includes a first semiconductor substrate and a second semiconductor substrate forming a second stack structure along the first axis, the second stack structure being a part of the first stack structure. The first semiconductor substrate includes an array of pixel cells. The second semiconductor substrates includes processing circuits to process outputs of the array of pixel cells.
In some aspects, the first semiconductor substrate includes at least one of: silicon or germanium.
In some aspects, the first semiconductor layer further includes a motion sensor.
In some aspects, the first semiconductor layer includes a semiconductor substrate that includes: a micro-electromechanical system (MEMS) to implement the motion sensor; and a controller to control an operation of the MEMS and to collect sensor data from the MEMS.
In some aspects, the second semiconductor layer includes a semiconductor substrate that includes an array of light emitting diodes (LED) to form the display.
In some aspects, the semiconductor substrate forms a device layer. The second semiconductor layer further includes a thin-film circuit layer on the device layer configured to transmit control signals to the array of LEDs.
In some aspects, the device layer comprises a groups III V material. The thin-film circuit layer comprises indium gallium zinc oxide (IGZO) thin-film transistors (TFTs).
In some aspects, the compute circuits include a sensor compute circuit and a display compute circuit. The sensor compute circuit includes an image sensor controller configured to control the image sensor to perform the image sensing operation to generate a physical image frame. The display compute circuit includes a content generation circuit configured to generate an output image frame based on the physical image frame, and a rendering circuit configured to control the display to display the output image frame.
In some aspects, the compute circuits include a frame buffer. The image sensor controller is configured to store the physical image frame in the frame buffer. The content generation circuit is configured to replace one or more pixels of the physical image frame in the frame buffer to generate the output image frame, and to store the output image frame in the frame buffer. The rendering circuit is configured to read the output image frame from the frame buffer and to generate display control signals based on the output image frame read from the frame buffer.
In some aspects, the sensor compute circuit includes a sensor data processor configured to determine pixel locations of a region of interest (ROI) that enclose a target object in the physical image frame. The image sensor controller is configured to enable a subset of pixel cells of an array of pixel cells of the image sensor to capture a subsequent physical frame based on the pixel locations of the ROI.
In some aspects, the content generation circuit is configured to generate the output image frame based on a detection of the target object by the sensor data processor.
In some aspects, the first semiconductor layer further includes a motion sensor. The sensor data processor is further configured to determine at least one of a state of motion or a location of the apparatus based on an output of the motion sensor. The image sensor controller is configured to enable the subset of pixel cells based on the at least one of a state of motion or a location of the apparatus.
In some aspects, the content generation circuit is configured to generate the output image frame based on the at least one of a state of motion or a location of the apparatus.
In some aspects, the first semiconductor layer is connected to the third semiconductor layer via 3D interconnects.
In some aspects, the first semiconductor layer is connected to the third semiconductor layer via 2.5D interconnects.
In some aspects, the third semiconductor layer is connected to the second semiconductor layer via metal bumps.
In some aspects, the apparatus further comprises a laser diode adjacent to the image sensor and configured to project structured light.
In some aspects, the apparatus further comprises a light emitting diode (LED) adjacent to the display to support an eye-tracking operation.
In some aspects, the third semiconductor layer further includes a power management circuit.
In some aspects, the image sensor is divided into a plurality of tiles of image sensing elements. The display is divided into a plurality of tiles of display elements. A frame buffer of the compute circuits is divided into a plurality of tile frame buffers. Each tile frame buffer is directly connected to a corresponding tile of image sensing element and a corresponding tile of display elements. Each tile of image sensing elements is configured to store a subset of pixels of a physical image frame in the corresponding tile frame buffer. Each tile of display elements is configured to output a subset of pixels of an output image frame stored in the corresponding tile frame buffer.
In some examples, a method of generating an output image frame is provided. The method comprises: generating, using an image sensor, an input image frame, the image sensor comprising a plurality of tiles of image sensing elements, each tile of image sensing elements being connected to a corresponding tile frame buffer which is also connected to a corresponding tile of display elements of a display; storing, using each tile of image sensing elements, a subset of pixels of the input image frame at the corresponding tile frame buffer in parallel; replacing, by a content generator, at least some of the pixels of the input image frame stored at the tile frame buffers to generate the output image frame; and controlling each tile of display elements to fetch a subset of pixels of the output image frame from the corresponding tile frame buffer to display the output image frame.
These illustrative examples are mentioned not to limit or define the scope of this disclosure, but rather to provide examples to aid understanding thereof. Illustrative examples are discussed in the Detailed Description, which provides further description. Advantages offered by various examples may be further understood by examining this specification.
Illustrative embodiments are described with reference to the following figures.
The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated may be employed without departing from the principles of, or benefits touted in, this disclosure.
In the appended figures, similar components and/or features may have the same reference label. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
In the following description, for the purposes of explanation, specific details are set forth to provide a thorough understanding of certain inventive embodiments. However, it will be apparent that various embodiments may be practiced without these specific details. The figures and description are not intended to be restrictive.
As described above, a computing system, such as a mobile device, typically includes various types of sensors, such as an image sensor, a motion sensor, etc., to generate sensor data about the operation conditions of the mobile device. The computing system can also include a display to output certain contents. The mobile device may also operate an application that receives sensor data from the sensors, generates contents based on the sensor data, and outputs the contents via the display.
One example of such an application is a VR/MR/AR application, which can generate virtual content based on the sensor data of the mobile device to provide user a simulated experience of being in a virtual world, or in a hybrid world having a mixture of physical objects and virtual objects. For example, a mobile device may be in the form of, for example, a head-mounted display (HMD), smart glasses, etc., to be worn by a user and covering the user's eyes. The HMD may include image sensors to capture images of a physical scene surrounding the user. The HMD may also include a display to output the images of the scene. Depending on the user's orientation/pose, the HMD may capture images from different angles of the scene and display the images to the user, thereby simulating the user's vision. To provide a VR/MR/AR experience, the application can determine various information, such as the orientation/pose of the user, location of the scene, physical objects present in a scene, etc., and generate contents based on the information. For example, the application can generate a virtual image representing a virtual scene to replace the physical scene the mobile device is in, and display the virtual image. As another example, the application may generate a composite image including a part of the image of the physical scene as well as virtual contents, and display the composite image to the user. The virtual contents may include, for example, a virtual object to replace a physical object in the physical scene, texts or other image data to annotate a physical object in the physical scene, etc. As the virtual/composite images displayed to the user change as the user moves or changes orientation/pose, the application can provide the user with a simulated experience of being immersed in a virtual/hybrid world.
The VR/MR/AR application, as well as the immersive experience provided by the application, can benefit from increased resolutions and operation speeds of the image sensor and the displays. By increasing the resolutions of the image sensor and the displays, more detailed images of the scene can be captured and (in the case of AR/MR) displayed to the user to provide improved simulation of vision. Moreover, in the case of VR, a more detailed virtual scene can be constructed based on the captured images and displayed to user. Also, by increasing the operation speeds of the image sensor and the display, the images captured and displayed can change more responsively to changes in the location/orientation/pose of the user. All these can improve the user's simulated experience of being immersed in a virtual/hybrid world.
Although a mobile device application can benefit from the increased resolutions and operation speeds of the image sensor and the displays, various constraints, such as area and power constraints imposed by the mobile device, can limit the resolution and operation speeds of the image sensor and the displays. Specifically, an image sensor typically includes an array of image sensing elements (e.g., photodiodes), whereas a display typically includes an array of display elements (e.g., light emitting diodes (LED)). The mobile device further includes compute circuits, such as image processing circuits, rendering circuits, memory, etc., that support the operations of the display elements and image sensing elements. Due to the small form factors of the mobile device/HMD, limited space is available to fit in the image sensor, the displays, and their compute circuits, which in turn can limit the numbers of image sensing elements and display elements, as well as the quantities of computation and memory resources included in the compute circuits, all of which can limit the achievable image sensing and display resolutions. The limited available power of a mobile device also constrains the numbers of image sensing elements and display elements.
In addition, operating the image sensor and the display at high frame rate requires moving a large quantity of image data and content data within the mobile device at a high data rate. But moving those data at a high data rate can involve massive compute resources and power consumption, especially when the data are moved over discrete electrical buses (e.g., a mobile industry processor interface (MIPI)) within the mobile device at a high data rate. Due to the limited available power and computation resources at the mobile device, the data rate for movement of image data and content data within the mobile device is also limited, which in turn can limit the achievable speeds of operation, as well as the achievable resolutions, of the image sensor and the displays.
This disclosure relates to an integrated system that can address at least some of the issues above. Specifically, a system may include a sensor, compute circuits, and a display. The compute circuits can include sensor compute circuits to interface with the sensor and display compute circuits to interface with the display. The compute circuits can receive sensor data from the sensor and generate content data based on the sensor data, and provide the content data to the display. The sensor can be formed on a first semiconductor layer and the display can be formed on a second semiconductor layer, whereas the compute circuit can be formed on a third semiconductor layer. The first, second, and third semiconductor layers can form a stack structure with the third semiconductor layer sandwiched between the first semiconductor substrate and the second semiconductor layer. Moreover, each of first, second, and third semiconductor layers can also include one or more semiconductor substrates stacked together. The stack structure can be enclosed at least partially within a semiconductor package having at least a first opening to expose the display. The integrated system can be part of a mobile device (e.g., a head-mounted display (HMD)), and the semiconductor package can have input/output (I/O) pins to connect with other components of the mobile device, such as a host processor that executes a VR/AR/MR application.
In some examples, the first, second, and third semiconductor layers can be fabricated with heterogeneous technologies (e.g., different materials, different process nodes) to form a heterogeneous system. The first semiconductor layer can include various types of sensor devices, such as an array of image sensing elements, each including one or more photodiodes as well as circuits (e.g., analog-to-digital converters) to digitize the sensor outputs. Depending the sensing wavelength, the first semiconductor substrate can include various materials such as silicon, Germanium, etc. In addition, the first semiconductor substrate may also include a motion sensor, such as an inertial motion unit (IMU), which can include a micro-electromechanical system (MEMS). Both the array of image sensing elements and the MEMS of the motion sensor can be formed on a first surface of the first semiconductor substrate facing away from the second and third semiconductor substrates, and the semiconductor package can have a second opening to expose the array of image sensing elements.
Moreover, the second semiconductor layer can include an array of display elements each including a light emitting diode (LED) to form the display, which can be in the form of tiled displays or a single display for both left and right eyes. The second semiconductor layer may include a sapphire substrate or a gallium nitride (GaN) substrate. The array of display elements can be formed in one or more semiconductor layers on a second surface of the second semiconductor substrate facing away from the first and third semiconductor substrates. The semiconductor layers may include various groups III-V material depending on the color of light to be emitted by the LED such as (GaN), indium gallium nitride (InGaN), aluminum gallium indium phosphide (AlInGaP), Lead Selenide (PbSe), Lead Sulfide (PbS), Graphene, etc. In some examples, second semiconductor layer may further include indium gallium zinc oxide (IGZO) thin-film transistors (TFTs) to transmit control signals to the array of display elements. In some examples, the second semiconductor layer may also include a second array of image sensing elements on the second surface of the second semiconductor layer to collect images of the user's eyes while the user is watching the display.
Further, the third semiconductor layer can include digital logics and memory cells to implement the compute circuits. The third semiconductor layer may include silicon transistor devices, such as a fin field-effect transistor (FinFET), a Gate-all-around FET (GAAFET), etc., to implement the digital logics, as well as memory devices, such as MRAM device, ReRAM device, SRAM devices, etc., to implement the memory cells. The third semiconductor layer may also include other transistor devices, such as analog transistors, capacitors, etc., to implement analog circuits, such as analog-to-digital converters (ADC) to quantize the sensor signals, display drivers to transmit current to the LEDs of the display elements, etc.
In addition to sensor, display, and compute circuits, the integrated system may include other components to support the VR/AR/MR application on the host processor. For example, the integrated system may include one or more illuminators for active sensing. For example, the integrated system may include a laser diode (e.g., vertical-cavity, surface-emitting lasers (VCSELs)) to project light for depth-sensing. The laser diode can be formed on the first surface of the first semiconductor substrate to project light (e.g., structured light) into the scene, and the image sensor on the first surface of the first semiconductor layer can detect light reflected from the scene. As another example, the integrated system may include a light emitting diode (LED) to project light towards the user's eyes when the user watches the display. The LED can be formed on the second surface of the second semiconductor layer facing the user's eyes. Images of the eyes can then be captured by the image sensor on the second surface to support, for example, eye tracking. In addition, the integrated system can include various optical components, such as lenses and filters, positioned over the image sensor on the first semiconductor layer and the display on the second semiconductor layer to control the optical properties of the light entering the lenses and exiting the display. In some examples, the lenses can be wafer level optics.
The integrated system further includes first interconnects to connect between the first semiconductor layer and the third semiconductor layer to enable communication between the image sensor in the first semiconductor layer and the sensor compute circuits in the third semiconductor layer. The integrated system also includes second interconnects to connect between the third semiconductor layer and the second semiconductor layer to enable communication between the display/image sensor in the second semiconductor layer and the sensor/display compute circuits in the third semiconductor layer. Various techniques can be used to implement the first and second interconnects to connect between the third semiconductor layer and each of the first and second semiconductor layers. In some examples, at least one of the first and second interconnects can include 3D interconnects, such as through silicon vias (TSVs), micro-TSVs, a Copper-Copper bump, etc. In some examples, at least one of first and second interconnects can include 2.5D interconnects, such as an interposer. In such examples, the system can include multiple semiconductor substrates, each configured as a chiplet. For example, the array of image sensing elements of the image sensor can be formed in one chiplet or divided into multiple chiplets. Moreover, the motion sensor can also be formed in another chiplet. Each chiplet can be connected to an interposer via, for example, micro-bumps. The interposer is then connected to the third semiconductor layer via, for example, micro-bumps.
As described above, the compute circuits in the third semiconductor layer can include sensor compute circuits to interface with the sensor and display compute circuits to interface with the display. The sensor compute circuits can include, for example, an image sensor controller, an image sensor frame buffer, a motion data buffer, and a sensor data processor. Specifically, the image sensor controller can control the image sensing operations performed by the image sensor by, for example, providing global signals (e.g., clock signals, various control signals) to the image sensor. The image sensor controller can also enable a subset of the array of image sensing elements to generate a sparse image frame. The image sensor frame buffer can store one or more image frames generated by the array of image sensing elements. The motion data buffer can store motion measurement data (e.g., pitch, roll, yaw) measured by the IMU. The sensor data processor can process the image frames and motion measurement data. For example, the sensor data processor can include an image processor to process the image frames to determine the location and the size of a region of interest (ROI) enclosing a target object, and transmit image sensor control signals back to the image sensor to enable the subset of image sensing elements corresponding to the ROI. The target object can be defined by the application on the host processor, which can send the target object information to the system. In addition, the sensor data processor can include circuits such as, for example, a Kalman filter, to determine a location, an orientation, and/or a pose of the user based on the IMU data. The sensor compute circuits can transmit the processing results, such as location and size of ROI, location, orientation and/or pose information of the user, to the display compute circuits.
The display compute circuits can generate (or update) content based on the processing results from the sensor compute circuits, and generate display control signals to the display to output the content. The display compute circuits can include, for example, a content generation circuit, a display frame buffer, a rendering circuit, etc. Specifically, the content generation circuit can receive a reference image frame, which can be a virtual image frame from the host processor, a physical image frame from the image sensor, etc. The content generation circuit can generate an output image frame based on the reference image frame, as well as the sensor processing result. For example, in a case where the virtual image frame is received from the host processor, the content generation circuit can perform a transformation operation on the virtual image frame to reflect a change in the user's viewpoint based on the location, orientation and/or pose information of the user. As another example, in a case where a physical image frame is received from the image processor, the content generation circuit can generate the output image frame as a composite image based on adding virtual content such as, for example, replacing a physical object with a virtual object, adding virtual annotations, etc. The content generation circuit can also perform additional post-processing of the output image frame to, for example, compensate for optical and motion warping effects. The content generation circuit can then store the output image frame at the display frame buffer. The rendering circuit can include control logic and LED driver circuits. The control logic can read pixels of the output image frame from the frame buffer according to a scanning pattern, and transmit display control signals to the LED driver circuits to render the output image frame.
In some examples, the sensor, the compute circuits, and the display can be arranged to form a distributed sensing and display system, in which the display is divided into tiles of display elements and the image sensor is divided into tiles of image sensing elements. Each tile of display elements in the second semiconductor substrate is directly connected, via the second on-chip interconnects, to a corresponding tile memory in the third semiconductor substrate. Each tile memory is, in turn, connected to a corresponding tile of image sensing elements in the first semiconductor substrate. To support an AR/MR application, each tile of image sensing elements can generate a subset of pixel data of a scene and store the subset of pixel data in the corresponding tile memory. The content generation circuit can edit a subset of the stored pixel data to add in the virtual contents. The rendering circuit can then transmit display controls to each tile of display elements based on the pixel data stored in the corresponding tile memories.
With the disclosed techniques, an integrated system in which sensor, compute, and display are integrated within a semiconductor package can be provided. Such an integrated system can improve the performance of the sensor and the display while reducing footprint and reducing power consumption. Specifically, by putting sensor, compute, and display within a semiconductor package, the distances travelled by the data between the sensor and the compute and between the compute and the display can be greatly reduced, which can improve the speed of transfer of data. The speed of data transfer can be further improved by the 2.5D and 3D interconnects, which can provide high-bandwidth and short-distance routes for the transfer of data. All these allow the image sensor and the display to operate at a higher frame to improve their operation speeds. Moreover, as the sensor and the display are integrated within a rigid stack structure, relative movement between the sensor and the display (e.g., due to thermal expansion) can be reduced, which can reduce the need to calibrate the sensor and the display to account for the movement.
In addition, the integrated system can reduce footprint and power consumption. Specifically, by stacking the compute circuits and the sensors on the back of the display, the overall footprint occupied by the sensors, the compute circuits, and the display can be reduced especially compared with a case where the display, the sensor, and the compute circuits are scattered at different locations. The stacking arrangements are also likely to achieve the minimum and optimum overall footprint, given that the display typically have the largest footprint (compared with sensor and compute circuits). Moreover, the image sensors can be oriented to face an opposite direction from the display to provide simulated vision, which allows placing the image sensors on the back of the display, while placing the motion sensor on the back of the display typically does not affect the overall performance of the system.
Moreover, in addition to improving the data transfer rate, the 2.5D/3D interconnects between the semiconductor substrates also allow the data to be transferred more efficiently compared with, for example, discrete buses such as those defined under the MIPI specification. For example, C-PHY Mobile Industry Processor Interface (MIPI) requires a few pico-Joule (pJ)/bit while wireless transmission through a 60 GHz link requires a few hundred pJ/bit. In contrast, due to the high bandwidth and the short routing distance provided by the on-chip interconnects, the power consumed in the transfer of data over 2.5D/3D interconnects is typically just a fraction of pJ/bit. Furthermore, due to the higher transfer bandwidth and reduced transfer distance, the data transfer time can also be reduced as a result, which allows support circuit components (e.g., clocking circuits, signal transmitter and receiver circuits) to be powered off for a longer duration to further reduce the overall power consumption of the system.
The integrated system also allows implementation of a distributed sensing and display system, which can further improve the system performance. Specifically, compared with a case where the image sensors store an image at a centralized frame buffer from which the display fetches the image, which typically requires sequential accesses of the frame buffer to write and read a frame, a distributed sensing and display system allows each tile of image sensing elements to store a subset of pixel data of a scene into each corresponding tile memory in parallel. Moreover, each tile of display elements can also fetch the subset of pixel data from the corresponding tile memory in parallel. The parallel access of the tile memories can speed up the transfer of image data from the image sensor to the displays, which can further increase the operation speeds of the image sensor and the displays.
The disclosed techniques may include or be implemented in conjunction with an AR system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, for example, a VR, an AR, a MR, a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured (e.g., real-world) content. The AR content may include video, audio, haptic feedback, or some combination thereof, any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a 3D effect to the viewer). Additionally, in some embodiments, AR may also be associated with applications, products, accessories, services, or some combination thereof, that are used to, for example, create content in an AR and/or are otherwise used in (e.g., performing activities in) an AR. The AR system that provides the AR content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
Near-eye display 100 includes a frame 105 and a display 110. Frame 105 is coupled to one or more optical elements. Display 110 is configured for the user to see content presented by near-eye display 100. In some embodiments, display 110 comprises a waveguide display assembly for directing light from one or more images to an eye of the user.
Near-eye display 100 further includes image sensors 120a, 120b, 120c, and 120d. Each of image sensors 120a, 120b, 120c, and 120d may include a pixel array configured to generate image data representing different fields of views along different directions. For example, sensors 120a and 120b may be configured to provide image data representing two fields of view towards a direction A along the Z axis, whereas sensor 120c may be configured to provide image data representing a field of view towards a direction B along the X axis, and sensor 120d may be configured to provide image data representing a field of view towards a direction C along the X axis.
In some embodiments, sensors 120a 120d can be configured as input devices to control or influence the display content of the near-eye display 100, to provide an interactive VR/AR/MR experience to a user who wears near-eye display 100. For example, sensors 120a-120d can generate physical image data of a physical environment in which the user is located. The physical image data can be provided to a location tracking system to track a location and/or a path of movement of the user in the physical environment. A system can then update the image data provided to display 110 based on, for example, the location and orientation of the user, to provide the interactive experience. In some embodiments, the location tracking system may operate a SLAM algorithm to track a set of objects in the physical environment and within a view of field of the user as the user moves within the physical environment. The location tracking system can construct and update a map of the physical environment based on the set of objects, and track the location of the user within the map. By providing image data corresponding to multiple fields of views, sensors 120a 120d can provide the location tracking system a more holistic view of the physical environment, which can lead to more objects to be included in the construction and updating of the map. With such an arrangement, the accuracy and robustness of tracking a location of the user within the physical environment can be improved.
In some embodiments, near-eye display 100 may further include one or more active illuminators 130 to project light into the physical environment. The light projected can be associated with different frequency spectrums (e.g., visible light, infra-red light, ultra-violet light, etc.), and can serve various purposes. For example, illuminator 130 may project light in a dark environment (or in an environment with low intensity of infrared (IR) light, ultraviolet (UV) light, etc.) to assist sensors 120a 120d in capturing images of different objects within the dark environment to, for example, enable location tracking of the user. Illuminator 130 may project certain markers onto the objects within the environment, to assist the location tracking system in identifying the objects for map construction/updating.
In some embodiments, illuminator 130 may also enable stereoscopic imaging. For example, one or more of sensors 120a or 120b can include both a first pixel array for visible light sensing and a second pixel array for infra-red (IR) light sensing. The first pixel array can be overlaid with a color filter (e.g., a Bayer filter), with each pixel of the first pixel array being configured to measure the intensity of light associated with a particular color (e.g., one of red, green or blue colors). The second pixel array (for IR light sensing) can also be overlaid with a filter that allows only IR light through, with each pixel of the second pixel array being configured to measure the intensity of IR lights. The pixel arrays can generate an RGB image and an IR image of an object, with each pixel of the IR image being mapped to each pixel of the RGB image. Illuminator 130 may project a set of IR markers on the object, the images of which can be captured by the IR pixel array. Based on a distribution of the IR markers of the object as shown in the image, the system can estimate a distance of different parts of the object from the IR pixel array, and generate a stereoscopic image of the object based on the distances. Based on the stereoscopic image of the object, the system can determine, for example, a relative position of the object with respect to the user, and can update the image data provided to display 100 based on the relative position information to provide the interactive experience.
As discussed above, near-eye display 100 may be operated in environments associated with a wide range of light intensities. For example, near-eye display 100 may be operated in an indoor environment or in an outdoor environment, and/or at different times of the day. Near-eye display 100 may also operate with or without active illuminator 130 being turned on. As a result, image sensors 120a 120d may need to have a wide dynamic range to be able to operate properly (e.g., to generate an output that correlates with the intensity of incident light) across a wide range of light intensities associated with different operating environments for near-eye display 100.
As discussed above, to avoid damaging the eyeballs of the user, illuminators 140a, 140b, 140c, 140d, 140e, and 140f are typically configured to output lights of very low intensities. In a case where image sensors 150a and 150b comprise the same sensor devices as image sensors 120a 120d of
Moreover, the image sensors 120a 120d may need to be able to generate an output at a high speed to track the movements of the eyeballs. For example, a user's eyeball can perform a rapid movement (e.g., a saccade movement) in which there can be a quick jump from one eyeball position to another. To track the rapid movement of the user's eyeball, image sensors 120a 120d need to generate images of the eyeball at high speed. For example, the rate at which the image sensors generate an image frame (the frame rate) needs to at least match the speed of movement of the eyeball. The high frame rate requires short total exposure time for all of the pixel cells involved in generating the image frame, as well as high speed for converting the sensor outputs into digital values for image generation. Moreover, as discussed above, the image sensors also need to be able to operate at an environment with low light intensity.
Waveguide display assembly 210 is configured to direct image light to an eyebox located at exit pupil 230 and to eyeball 220. Waveguide display assembly 210 may be composed of one or more materials (e.g., plastic, glass.) with one or more refractive indices. In some embodiments, near-eye display 100 includes one or more optical elements between waveguide display assembly 210 and eyeball 220.
In some embodiments, waveguide display assembly 210 includes a stack of one or more waveguide displays including, but not restricted to, a stacked waveguide display, a varifocal waveguide display, etc. The stacked waveguide display is a polychromatic display (e.g., a red-green-blue (RGB) display) created by stacking waveguide displays whose respective monochromatic sources are of different colors. The stacked waveguide display is also a polychromatic display that can be projected on multiple planes (e.g., multi-planar colored display). In some configurations, the stacked waveguide display is a monochromatic display that can be projected on multiple planes (e.g., multi-planar monochromatic display). The varifocal waveguide display is a display that can adjust a focal position of image light emitted from the waveguide display. In alternate embodiments, waveguide display assembly 210 may include the stacked waveguide display and the varifocal waveguide display.
Waveguide display 300 includes a source assembly 310, an output waveguide 320, and a controller 330. For purposes of illustration,
Source assembly 310 generates image light 355. Source assembly 310 generates and outputs image light 355 to a coupling element 350 located on a first side 370-1 of output waveguide 320. Output waveguide 320 is an optical waveguide that outputs expanded image light 340 to an eyeball 220 of a user. Output waveguide 320 receives image light 355 at one or more coupling elements 350 located on the first side 370-1 and guides received input image light 355 to a directing element 360. In some embodiments, coupling element 350 couples the image light 355 from source assembly 310 into output waveguide 320. Coupling element 350 may be, for example, a diffraction grating, a holographic grating, one or more cascaded reflectors, one or more prismatic surface elements, and/or an array of holographic reflectors.
Directing element 360 redirects the received input image light 355 to decoupling element 365 such that the received input image light 355 is decoupled out of output waveguide 320 via decoupling element 365. Directing element 360 is part of, or affixed to, first side 370-1 of output waveguide 320. Decoupling element 365 is part of, or affixed to, second side 370-2 of output waveguide 320, such that directing element 360 is opposed to the decoupling element 365. Directing element 360 and/or decoupling element 365 may be, for example, a diffraction grating, a holographic grating, one or more cascaded reflectors, one or more prismatic surface elements, and/or an array of holographic reflectors.
Second side 370-2 represents a plane along an x-dimension and a y-dimension. Output waveguide 320 may be composed of one or more materials that facilitate total internal reflection of image light 355. Output waveguide 320 may be composed of for example, silicon, plastic, glass, and/or polymers. Output waveguide 320 has a relatively small form factor. For example, output waveguide 320 may be approximately 50 mm wide along the x-dimension, 30 mm long along y-dimension and 0.5-1 mm thick along a z-dimension.
Controller 330 controls scanning operations of source assembly 310. The controller 330 determines scanning instructions for the source assembly 310. In some embodiments, the output waveguide 320 outputs expanded image light 340 to the user's eyeball 220 with a large field of view (FOV). For example, the expanded image light 340 is provided to the user's eyeball 220 with a diagonal FOV (in x and y) of 60 degrees and/or greater and/or 150 degrees and/or less. The output waveguide 320 is configured to provide an eyebox with a length of 20 mm or greater and/or equal to or less than 50 mm; and/or a width of 10 mm or greater and/or equal to or less than 50 mm.
Moreover, controller 330 also controls image light 355 generated by source assembly 310, based on image data provided by image sensor 370. Image sensor 370 may be located on first side 370-1 and may include, for example, image sensors 120a 120d of
After receiving instructions from the remote console, mechanical shutter 404 can open and expose the set of pixel cells 402 in an exposure period. During the exposure period, image sensor 370 can obtain samples of lights incident on the set of pixel cells 402, and generate image data based on an intensity distribution of the incident light samples detected by the set of pixel cells 402. Image sensor 370 can then provide the image data to the remote console, which determines the display content, and provide the display content information to controller 330. Controller 330 can then determine image light 355 based on the display content information.
Source assembly 310 generates image light 355 in accordance with instructions from the controller 330. Source assembly 310 includes a source 410 and an optics system 415. Source 410 is a light source that generates coherent or partially coherent light. Source 410 may be, for example, a laser diode, a vertical cavity surface emitting laser, and/or a light emitting diode.
Optics system 415 includes one or more optical components that condition the light from source 410. Conditioning light from source 410 may include, for example, expanding, collimating, and/or adjusting orientation in accordance with instructions from controller 330. The one or more optical components may include one or more lenses, liquid lenses, mirrors, apertures, and/or gratings. In some embodiments, optics system 415 includes a liquid lens with a plurality of electrodes that allows scanning of a beam of light with a threshold value of scanning angle to shift the beam of light to a region outside the liquid lens. Light emitted from the optics system 415 (and also source assembly 310) is referred to as image light 355.
Output waveguide 320 receives image light 355. Coupling element 350 couples image light 355 from source assembly 310 into output waveguide 320. In embodiments where coupling element 350 is a diffraction grating, a pitch of the diffraction grating is chosen such that total internal reflection occurs in output waveguide 320, and image light 355 propagates internally in output waveguide 320 (e.g., by total internal reflection), toward decoupling element 365.
Directing element 360 redirects image light 355 toward decoupling element 365 for decoupling from output waveguide 320. In embodiments where directing element 360 is a diffraction grating, the pitch of the diffraction grating is chosen to cause incident image light 355 to exit output waveguide 320 at angle(s) of inclination relative to a surface of decoupling element 365.
In some embodiments, directing element 360 and/or decoupling element 365 are structurally similar. Expanded image light 340 exiting output waveguide 320 is expanded along one or more dimensions (e.g., may be elongated along x-dimension). In some embodiments, waveguide display 300 includes a plurality of source assemblies 310 and a plurality of output waveguides 320. Each of source assemblies 310 emits a monochromatic image light of a specific band of wavelength corresponding to a primary color (e.g., red, green, or blue). Each of output waveguides 320 may be stacked together with a distance of separation to output an expanded image light 340 that is multi-colored.
Near-eye display 100 is a display that presents media to a user. Examples of media presented by the near-eye display 100 include one or more images, video, and/or audio. In some embodiments, audio is presented via an external device (e.g., speakers and/or headphones) that receives audio information from near-eye display 100 and/or control circuitries 510 and presents audio data based on the audio information to a user. In some embodiments, near-eye display 100 may also act as an AR eyewear glass. In some embodiments, near-eye display 100 augments views of a physical, real-world environment, with computer-generated elements (e.g., images, video, sound).
Near-eye display 100 includes waveguide display assembly 210, one or more position sensors 525, and/or an inertial measurement unit (IMU) 530. Waveguide display assembly 210 includes source assembly 310, output waveguide 320, and controller 330.
IMU 530 is an electronic device that generates fast calibration data indicating an estimated position of near-eye display 100 relative to an initial position of near-eye display 100 based on measurement signals received from one or more of position sensors 525.
Imaging device 535 may generate image data for various applications. For example, imaging device 535 may generate image data to provide slow calibration data in accordance with calibration parameters received from control circuitries 510. Imaging device 535 may include, for example, image sensors 120a 120d of
The input/output interface 540 is a device that allows a user to send action requests to the control circuitries 510. An action request is a request to perform a particular action. For example, an action request may be to start or end an application or to perform a particular action within the application.
Control circuitries 510 provide media to near-eye display 100 for presentation to the user in accordance with information received from one or more of: imaging device 535, near-eye display 100, and input/output interface 540. In some examples, control circuitries 510 can be housed within system 500 configured as a head-mounted device. In some examples, control circuitries 510 can be a standalone console device communicatively coupled with other components of system 500. In the example shown in
The application store 545 stores one or more applications for execution by the control circuitries 510. An application is a group of instructions that when executed by a processor generates content for presentation to the user. Examples of applications include: gaming applications, conferencing applications, video playback applications, or other suitable applications.
Tracking module 550 calibrates system 500 using one or more calibration parameters and may adjust one or more calibration parameters to reduce error in determination of the position of the near-eye display 100.
Tracking module 550 tracks movements of near-eye display 100 using slow calibration information from the imaging device 535. Tracking module 550 also determines positions of a reference point of near-eye display 100 using position information from the fast calibration information.
Engine 555 executes applications within system 500 and receives position information, acceleration information, velocity information, and/or predicted future positions of near-eye display 100 from tracking module 550. In some embodiments, information received by engine 555 may be used for producing a signal (e.g., display instructions) to waveguide display assembly 210 that determines a type of content presented to the user. For example, to provide an interactive experience, engine 555 may determine the content to be presented to the user based on a location of the user (e.g., provided by tracking module 550), or a gaze point of the user (e.g., based on image data provided by imaging device 535), a distance between an object and user (e.g., based on image data provided by imaging device 535).
In some examples, image sensor 600 may also include an illuminator 622, an optical filter 624, an imaging module 628, and a sensing controller 640. Illuminator 622 may be an IR illuminator, such as a laser or a light emitting diode (LED), that can project IR light for 3D sensing. The projected light may include, for example, structured light or light pulses. Optical filter 624 may include an array of filter elements overlaid on the plurality of photodiodes 612a-612d of each pixel cell including pixel cell 602a. Each filter element can set a wavelength range of incident light received by each photodiode of pixel cell 602a. For example, a filter element over photodiode 612a may transmit the visible blue light component while blocking other components, a filter element over photodiode 612b may transmit the visible green light component, a filter element over photodiode 612c may transmit the visible red light component, whereas a filter element over photodiode 612d may transmit the IR light component.
Image sensor 600 further includes an imaging module 628. Imaging module 628 may further include a 2D imaging module 632 to perform 2D imaging operations and a 3D imaging module 634 to perform 3D imaging operations. The operations can be based on digital values provided by ADCs 616. For example, based on the digital values from each of photodiodes 612a-612c, 2D imaging module 632 can generate an array of pixel values representing an intensity of an incident light component for each visible color channel, and generate an image frame for each visible color channel. Moreover, 3D imaging module 634 can generate a 3D image based on the digital values from photodiode 612d. In some examples, based on the digital values, 3D imaging module 634 can detect a pattern of structured light reflected by a surface of an object, and compare the detected pattern with the pattern of structured light projected by illuminator 622 to determine the depths of different points of the surface with respect to the pixel cells array. For detection of the pattern of reflected light, 3D imaging module 634 can generate pixel values based on intensities of IR light received at the pixel cells. As another example, 3D imaging module 634 can generate pixel values based on time-of-flight of the IR light transmitted by illuminator 622 and reflected by the object.
Image sensor 600 further includes a sensing controller 640 to control different components of image sensor 600 to perform 2D and 3D imaging of an object.
In some examples, display 700 can be configured as a scanning display in which the LEDs configured to emit light of a particular color are formed as a strip (or multiple strips). For example, display elements/LEDs 702a, 702b, 702c can be assembled to form a strip 704 on a semiconductor substrate 706 to emit green light. In addition, strip 708 can be configured to emit red light, whereas strip 710 can be configured to emit blue light.
In addition, display 700 includes a display controller circuit 714, which can include graphic pipeline 716 and global configuration circuits 718, which can generate, respectively, digital display data 720 and global configuration signal 722 to control LED array 712 to output an image. Specifically, graphic pipeline 716 can receive instructions/data from, for example, a host device to generate digital pixel data for an image to be output by LED array 712. Graphic pipeline 716 can also map the pixels of the images to the groups of LEDs of LED array 712 and generate digital display data 720 based on the mapping and the pixel data. For example, for a pixel having a target color in the image, graphic pipeline 305 can identify the group of LEDs of LED array 712 corresponding that pixel, and generate digital display data 720 targeted at the group of LEDs. The digital display data 720 can be configured to scale a baseline output intensity of each LEDs within the group to set the relative output intensities of the LEDs within the group, such that the combined output light from the group can have the target color.
In addition, global configuration circuits 718 can control the baseline output intensity of the LEDs of LED array 302, to set the brightness of output of LED array 712. In some examples, global configuration circuits 718 can include a reference current generator as well as current mirror circuits to supply global configuration signal 722, such as a bias voltage, to set the baseline bias current of each LED of LED array 302.
Display 700 further includes a display driver circuits array 730, which includes digital and analog circuits to control LED array 712 based on digital display data 720 and global configuration signal 722. Display driver circuit array 730 may include a display driver circuit for each LED of LED array 712. The controlling can be based on supplying a scaled baseline bias current to each LED of LED array 712, with the baseline bias current set by global configuration signal 722, while the scaling can be set by digital display data 720 for each individual LED. For example, as shown in
One example of application 810 hosted by compute circuit 808 is a VR/MR/AR application, which can generate virtual content based on the sensor data of the mobile device to provide user a simulated experience of being in a virtual world, or in a hybrid world having a mixture of physical objects and virtual objects. To provide a VR/MR/AR experience, the application can determine various information, such as the orientation/pose of the user, location of the scene, physical objects present in a scene, etc., and generate contents based on the information. For example, the application can generate a virtual image representing a virtual scene to replace the physical scene the mobile device is in, and display the virtual image. The virtual image being displayed can be updated as the user moves or changes orientation/pose, the application can provide the user with a simulated experience of being immersed in a virtual world.
As another example, the application may generate a composite image including a part of the image of the physical scene as well as virtual contents, and display the composite image to the user, to provide AR/MR experiences.
The performance of application 810, as well as the immersive experience provided by the application, can be improved by increasing resolutions and operation speeds of image sensors 600a and 600b and displays 700a and 700b. By increasing the resolutions of the image sensors and the displays, more detailed images of the scene can be captured and (in the case of AR/MR) displayed to the user to provide improved simulation of vision. Moreover, in the case of VR, more detailed virtual scene can be constructed based on the captured images and displayed to user. Moreover, by increasing the operation speeds of the image sensor and the display, the images captured and displayed can change more responsively to changes in the location/orientation/pose of the user. All these can improve the user's simulated experience of being immersed in a virtual/hybrid world.
Although it is desirable to increase the resolutions and operation speeds of image sensors 600a and 600b and displays 700a and 700b, various constraints, such as area and power constraints imposed by mobile device 800, can limit the resolution and operation speeds of the image sensor and the displays. Specifically, due to the small form factors of mobile device 800, very limited space is available to fit in image sensors 600 and displays 700 and their support components (e.g., sensing controller 640, imaging module 628, display driver circuits array 720, display controller circuit 714, compute circuits 808), which in turn can limit the numbers of image sensing elements and display elements, as well as the quantities of available computation and memory resources, all of which can limit the achievable image sensing and display resolutions. The limited available power of mobile device 800 also constrains the numbers of image sensing elements and display elements.
In addition, operating the image sensor and the display at high frame rate requires moving a large quantity of image data and content data within the mobile device at a high data rate. But moving those data at a high data rate can involve massive compute resources and power consumption, especially when the data are moved over discrete electrical buses 812820 within mobile device 800 over a considerable distance between compute circuits 808 and each of image sensors 600 and displays 700 at a high data rate. Due to the limited available power and computation resources at mobile device 800, the data rate for movement of image data and content data within the mobile device is also limited, which in turn can limit the achievable speeds of operation, as well as the achievable resolutions of the image sensor and the displays.
Sensors 902, display 904, and compute circuits 906 can be formed in different semiconductor layers which can be stacked. Each semiconductor layer can include one or more semiconductor substrates/wafers that can also be stacked to form the layer. For example, image sensor 902a and IMU 902b can be formed on a semiconductor layer 912, display 904 can be formed on a semiconductor layer 914, whereas compute circuits 906 can be formed on a semiconductor layer 916. Semiconductor layer 916 can be sandwiched between semiconductor layer 912 and semiconductor layer 914 (e.g., along the z-axis) to form a stack structure. In the example of
The stack structure of semiconductor layers 912, 914, and 916 can be enclosed at least partially within a semiconductor package 910 to form an integrated system. Semiconductor package 910 can be positioned within a mobile device, such as mobile device 800. Semiconductor package 910 can have an opening 920 to expose pixel cell array 602 and an opening 921 to expose LED array 712. Semiconductor package 910 further includes input/output (I/O) pins 930, which can be electrically connected to compute circuits 906 on semiconductor layer 916, to provide connection between integrated system 900 and other components of the mobile device, such as a host processor that executes a VR/AR/MR application, power system, etc. I/O pins 930 can be connected to, for example, semiconductor layer 916 via bond wires 932.
Integrated system 900 further includes interconnects to connect between the semiconductor substrates. For example, image sensor 902a of semiconductor layer 912 connected to semiconductor layer 916 via interconnects 922a to enable movement of data between image sensor 902a and sensor compute circuits 906a, whereas IMU 902b of semiconductor layer 912 is connected to semiconductor layer 916 via interconnects 922b to enable movement of data between IMU 902b and sensor compute circuits 906a. In addition, semiconductor layer 916 is connected to semiconductor layer 914 via interconnects 924 to enable movement of data between display compute circuits 906b and display 904. As to be described below, various techniques can be used to implement the interconnects, which can be implemented as 3D interconnects such as through silicon vias (TSVs), micro-TSVs, Copper-Copper bumps, etc. and/or 2.5D interconnects such as interposer.
In addition, semiconductor substrate 1010 can include processing circuits 1012 formed on a front side surface 1014. Processing circuits 1012 can include, for example, analog-to-digital converters (ADC) to quantize the charge generated by photodiodes 612 of pixel cell array 602, memory devices to store the outputs of the ADC, etc. Other components, such as metal capacitors or device capacitors, can also be formed on front side surface 1014 and sandwiched between semiconductor substrates 1000 and 1010 to provide additional charge storage buffers to support the quantization operations.
Semiconductor substrates 1000 and 1010 can be connected with vertical 3D interconnects, such as Copper bonding 1016 between front side surface 1006 of semiconductor substrate 1000 and front side surface 1014 of semiconductor substrate 1010, to provide electrical connections between the photodiodes and processing circuits. Such arrangements can reduce the routing distance of the pixel data from the photodiodes to the processing circuits.
In addition, integrated system 900 further includes a semiconductor substrate 1020 to implement IMU 902b. Semiconductor substrate 1020 can include a MEMS 1022 and a MEMS controller 1024 formed on a front side surface 1026 of semiconductor substrate 1020. MEMS 1022 and MEMS controller 1024 can form an IMU, with MEMS controller 1024 controlling the operations of MEMS 1022 and generating sensor data from MEMS 1022.
Moreover, semiconductor layer 916, which implements sensor compute circuits 906a and display compute circuits 906b, can include a semiconductor substrate 1030 and a semiconductor substrate 1040 forming a stack. Semiconductor substrate 1030 can implement sensor compute circuits 906a to interface with image sensor 902a and IMU 902b. Sensor compute circuits 906a can include, for example, an image sensor controller 1032, an image sensor frame buffer 1036, a motion data buffer 1036, and a sensor data processor 1038. Image sensor controller 1032 can control the sensing operations performed by the image sensor by, for example, providing global signals (e.g., clock signals, various control signals) to the image sensor. Image sensor controller 1032 can also enable a subset of pixel cells of pixel cell array 602 to generate a sparse image frame. In addition, image sensor frame buffer 1034 can store one or more image frames generated by pixel cell array 602, whereas motion data buffer 1036 can store motion measurement data (e.g., pitch, roll, yaw) measured by the IMU.
Sensor data processor 1038 can process the image frames stored in image sensor frame buffer 1034 and motion measurement data stored in motion data buffer 1036 to generate a processing result. For example, sensor data processor 1038 can include an image processor to process the image frames to determine the location and the size of a region of interest (ROI) enclosing a target object. The target object can be defined by the application on the host processor, which can send the target object information to the system. In addition, sensor data processor 1038 can include circuits such as, for example, a Kalman filter, to determine a state of motion, such as a location, an orientation, etc., of mobile device 800 based on the motion measurement data. Based on the image processing results and state of motion, image sensor controller 1032 can predict the location of the ROI for the next image frame, and enable a subset of pixel cells of pixel cell array 602 corresponding to the ROI to generate a subsequent sparse image frame. The generation of a sparse image frame can reduce the power consumption of the image sensing operation as well as the volume of pixel data transmitted by pixel cell array 602 to sensor compute circuits 906a. In addition, sensor data processor 1038 can also transmit the image processing and motion data processing results to sensor compute circuits 906b for display 904.
In addition, semiconductor substrate 1040 can implement display compute circuits 906b to interface with display 904 of semiconductor layer 914. Display compute circuits 906b can include, for example, a content generation circuit 1042, a display frame buffer 1044, and a rendering circuit 1046. Specifically, content generation circuit 1042 can receive a reference image frame, which can be a virtual image frame received externally from, for example, a host processor via I/O pins 930, or a physical image frame received from image sensor frame buffer 1034. Content generation circuit 1042 can generate an output image frame based on the reference image frame as well as the image processing and motion data processing results.
Specifically, in a case where the virtual image frame is received from the host processor, the content generation circuit can perform a transformation operation on the virtual image frame to reflect a change in the user's viewpoint based on the location and/or orientation information from the motion data processing results, to provide user a simulated experience of being in a virtual world. As another example, in a case where a physical image frame is received from the image processor, content generation circuit 1042 can generate the output image frame as a composite image based on adding virtual content such as, for example, replacing a physical object in the physical image frame with a virtual object, adding virtual annotations to the physical frame, etc., as described in
Content generation circuit 1042 can store the output image frame at display frame buffer 1044. Rendering circuit 1046 can include display driver circuits array 730 as well as control logic circuits. The control logic circuits can read pixels of the output image frame from display frame buffer 1044 according to a scanning pattern, and transmit control signals to display driver circuits array 730, which can then control LED array 712 to display the output image frame.
Semiconductor substrates 1010 (of semiconductor layer 912), as well as semiconductor substrates 1030 and 1040 (of semiconductor layer 916), can include digital logics and memory cells. Semiconductor substrates 1010, 1030, and 1040 may include silicon transistor devices, such as FinFET, GAAFET, etc., to implement the digital logics, as well as memory devices, such as MRAM device, ReRAM device, SRAM devices, etc., to implement the memory cells. The semiconductor substrates may also include other transistor devices, such as analog transistors, capacitors, etc., to implement analog circuits, such as analog-to-digital converter (ADC) to quantize the sensor signals, display driver circuits to transmit current to LED array 712, etc.
In some examples, semiconductor layer 914, which implements LED array 712, can include a semiconductor substrate 1050 which includes a device layer 1052, and a thin-film circuit layer 1054 deposited on device layer 1052. LED array 712 can be formed in a layered epitaxial structure include a first doped semiconductor layer (e.g., a p-doped layer), a second doped semiconductor layer (e.g., an n-doped layer), and a light-emitting layer (e.g., an active region). Device layer 1052 has a light emitting surface 1056 facing away from the light receiving surface of pixel cell array 602, and an opposite surface 1058 that is opposite to light emitting surface 1056.
Thin-film circuit layer 1054 is deposited on the opposite surface 1056 of device layer 1052. Thin-film circuit layer 1054 can include a transistor layer (e.g., a thin-film transistor (TFT) layer); an interconnect layer; and/or a bonding layer (e.g., a layer comprising a plurality of pads for under-bump metallization). Device layer 1052 can provide a support structure for thin-film circuit layer 1054. Thin-film circuit layer 1054 can include circuitry for controlling operation of LEDs in the array of LEDs, such as circuitry that routes the current from display driver circuits to the LEDs. Thin-film circuit layer 1054 can include materials including, for example, c-axis aligned crystal indium-gallium-zinc oxide (CAAC-IGZO), amorphous indium gallium zinc oxide (a-IGZO), low-temperature polycrystalline silicon (LTPS), amorphous silicon (a-Si), etc.
Semiconductor substrates 1000, 1010, 1020, 1030, 1040, and 1050, of semiconductor layers 912, 914, and 916, can be connected via 3D interconnects, such as through silicon vias (TSVs), micro-TSVs, Copper-Copper bumps, etc. For example, as described above, semiconductor substrates 1000 and 1010 can be connected via Copper bonding 1016. In addition, semiconductor substrates 1010, 1030, and 1040 can be connected via through silicon vias 1060 (TSVs), which penetrate through the semiconductor substrates. Moreover, semiconductor substrates 1020, 1030, and 1040 can be connected via TSVs 1062, which penetrate through the semiconductor substrates. Further, semiconductor substrates 1040 and 1050 can be connected via a plurality of metal bumps, such as micro bumps 1064, which interface with thin-film circuit layer 1054.
In some examples, integrated sensing and display system 900 may further include a power management circuit (not shown in
In some examples, at least some of semiconductor layers 912, 914, and 916 can be connected via 2.5D interconnects to form a multi-chip module (MCM).
In addition, integrated system 900 may further include one or more illuminators for active sensing. For example, referring to
Referring back to
To reduce the delay incurred by the memory access to content generation, in some examples, compute circuits 906 of integrated system 900 can include a shared frame buffer to be accessed by both sensor compute circuits 906a and display compute circuits 906b. Image sensor 902a can store a physical image frame at the shared frame buffer. Content generation circuit 1042 can read the physical image frame at the shared frame buffer and replace pixels of the image frame buffer to add in virtual contents to generate a composite image frame. Rendering circuit 1046 can then read the composite image frame from the shared frame buffer and output it to LED array 712. By taking away the time to store the input/output frame at the display frame buffer, the delay incurred by the sequential memory accesses can be reduced.
In some examples to further reduce the delay, a distributed sensing and display system can be implemented in which the display is divided into tiles of display elements and the image sensor is divided into tiles of image sensing elements. Each tile of display elements is directly connected to a corresponding tile memory in the third semiconductor substrate. Each tile memory is, in turn, connected to a corresponding tile of image sensing elements. Each tile memory can be accessed in parallel to store the physical image frame captured by the image sensor and to replace pixels to add in virtual contents. As each tile memory is typically small, the access time for each tile memory is relatively short, which can further reduce the delay incurred by memory access to content generation.
Each of tile frame buffers 1404a-1404e can be accessed in parallel by sensor compute circuits 906a to write subsets of pixels of a physical image frame captured by the corresponding array of pixel cells 1403. Each of tile frame buffers 1404a 1404e can also be accessed in parallel by display compute circuits 906b to replace pixels to add in virtual contents. The sharing of the frame buffer between sensor compute circuits 906a and display compute circuits 906b, as well as the parallel access of the tile frame buffers, can substantially reduce the delay incurred in the transfer of pixel data and speed up the generation of content.
Method 1500 starts with step 1502, in which an image sensor, such as image sensor including array of pixel cells 1403 (e.g., 1403a-e). Each array of pixel cells 1403 can form a tile of image sensing elements and connected to a corresponding tile frame buffer (e.g., one of tile frame buffers 1404a-e) which in turn is connected to a corresponding tile of display elements of a display (e.g., array of LEDs 1409a-e). The arrays of pixel cells 1403 can collectively capture light from a scene and generate an image frame of the scene.
It should be appreciated that while some examples may employ multiple tiles of image sensing elements, the method may employ an image sensor having a single array of pixel cells 1403 form a single tile of image sensing elements connected to a corresponding frame buffer.
In step 1504, each tile of image sensing elements can store a subset of pixels of the image frame at the corresponding tile frame buffer in parallel. For example, array of pixel cells 1403a can store a subset of pixels at tile frame buffer 1404a, array of pixel cells 1403b can store another subset of pixels at tile frame buffer 1404b, etc. The storage of the pixels at the respective tile frame buffer can be performed in parallel as each tile frame buffer is connected directly to the tile of image sensing element, as shown in
In step 1506, a content generator, such as content generation circuit 1042, can replace at least some of the pixels of the input image frame stored at the tile frame buffer(s) to generate the output image frame. In some examples, the pixels can be replaced to provide an annotation generated by sensor data processor 1038 based on, for example, detecting a target object in the input image frame, as shown in
In step 1508, a rendering circuit, such as rendering circuit 1046, can control each tile of display elements to fetch a subset of pixels of the output image frame from the corresponding tile frame buffer to display the output image frame. The rendering circuit can control the tiles of display elements based on a scanning pattern. Upon receiving a signal to output content, the tile of display elements can fetch the pixel data, which can include the pixel data of the original input frame or pixel data inserted by content generation circuit 1042, from the corresponding tile frame buffer and output the pixel data. If an image sensor with only a single tile of image sensing elements is employed, the rendering circuit controls the single frame buffer to display the output image frame.
With the disclosed techniques, an integrated system in which sensor, compute, and display are integrated within a semiconductor package can be provided. Such an integrated system can improve the performance of the sensor and the display while reducing the footprint and reducing power consumption. Specifically, by putting sensor, compute, and display within a semiconductor package, the distances travelled by the data between the sensor and the compute and between the compute and the display can be greatly reduced, which can improve the speed of transfer of data. The speed of data transfer can be further improved by the 2.5D and 3D interconnects, which can provide high-bandwidth and short-distance routes for the transfer of data. In addition, the integrated system also allows implementation of a distributed sensing and display system, which can further improve the system performance, as described above. All these allow the image sensor and the display to operate at a higher frame to improve their operation speeds. Moreover, as the sensor and the display are integrated within a rigid stack structure, relative movement between the sensor and the display (e.g., due to thermal expansion) can be reduced, which can reduce the need to calibrate the sensor and the display to account for the movement.
In addition, the integrated system can reduce the footprint and power consumption. Specifically, by stacking the compute circuits and the sensors on the back of the display, the overall footprint occupied by the sensors, the compute circuits, and the display can be reduced especially compared with a case where the display, the sensor, and the compute circuits are scattered at different locations. The stacking arrangements are also likely to achieve the minimum and optimum overall footprint, given that the displays typically have the largest footprint (compared with sensor and compute circuits), and that the image sensors need to be facing opposite directions from the display to provide simulated vision.
Moreover, in addition to improving the data transfer rate, the 2.5D/3D interconnects between the semiconductor substrates also allow the data to be transferred more efficiently compared with, for example, discrete buses such as those defined under the MIPI specification. For example, C-PHY Mobile Industry Processor Interface (MIPI) requires a few pico-Joule (pJ)/bit while wireless transmission through a 60 GHz link requires a few hundred pJ/bit. In contrast, due the high bandwidth and the short routing distance provided by the on-chip interconnects, the power consumed in the transfer of data over 2.5D/3D interconnects is typically just a fraction of pJ/bit. Furthermore, due to the higher transfer bandwidth and reduced transfer distance, the data transfer time can also be reduced as a result, which allows support circuit components (e.g., clocking circuits, signal transmitter and receiver circuits) to be powered off for a longer duration to further reduce the overall power consumption of the system.
An integrated sensing and display system, such as integrated system 900, can improve the performance of the sensor and the display while reducing the footprint and reducing power consumption. Specifically, by putting sensors 902, compute circuits 906, and display 904 within a single semiconductor package 910, rather than scattering them around at different locations within the mobile device, the distances travelled by the data between sensors 902 and compute circuits 906, and between compute circuits 906 and display 904, can be greatly reduced, which can improve the speed of transfer of data. The speed of data transfer can be further improved by the 2.5D/3D interconnects 922 and 924, which can provide high-bandwidth and short-distance routes for the transfer of data. All these allow image sensor 902a and display 904 to operate at a higher frame to improve their operation speeds.
Moreover, as sensors 902 and display 904 are integrated within a rigid stack structure, relative movement between sensors 902 and display 904 (e.g., due to thermal expansion) can be reduced. Compared with a case where the sensor and the display are mounted on separate printed circuit boards (PCBs) that are held together on non-rigid structures, integrated system 900 can reduce the relative movement between sensors 902 and display 904 which can accumulate over time. The reduced relative movement can be advantageous as the need to re-calibrate the sensor and the display to account for the movement can be reduced. Specifically, as described above, image sensors 600 can be positioned on mobile device 800 to capture images of a physical scene with the field-of-views (FOVs) of left and right eyes of a user, whereas displays 700 are positioned in front of the left and right eyes of the user to display the images of the physical scene, or virtual/composite images derived from the captured images, to simulate the vision of the user. If there are relative movements between the image sensors and the displays, the image sensors and/or the display may need to be calibrated (e.g., by post-processing the image frames prior to being displayed) to correct for the relative movements in order to simulate the vision of the user. By integrating the sensors and the display within a rigid stack structure, the relative movements between the sensors and the display can be reduced, which can reduce the need for the calibration.
In addition, integrated system 900 can reduce the footprint and power consumption. Specifically, by stacking compute circuits 906 and sensors 902 on the back of display 904, the overall footprint occupied by sensors 902, display 904, and compute circuits 906 can be reduced, especially compared with a case where sensors 902, display 904, and compute circuits are scattered at different locations within mobile device 800. The stacking arrangements are also likely to achieve the minimum and optimum overall footprint, given that display 904 typically have the largest footprint compared with sensors 902 and compute circuits 906, and that image sensors 902a can be oriented to face an opposite direction from display to provide simulated vision.
Moreover, in addition to improving the data transfer rate, the 2.5D/3D interconnects between the semiconductor substrates, such as interconnects 922a, 922b, and 924, also allow the data to be transferred more efficiently compared with, for example, discrete buses such as those defined under the MIPI specification. As a result, power consumption by the system in the data transfer can be reduced. For example, C-PHY Mobile Industry Processor Interface (MIPI) requires a few pico-Joule (pJ)/bit while wireless transmission through a 60 GHz link requires a few hundred pJ/bit. In contrast, due the high bandwidth and the short routing distance provided by the on-chip interconnects, the power consumed in the transfer of data over 2.5D/3D interconnects is typically just a fraction of pJ/bit. Furthermore, due to the higher transfer bandwidth and reduced transfer distance, the data transfer time can also be reduced as a result, which allows the support circuit components (e.g., clocking circuits, signal transmitter and receiver circuits) to be powered off for a longer duration to further reduce the overall power consumption of the system. All these can reduce the power consumption of integrated system 900 as well as mobile device 800 as a whole.
Some portions of this description describe the embodiments of the disclosure in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are commonly used by those skilled in the data processing arts to convey the substance of their work effectively to others skilled in the art. These operations, while described functionally, computationally, or logically, are understood to be implemented by computer programs or equivalent electrical circuits, microcode, or the like. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules, without loss of generality. The described operations and their associated modules may be embodied in software, firmware, and/or hardware.
Steps, operations, or processes described may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In some embodiments, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.
Embodiments of the disclosure may also relate to an apparatus for performing the operations described. The apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Embodiments of the disclosure may also relate to a product that is produced by a computing process described herein. Such a product may comprise information resulting from a computing process, where the information is stored on a non-transitory, tangible computer readable storage medium and may include any embodiment of a computer program product or other data combination described herein.
The language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the disclosure be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the disclosure, which is set forth in the following claims.
This application claims priority to U.S. Provisional Patent Application 63/131,937, titled “Integrated Sensing and Display System,” filed Dec. 30, 2020, the entirety of which is incorporated herein by reference
Number | Date | Country | |
---|---|---|---|
63131937 | Dec 2020 | US |