The present invention generally relates to image processing. More particularly, the present invention relates to a computing system for performing real-time neural network-based image rendering.
Image processing techniques using machine learning models, such as neural networks, have been developed for rendering high-quality images. For example, neural radiance field techniques based on neural networks have been recently developed to synthesize photorealistic images from novel viewpoints (i.e., perspectives). For instance, a neural radiance field of an object can be encoded into a neural network based on a training dataset comprising images depicting the object from various viewpoints. Once the neural network is trained, intensity and color values of pixels of an image of the object can be obtained, and the image can be rendered. In general, conventional neural radiance field-based image rendering techniques have their limitations. For example, due to large amount of data that needs to be processed by current implementations of neural radiance field-based image processing systems, real-time image rendering is often not practical. As such, better solutions are needed.
Described herein is a computing core for rendering an image. The computing core can comprise a position encoding logic and a plurality of pipeline logics connected in series in a pipeline. The position encoding logic can be configured to transform coordinates and directions of sampling points corresponding to a portion of the image into high dimensional representations. The plurality of pipeline logics can be configured to output, based on the high dimensional representation of the coordinates and the high dimensional representation of the directions, intensity and color values of pixels corresponding to the portion of the image in one pipeline cycle. The plurality of pipeline logics can be configured to run in parallel.
In some embodiments, the plurality of pipeline logics can comprise a first pipeline logic, a second pipeline logic, and a third pipeline logic. The first pipeline logic can be configured to receive the high dimensional representation of the coordinates, the second pipeline logic can be configured to receive the high dimensional representation of the coordinates and an output of the first pipeline logic, and the third pipeline logic can be configured to receive the high dimensional representation of the directions and an output of the second pipeline logic, and output intensity and color values of the pixels corresponding to the portion of the image.
In some embodiments, the position encoding logic can be configured to execute Fourier feature mapping to transform the coordinates and the directions of the sampling points to the high dimensional representation of the coordinates and the high dimensional representation of the directions, respectively.
In some embodiments, a first memory and a second memory can be coupled to the position encoding logic. The first memory can be configured to store the high dimensional representation of the coordinates and the second memory can be configured to store the high dimensional representation of the directions. The first memory and the second memory can be synchronous random access memory modules.
In some embodiments, the first memory and the second memory can be first-in-first-out memories. The first memory can be configured to store the high dimensional representation of the coordinates and the second memory can be configured to store the high dimensional representation of the directions.
In some embodiments, the plurality of pipeline logics can be configured to encode a machine learning model based on a neural network. Each of the plurality of pipeline logics can be configured to perform computations associated with particular neural layers of the neural network.
In some embodiments, the neural network can be a neural radiance field.
In some embodiments, the neural radiance field can be encoded through the neural layers of the neural network.
In some embodiments, the neural network can comprise ten neural layers.
In some embodiments, the first pipeline logic can be configured to execute computations associated with first four neural layers of the neural network based on the high dimensional representation of the coordinates to output a first positional encoding representation.
In some embodiments, the second pipeline logic can be configured to execute computations associated with next three neural layers of the neural network based on a concatenation of the high dimensional representation of the coordinates and the first positional encoding representation to output a second positional encoding representation.
In some embodiments, the third pipeline logic can be configured to execute computations associated with final three neural layers of the neural network based on a concatenation of the high dimensional representation of the directions and the second positional encoding representation to output the intensity and color values of the pixels.
In some embodiments, the high dimensional representation of the coordinates can comprise 63 dimensions and the high dimensional representation of the directions can comprise 27 dimensions.
In some embodiments, each of the plurality of pipeline logics can comprise a multiply-accumulate array.
Described herein is a computing system comprising a plurality of the computing cores. The plurality of the computing cores can be configured to render a portion of an image in parallel.
Described herein is a computer-implemented image rendering method. A computing system can be configured to divide an image to be rendered into rows of image portions. The computing system can obtain, for each image portion, coordinates and directions of sampling point corresponding to pixels of the image portion. The computing system can transform, for each image portion, the coordinates and directions into high dimensional representations. The computing system can determine, through a computing core, intensity and color values of the pixels. The computing system can reconstruct the image based on intensity and color values of pixels of the rows of image portions.
In some embodiments, the coordinates and directions of the sampling points can be transformed into the high dimensional representations based on a Fourier feature mapping technique.
In some embodiments, the computing core can be configured to execute computations associated with a machine learning model encoded with a neural radiance field and the computing core is associated with a row of image portions.
In some embodiments, the machine learning model can be based on a neural network.
These and other features of the apparatuses, systems, methods, and non-transitory computer-readable media disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for purposes of illustration and description only and are not intended as a definition of the limits of the invention.
Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
The figures depict various embodiments of the disclosed technology for purposes of illustration only, wherein the figures use like reference numerals to identify like elements. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated in the figures can be employed without departing from the principles of the disclosed technology described herein.
Provided herein are technical solutions that address problems arising from conventional methods of image rendering as discussed above. In various embodiments, a computing system can be configured to render images based on neural radiance field techniques. The computing system can comprise a plurality of computing cores (or processing cores). The plurality of computing cores can be configured to parallelly render images through a machine learning model, such as a neural network. By taking advantage of parallel processing, the plurality of computing cores can accelerate computations associated with the machine learning model. Such a parallel computer architecture is desirable because each of the plurality of computing cores can be configured to process a particular portion of an image through the machine learning model. For example, assume that the machine learning model can be implemented with a neural network. In this example, each computing core can be dedicated to computations associated with the neural network to render a portion of an image. These and other features of the technical inventions are discussed herein.
In some embodiments, the first pipeline stage 102 can be configured to perform image rendering tasks relating to transforming positional encoding of sampling points in the three-dimensional imaging space. The first pipeline stage 102 can take in a position input vector 110 (e.g. “Position”) comprising 63 dimensions of spatial information (e.g., coordinates) and process the position input vector 110 through layers (e.g., neural layers) of the first pipeline stage 102. Based on the position input vector 110, the first pipeline stage 102 can output a first output vector 112 (e.g., a positional encoding vector representation), as part of an input, to the second pipeline stage 104. In some embodiments, the first output vector 112 can comprise 256 dimensions. In this regard, the first pipeline stage 102 can transform the position input vector 110 from 63 dimensions into a vector representation having 256 dimensions or features. The first output vector 112 can be concatenated with the position input vector 110 prior to being inputted into the second pipeline stage 104. As shown in
In some embodiments, the second pipeline stage 104 can be configured to perform image rendering tasks relating to processing of transformed positional encoding of sampling points of the three-dimensional imaging space. The second pipeline stage 104 can take in a first input vector 114 that is a concatenation of the position input vector 110 and the first output vector 112. In this regard, the first input vector 114 can have 319 dimensions or features (i.e., 63 dimensions of the position input vector 110 plus 256 dimensions of the first output vector 112). Based on the first input vector 114, the second pipeline stage 104 can output a second output vector 116 (i.e., a positional encoding vector representation), as part of an input, to the third pipeline stage 106. In some embodiments, the second output vector 116 outputted by the second pipeline stage 106 can comprise 256 dimensions or features. In this regard, the second pipeline stage 104 reduces the dimension of the first input vector 114 from 319 dimensions to 256 dimensions or features. As shown in
In some embodiments, the third pipeline stage 106 can be configured to output intensity and color values of sampling points of the three-dimensional imaging space (e.g., “Color” and “Intensity”). The third pipeline stage 106 can be configured to output the intensity and color values based on positional encoding and directions of sampling points. The third pipeline stage 106 can take in a second input vector 118 that is a concatenation of a direction vector 120 and the second output vector 116. Based on the second input vector 118, the third pipeline stage 106 can output the intensity and color values of the sampling points. In some embodiments, the direction vector 120 can be associated with camera rays of pixels of an image to be rendered. The camera rays are generated from a perspective of the image. In general, the camera rays can provide information relating to directions of sampling points in the three-dimensional imaging space that correspond to the pixels of the image. In this way, the pixels can be adapted to have intensity and color values of the sampling points. In some embodiments, the direction vector 120 can comprise 27 dimensions or features. When the direction vector 120 is concatenated with the second output vector 116, the resulting vector (i.e., the second input vector 118) has 283 dimensions or features. As shown in
Image rendering tasks performed by the first pipeline logic 204, the second pipeline logic 206, and the third pipeline logic 208 have already been discussed in reference with
In some embodiments, the position encoding logic 302 can be configured to transform coordinates and directions of sampling points to high dimensional representations. For example, the position encoding logic 302 can transform the coordinates of the sampling points from dimensions of three to a vector representation comprising 62 dimensions. As another example, the position encoding logic 302 can transform the directions of the sampling points from dimensions of two to a vector representation comprising 27 dimensions. Once transformed, the position encoding logic 302 can store the high dimensional representation of the coordinates and the high dimensional representation of the directions in the position SRAM 310 and the direction SRAM 312, respectively.
In some embodiments, the position SRAM 310 can be configured to temporarily store high dimensional representations of coordinates of sampling points corresponding to pixels of an image to be rendered. The high dimensional representations of the coordinates can be later accessed by the first pipeline logic 304 and the second pipeline logic 306 for further processing. In some embodiments, the direction SRAM 312 can be configured to temporarily store high dimensional representations of directions of sampling points corresponding to pixels of an image to be rendered. The high dimensional representations of the directions can be later accessed by the third pipeline logic 308, along with an output of the second pipeline logic 306, for processing intensity and color values of the sampling points.
In some embodiments, the first pipeline logic 304 can comprise a compute unit 304a communicatively coupled to SRAMs 304b, 304c and an output SRAM 304d. In various embodiments, the compute unit 304a can include at least one multiply-accumulate (MAC) array. The MAC array is a logic that can be configured or programmed to compute a product of two numbers and add the resulting product to an accumulator. In general, the MAC array can compute in full integer values or, in some cases, in floating-point values. Many variations are possible. In some embodiments, the compute unit 304a can be configured to access the high dimensional representations of the coordinates stored in the position SRAM 310 and perform calculations (i.e., neural calculations) associated with a portion of a machine learning model (i.e., a neural network) encoded by the first pipeline logic 304. As discussed in relation to
In some embodiments, the second pipeline logic 306 can comprise a compute unit 306a communicatively coupled to an input SRAM 306b, SRAMs 306c, 306d, and an output SRAM 306e. Similar to the first pipeline logic 304, in various embodiments, the compute unit 306a can include at least one multiply-accumulate (MAC) array. In some embodiments, the second pipeline logic 306 can be configured to access data stored in the output SRAM 304d of the first pipeline logic 304 and concatenated this data with the high dimensional representations of the coordinates stored in the position SRAM 310 prior to storing the concatenated data in the input SRAM 306b. The compute unit 306a can perform calculations (i.e., neural calculations) associated with a portion of the machine learning model executed by the second pipeline logic 306 based on the concatenated data through clock cycles of the computing core. Similar to the SRAMs 304b, 304c of the first pipeline logic 304, the SRAMs 306c, 306d can be configured in a ping-pong configuration to temporarily store resulting data as the concatenated data is processed through layers of the machine learning model (i.e., neural layers of the neural network) by the second pipeline logic 306. Upon completion, the compute unit 306a can store the resulting data in the output SRAM 306e to be accessed by the third pipeline logic 308.
In some embodiments, the third pipeline logic 308 can comprise a compute unit 308a communicatively coupled to an input SRAM 308b, a SRAM 308c, and an output SRAM 308d. Similar to the first pipeline logic 304 and the second pipeline logic 306, in various embodiments, the compute unit 308a can include at least one multiply-accumulate (MAC) array. In some embodiments, the third pipeline logic 308 can be configured to access data stored in the output SRAM 306e of the second pipeline logic 306 and concatenated this data with the high dimensional representations of the directions stored in the direction SRAM 312 prior to storing the concatenated data in the input SRAM 308b. The compute unit 308a can perform calculations (i.e., neural calculations) associated with a portion of the machine learning model executed by the third pipeline logic 308 based on the concatenated data through clock cycles of the computing core. The SRAM 308c can be configured to temporarily store resulting data as the concatenated data is processed through layers of the machine learning model (i.e., neural layers of the neural network) by the third pipeline logic 308. Upon completion of data processing, the compute unit 308a can output and store intensity and color values of sampling points corresponding to pixels of an image to be rendered in the output SRAM 308d. The intensity and color values can be later accessed to render the image.
In some embodiments, as shown in
At block 406, the processor 402 can divide an image to be rendered into rows of image portions.
At block 408, the processor 402 can obtain, for each image portion, coordinates and directions of sampling points corresponding to pixels of the image portion.
At block 410, the processor 402 can transform, for each image portion, the coordinates and directions of the sampling points into high dimensional representations.
At block 412, the processor 402 can determine, through a computing core, based on the high dimensional representations, intensity and color values of the pixels.
At block 414, the processor 402 can reconstruct the image based on intensity and color values of pixels of the rows of image portions
The techniques described herein, for example, are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include circuitry or digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
The computer system 500 also includes a main memory 506, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Such instructions, when stored in storage media accessible to processor 504, render computer system 500 into a special-purpose machine that is customized to perform the operations specified in the instructions.
The computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 502 for storing information and instructions.
The computer system 500 may be coupled via bus 502 to output device(s) 512, such as a cathode ray tube (CRT) or LCD display (or touch screen), for displaying information to a computer user. Input device(s) 514, including alphanumeric and other keys, are coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516. The computer system 500 also includes a communication interface 518 coupled to bus 502.
Unless the context requires otherwise, throughout the present specification and claims, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense, that is as “including, but not limited to.” Recitation of numeric ranges of values throughout the specification is intended to serve as a shorthand notation of referring individually to each separate value falling within the range inclusive of the values defining the range, and each separate value is incorporated in the specification as it were individually recited herein. Additionally, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. The phrases “at least one of,” “at least one selected from the group of,” or “at least one selected from the group consisting of,” and the like are to be interpreted in the disjunctive (e.g., not to be interpreted as at least one of A and at least one of B).
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may be in some instances. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiment.
A component being implemented as another component may be construed as the component being operated in a same or similar manner as the another component, and/or comprising same or similar features, characteristics, and parameters as the another component.
This application is a continuation application of International Application No. PCT/CN2021/136922, filed on Dec. 9, 2021, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/136922 | Dec 2021 | WO |
Child | 18646818 | US |