This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2023-0036561, filed on Mar. 21, 2023, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
The following description relates to a method and apparatus with super-sampling.
Super-sampling may be a method of generating a high-resolution image with high quality by removing an aliasing effect occurring in a low-resolution image. Super-sampling may be performed in real-time by using deep learning.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In a general aspect, here is provide a processor-implemented method of a processor including merging a first super-sampled image frame, having been generated at a first time point, with a second input image frame corresponding to a super-sampling target for a second time point to generate a merged image and generating a second super-sampled image frame by performing a super-sampling operation at the second time point that includes increasing a bit-precision of a result of an executing, by the processor, of a super-sampling neural network model provided a decreased bit precision of the merged image.
The method may include generating the merged image by mixing pixels of the first super-sampled image frame and pixels of the second input image frame based on determined change data and determining the change data corresponding to a change between the second input image frame and a first input image frame corresponding to a super-sampling target at the first time point.
The generating of the merged image may include applying a corresponding pixel of the second input image frame to a position satisfying a replacement condition among pixel positions of the merged image and warping and applying a corresponding pixel of the first output image frame to a position violating the replacement condition among the pixel positions of the merged image using the change data.
The generating a second super-sampled image frame may include generating a first temporary image having a narrower dynamic range than the merged image by performing tone mapping on the merged image and generating a second temporary image having lower bit-precision than the first temporary image by performing data type conversion on the first temporary image.
The method may include performing buffer layout conversion on the second temporary image to determine network input data, where the network input data has a depth characteristic layout instead of a spatial characteristic layout of the second temporary image.
The merged image may be expressed as a real number data type, and the network input data may be expressed as an integer number data type.
The decreasing of the bit-precision is performed by a first processor, the increasing of the bit-precision is performed by a second processor, and the first processor is one among a first plurality of processors and the second processor is one among a different plurality of second processors.
The method may include storing the network input data in a first memory space of the first processor and duplicating the network input data from the first memory space to a second memory space of the second processor while an operation of the first processor is stopped.
The method may include storing the network output data in the second memory space and duplicating the network output data from the second memory space to the first memory space while the operation of the first processor is stopped, in response to the duplication of the network output data being completed, the operation of the first processor is resumed.
The method may include rendering the second input image frame at the second time point, and, according to an asynchronous pipeline method, the first output image frame is displayed at the second time point instead of the first time point when the first output image frame was generated, and the second output image frame is displayed at a third time point instead of the second time point.
The second output image frame has higher quality than the second input image frame.
In a general aspect, here is provided an electronic device including a first processor configured to merge a first super-sampled image frame, having been generated at a first time point, with a second input image frame corresponding to a super-sampling target for a second time point to generate a merged image and generate a second super-sampled image frame by performing a super-sampling operation at the second time point that includes increasing a bit-precision of a result of an executing, by the processor, of a super-sampling neural network model executed by a second processor provided a decreased bit precision of the merged image.
To generate the merged image, the first processor may be configured to generate the merged image by mixing pixels of the first output image frame and pixels of the second input image frame based on determined change data and determine the change data corresponding to a change between the second input image frame and a first input image frame corresponding to a super-sampling target at the first time point.
The first processor may be configured to apply a corresponding pixel of the second input image frame to a position satisfying a replacement condition among pixel positions of the merged image and warp and apply a corresponding pixel of the first output image frame to a position violating the replacement condition among the pixel positions of the merged image using the change data.
The first processor may be configured to generate a first temporary image having a narrower dynamic range than the merged image by performing tone mapping on the merged image and generate a second temporary image having lower bit-precision than the first temporary image by performing data type conversion on the first temporary image.
The first processor may be configured to determine network input data based on the decreased bit-precision, store the network input data is stored in a first memory space of the first processor, and duplicate the network input data from the first memory space to a second memory space of the second processor while an operation of the first processor is stopped.
The first processor may be configured to render the second input image frame at the second time point, and, according to an asynchronous pipeline method, the first output image frame is displayed at the second time point instead of the first time point, and where the second output image frame is displayed at a third time point instead of the second time point.
In a general aspect, here is provided a processor-implemented method including merging a first super-sampling image result at a first time point with a second image target at a second time point to generate a merged image and generate a super-sampled second output image at the second time point by increasing a bit-precision provided a super-sampling neural network provided a decreased bit-precision image of the merged image.
The merging may include determining change data corresponding to a change between the second image target and a super-sampling target at the first time point and mixing pixels of the first output image frame and pixels of the second input image frame based on the change data to determine the merged image.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same, or like, drawing reference numerals may be understood to refer to the same, or like, elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences within and/or of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, except for sequences within and/or of operations necessarily occurring in a certain order. As another example, the sequences of and/or within operations may be performed in parallel, except for at least a portion of sequences of and/or within operations necessarily occurring in an order, e.g., a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms, and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application. The use of the term “may” herein with respect to an example or embodiment (e.g., as to what an example or embodiment may include or implement) means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto. The use of the terms “example” or “embodiment” herein have a same meaning (e.g., the phrasing “in one example” has a same meaning as “in one embodiment”, and “one or more examples” has a same meaning as “in one or more embodiments”).
Throughout the specification, when a component or element is described as being “on”, “connected to,” “coupled to,” or “joined to” another component, element, or layer it may be directly (e.g., in contact with the other component, element, or layer) “on”, “connected to,” “coupled to,” or “joined to” the other component, element, or layer or there may reasonably be one or more other components, elements, layers intervening therebetween. When a component, element, or layer is described as being “directly on”, “directly connected to,” “directly coupled to,” or “directly joined” to another component, element, or layer there can be no other components, elements, or layers intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof, or the alternate presence of an alternative stated features, numbers, operations, members, elements, and/or combinations thereof. Additionally, while one embodiment may set forth such terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, other embodiments may exist where one or more of the stated features, numbers, operations, members, elements, and/or combinations thereof are not present.
Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and specifically in the context on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and specifically in the context of the disclosure of the present application, and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein.
A machine learning model (e.g., a neural network through deep learning) may be trained to perform an inference according to a trained purpose or task. For example, a neural network may be trained to intuitively map input data and output data onto each other in a nonlinear relationship. As a non-limiting example, a neural network may be trained to infer a super-sampling result of an input image.
In a non-limiting example, the image quality of image frames may be enhanced through super-sampling operation 120. Through the super-sampling operation 120, the input image frame 101 with low resolution may be converted into an output image frame 102 with high resolution. The input image frame 101 may be a target of the super-sampling operation 120 and the output image frame 102 may be a result of the super-sampling operation 120. Depending on the type of the application engine 110, an image frame may be referred to as texture.
The input image frame 101 may be a rendering result of the application engine 110, and an output video may be determined through the output image frame 102. The application engine 110 may render a plurality of input image frames, and a plurality of output image frames may be generated based on the super-sampling operation 120 on the plurality of input image frames. The application engine 110 may perform post-processing on the plurality of output image frames and may display the plurality of output image frames as a video output.
The super-sampling operation 120 may be a machine learning model (e.g., a neural network) trained to perform super-sampling. The machine learning model used for the super-sampling operation 120 may be a neural reconstruction model. The machine learning model may perform reconstruction, such as a denoise operation, on a network input image and may remove artifacts from the network input image. The machine learning model may be made up of several machine learning models.
A machine learning model may include a deep neural network (DNN) including a plurality of layers. In a non-limiting example, the DNN may include at least one of a fully connected network (FCN), a convolutional neural network (CNN), or a recurrent neural network (RNN). In an example, at least a portion of the layers included in the neural network may correspond to a CNN and the other portion of the layers may correspond to an FCN. The CNN may be referred to as convolutional layers and the FCN may be referred to as fully connected layers. According to one embodiment, the machine learning model including a neural encoder and a neural decoder may be a neural auto encoder.
In an example, the neural network may be trained based on deep learning, and then after training, perform an inference or inferences for the purpose of training by intuitively mapping, to each other, input and output through a nonlinear relationships. Deep learning is a machine learning technique for solving a problem such as image or speech recognition from a big data set, as a non-limiting example. In an example, deep learning may be construed as an optimization problem solving process of finding a point at which energy loss, or cost, is minimized while training a neural network using prepared training data. Through supervised, unsupervised learning, or reinforcement training, of deep learning, a structure of the neural network or parameters and hyper-parameters corresponding to a model may be derived. The parameters may include connection weighting (weights) between respective multiple layers of a plurality of layers of the neural network. If the width (e.g., number of nodes in layers(s)) and the depth (e.g., number of layers) of the neural network are sufficiently large, the neural network may have a capacity sufficient to implement a predetermined function. The neural network may achieve an optimized performance (e.g., meetin an accuracy threshold) when learning from a sufficiently large amount of training data through an appropriate training process.
In one or more example, the neural network may have been previously been trained for the particular tasks (e.g., super-sampling), or such above training may be initially performed before the neural network is used for inference operations. That the neural network has been trained means that the neural network is ready for inference, i.e., inferences are operations of trained models. In an example, a “start”, implementation, or execution of the neural network” may include loading the neural network into a memory or that input data for inference is input or provided to the neural network after the neural network is loaded into the memory.
In a non-limiting example, a super-sampling operation 202 may include input processing 214 operation, network execution 215 operation, and output processing 216 operation. The network execution 215 includes the execution of the example trained super-sampling neural network. Through the input processing 214, the network input data used for the network execution 215 may be determined. A merged image may be generated based on the input image frame 211, the change data 212, and an output image frame 213, and the network input data may be determined by adjusting bit-precision of the merged image. Detailed operations related to input processing 214 are described in greater detail below.
Network output data may be generated through the network execution 215. In an example, the super-sampling neural network model may generate the network output data based on the network input data. Image quality may be enhanced through network execution 215. Detailed operations related to network execution 215 are described in greater detail below. The bit-precision of the network output data may be adjusted through output processing 216. An output image frame 217 corresponding to a result of super-sampling the input image frame 211 at the t-th time point may be generated. Detailed operations related to output processing 216 are described in greater detail below.
In an example, the network execution 215 may be performed by a processor (e.g., a neural processing unit (NPU) or other processor) processing operations of the super-sampling neural network model. In an example, the processor may perform multiply-accumulate (MAC) operations that may make up a large percentage in the operations of the super-sampling neural network model. In an example, a data format specialized in the application engine 201 may be converted into a data format specialized in the network execution 215 through the input processing 214. A data format specialized in the network execution 215 may be converted into a data format specialized in the application engine 201 through the output processing 216.
In an example, data that is used for the network execution 215 may have a bit-precision that is relatively low compared to the precision of the data used for the application engine 201. In an example, a real data type may be used for the application engine 201 and an integer data type may be used for the network execution 215. The real data type may be a float data type. Based on the relatively low precision of the bit-precision data, the super-sampling operation 202 may be utilized in environments where there are limited resources available resources, such as in a mobile environment.
The application engine 201 may perform post-processing on the output image frame 217 and may display the output image frame 217. The output image frame 217 may be a result of super-sampling at the t-th time point and may be used for the super-sampling operation 202 at the t+1-th time point.
In an example, the application engine 201 may generate an input image frame 221 as a result of rendering at the t+1-th time point. The input image frame 211 at the t-th time point may be an N-th frame and the input image frame 221 at the t+1-th time point may be an N+1-th frame. The application engine 201 may generate change data 222 representing the difference between a rendering result at the t-th time point and a rendering result at the t+1-th time point. In an example, the change data 222 may include motion vectors of corresponding pixels of the rendering result at the t-th time point and the rendering result at the t+1-th time point.
Input processing 223 may be performed based on the input image frame 221, the change data 222, and the output image frame 226. Network execution 224 may be performed based on the network input data upon input processing 223, and output processing 225 may be performed based on the network output data upon network execution 224. An output image frame 226 may be generated based on the output processing 225.
The application engine 201 may perform post-processing on the output image frame 226 and may display the output image frame 226. The output image frame 226 may be a result of super-sampling at the t-th time point and may be used for the super-sampling operation 202 at the t+1-th time point. In the output video displayed by the application engine 201, the output image frame 217 at the t-th time point may be the N-th frame and the output image frame 226 at the t+1-th time point may be the N+1-th frame.
In an example, training of the super-sampling neural network may use training data of input images, and in an example may also further perform input and output processings 214 and 220.
The first processor 301 may perform input processing 314 based on an input image frame 311, change data 312, and an output image frame 313. The output image frame 313 may be loaded from a history buffer 303. The history buffer 303 may store the output image frames 313 and 321 of each time point.
Based on the input processing 314, the network input data may be generated. In an example, the network input data may be stored in an input buffer 315. The first processor 301 and the second processor 302 may each have an individual memory for operations. The individual memory of the first processor 301 may be referred to as a first memory and the individual memory of the second processor 302 may be referred to as a second memory. The history buffer 303 may be a memory space of the first memory.
The input buffer 315 may be a memory space of the first memory. The network input data of the input buffer 315 may be transmitted to an input buffer 316 for the network execution 317. The input buffer 316 may be a memory space of the second memory. After the network input data is stored in the input buffer 315 of the first memory of the first processing unit 301, the network input data may be duplicated from the input buffer 315 of the first memory to the input buffer 316 of the second memory while operations of the first processor 301 are stopped. In an example, the operations of the first processor 301 may be stopped by locking the first processor 301.
The second processor 302 may execute the super-sampling neural network model based on the network input data stored in the input buffer 316 of the second memory and may generate network output data. After the network output data is stored in an output buffer 318 of the second memory, the network output data may be duplicated from the output buffer 318 of the second memory to an output buffer 319 of the first memory while operations of the first processing unit 301 are stopped. After the duplication of network output data is completed, the operations of the first processor 301 may resume.
The first processor 301 may perform output processing 320 based on the network output data. The output image frame 321 may be generated based on the output processing 314. The output image frame 321 may be stored in the history buffer 303 and may be used for input processing at a subsequent time point.
In an example, the super-sampling may be implemented through various system architectures. Based on the above description, the first processor 301 and the second processor 302 may use individual memories respectively referred to as the first memory and the second memory. In an example, the first memory and the second memory may be physically separate memories. The input buffer 315, the output buffer 319, and the history buffer 303 may be provided in the first memory of the first processor 301 and the input buffer 316 and the output buffer 318 may be provided in the second memory.
In an example, the first processor 301 and the second processor 302 may use a common memory. The common memory may be physically one memory (i.e., the same memory unit). In this case, the input buffer 315, the output buffer 319, and the history buffer 303 may be provided in a first partial space for the first processor 301 in the common memory and the input buffer 316 and the output buffer 318 may be provided in a second partial space for the second processor 302 in the common memory. The first partial space and the second partial space may be internally considered as separate memories in the system.
In an example, the second processor 302 may directly access the first memory of the first processor 301. In this case, the input buffer 315, the output buffer 319, and the history buffer 303 may be provided in the first memory of the first processor 301 and the second processor 302 may perform network execution 317 by directly accessing the input buffer 315 and may directly store the network output data upon network execution 317 in the output buffer 319. In this process, the input buffer 316 and the output buffer 318 may not be used.
In an example, in which individual memories referred to as the first memory and the second memory are used may be representatively described below. However, the description does not limit application of other embodiments.
In an example, the warping 410 and merging 420 may be performed separately. The warping 410 of a previous output image frame 401 may be performed based on change data 402 corresponding to a change between a previous input image frame and a current input image frame 403. Based on the warping 410, a warped image may be generated. A previous time point may be the t−1-th time point and a current time point may be the t-th time point. The merging 420 of the warped image and the current input image frame 403 may be performed by mixing pixels of the warped image and pixels of the current input image frame 403. Based on the merging 420, a merged image may be generated.
In an example, the warping 410 and merging 420 may be combined. The change data 402 corresponding to a change between the previous input image frame and the current input image frame 403 may be determined and the merged image may be determined by mixing pixels of the previous output image frame 401 and pixels of the current input image frame 403 based on the change data 402. In an example, the merged image may be determined by applying a corresponding pixel of the current input image frame 403 to a position satisfying a replacement condition among pixel positions of the merged image and then warping and applying a corresponding pixel of the previous output image frame 401 to a position violating the replacement condition using the change data 402. When the warping 410 and merging 420 are combined and performed, a number of operations that may be required to determine a merged image may be minimized.
In an example, a first temporary image having a narrower dynamic range than the merged image may be generated through tone mapping 430 related to the merged image. In an example, an image (e.g., the merged image) before tone mapping 430 may use a 16-bit or a 32-bit high dynamic range (HDR) and an image (e.g., the first temporary image) after the tone mapping 430 may use an 8-bit standard dynamic range (SDR). In an example, as described in Equation 1 shown below, instead of defining a single tone mapping function for the entire range, a tone mapping function may be defined to a partial range.
In Equation 1, the PartialTonemap(I) may denote a function performing partial tone mapping on I, and I may denote an input image. In addition, Tonemap(I) may denote a function performing tone mapping on I, and const may denote a constant. In an example, when a custom range is set to [0, 1], Tonemap(I) may be applied to only pixel values in the custom range and pixel values outside the custom range may be converted into constants. In an example, a pixel value exceeding 1 may be fixed to 1.
In an example, pixel values of an image before tone mapping 430 may be distributed between [0, 1] but pixel values for very bright areas may have values greater than 1. However, because an effect of super-sampling on the very bright area is not significant, information loss upon an application of partial tone mapping may be significantly small. On the other hand, a limited bit-width may be efficiently used through partial tone mapping. In addition, because quantization of decreased bit-precision is applied to only a partial range through the partial tone mapping during the data type conversion 440, information loss due to quantization may be minimized. In addition, when a residual image, which is described in greater detail below, is used, super-sampling may be effectively applied to values between [0, 1] and values greater than or equal to 1 may maintain their original values.
In an example, a second temporary image having a bit-precision that is lower than the first temporary image may be generated through data type conversion 440 on the first temporary image. The data type conversion 440 may be quantization. In an example, a pixel value of an image (e.g., the first temporary image) before the data type conversion 440 may be expressed as a real data type and an image (e.g., the second temporary image) after the data type conversion 440 may be expressed as an integer data type. In an example, based on the tone mapping 430 and the data type conversion 440, 16-bit or 32-bit real number representation may be converted into 8-bit integer representation. In an example, based on data type conversion 440, a quantized data type may include uint8, int8, a custom quantized format, and the like. However, the quantized data type is not limited thereto and various bits less than 8 bits, such as 6 bits and 4 bits, may be used as the quantized data type.
Network input data may be determined based on buffer layout conversion 450 on the second temporary image. The network input data may have a depth characteristic layout of the second temporary image instead of a spatial characteristic layout through buffer layout conversion 450 on the second temporary image. The buffer layout conversion 450 may correspond to space-to-depth conversion. However, the type of the buffer layout conversion 450 is not limited thereto and according to one embodiment, the buffer layout conversion 450 may be omitted or a different type of conversion from space-to-depth conversion may be performed.
In a non-limiting example, a replacement condition 510 based on a pixel position of the merged image 503 may be defined, and pixels of the merged image 503 may be determined by mixing pixels of the input image frame 501 with pixels of the output image frame 502 based on whether the replacement condition 510 is satisfied.
In an example, pixel positions of the merged image 503, a position satisfying the replacement condition 510 may be classified as a replacement position and a position violating the replacement condition may be classified as a maintenance position. A corresponding pixel of the input image frame 501 may be applied to the replacement position. A corresponding pixel of the output image frame 502 that is warped using the change data may be applied to the maintenance position.
In a non-limiting example, the replacement condition 510 may be based on a grid that is defined based on a size difference between the input image frame 501 and the output image frame 502 and an arbitrary cell position in the grid. When a width of the output image frame 502 is a times the width of the input image frame 501 and a height of the output image frame 502 is b times the width of the input image frame 501, an a*b grid may be defined. An arbitrary cell position in the a*b grid may be selected. A corresponding cell position in the merged image 503 may satisfy the replacement condition 510. For example, when a width of the input image frame 501 is twice the width of the output image frame 502 and a height of the input image frame 501 is twice the height of the output image frame 502, a 2*2 grid may be defined. Among cells of (1, 1), (1, 2), (2, 1) and (2, 2) of the grid, the cell position (1, 1) may be selected. A fixed cell position may be applied to all pixel positions satisfying the replacement condition or a different cell position may be applied to each pixel position. A pixel position corresponding to the cell position in the merged image 503 may satisfy the replacement condition 510.
In a non-limiting example, the image merging operation may be performed based on pseudocode of Table 1.
In Table 1, OutputPixelPos may be a position (e.g., an x coordinate and a y coordinate) of a pixel that is desired to obtain a value in the merged image 503, UpscaleFactor may be the size (or resolution) difference (or a ratio) between the input image frame 501 and the output image frame 502, sampleInputPixelPos( ) may be a function that sets a position of the input image frame 501 to be sampled with respect to OutputPixelPos, getMotion( ) may be a function that retrieves a value of difference data (e.g., a motion vector) on OutputPixelPos, and isInputTextureLocation( ) may be a function that determines whether coordinates of the merged image 503 satisfy the replacement condition 50. In association with getMotion( ) additional processing, such as bilinear sampling, may be performed.
According to the code of Table 1, when OutputPixelPos satisfies a condition by isInputTextureLocation, a pixel value of the input image frame 501 may be substituted into the position and when OutputPixelPos does not satisfy the condition, a pixel value of the output image frame 502 may be warped and then may be substituted. The isInputTextureLocation( ) function may determine a position of the merged image 503 to be substituted by information of the input image frame 501. As illustrated in
The isInputTextureLocation function may vary depending on an algorithm and in this process, camera jitter may also be considered. In an example, for the input image frame 501 and the output image frame 502, a merging method, such as concatenation in a channel dimension direction as well as a method of merging in a high-resolution image, such as the merged image 503, may be used.
A neural network model may be executed for the network output data generation 640. Execution of the neural network model may include an inference operation of the neural network model. In an example, the inference operation may be performed through available various execution libraries, such as tflite, onnx runtime, and Snapdragon neural processing engine (SNPE).
The network output data may be an image with enhanced image quality or a residual image to be added to an input image frame to obtain an image with enhanced image quality. The neural network model may be trained to output an image with enhanced image quality or a residual image. When the residual image is used, the neural network model may omit upsampling or an addition operation, and thus, this may be advantageous for weight lightening.
In an example, the inverse tone mapping 710 may correspond to an inverse operation of tone mapping of input processing. A third temporary image having a wider dynamic range than network output data may be generated through the inverse tone mapping 710 on the network output data. In an example, an image (e.g., the network output data) before the inverse tone mapping 710 may use 8-bit SDR and an image (e.g., the third temporary image) after the inverse tone mapping 710 may use 16-bit or 32-bit HDR.
In an example, the data type conversion 720 may be an inverse operation of the data type conversion of the input processing. A fourth temporary image having a bit-precision that is higher than the third temporary image may be generated through data type conversion 720 on the third temporary image. In an example, a pixel value of an image (e.g., the third temporary image) before data type conversion 720 may be expressed as an integer data type and an image (e.g., the fourth temporary image) after the data type conversion 720 may be expressed as a real data type. In an example, based on inverse tone mapping 710 and data type conversion 720, 8-bit integer representation may be converted into 16-bit or 32-bit real number representation.
In an example, the buffer layout conversion 730 may be an inverse operation of buffer layout conversion of input processing. An output image frame may be determined based on the buffer layout conversion 730. The output image frame may have a spatial characteristic layout of the fourth temporary image instead of a depth characteristic layout through the buffer layout conversion 730 on the fourth temporary image. In an example, the layout conversion 730 may correspond to depth-to-space conversion.
Referring to
In a non-limiting example, Table 2 shown below may represent an example of pseudocode used to determine an output image frame when the network output data 701 is an image with enhanced image quality.
In a non-limiting example, Table 3 shown below may represent an example of pseudocode used to determine an output image frame when the network output data 701 is a residual image to be added to an input image frame to obtain an image with enhanced image quality.
In an example, OutputPixelPos may be a position (e.g., an x coordinate and a y coordinate) of a pixel that is desired to obtain a value in an output image frame, UpscaleFactor may be the size (or resolution) difference (or a ratio) between an input image frame and an output image frame, sampleInputPixelPos( ) may be a function for setting a position of an input image frame to be sampled with respect to OutputPixelPos, and convertToFloat( ) may be a function for converting quantized data into real data.
In an example, Tonemap( ) and InverseTonemap( ) may mutually be an inverse conversion relationship. Buffer layout conversion, such as space-to-depth conversion or depth-to-space conversion, may be performed together by appropriately adjusting BufferIndex.
In an example, a pipeline having latency between a rendered frame (e.g., the N-th frame) and a display frame (e.g., the N−1-th frame) may be referred to as an asynchronous pipeline. According to a sequential pipeline method that displays the output image frame 846 of the N-th frame at the t-th time point and displays an output image frame (not shown) of the N+1-th frame at the t+1-th time point, a latency may occur in response to suspension or resumption of a processor (e.g., the first processor). According to the asynchronous pipeline method, the latency that may occur in response to suspension or resumption of the processor (e.g., the first processor) may be removed.
As higher frame per second (FPS) speeds are used, although a difference of 1 frame between a rendered frame and a display frame occurs according to the asynchronous pipeline method, the resulting difference may occur in a short enough of a period of time where a user may not substantially be able to sense the difference.
More specifically, an application engine 801 may render the N-th frame at the t-th time point. The input image frame 811 may correspond to the N-th frame. At the t-th time point, the input processing 814 may be performed based on the input image frame 811, change data 812, and an output image frame 813. Network input data based on input processing 814 may be stored in the input buffer 815. The input buffer 815 may correspond to a first memory space (e.g., the first memory or the first partial space) of the first processor. In an example, to execute a neural network model based on the network input data of the input buffer 815, the first processor may be locked for buffer transmission. This locking and/or unlocking may cause an increased amount of latency. Accordingly, the neural network model may be executed based on the network input data based on the N−1-th frame instead of the N-th frame at the t-th time point.
In an example, network execution 822 may be performed based on the network input data of the previous input buffer 821 at the t-th time point. The network input data of the previous input buffer 821 may be based on the N−1-th frame rendered at the t−1-th time point. The previous input buffer 821 may correspond to a second memory space (e.g., the second memory or the second partial space) of the second processor. Network output data based on the network execution 822 may be stored in an output buffer 823 of the second memory space. The network output data of the output buffer 823 may be transmitted to the first memory space and output processing 825 may be performed based on the network output data of the first memory space. When the network output data corresponds to a residual image, an input image frame 824 may be used for output processing 825. The output image frame 826 may be generated based on the output processing 825.
In an example, the application engine 801 may perform post-processing and a display operation based on the output image frame 826 at the t-th time point. The output image frame 826 may correspond to a high-resolution version of the N−1-th frame. Network input data of the N−1-th frame between the t−1-th time point and the t-th time point may be transmitted to the previous input buffer 821 of the second memory space from an input buffer (not shown) of the first memory space, a network execution 822 may be performed, the network output data may be transmitted to an output buffer (not shown) of the first memory space from the output buffer 823 of the second memory space, and the output image frame 826 may be generated. Accordingly, the output image frame 826 of the N−1-th frame rendered at the t−1-th time point may be displayed instead of the output image frame 846 of the N-th frame rendered at the t-th time point. The output image frame 846 may be displayed at the t+1-th time point.
More specifically, the application engine 801 may render the N+1-th frame at the t+1-th time point. The input image frame 831 may correspond to the N+1-th frame. Input processing 834 may be performed based on the input image frame 831, change data 832, and an output image frame 833 at the t+1-th time point. Network input data based on the input processing 834 may be stored in an input buffer 835. The input buffer 835 may correspond to the first memory space of the first processor.
In an example, the network execution 842 may be performed based on network input data of a previous input buffer 841 at the t+1-th time point. The network input data of the previous input buffer 841 may be based on the N-th frame rendered at the t-th time point. The previous input buffer 841 may correspond to the memory space of the second memory space of the second processor. Network output data based on the network execution 842 may be stored in an output buffer 843. The network output data of the output buffer 843 may be transmitted to the first memory space, and output processing 845 may be performed based on the network output data of the first memory space. When the network output data corresponds to a residual image, an input image frame 844 may be used for output processing 845. The output image frame 846 may be generated based on output processing 845.
In a non-limiting example, the application engine 801 may perform post-processing and a display operation based on the output image frame 846 at the t+1-th time point. The output image frame 846 may correspond to a high-resolution version of the N-th frame. The output image frame 846 of the N-th frame rendered at the t-th time point may be displayed instead of the output image frame (not shown) of the N+1-th frame rendered at the t+1-th time point. The output image frame (not shown) of the N+1-th frame may be displayed at the t+2-th time point.
In an example, the electronic device 900a may include the first processor 910 configured to generate a merged image by merging a first output image frame corresponding to a super-sampling result at a first time point with a second input image frame corresponding to a super-sampling target at a second time point and determine network input data by decreasing bit-precision of the merged image and the second processor 920 configured to generate network output data by executing a neural network model based on the network input data. The first processor 910 may generate a second output image frame corresponding to a super-sampling result at the second time point by increasing bit-precision of the network output data.
In an example, to generate the merged image, the first processor 910 may determine change data corresponding to a change between the second input image frame and the first input image frame corresponding to a super-sampling target at the first time point and may generate the merged image by mixing pixels of the first output image frame and pixels of the second input image frame based on the change data.
In an example, to determine the merged image, the first processor 910 may apply a corresponding pixel of the second input image frame to a position satisfying a replacement condition among pixel positions of the merged image and may warp a corresponding pixel of the first output image frame using the change data and apply the warped corresponding pixel to a position violating the replacement condition among the pixel positions of the merged image.
In an example, to determine the network input data, the first processor 910 may generate a first temporary image having a narrower dynamic range than the merged image by performing tone mapping on the merged image and may generate a second temporary image having lower bit-precision than the first temporary image by performing data type conversion on the first temporary image.
In an example, after the network input data is stored in the first memory 911 of the first processor 910, while operations of the first processor 910 are stopped, the network input data may be duplicated from the first memory 911 to the second memory 921 of the second processor 920. After the network output data is stored in the second memory 921, while operations of the first processor 910 are stopped, the network output data may be duplicated from the second memory 921 to the first memory 911 and after duplication of the network output data is completed, operations of the first processor 910 may resume.
In an example, the second input image frame may correspond to a result of rendering at a second time point and based on the asynchronous pipeline method, the first output image frame may be displayed at the second time point instead of the second output image frame and the second output image frame may be displayed at the third time point.
The first processor 910 and the second processor 920 may be configured to execute computer-readable instructions that configure the first processor 910 and/or the second processor 920 to control the electronic apparatus 100 to perform one or more or all operations and/or methods involving the super-sampling of images, and may include any one or a combination of two or more of, for example, a central processing unit (CPU), a graphic processing unit (GPU), a neural processing unit (NPU), and/or tensor processing units (TPUs), but is not limited to the above-described examples.
Referring to
Referring to
In addition, descriptions provided with reference to
Referring to
The processor 1010 may be configured to computer-readable instructions that configure the processor 1010 to control the electronic apparatus 1000 to perform one or more or all operations and/or methods involving the super-sampling of images. For example, the processor 1010 may process instructions stored in the memory 1020 or the storage device 1040. The processor 1010 may perform operations of
The camera 1030 may capture a photo and/or record a video. The storage device 1040 may include a computer-readable storage medium or computer-readable storage device. The storage device 1040 may store more information than the memory 1020 for a long time. For example, the storage device 1040 may include a magnetic hard disk, an optical disc, a flash memory, a floppy disk, or other non-volatile memories known in the art.
The input device 1050 may receive an input from the user in traditional input manners through a keyboard and a mouse and in new input manners such as a touch input, a voice input, and an image input. For example, the input device 1050 may include a keyboard, a mouse, a touch screen, a microphone, or any other device that detects the input from the user and transmits the detected input to the electronic device 1000. The output device 1060 may provide an output of the electronic device 1000 to the user through a visual, auditory, or haptic channel. The output device 1060 may include, for example, a display, a touch screen, a speaker, a vibration generator, or any other device that provides the output to the user. The network interface 1070 may communicate with an external device through a wired or wireless network.
In addition, descriptions of
In an example, operation 1110 may include determining change data corresponding to a change between the second input image frame and the first input image frame corresponding to a super-sampling target at the first time point and generating the merged image by mixing pixels of the first output image frame and pixels of the second input image frame based on the change data. The determining of the merged image may include applying a corresponding pixel of the second input image frame to a position satisfying a replacement condition among pixel positions of the merged image, and warping a corresponding pixel of the first output image frame using the change data, and applying the warped corresponding pixel to a position violating the replacement condition among the pixel positions of the merged image.
In an example, operation 1120 may include generating a first temporary image having a narrower dynamic range than the merged image by performing tone mapping on the merged image and generating a second temporary image having lower bit-precision than the first temporary image by performing data type conversion on the first temporary image.
In an example, operation 1120 may include performing buffer layout conversion on the second temporary image such that the network input data has a depth characteristic layout instead of a spatial characteristic layout of the second temporary image.
In an example, the merged image may be expressed as a real data type and network input data may be expressed as an integer data type.
In a non-limiting example, a first processor may be used for determining the network input data and a second processor may be used for generating the network output data. The first processor and the second processor may be different types of processors. After the network input data is stored in a first memory space of the first processor, while operations of the first processor are stopped, the network input data may be duplicated from the first memory space to a second memory space of the second processor. After the network output data is stored in the second memory space, while operations of the first processor are stopped, the network output data may be duplicated from the second memory space to the first memory space and after duplication of the network output data is completed, operations of the first processor may resume.
In an example, the second input image frame may correspond to a result of rendering at a second time point and based on the asynchronous pipeline method, the first output image frame may be displayed at the second time point instead of displaying the second output image frame and the second output image frame at the third time point.
In an example, the second output image frame may have higher quality than the second input image frame.
The processors, memories, machine learning models, neural networks, electronic apparatuses, electronic apparatus 100, super-sampling operation 120, application engine 110, application engine 201, first processor 301, second processor 302, and application engine 801 described herein and disclosed herein described with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media, and thus, not a signal per se. As described above, or in addition to the descriptions above, examples of a non-transitory computer-readable storage medium include one or more of any of read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and/or any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Therefore, in addition to the above and all drawing disclosures, the scope of the disclosure is also inclusive of the claims and their equivalents, i.e., all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0036561 | Mar 2023 | KR | national |