The field relates generally to processing a rendered image, and more specifically to neural denoising and upscaling a rendered image.
Rendering images using a computer has evolved from low-resolution, simple line drawings with limited colors made familiar by arcade games decades ago to complex, photo-realistic images that are rendered to provide content such as immersive game play, virtual reality, and high-definition CGI (Computer-Generated Imagery) movies. While some image rendering applications such as rendering a computer-generated movie can be completed over the course of many days, other applications such as video games and virtual reality or augmented reality may entail real-time rendering of relevant image content. Because computational complexity may increase with the degree of realism desired, efficient rendering of real-time content while providing acceptable image quality is an ongoing technical challenge.
Producing realistic computer-generated images typically involves a variety of image rendering techniques, from rendering perspective of the viewer correctly, rendering different surface textures, and providing realistic lighting. But rendering an accurate image takes significant computing resources, and becomes more difficult when the rendering must be completed many tens to hundreds of times per second to produce desired framerates for game play, augmented reality, or other applications. Specialized graphics rending pipelines can help manage the computational workload, providing a balance between image quality and rendered images or frames per second using techniques such as taking advantage of the history of a rendered image to improve texture rendering. Rendered objects that are small or distant may be rendered using fewer triangles than objects that are close, and other compromises between rendering speed and quality can be employed to provide the desired balance between frame rate and image quality.
In some embodiments, an entire image may be rendered at a lower resolution than the eventual display resolution, significantly reducing the computational burden in rendering the image. Even with such techniques, the number of triangles that may be rendered per image while maintaining a reasonable frame rate for applications such as gaming or virtual/augmented reality may be significantly lower than the display resolution of a modern smartphone, tablet computer, or other device. This may result in a rendered image in which contains an upsampled image that has resolution scaling or upsampling artifacts, or includes other imperfections as a result of such constraints on the number of calculations that can be performed to produce each image.
Similarly, it may be desirable to reduce the number of rays that are traced to produce a high quality rendered image, such as tracing only those rays that are visible to the viewer by starting from the viewer's perspective and tracing backward to the light source. Even with such techniques, the number of light rays that can be traced per image while maintaining a reasonable frame rate for applications such as gaming or virtual/augmented reality may be orders of magnitude lower than might be captured by a camera photographing the same image in real life. This may result in a rendered image in which contains pixel noise, has resolution scaling or upsampling artifacts, or includes other imperfections as a result of such constraints on the number of calculations that can be performed to produce each image.
For reasons such as these, it is desirable to manage image artifacts in rendered images, such as during real-time rendered image streams.
The claims provided in this application are not limited by the examples provided in the specification or drawings, but their organization and/or method of operation, together with features, and/or advantages may be best understood by reference to the examples provided in the following detailed description and in the drawings, in which:
Reference is made in the following detailed description to accompanying drawings, which form a part hereof, wherein like numerals may designate like parts throughout that are corresponding and/or analogous. The figures have not necessarily been drawn to scale, such as for simplicity and/or clarity of illustration. For example, dimensions of some aspects may be exaggerated relative to others. Other embodiments may be utilized, and structural and/or other changes may be made without departing from what is claimed. Directions and/or references, for example, such as up, down, top, bottom, and so on, may be used to facilitate discussion of drawings and are not intended to restrict application of claimed subject matter. The following detailed description therefore does not limit the claimed subject matter and/or equivalents.
In the following detailed description of example embodiments, reference is made to specific example embodiments by way of drawings and illustrations. These examples are described in sufficient detail to enable those skilled in the art to practice what is described, and serve to illustrate how elements of these examples may be applied to various purposes or embodiments. Other embodiments exist, and logical, mechanical, electrical, and other changes may be made.
Features or limitations of various embodiments described herein, however important to the example embodiments in which they are incorporated, do not limit other embodiments, and any reference to the elements, operation, and application of the examples serve only to aid in understanding these example embodiments. Features or elements shown in various examples described herein can be combined in ways other than shown in the examples, and any such combinations is explicitly contemplated to be within the scope of the examples presented here. The following detailed description does not, therefore, limit the scope of what is claimed.
As graphics processing power available to smart phones, personal computers, and other such devices continues to grow, computer-rendered images continue to become increasingly realistic in appearance. These advances have enabled real-time rendering of complex images in sequential image streams, such as may be seen in games, augmented reality, and other such applications, but typically still involve significant constraints or limitations based on the graphics processing power available. For example, images may be rendered at a lower resolution than the eventual desired display resolution, with the render resolution based on the desired image or frame rate, the processing power available, the level of image quality acceptable for the application, and other such factors. Similarly, ray-traced images can be denoised, reducing the impact of the limited number of rays that can be traced in generating real-time rendering of image streams.
Rendering images at a lower resolution than the display resolution can significantly reduce the computational burden of rendering sequential images in real time, enabling even relatively low power devices such as smartphones to display complex rendered graphics at frame rates that provide for engaging game play or that are suitable for other graphics applications such as augmented reality. For example, an image rendered at 1000×500 pixels that is subsequently scaled and displayed on a 2000×1000 pixel display may be scaled using a scale factor of 2×2, resulting in approximately a quarter the computational workload for the rendering unit relative to rendering a full-scale image.
The number of light rays that are traced in a large or complex image to produce a realistic image with acceptably low levels of illumination noise is very high, and similarly presents a computational challenge for real-time rendering of image sequences. High-quality sequential images such as rendered movies often take days or weeks to render rather than the hour or two runtime length of the rendered movie, which is impractical to achieve on mobile devices, for game play, for augmented or virtual reality, or for other applications where real-time rendering of images with acceptable image quality is important.
Ray tracing computation can be reduced for an image frame by significantly limiting the number of traced light rays, again resulting a significant reduction in computations per rendered image frame. But tracing fewer light rays per pixel results in less accurate and more noisy illumination, including especially more lighting noise in areas that receive relatively few light rays. Monte Carlo (randomized) selection or sampling of light rays to trace can improve approximation of real-life light tracing, but hundreds or thousands of ray-traced samples per pixel may still be desired to produce photo-realistic images with acceptably low noise using traditional techniques.
Problems such as these may be addressed in the same graphics pipeline such as by performing rendering at a low resolution and upsampling the rendered image to a desired display resolution, and/or by employing denoising techniques such as trilinear filtering or other filtering methods to a rendered image. But, simply combining various methods such as these sequentially into a graphics rendering pipeline can result in reducing the effectiveness of the methods employed, such as where an upsampling method employing jitter to reduce image artifacts such as checkerboarding or aliasing is weakened by a denoising solution that employs a history buffer that effectively reduces or eliminates the impact of jitter on the upsampled pixels.
Some examples presented herein therefore provide for improved upsampling a rendered image using jitter values and denoising the rendered image using a denoising method that does not interfere with jittered sampling of the rendered image, allowing both the denoising and upsampling steps to work as intended. In a more detailed example, one or more projected lighting components in a rendered image are offset from pixel location centers in the rendered image according to associated jitter vectors, and the lighting components are denoised in a way that preserves the associated jitter vectors. The denoised image is then transformed to an upsampled image in a second image format using the associated jitter vectors, such that the upsampled image has a spatial resolution higher than the rendered image.
Performing denoising before upscaling may provide benefits such as reducing a number of pixels that are processed in performing denoising, and allowing an upsampling algorithm to perform rectification or invalidation of warped history using the best available image (such as where pixels become disoccluded in the current image frame). Implementing a denoising solution before upsampling presents a potential problem in that denoising using a history buffer may have the effect of averaging jittered samples taken from each pixel location over time, reducing the effectiveness of using jittered samples to accurately accumulate pixel values into the upsampled high resolution output over time, in turn reducing checkerboarding and other such image artifacts in the output image. Using a denoising process that preserves the jittered or offset pixel location samples through the denoising process is therefore desirable in that it enables the upsampling algorithm to use the jittered pixel samples to reduce image artifacts such as checkerboarding and aliasing.
Trained neural network 100 in this example may use image data provided as an input tensor to generate a feedback tensor comprising an image at 540×960 pixels resolution. An output tensor of trained neural network 106 may comprise denoising filter parameters provided to ray denoising block 108, and upsampling/blending parameters such as an alpha blending coefficient to upsampling/accumulate block 110. Although trained neural network 106 in this example comprises a single neural network configured to generate multiple output tensors for different stages of the image processing pipeline such as denoising and upsampling/accumulation, other examples may employ separate or discrete neural networks to perform functions such as these and/or other functions. Ray denoising block 108 may receive the preprocessed image and denoising filter parameters, and may perform denoising using a filter such as a trilinear filter or other suitable filter to reduce the amount of noise in the preprocessed rendered image. A ray denoising filter employed in block 108 in this example may reduce pixel noise in the image to be upsampled before sending the denoised jittered image to upsample/accumulate block 110 in a way that preserves jitter offset pixel sampling locations. This may be achieved, for example, using a denoising technique that does not rely on use of a history buffer in a denoising process that may cause a pixel sample location to average out to a pixel center (e.g., no longer jitter offset) location over time.
A denoised image with jitter offset pixel sampling locations still intact may be provided to upsample/accumulate block 110 for upsampling. In the example of
In a further example, one or more of these processes such as preprocessing at 104, ray denoising at 108, and/or upsampling and accumulating or blending at 110 may be performed separately for different lighting components or types, such as specular lighting, diffuse lighting, and albedo (or the degree to which light impacting a surface is diffusely reflected). In a more detailed example, separate history buffers may be maintained and employed for blending at 110 for each of specular lighting, diffuse lighting, and albedo. An upsampled and blended output image provided from 110 in this example is a 720×1080 image, and may include lighting component outputs that are provided as an accumulated history to pre-processing block 104 as shown in
Render block 206 may employ rendering and ray tracing to produce a current image frame FRAME N, using jitter offsets and/or other such inputs. The rendered image frame may be rendered at a lower resolution than a desired display resolution, and may be provided as an array of pixel values sampled at jitter offsets from the center of respective pixel locations in the image frame to reduce checkerboarding, aliasing, and other such artifacts. In a further example, an image frame may comprise different color channels such as red, green, and blue color brightness levels for individual pixel locations, and may include separate images or channels for different lighting components such as specular, diffuse, and albedo. A rendered image may be provided to denoise block 208, which may be any type of image denoising filter. In a more detailed example, a trilinear filter is employed to reduce pixel noise and various resolution sampling artifacts in the rendered image. Some example denoising filters 208 may avoid biasing the received rendered image during the denoising process using historical image signal values such as from FRAME N−1 or previous image frames. Disadvantageously, incorporation of such historical image signal values may have the effect of averaging or integrating the pixel sample locations over prior image frames such that the sampling jitter offsets are essentially filtered out and resolve to the center of each pixel location. A denoising filter at block 208 in this example therefore may be employed in such a way that it retains jitter offsets of pixel sampling locations provided by render block 206, passing along a denoised but still jittered image to upsample block 210.
Upsample block 210 may be operable to map pixel values in a lower resolution rendered image offset by jitter vectors to a higher resolution using the jitter vectors. In more detailed examples, the mapped signal intensity values may be scaled based on a length of the jitter vectors such that shorter jitter vectors closer to pixel centers are weighted more heavily than pixel values associated with long jitter vectors. Upsample block 210 may generate a sparse image having null pixel locations, which may be filled by interpolating between and/or among mapped image signal intensity values in nearby upscaled image pixel locations and/or image signal intensity values of a warped history image frame. The upsampled image may be provided to validate (rectify) block 214 and to accumulate (blend) block 216, and in some examples may comprise different outputs to blocks 214 and 216. In another example, upsample block 210 may upsample the denoised image from denoise block 208 to the same resolution, such as to perform temporal antialiasing in conjunction with accumulate (blend) block 216. More detailed embodiments of upsampling are described herein.
A rendered image may also be provided by render block 206 along with motion vectors (e.g., for objects or textures in image frame FRAME N relative to rendering instance N−1) to the reproject (resample) block 212, which may receive an accumulated image history output at 202 referenced to a previous rendering instance. Reproject history block 212 may re-project or warp an accumulated image history output onto a current rendered image, such as by using rendered or estimated motion vectors from render block 206 or another source. This may serve to align pixel values of an output image frame from a previous rendering instance with pixel values of the upsampled image frame in a current rendering instance.
Validate (rectify) block 214 may use a warped or pixel-aligned version of a past image frame to mitigate effects of image content being disoccluded or becoming visible in the current image frame FRAME N that were not visible in the prior image frame FRAME N−1, such as by reducing or clamping an influence of accumulated historic image frames on disoccluded pixels in the current image frame. In a further example, rectification or history invalidation may also be performed for changes in view-dependent lighting, such as where the camera's current view direction has changed sufficiently from a prior image's point of view to render specular highlight and/or reflection detail invalid between image frames.
Rectify history block 214 may also receive predicted blending coefficients such as from a trained neural network (not shown) to predict or estimate a degree to which a warped history is to be combined with the current rendered frame. A resulting rectified history output from block 214 may therefore comprise a combination of accumulated image history as well as some “clamped” pixels from a current rendered image, such as at locations where pixels become disoccluded in image frame FRAME N.
Accumulate history block 216 in some examples may selectively combine or blend a re-projected, rectified history received from rectify history block 214 with a current upscaled image frame. An accumulation process at accumulate history block 216 may use a predicted parameter “alpha” received from a trained neural network to determine a proportion of the current denoised and upsampled image frame to be blended with the re-projected, rectified history. Output from accumulate history block 216 may be provided to post-processing block 218, which may perform various filtering and/or processing functions to form an output image for display. An accumulated or blended image output from accumulate block 216 may also be stored or provided for processing the next image frame, as reflected by re-project block 214 (corresponding to the current frame's re-project block 212, which similarly receives prior frame accumulated image history from accumulate block 202). An example architecture of
An upsampled image with 1.5× the resolution of the original image shown at 302 is shown at 304, with the jitter-derived sampling points shown in the original image at 302 mapped to the corresponding locations in the upsampled image. The pixels in the upsampled image that include a sampling point may tend to cluster when non-integer upsampling ratios such as 1.5× upsampling are used, forming a checkerboard in the example shown at 304 as denoted by the shaded boxes which contain the mapped sampling points. The empty pixel locations shown at 304 do not contain a sampling point from the original image 302, and in some examples may be filled by interpolation, image history, or by a combination of interpolation and history or other such means. In some examples, integer upsampling ratios may result in a more uniform distribution of pixels with sampling points within the upsampled image, alleviating much of the checkerboarding, aliasing, and other such issues visible in the 1.5× upsampling example shown at 204.
The 1.5× upsampled image shown at 304 illustrates another problem with traditional upscaling algorithms in that sample points very near the edge of upsampled pixels in 304 dictate the color or shading of the entire pixel in which the sampling point resides, but have no influence on neighboring pixels, some of which may be a tiny fraction of a pixel width away from the sampling point and some of which may not have a sampling point located within (i.e., may be empty). Further, because the upsampled pixels shown at 304 span multiple original pixels shown at 302, the upsampled pixels cannot be accurately mapped back to the original image without further information such as separately retaining the sampling points used or other such original image information.
A scatter operation may place a sampling point from a low resolution pixel shown at 502 into a high resolution pixel shown at 504. The high resolution pixel may have a sampling point in only one of four high resolution pixels shown, and may comprise a part of a sparse image. The high resolution pixel and corresponding high resolution image may be used to update a history buffer, and in further examples may have an effect on the history buffer that varies with factors such as recent disocclusion of the pixel. As the upscaling factor increases, the scatter operation shown may have increasingly visible image artifacts, due to factors such as the jittered and scattered sampling points occupying an increasingly small percentage of high resolution pixels.
In a more detailed example, the low resolution coordinate position of the sampling point as shown at 502 can be translated to the sampling point's corresponding pixel in the high resolution pixel shown at 504 using expression [1] as follows:
where:
Once the pixel location of the sampling point in the high resolution image is obtained such as by using the above equation, the sample may be written sparsely into the high resolution image using expression [2] as follows:
where:
The resulting high resolution image, having each low resolution pixel's sampling point intensity or color value written into a corresponding pixel location in the high resolution image results in a sparse high resolution pixel in which no collisions (or low resolution pixel sampling locations written into the same high resolution pixel location) if the same jitter vector is used to sample all pixels in the low resolution image because there are always fewer low resolution sampling points (or pixels) than there are pixel locations in the high resolution image.
The sparse high resolution image constructed by upsampling according to expressions [1] and [2] and as shown in
where:
Because these methods may result in image artifacts such as checkerboarding, aliasing, stair stepping, and the like due at least in part to the nature of upsampling at arbitrary non-integer values, the pixel values in the upscaled high resolution image in some examples may be further weighted based on factors such as jitter vector length or the distance of the pixel sampling point from the center of the pixel. In one such example, the pixel shown at 502 shows a black dot at the center of the pixel, which is the point from which a jitter vector is derived. A sampling point represented by a gray dot is relatively far from the center of the pixel, and the distance and direction from the center of the pixel to the sampling point is represented by a jitter vector shown as a line and an arrow, such that the jitter vector comprises both direction and distance values describing the path from the black dot at the center of pixel 502 to the sampling point represented by a gray dot.
The jitter vector's length therefore varies with the distance from the center of the pixel to the sampling point, and is inversely proportional to the degree to which the sampling point is likely to represent the image color or intensity value of a sampling point taken at the center of the pixel. Because sampling points with jitter vector lengths that are relatively long (e.g., approaching 7× the pixel height and width) are much less likely to represent the average value sampling points within the pixel than samples taken near the center of the pixel, jitter vector lengths may be used to scale the contribution of sampling points to an accumulated history buffer such as the accumulate (blend) accumulated history buffer shown at 202 and 216 of
In a more detailed example, a jitter scale value may be calculated using expression [4] as follows:
where:
Expression [4] therefore uses linear interpolation to pick a jitterScale scaling value between 1 and MinScale to be applied to the sampling point before adding it to the accumulated history. The scaling value is derived by linear interpolation between 1 and MinScale, based on the jitter vector length's length relative to the maximum possible jitter length. Although the jitterScale scaling value in this example varies linearly with the jitter vector length from the center of the pixel to the sampling point, other examples may use non-linear functions such as gaussian functions, exponential functions, or the like to derive the scaling value from jitter vector length.
In a further example, upsampling artifacts such as checkerboarding, aliasing, and the like may be mitigated by filling empty pixels in the upscaled image with interpolated pixel values. By blending in samples interpolated from the sparse samples in the upscaled image, the sparse samples mapped to the upscaled image may help smooth upsampling artifacts in neighboring upsampled pixel locations. In the high resolution upsampled pixels shown at 504, a sparse sample is mapped into an upscaled image, such that the sparse sample occupies only the lower right upsampled image pixel. The other three pixel locations are unoccupied and do not contain sampling points, and so the upsampled image is deemed to be sparse (i.e., not filled with pixel sample values). This sparse sample high resolution pixels containing sampling points (e.g. masked) may be multiplied by an alpha or blending value predicted by a neural network for blending into the accumulated history, and scaled by the jitter vector length such that sampling points nearer the center of the pixel are more heavily weighted. Because empty pixels in the sparse sample shown at 504 are excluded from the process and remain empty.
Interpolated pixel values for empty pixel values may be added such as based on a predicted beta blending value, further reducing the effect of upsampling artifacts on an accumulated history buffer and output or displayed image. In a more detailed example, interpolated samples are calculated for at least the empty pixel locations that do not contain a sampling point, such as where an inverse mask may be employed to ensure that interpolated samples are only included for locations that do not have a sampling point. An interpolated pixel array masked to include only pixel locations with empty pixel values may be multiplied by a beta coefficient (β) such as may be predicted by a trained neural network to determine a degree to which the interpolated samples should be added to the accumulated history buffer. Interpolated samples may be derived in various examples using traditional interpolation algorithms such as bilinear or cubic interpolation algorithms, or may be derived using a trained neural network, a kernel prediction network, or other such interpolation algorithm.
In a further example, blending a portion of the interpolated pixel color values into empty pixels of sparse upscaled images may be calculated using expressions [5] and [6] as follows:
where:
Expression [5] uses the alpha(x, y) parameter to determine a degree of blending the sparse samples sparseSamples(x, y) into an accumulated history warpedHistory(x, y) to generate an output accumAlpha(x, y). A mask mask(x, y) limits the output to pixel locations having a sampling point, and each sample having a sampling point is further scaled by both alpha(x, y) and a jitterScale derived from jitter vector length or distance from the sampling point to the center of the sparse pixel for each pixel with a sampling point in the upscaled image. Expression [6] shown below uses the accumAlpha(x, y) calculated using expression [5] plus interpolated pixel data and an inverse mask to generate an output including interpolated pixel values where empty pixel values exist in accumAlpha(x, y) as follows:
where:
Expression [6] uses a linear interpolation function lerp and an inverse mask calculated using mask(x, y) to selectively output accumAlpha(x, y) for pixel locations having sampling points within the pixel locations and interpolated pixel values Interp(jitteredColor(x, y)) scaled by a blending factor, beta(x, y) for pixel locations that are empty or that do not contain sampling points. The output out(x, y) of expression [6] comprises an output that can be added to an accumulated history buffer such as the accumulate (blend) blocks 102 and 116 of
The processes shown in
As shown here, the original image 602 may be progressively downsampled to produce images of lower resolution and lower noise that can be aliased to rendered image objects, such as using a GPU's tri-linear filtering hardware commonly used for trilinear interpolation. The downsampling is performed in some examples using a de-noising filter such as a filter generated by a neural network-based kernel prediction network, which in some examples comprises a convolutional neural network that generates a kernel of scalar weights or coefficients applied to neighboring pixels to calculate each de-noised pixel of an image. In a more detailed example, a kernel prediction network may predict a matrix of kernel coefficients for each pixel that comprises coefficients or weights for neighboring pixels that can be used to determine the de-noised pixel value by computing the de-noised pixel value from its original value and its neighboring pixels. Filter coefficients generated by a kernel prediction network may be used to build successive layers of the pyramid of
In the example of
Although the number of pyramid levels in the example of
The image pyramid of
The resulting tri-linear filtering process described in conjunction with
The rendered image is therefore denoised at 704, such as by using a filtering or denoising process that preserves the jitter vector offset information associated with each sampled pixel. In one example, this comprises avoiding using a history buffer that might serve to average out the jitter offsets of sampling points over time and thereby effectively average out the influence of the jitter vectors on the denoised output. The denoising filter in a more detailed example comprises a trilinear filter, which may use sequentially downsampled versions of the rendered image and a desired level of detail for each pixel location to select which original or sequentially downsampled version of the rendered image from which to draw color information. In a further example, the trilinear filter may interpolate between downsampled versions or between the original and a downsampled version of the original image. In another example, the trilinear filter may perform filtering separately for at least two of two or more lighting components of the rendered image.
The denoised image is upsampled at 706 to a higher spatial resolution than the rendered image, such as to transform an image rendered at a lower resolution due to computational constraints to a higher resolution for display on an electronic device. The upsampling in a more detailed example uses the jitter vector information for the pixels of the denoised image to perform the upsampling, such as by using jitter offsets to weight the pixel contribution to an output image and/or using averaging to fill sparse or empty upsampled pixel locations. The upsampling in a further example may be performed separately for two or more lighting components, such as specular, diffuse, and albedo.
The upsampled image is combined with a warped history buffer image at 708 to provide an output image, which in some further examples is at an upsampled resolution that matches a display resolution of an electronic device. The upsampled image is both denoised to reduce the pixel noise inherent in rendered images having limited rays cast per pixel in the rendering and sampling process, and is subsequently upscaled using a method that takes advantage of jitter vectors employed in choosing sampling points in sampling the rendered image pixels using jitter information retained through the denoising process. The upsampling process therefore has the benefit of upsampling a less noisy image due to the prior denoising step, but retains the ability to use jitter information such as jitter vector weighting pixel samples and/or averaging sparse or empty pixels in generating an upsampled image to produce an upsampled image having reduced upsampling artifacts such as checkerboarding and aliasing.
Various parameters in the examples presented herein, such as blending coefficients, denoising or trilinear filter parameters, and other such parameters, may be determined using machine learning techniques such as a trained neural network. In some examples, a neural network may comprise a graph comprising nodes to model neurons in a brain. In this context, a “neural network” means an architecture of a processing device defined and/or represented by a graph including nodes to represent neurons that process input signals to generate output signals, and edges connecting the nodes to represent input and/or output signal paths between and/or among neurons represented by the graph. In particular implementations, a neural network may comprise a biological neural network, made up of real biological neurons, or an artificial neural network, made up of artificial neurons, for solving artificial intelligence (AI) problems, for example. In an implementation, such an artificial neural network may be implemented by one or more computing devices such as computing devices including a central processing unit (CPU), graphics processing unit (GPU), digital signal processing (DSP) unit and/or neural processing unit (NPU), just to provide a few examples. In a particular implementation, neural network weights associated with edges to represent input and/or output paths may reflect gains to be applied and/or whether an associated connection between connected nodes is to be excitatory (e.g., weight with a positive value) or inhibitory connections (e.g., weight with negative value). In an example implementation, a neuron may apply a neural network weight to input signals, and sum weighted input signals to generate a linear combination.
In one example embodiment, edges in a neural network connecting nodes may model synapses capable of transmitting signals (e.g., represented by real number values) between neurons. Responsive to receipt of such a signal, a node/neural may perform some computation to generate an output signal (e.g., to be provided to another node in the neural network connected by an edge). Such an output signal may be based, at least in part, on one or more weights and/or numerical coefficients associated with the node and/or edges providing the output signal. For example, such a weight may increase or decrease a strength of an output signal. In a particular implementation, such weights and/or numerical coefficients may be adjusted and/or updated as a machine learning process progresses. In an implementation, transmission of an output signal from a node in a neural network may be inhibited if a strength of the output signal does not exceed a threshold value.
According to an embodiment, a node 802, 804 and/or 806 may process input signals (e.g., received on one or more incoming edges) to provide output signals (e.g., on one or more outgoing edges) according to an activation function. An “activation function” as referred to herein means a set of one or more operations associated with a node of a neural network to map one or more input signals to one or more output signals. In a particular implementation, such an activation function may be defined based, at least in part, on a weight associated with a node of a neural network. Operations of an activation function to map one or more input signals to one or more output signals may comprise, for example, identity, binary step, logistic (e.g., sigmoid and/or soft step), hyperbolic tangent, rectified linear unit, Gaussian error linear unit, Softplus, exponential linear unit, scaled exponential linear unit, leaky rectified linear unit, parametric rectified linear unit, sigmoid linear unit, Swish, Mish, Gaussian and/or growing cosine unit operations. It should be understood, however, that these are merely examples of operations that may be applied to map input signals of a node to output signals in an activation function, and claimed subject matter is not limited in this respect.
Additionally, an “activation input value” as referred to herein means a value provided as an input parameter and/or signal to an activation function defined and/or represented by a node in a neural network. Likewise, an “activation output value” as referred to herein means an output value provided by an activation function defined and/or represented by a node of a neural network. In a particular implementation, an activation output value may be computed and/or generated according to an activation function based on and/or responsive to one or more activation input values received at a node. In a particular implementation, an activation input value and/or activation output value may be structured, dimensioned and/or formatted as “tensors”. Thus, in this context, an “activation input tensor” as referred to herein means an expression of one or more activation input values according to a particular structure, dimension and/or format. Likewise in this context, an “activation output tensor” as referred to herein means an expression of one or more activation output values according to a particular structure, dimension and/or format.
In particular implementations, neural networks may enable improved results in a wide range of tasks, including image recognition, speech recognition, just to provide a couple of example applications. To enable performing such tasks, features of a neural network (e.g., nodes, edges, weights, layers of nodes and edges) may be structured and/or configured to form “filters” that may have a measurable/numerical state such as a value of an output signal. Such a filter may comprise nodes and/or edges arranged in “paths” and are to be responsive to sensor observations provided as input signals. In an implementation, a state and/or output signal of such a filter may indicate and/or infer detection of a presence or absence of a feature in an input signal.
In particular implementations, intelligent computing devices to perform functions supported by neural networks may comprise a wide variety of stationary and/or mobile devices, such as, for example, automobile sensors, biochip transponders, heart monitoring implants, Internet of things (IoT) devices, kitchen appliances, locks or like fastening devices, solar panel arrays, home gateways, smart gauges, robots, financial trading platforms, smart telephones, cellular telephones, security cameras, wearable devices, thermostats, Global Positioning System (GPS) transceivers, personal digital assistants (PDAs), virtual assistants, laptop computers, personal entertainment systems, tablet personal computers (PCs), PCs, personal audio or video devices, personal navigation devices, just to provide a few examples.
According to an embodiment, a neural network may be structured in layers such that a node in a particular neural network layer may receive output signals from one or more nodes in an upstream layer in the neural network, and provide an output signal to one or more nodes in a downstream layer in the neural network. One specific class of layered neural networks may comprise a convolutional neural network (CNN) or space invariant artificial neural networks (SIANN) that enable deep learning. Such CNNs and/or SIANNs may be based, at least in part, on a shared-weight architecture of a convolution kernels that shift over input features and provide translation equivariant responses. Such CNNs and/or SIANNs may be applied to image and/or video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain-computer interfaces, financial time series, just to provide a few examples.
Another class of layered neural network may comprise a recursive neural network (RNN) that is a class of neural networks in which connections between nodes form a directed cyclic graph along a temporal sequence. Such a temporal sequence may enable modeling of temporal dynamic behavior. In an implementation, an RNN may employ an internal state (e.g., memory) to process variable length sequences of inputs. This may be applied, for example, to tasks such as unsegmented, connected handwriting recognition or speech recognition, just to provide a few examples. In particular implementations, an RNN may emulate temporal behavior using finite impulse response (FIR) or infinite impulse response (IIR) structures. An RNN may include additional structures to control stored states of such FIR and IIR structures to be aged. Structures to control such stored states may include a network or graph that incorporates time delays and/or has feedback loops, such as in long short-term memory networks (LSTMs) and gated recurrent units.
According to an embodiment, output signals of one or more neural networks (e.g., taken individually or in combination) may at least in part, define a “predictor” to generate prediction values associated with some observable and/or measurable phenomenon and/or state. In an implementation, a neural network may be “trained” to provide a predictor that is capable of generating such prediction values based on input values (e.g., measurements and/or observations) optimized according to a loss function. For example, a training process may employ backpropagation techniques to iteratively update neural network weights to be associated with nodes and/or edges of a neural network based, at least in part on “training sets.” Such training sets may include training measurements and/or observations to be supplied as input values that are paired with “ground truth” observations or expected outputs. Based on a comparison of such ground truth observations and associated prediction values generated based on such input values in a training process, weights may be updated according to a loss function using backpropagation. The neural networks employed in various examples can be any known or future neural network architecture, including traditional feed-forward neural networks, convolutional neural networks, or other such networks.
Smartphone 924 may also be coupled to a public network in the example of
Signal processing and/or filtering architectures 916, 918, and 928 of
Trained neural network 106 (
Computing devices such as cloud server 902, smartphone 924, and other such devices that may employ signal processing and/or filtering architectures can take many forms and can include many features or functions including those already described and those not described herein.
As shown in the specific example of
Each of components 1002, 1004, 1006, 1008, 1010, and 1012 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications, such as via one or more communications channels 1014. In some examples, communication channels 1014 include a system bus, network connection, inter-processor communication network, or any other channel for communicating data. Applications such as image processor 1022 and operating system 1016 may also communicate information with one another as well as with other components in computing device 1000.
Processors 1002, in one example, are configured to implement functionality and/or process instructions for execution within computing device 1000. For example, processors 1002 may be capable of processing instructions stored in storage device 1012 or memory 1004. Examples of processors 1002 include any one or more of a microprocessor, a controller, a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), an image signal processor (ISP), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or similar discrete or integrated logic circuitry.
One or more storage devices 1012 may be configured to store information within computing device 1000 during operation. Storage device 1012, in some examples, is known as a computer-readable storage medium. In some examples, storage device 1012 comprises temporary memory, meaning that a primary purpose of storage device 1012 is not long-term storage. Storage device 1012 in some examples is a volatile memory, meaning that storage device 1012 does not maintain stored contents when computing device 1000 is turned off. In other examples, data is loaded from storage device 1012 into memory 1004 during operation. Examples of volatile memories include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories known in the art. In some examples, storage device 1012 is used to store program instructions for execution by processors 1002. Storage device 1012 and memory 1004, in various examples, are used by software or applications running on computing device 1000 such as image processor 1022 to temporarily store information during program execution.
Storage device 1012, in some examples, includes one or more computer-readable storage media that may be configured to store larger amounts of information than volatile memory. Storage device 1012 may further be configured for long-term storage of information. In some examples, storage devices 1012 include non-volatile storage elements. Examples of such non-volatile storage elements include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
Computing device 1000, in some examples, also includes one or more communication modules 1010. Computing device 1000 in one example uses communication module 1010 to communicate with external devices via one or more networks, such as one or more wireless networks. Communication module 1010 may be a network interface card, such as an Ethernet card, an optical transceiver, a radio frequency transceiver, or any other type of device that can send and/or receive information. Other examples of such network interfaces include Bluetooth, 4G, LTE, or 5G, WiFi radios, and Near-Field Communications (NFC), and Universal Serial Bus (USB). In some examples, computing device 1000 uses communication module 1010 to wirelessly communicate with an external device such as via public network 922 of
Computing device 1000 also includes in one example one or more input devices 1006. Input device 1006, in some examples, is configured to receive input from a user through tactile, audio, or video input. Examples of input device 1006 include a touchscreen display, a mouse, a keyboard, a voice responsive system, video camera, microphone or any other type of device for detecting input from a user.
One or more output devices 1008 may also be included in computing device 1000. Output device 1008, in some examples, is configured to provide output to a user using tactile, audio, or video stimuli. Output device 1008, in one example, includes a display, a sound card, a video graphics adapter card, or any other type of device for converting a signal into an appropriate form understandable to humans or machines. Additional examples of output device 1008 include a speaker, a light-emitting diode (LED) display, a liquid crystal display (LCD or OLED), or any other type of device that can generate output to a user.
Computing device 1000 may include operating system 1016. Operating system 816, in some examples, controls the operation of components of computing device 1000, and provides an interface from various applications such as image processor 1022 to components of computing device 1000. For example, operating system 1016, in one example, facilitates the communication of various applications such as image processor 1022 with processors 1002, communication unit 1010, storage device 1012, input device 1006, and output device 1008. Applications such as image processor 1022 may include program instructions and/or data that are executable by computing device 1000. As one example, image processor 1022 may implement a signal processing and/or filtering architecture 1024 to perform image processing tasks or rendered image processing tasks such as those described above, which in a further example comprises using signal processing and/or filtering hardware elements such as those described in the above examples. These and other program instructions or modules may include instructions that cause computing device 1000 to perform one or more of the other operations and actions described in the examples presented herein.
Features of example computing devices in
The term electronic file and/or the term electronic document, as applied herein, refer to a set of stored memory states and/or a set of physical signals associated in a manner so as to thereby at least logically form a file (e.g., electronic) and/or an electronic document. That is, it is not meant to implicitly reference a particular syntax, format and/or approach used, for example, with respect to a set of associated memory states and/or a set of associated physical signals. If a particular type of file storage format and/or syntax, for example, is intended, it is referenced expressly. It is further noted an association of memory states, for example, may be in a logical sense and not necessarily in a tangible, physical sense. Thus, although signal and/or state components of a file and/or an electronic document, for example, are to be associated logically, storage thereof, for example, may reside in one or more different places in a tangible, physical memory, in an embodiment.
In the context of the present patent application, the terms “entry,” “electronic entry,” “document,” “electronic document,” “content,”, “digital content,” “item,” and/or similar terms are meant to refer to signals and/or states in a physical format, such as a digital signal and/or digital state format, e.g., that may be perceived by a user if displayed, played, tactilely generated, etc. and/or otherwise executed by a device, such as a digital device, including, for example, a computing device, but otherwise might not necessarily be readily perceivable by humans (e.g., if in a digital format).
Also, for one or more embodiments, an electronic document and/or electronic file may comprise a number of components. As previously indicated, in the context of the present patent application, a component is physical, but is not necessarily tangible. As an example, components with reference to an electronic document and/or electronic file, in one or more embodiments, may comprise text, for example, in the form of physical signals and/or physical states (e.g., capable of being physically displayed). Typically, memory states, for example, comprise tangible components, whereas physical signals are not necessarily tangible, although signals may become (e.g., be made) tangible, such as if appearing on a tangible display, for example, as is not uncommon. Also, for one or more embodiments, components with reference to an electronic document and/or electronic file may comprise a graphical object, such as, for example, an image, such as a digital image, and/or sub-objects, including attributes thereof, which, again, comprise physical signals and/or physical states (e.g., capable of being tangibly displayed). In an embodiment, digital content may comprise, for example, text, images, audio, video, and/or other types of electronic documents and/or electronic files, including portions thereof, for example.
Also, in the context of the present patent application, the term “parameters” (e.g., one or more parameters), “values” (e.g., one or more values), “symbols” (e.g., one or more symbols) “bits” (e.g., one or more bits), “elements” (e.g., one or more elements), “characters” (e.g., one or more characters), “numbers” (e.g., one or more numbers), “numerals” (e.g., one or more numerals) or “measurements” (e.g., one or more measurements) refer to material descriptive of a collection of signals, such as in one or more electronic documents and/or electronic files, and exist in the form of physical signals and/or physical states, such as memory states. For example, one or more parameters, values, symbols, bits, elements, characters, numbers, numerals or measurements, such as referring to one or more aspects of an electronic document and/or an electronic file comprising an image, may include, as examples, time of day at which an image was captured, latitude and longitude of an image capture device, such as a camera, for example, etc. In another example, one or more parameters, values, symbols, bits, elements, characters, numbers, numerals or measurements, relevant to digital content, such as digital content comprising a technical article, as an example, may include one or more authors, for example. Claimed subject matter is intended to embrace meaningful, descriptive parameters, values, symbols, bits, elements, characters, numbers, numerals or measurements in any format, so long as the one or more parameters, values, symbols, bits, elements, characters, numbers, numerals or measurements comprise physical signals and/or states, which may include, as parameter, value, symbol bits, elements, characters, numbers, numerals or measurements examples, collection name (e.g., electronic file and/or electronic document identifier name), technique of creation, purpose of creation, time and date of creation, logical path if stored, coding formats (e.g., type of computer instructions, such as a markup language) and/or standards and/or specifications used so as to be protocol compliant (e.g., meaning substantially compliant and/or substantially compatible) for one or more uses, and so forth.
Although specific embodiments have been illustrated and described herein, any arrangement that achieve the same purpose, structure, or function may be substituted for the specific embodiments shown. This application is intended to cover any adaptations or variations of the example embodiments of the invention described herein. These and other embodiments are within the scope of the following claims and their equivalents.