This invention pertains generally to the fields of optical computations and neural networks, and in particular to methods and systems for implementing a layered neural network on an analog platform and to optically perform operations of a layered neural network, for various applications including LiDAR.
Light Detection and Ranging (LiDAR) devices can be used in applications where accurate and reliable perception of the environment is required, including autonomous driving systems and robotics. Among environment sensors, three dimensions (3D) LiDAR devices and systems could play an increasingly important role, because their resolution and field of view can exceed radar and ultrasonic sensors, and they can provide direct distance measurements allowing reliable detection of many kinds of obstacles. Moreover, the robust and precise depth measurements of surroundings provided by LiDAR systems can often make them a leading choice for environmental sensing.
A typical LiDAR system operates by scanning its field of view with one or several laser beams or signals. This can be done using a properly designed beam steering sub-system. A laser beam can be generated with an amplitude-modulated laser diode emitting a near-infrared wavelength. The laser beam can then be reflected by the environment back to the scanner, and received by a photodetector. Fast electronics can filter the laser beam signal and measure differences between the transmitted and received signals, which can be proportional to a distance travelled by the signal. A range can be estimated with a sensor model based on such differences. Differences and variations in reflected energy, due to reflection off of different surface materials and propagation through different mediums, can be compensated for with signal processing.
LiDAR outputs can include unstructured 3D point clouds corresponding to the scanned environments, and intensities corresponding to the reflected laser energies. A 3D point cloud can be a collection of data points analogous to the real world in three dimensions, where each point is defined by its own position. In addition, point clouds can have canonical formats, making it easy to convert other 3D representation formats to point clouds and vice versa. A difficulty in dealing with point clouds is that for a 360-degree sweep, they can be unstructured and can typically contain around 100,000 3D points, and up to 120 points per square meter, making their processing a generally large computational challenge.
Compared to two-dimensional (2D) image-based detection, LiDAR devices and 3D cameras are capable of capturing data providing rich geometric shape and scale information. The 3D data involved can provide opportunities for a better understanding of the surrounding environment, and has numerous applications in different areas, including autonomous driving, robotics, remote sensing, and medical treatment. However, unlike images, the sparsity and the highly variable point density, caused by factors such as non-uniform sampling of a 3D space, effective range of a sensor, and the relative positions of points, can make the processing of LiDAR point clouds challenging. Those factors can make the point searching and indexing operations intensive and relatively expensive. One way to tackle these challenges is to project point clouds into a 2D or 3D space, such as bird's-eye-view (i.e. BEV or top view) or a spherical-front-view (i.e. SFV or panoramic view), in order to generate a structured (e.g. matrix and/or tensor) form that can be used with standard algorithms.
Among different approaches to represent LiDAR data, a point cloud representation can preserve the original geometric information in 3D space without any discretization (
While point cloud representation can preserve more information about a scene, the processing of such unstructured data representation can become a challenge in LiDAR systems. One approach is to manually craft feature representations for point clouds that are tuned for 3D object detection. However, such manual design choices lack the capability of fully exploiting 3D shape information, and invariances required for detection tasks.
Conventional approaches developed to process point clouds of LiDAR systems have utilized many-core digital electronics-based signal processing units, e.g., central processing units (CPUs) and graphical processing units (GPUs), to perform the required computation. Improvements made by vendors such as NVIDIA® and AMD have involved leveraging a GPU as a low-cost massively parallel data-streaming computing platform. Accordingly, there have been a variety of functions developed and optimized for multi-core CPU and GPU environments, using specialized programming interfaces such as NVIDIA's Compute Unified Device Architecture (CUDA). As an example, the low power embedded GeForce GT 650M GPU from NVIDIA® has been investigated as a prototyping platform to implement LiDAR data processing in real-time. However, CPU- and GPU-based clusters are, in general, costly and they limit accessibility to high performance computing. In particular, the low time and energy efficiency of a GPU or CPU, and the limited memory resources in underutilized platform can limit the performance of proposed algorithms, even in cases of very efficient theoretically ones.
Therefore, there is a need for methods and systems of computing that can obviate or mitigate one or more limitations of the prior art by meeting the time and computation density requirements of large data sets such as point clouds, and in particular those used for LiDAR applications.
This background information is provided to reveal information believed by the applicant to be of possible relevance to the present invention. No admission is necessarily intended, nor should be construed, that any of the preceding information constitutes prior art against the present invention.
Embodiment of the present invention can overcome processing challenges by making use of an analog computing platform to implement a layered neural network. In particular, it can overcome the challenge of processing a point cloud representing a large image, and particularly a point cloud of a LiDAR system. A computing platform implementing an analog neural network (ANN) according to an embodiment, can perform in the analog domain, i.e., the electronic and/or optical domain, and by doing so, the energy and time efficiency of data processing tasks can be significantly improved, including LiDAR data processing. Moreover, by performing related computations in the electronic and/or optical domain, an analog computing platform according to embodiments can minimize the number of data converters, i.e. analog-to-digital converters (ADC) and digital-to-analog converters (DAC), in a system such as a LiDAR system.
By implementing a layered neural network with an analog computing platform according to embodiments, computing speed and efficiency can be improved. Embodiments include analog implementation of various layers, as well as concatenations and combination of such layers.
By implementing a LiDAR system with an analog computing platform according to embodiments, image processing can be performed with increased speed and efficiently, and in real-time.
An aspect of the disclosure provides an analog computing platform operative to implement at least one layer of a neural network. Such an analog computing platform can include an interface operative to receive elements of a first matrix and elements of a second matrix in the analog domain. An analog computing platform can further include a layered neural network including at least one optical processing chip operative to optically perform multiply-and-accumulate (MAC) operations with the matrix elements in the analog domain. Such end-to-end analog computation architecture can result in the capability of performing very large numbers of operations (per second) in the analog domain. In some embodiments such an architecture for analog computation results in the capability of performing PMAC operations per second. In some embodiments, an interface can include at least one digital-to-analog converter (DAC) for converting elements of the first matrix elements and elements of the second matrix into the analog domain. In some embodiments, for example where a digital output is required, the analog computing platform further includes at least one analog-to-digital converter (ADC) operative to output the result of the MAC operations in a digital format. Accordingly, inputs, including in some embodiments training parameters of the neural network, can be supplied in the digital domain to the analog computing platform. In some embodiments, an analog computing platform can further include a summation unit operative to add bias values over the results of MAC operations. In such embodiments, at least one layer of a neural network is a convolutional layer, and the matrix elements include elements of a kernel matrix. In some embodiments, an analog computing platform can further include a summation unit operative to add bias values over the results of MAC operations, and wherein at least one layer of a neural network is a fully connected layer, and the matrix elements include elements of a kernel matrix. In some embodiments, at least one layer of a neural network implemented by an analog computing platform can be a batch normalization layer, the matrix elements include learned parameters, and the results of the MAC operations are biased by a learned parameter. In some embodiments, an analog computing platform can further include a CMOS circuit, wherein at least one layer of a neural network is a max pooling layer, and the CMOS circuit includes one or more comparators configured to identify in a matrix the matrix element having the maximum value. In some embodiments, at least one layer of a neural network implemented by an analog computing platform can be an average pooling layer, the first matrix can include a number k2 of elements, the second matrix can be constructed such that each of its elements is 1/k2, and the MAC operations between the elements of the first matrix and the elements of the second matrix results in an average value for the elements in the first matrix. In some embodiments, an analog computing platform can further include a CMOS circuit, at least one layer of a neural network can include a rectified linear unit (ReLU) non-linear function, and the CMOS circuit can be configured to perform a ReLU non-linear function over one or more matrix elements. In some embodiments, an analog computing platform can further including a CMOS circuit, at least one layer of a neural network can include a sigmoid function, and the CMOS circuit can be configured to perform a sigmoid function over one or more matrix elements. In some embodiments, an analog computing platform can include at least two different layers of a neural network, implemented in concatenation. In some embodiments, an analog computing platform can operate on matrix elements that include point coordinates from a point cloud. In some embodiments, an analog computing platform can operate on matrix elements including point coordinates that are Cartesian, and point coordinates can be linearly translated from previous point coordinates, such that each point of a point cloud is defined by non-negative values. In some embodiments, an analog computing platform can operate on data from a point cloud obtained with a LiDAR system. In some embodiments, the implementation of at least one layer of a neural network with an analog computing platform can be performed as part of a LiDAR system operation. In some embodiments, an analog computing platform can include at least one optical processing chip, operative to optically perform MAC operations with matrix elements in the analog domain, and an optical processing chip can have a Broadcast-and-Weight architecture that includes modulated microring resonators.
An aspect of the disclosure provides a method for realizing at least one layer of a neural network comprising an analog computing platform: receiving matrix elements with an interface, and optically performing multiply-and-accumulate (MAC) operations with an optical processing chip and the matrix elements; wherein the MAC operations are part of a layered neural network. In some embodiments, MAC operations with the matrix elements can be optically performed in series. In some embodiments, a method can further comprise the analog computing platform: performing with a summation unit the addition of bias values over the results of MAC operations, wherein at least one layer of a neural network is a convolutional layer. In some embodiments, a method can further comprise the analog computing platform performing with a summation unit the addition of bias values over the results of MAC operations, and directing the results of each MAC operation to a subsequent layer; wherein the at least one layer of a neural network is a fully connected layer. In some embodiments, a method can further comprise matrix elements that include learned parameters, results of MAC operations can be biased by at least one learned parameter provided by an interface; and at least one layer of a neural network is a batch normalization layer. In some embodiments, a method can further comprise an analog computing platform that further includes a CMOS circuit with comparators configured to identify in a matrix the matrix element having the maximum value, and wherein the at least one layer of a neural network is a max pooling layer. In some embodiments, a method can further include a first matrix including a number k2 of elements, a second matrix constructed such that each of its elements is 1/k2, MAC operations between the elements of the first matrix and the elements of the second matrix that result in an average value for the elements in the first matrix; and wherein the at least one layer of a neural network is an average pooling layer. In some embodiments, a method can further include using a CMOS circuit configured to perform a ReLU non-linear function over one or more matrix elements, and the at least one layer of a neural network includes a ReLU non-linear function. In some embodiments, a method can further include using a CMOS circuit configured to perform a sigmoid function over one or more matrix elements, and the at least one layer of a neural network includes a sigmoid function. In some embodiments, a method can implement at least two different layers of a neural network in concatenation. In some embodiments, a method can include operating on matrix elements comprising Cartesian coordinates that were linearly translated to non-negative values.
An aspect of the disclosure provides a LiDAR system in which the processing of data is performed with a layered neural network implemented on an analog computing platform operative to optically perform at least one multiply-and-accumulate (MAC) operation with matrix elements received via an interface, the matrix elements including point cloud data from the LiDAR system.
An aspect of the disclosure provides a method of performing LiDAR operations comprising: scanning points of a physical environment, recording the scanned points as spherical coordinates, converting the spherical coordinates of data points into Cartesian coordinates, translating linearly the Cartesian coordinates of each scanned point such as to have non-negative values, defining each point coordinate as a matrix element, and processing the matrix elements with an analog computing platform operative to realize layers of a neural network.
In a typical LiDAR system, one or several laser beams or signals can be generated with an amplitude-modulated laser diode emitting a near-infrared wavelength, steered with a properly designed beam steering sub-system, and reflected by the environment back to the scanner, and received by a photodetector.
Because of the large number of points a point cloud can contain, their processing can be intensive. In order to generate a structured form that can be used with standard algorithms, a point cloud can be projected into 3D space, such as a voxel representation, or a spherical-front-view (i.e. SFV, panoramic view, or range-view), or in a 2D space such as a bird's-eye-view representation (i.e. BEV or top view), and coordinates can be structured as a matrix or a tensor.
A major breakthrough in recognition and object detection tasks was due to moving from hand-crafted feature representations, to machine-learned feature extraction methods. Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. Among different deep neural networks, convolutional neural networks (CNNs) have been shown to be very accurate in many image recognition tasks such as image classification, object detection and in particular person detection. However, deep learning on 3D point clouds still faces several significant challenges related to the small scale of datasets, the high dimensionality of 3D point clouds, and their unstructured nature.
Deep Neural Networks (DNN) have been shown to be powerful tools for many vision tasks. In particular, it has been considered as an opportunity to improve the accuracy and processing time of point cloud processing in LiDAR systems. Numerous methods have been proposed to address different challenges in point cloud processing, regarding efficiencies in time and energy, which are required in real-time tasks such as object detection, object classification, segmentation, etc. Some of the proposed approaches involve converting an unstructured point cloud into a structured grid, and some others exploit the exclusive benefits of deep learning over a raw point cloud, without the need for conversion to a structured grid.
Despite fast growth of DNNs in object detection in datasets having a large number of object classes, real time visual object detection and classification in a driving environment is still very challenging, because of required speed and accuracy required to meet a real-world environment. A main challenge in processing a point cloud is having sufficient computing power and time efficiency. The running of algorithms over large datasets is intensive, and computational complexity can grow exponentially with an increase in the number of points. In order to effectively process a large dataset, a fast and efficient processor is required.
When a layered neural network is used for processing a point cloud in a LiDAR application, the computational cost is largely from large-size matrix multiplications that have to be performed in each layer of the neural network. The number of layers typically increases as the complexity of the tasks being performed by a network is increased, and therefore so does the number of matrix multiplications.
In general, the number of applications for neural networks, the size of datasets from which they are configured (i.e. trained), and their complexity, are increasing, and by some accounts, exponentially so. Accordingly, the digital-based processing units that have been used for LiDAR point cloud processing tasks, such as GPUs, are also facing challenges in supporting the ever-increasing computation complexity of point cloud processing.
One challenge is that a GPU cannot be used as standalone device for hardware acceleration. This is because a GPU depends on a CPU for data offloading, and for the scheduling of algorithm executions. The execution time of data movement and algorithm scheduling can be considerable in comparison with computation time. Although parallel processing in a GPU can play an important role in the computation efficiency, it is mainly beneficial for small to moderate amounts of computation, e.g., image sizes smaller than 150×150 pixels. Larger images can yield an increased execution time, partly because a single GPU does not have enough processors to handle all pixels at the same time (and because of other memory read/write constraints). Because per-pixel computations are not parallelized, the processing time can exhibit an approximately linear dependence with the mean number of active bins per pixel.
Embodiment of the present invention can overcome processing challenges by making use of an analog deep neural network. In particular, it can overcome the challenge of processing a point cloud representing a large image, and particularly a point cloud of a LiDAR system. A computing platform implementing an analog neural network (ANN) according to an embodiment, can perform in the analog domain, i.e., the electronic and/or optical domain, and by doing so, the energy and time efficiency of data processing tasks can be significantly improved, including LiDAR data processing. Moreover, by performing related computations in the electronic and/or optical domain, a computing platform according to embodiments can minimize the number of data converters, i.e. analog-to-digital converters (ADC) and digital-to-analog converters (DAC), in a system such as a LiDAR system.
The challenges and limitations of digital-electronic processing units in providing the time and energy efficiency required by LiDAR technology applications highlight the demand for a fast, energy-efficient, and high-performance approach that can be employed in LiDAR large-size point cloud processing. In embodiments, an analog platform such as one with a hybrid CMOS-photonics architecture, can be utilized to implement an analog neural network (ANN), and in particular for point cloud processing in a LiDAR system.
An ANN according to embodiments can be based on an analog implementation of multiply-and-accumulate (MAC) operations. For instance, a MAC operation can be implemented using photonics-based Broadcast-and-Weight (B&W) architecture. An optical B&W architecture utilizes wavelength division multiplexing (WDM) and an array of microring modulators (MRM) to implement MAC operations in an optical or photonic platform. Because the bandwidth of a photonic system can be very large, i.e. in the THz range, a photonic implementation of MAC operations can offer significant potential improvements over digital electronics in energy (a factor >102), speed (a factor >103), and compute density (a factor >102). Considering that a neural network can process very large numbers of matrix-to-matrix multiplications, and MAC operations with very large matrices, optical neural networks according to embodiments offer the benefits of high optical bandwidth and lossless light propagation, when performing computations, and offer orders of magnitude improvements in terms of energy, speed, and compute density, as compared to neural networks based on digital electronics (i.e., GPU and TPU).
Embodiments include the implementation of an analog neural network on a photonics-based computing platform, such as an optical neural network (ONN) based on a hybrid CMOS-photonics system. Example embodiments will be discussed with referent to examples of a LiDAR system, but it should be appreciated that the invention is not limited to LiDAR systems. Optical neural network layers according to embodiments can be utilized in an application to process point clouds, whether or not they are ordered, and in various kinds of 2D or 3D structures, such as those from a LiDAR system.
A system according to an embodiment can process a point cloud as can be generated using 3D laser scanners and LiDAR systems and techniques. A point cloud is a dataset representing a large number of individual spatial measurements, typically collected by an instrument system. If the instrument system is a LiDAR system, a point cloud can include points that lie on many different surfaces in the scanned view. Each point can represent a single laser scan measurement corresponding to a location in 3D space. It can be identified using a local coordinate system of the scanner, such as a spherical coordinate system, and be transformed and recorded as Cartesian coordinates relative to an origin, i.e. (x, y, z).
To transform a spherical coordinate into a Cartesian coordinate, a transformation such as the following can be applied:
where:
r is the range of distance from the scanner to a surface,
α is an azimuthal angle from a reference vertical plane,
ϵ is an elevation angle from a reference horizontal plane.
In a case where intensity information is present, a point cloud can have four dimensions (4D), i.e. (x, y, z, i).
The perception required by a LiDAR application can be obtained by processing the information such as points of the LiDAR's environment as captured by a LiDAR system, e.g. spatial coordinates, distance, intensity, etc. A deep neural network can then be used to process the data points into images and perform tasks such as object recognition, classification, segmentation and more.
The processing of a point cloud can require a very large amount of GPU memory and processing capabilities, and because of limitations in digital electronics, in detection speed, in power consumption, and in accuracy, a deep neural network of the prior art can be limited and insufficient for some applications.
Embodiments include a photonics-based (i.e. optical) computing platform on which MAC operations can be implemented with a B&W architecture. In a B&W architecture, different optical wavelengths propagate on separate waveguides. They are weighted by separate modulated MRMs, and transmit back to the same waveguide. The signals on all wavelength can be accumulated by detecting the total optical power from all wavelengths, using a balanced photodetector. Using such a platform, a multiplication between vectors, including vectorized matrices where a matrix is created from a point cloud, can be performed.
Embodiments include a generic optical platform operative to implement MAC operations for different layers of a trained neural network, particularly for processing point clouds, and in particular for processing point clouds as used in LiDAR applications.
In an embodiment, a generic optical platform can be used for an inference phase of a neural network, where trainable variables such as weights and biases, have already been obtained and recorded in a (digital) memory. In an inference phase, digital-to-analog converters (DACs) can be utilized to import into an optical platform the weights and biases of each layer, for layer computations to be performed in the analog domain, i.e. optically with a generic optical platform.
In addition to computing the layers of a neural network, mathematical operations such as non-linear activation functions, summation, and subtraction can also be realized with an analog computing platform, such as an optical computing platform coupled with an analog electronic processor of an embodiment. For example, some embodiments include integrated electronic circuits coupled with an optical computing platform. Accordingly, the use of DAC 505 and DAC 515 in an optical platform of an embodiment can be used for converting trained values of weights and biases of each layer, as recorded in a digital memory, to the analog domain, such that if required, they can be applied as modulation 525 and weight bank 535 voltages to the MRMs. Similarly, the use of an ADC 545 can be used for converting a final (analog) result into the digital domain. In some embodiments, reading from or writing to digital memory is reduced and is limited to reading the input and weight values from the digital memory, and hence the usage of DACs and ADCs in the architecture is minimized. Such end-to-end analog computation architecture results in the capability of performing very large numbers of operations (per second) in the analog domain. In some embodiments such an analog computation architecture results in the capability of performing PMAC operations per second.
In an embodiment, optical and electrical signals can implement the layers of a neural network without requiring a digital interface. The removal of such analog-to-digital and digital-to-analog conversions can lead to significant improvements in time and energy efficiency of applications, in particular in applications such as LiDAR, where the data itself can often be generated in an analog fashion. An optical platform according to an embodiment can make use of a hybrid CMOS-photonics architecture to process point cloud data from a LiDAR scanning system.
In embodiments, an optical platform can be used to implement neural network layers of a PointNet architecture, which is a neural network architecture that can be used for many applications, including but not limited to LiDAR applications, for instance a PointNet architecture can include convolutional layers, batch normalization layers, pooling layers and fully-connected (dense) layers, and embodiments include the optical implementation of these layers, as well as other customized layers, onto an optical platform according to embodiments as described.
A PointNet architecture can be subdivided into portions 710, one of which for example is referred to as a T-Net portion 715. A T-Net portion can include a convolution layer with a size 64 multiplication 720, a convolution layers with a size 128 multiplication 725, and a convolution layer with a size 1024 multiplication 730. It can also include a max pooling layer 735, a size 512 fully-connected (FC) layer 740, and a size 256 fully-connected (FC) layer 745. Trainable weights 750 and trainable biases 755 can also be applied with a multiplication 760 and an addition 765 respectively, to provide a resulting vector 775 representing a processed initial vector 780.
Embodiments include the implementation of a convolutional layer with an optical platform according to embodiments. Similar to 2D image processing, a neural network used for point cloud processing can include many layers, where a convolutional layer is one of the main layers. In an embodiment, a convolutional layer can be implemented optically using an optical platform according to an embodiment. Generally, a convolutional layer can involve separate channels of calculation, each channel for processing a separate input matrix. For example, in image processing, when an image is defined by the red-green-blue (RGB) color model, each color of red, green and blue can be processed through a different one of three channels of a convolutional layer. In each channel, an input matrix can be processed by undergoing a sequence of convolution operations with a respective one of three kernel matrices. Each one of the three channels can produce a scalar, and the three scalars can be summed and recorded as a single element of an output matrix.
A B&W protocol as in
An optical platform according to embodiments can perform multiplication operations of a convolution operation. In order to implement summations as well, such as those required to add bias values following multiplication operations, the analog computing platform according to embodiments can further include an analog CMOS-based electronic summation unit.
A simple two stage circuit can be seen as a combination of two CMOS inverters that have different ratios of NMOS versus PMOS gate lengths, which yields the shifted DC characteristics. When Vin is low and both outputs are high, transistor MI is inactive such that Vout1 transitions with increasing Vin according to a CMOS inverter characteristic with one NMOS device and two series PMOS devices. In contrast, when Vin is high and both outputs are low, M2 is inactive such that Vout2 transitions with decreasing Vin according to a CMOS inverter characteristic with two series NMOS devices and one PMOS device. Since Vout1 cannot transition high unless Vout2 is also high, and Vout2 cannot transition low unless Vout1 is also low, the circuit provides guaranteed monotonicity in the quantizer characteristic regardless of the presence of mismatch. The number of outputs can be readily increased, as depicted in the figure for n-output example. The outputs are summed together by means of a summing amplifier which is used to combine the voltages present on two or more inputs into a single output voltage.
In order to complete a convolution operation over an input matrix 805, which can include an image matrix for each ki+1 channel of a filter, the analog architecture, including the optical computation core and the electronic summation unit, can be utilized a number of times equal to ki+1(n−f+1). The final result of a convolutional layer can be recorded in a non-volatile analog memory device so that it can be utilized by a subsequent layer. In an optical platform according to embodiments, the use of an analog memory device can make analog-to-digital conversion unnecessary.
A fully-connected (i.e. dense) layer is a neural network one in which each input is connected to an activation unit of a subsequent layer. In many models of machine learning, the final layers can be fully-connected layers operative to compile data extracted from previous layers and to produce a final output. After convolutional layers, fully-connected layers can be the second most time-consuming layers of a neural network computation.
In an embodiment, a fully-connected layer i can have ki neurons, an input matrix can be of size (n×ki), and a trainable weight matrix can be of size (ki×ki+1), where ki+1 denotes the number of neurons in the next layer, i+1. An optical implementation of a fully-connected layer can include ki+1 parallel wavelength channels with ki (all-pass) MRMs to implement the elements of each row of the input matrix, and ki (add-drop) MRMs to implement corresponding elements of the columns of the weight matrix. In an embodiment, bias values can be added after multiplication using an electronic summation unit 905, as described previously, operative to perform summation operations. In order to complete a computation over a complete input matrix, a fully-connected layer including a summation unit can be utilized one or more of times, such that each time, a portion of the computation, which can be supported by the computation unit, can be completed.
In a neural network, a batch normalization layer can be utilized for normalizing data processing by the network. Batch normalization refers to the application of a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. This can have the effect of stabilizing a learning process and significantly reducing the number of training epochs required to train a deep network.
In a batch normalization layer, when a network is used in an inference phase, each element of an input y can be normalized with a learned mean μ parameter and a learned variance σ parameter to produce ŷ, a normalized version of input element y:
The normalized value ŷ can be scaled with learned parameter γ and shifted with learned parameter β to produce :
In order to implement such a batch normalization layer in an optical platform according to an embodiment, the above steps can be summarized as:
In vector form, this can be expressed as:
the elements of vector γ being the learned parameters γ (one in each dimension), and the elements of vector β being the learned parameters β (one in each dimension).
These steps can allow implementation of a batch normalization computation into a multiplication using an optical B&W protocol of
A batch normalization layer with an input matrix X of size (n×ki), and hence vectors {circumflex over (α)} and {circumflex over (β)} of size (1×ki), can be implemented with an optical platform having ki parallel waveguide channels, each one including an all-pass MRM to realize each element of its input, and an add-drop MRM to represent each element of {circumflex over (α)}. The elements of {circumflex over (β)} can be added at the end, using a CMOS-based summation unit 905. A batch normalization over an entire batch of data (i.e. a point cloud) can be completed by using an optical platform of an embodiment a plurality of times, such as n times. An optical platform of an embodiment implementing a batch normalization layer is illustrated in
In a neural network, and especially in a neural network that includes one or more convolutional layers, a pooling layer can be used to progressively reduce the spatial size of a data (e.g. point cloud) representation, in order to reduce the number of parameters and the amount of computations in the network. A pooling layer can have a filter with a specific size, with which a spatial size reduction can be applied to the input. Embodiments include the implementation of a pooling approach referred to as “max pooling”, and the implementation of a pooling approach referred to as “average pooling” with an optical platform according to embodiments.
A max pooling layer with kernel size of (k×k) can be used to compare the elements of a partition of an input matrix having the same size as the kernel matrix, and to select the element in the partition having the maximum value. Implementation of a max pooling layer with a kernel size of (k×k) can be performed by using k electronics-based comparators to find the maximum value among the k2 elements of each (k×k) partition of an input matrix. The size of the complete input matrix can determine how many times a max pooling layer architecture should be used.
In a case where Vin1 1247, Vin2 1249, and Vinn 1251 correspond to different branches and Vin1>Vin2> . . . >Vinn, transistors MI1, MF1 and MS1 operate in saturation region and the device of other branches, MIi 1230, MFi and MSi operate in cut-off, triode and cut-off region, respectively. The drain-source voltage of MFi device is almost decrease to zero and also the output current, Iout, would be a copy of input winner device, which is equal to 0.5 Ib. The currents of other branches are almost zero, cause the currents of Mout 1245 and MI1 1230 equalize, and VMAX=Vini.
The functionality of an average pooling layer with a kernel size of (k×k) is similar to that of a max pooling layer, except that it selects the average of the k2 elements of the input matrix partition under computation. In order to implement average pooling with an optical computing platform according to an embodiment, average pooling can be transformed into a weighted summation operation. In that regard, to implement an average pooling layer with a kernel size of (k×k), the scalar 1/k2 can be multiplied to each element of the corresponding (k×k) partition of the input matrix, and the resulting values can then be accumulated using a photodetector. Hence, an architecture can include k2 parallel waveguide channels, each one including one all-pass MRM and one add-drop MRM. The size of an input matrix can determine how many times an optical platform is to be utilized.
In a neural network, non-linearity can be provided by one or many activation layers. The output of an activation layer can be generated by applying non-linear functions to its input. As should be appreciated by a person skilled in the art, some of the widely-used activation functions include rectified linear unit functions (ReLU functions) and sigmoid functions. These functions can be performed by means of specially designed analog electronic circuits. They can be fabricated on a separate CMOS chip and be integrated to an optical platform according to embodiments. Since outputs from optical neural network layers are in the electronics domain, it is beneficial to implement activation functions electronically, as it prevents conversions from electronics to optics before activation layers are applied.
There's a first current mirror which is made from a pair of back to back n-channel transistors (M7, M8) with their input ports connected in parallel with an input reference current Iref. A differential amplifier is made from CMOS, wherein the p-channel MOSFETs (M5, M6) and n-channel MOSFETs (M3, M4) have the same small-signal model, exhibiting controlled current behavior. Two p-channel MOSFETs (M5, M6) are used for load devices, and the other two n-channel MOSFETs (M3, M4) are used for driven devices. It has two inputs connected to the output voltage Vs of resistor circuit section and a Voltage Source VCC/2. respectively. It has one current output Iout, which is a differential current of the second differential amplifiers.
A second current mirror is made from a pair of back to back n-channel transistors (M7, M9) with their input ports connected in parallel with an input reference Iref. A third current mirror is made from two pair of back to back p-channel transistors (M10, M11, M12, and M13) with their input ports connected in parallel. It has an input reference current Io9 provided by said replicated current of the second current mirror and an output current Io13 which is a replicated current simulated by the input reference current Io9 of said third current mirror. Finally, there's a output current Iout, which is the sum of said output current Io13 of the third current mirror (4) and the current output I1 of differential amplifier.
In embodiments, different neural network layers implemented with an optical platform as described can be concatenated to each other to construct an optical neural network operative to process data, including a point cloud as used in LiDAR applications. As an example, an optical platform of an embodiment can implement a convolutional layer 910, a batch normalization layer 1120, and an activation layer, concatenated in series and operative to process a point cloud from data generated by a LiDAR system.
A batch normalization layer 1120 can include a scalar-matrix chip 1620 for multiplying an analog input 1105 and a learning parameter 1110, and a summation unit 1625 to perform additions of learning parameters 1115 as required for normalization. Data and learning parameter can be provided by a digital memory 1605 and additional DACs 1630. Results can be recorded in an analog memory block 1635 for use in a subsequent, concatenated layer such as a non-linear activation block 1640. In an embodiment, a non-linear activation block 1640 can be operative to realize a ReLU function 1405. In another embodiment, a non-linear activation block 1640 can be operative to realize a sigmoid function 1505.
An optical platform can include a portion 1605 implementing a convolutional layer, a batch normalization layer, and activation layers can include a further analog memory 1845, in or to record the result of data having been processed by all layers of the optical platform and to make it readily available for further processing.
An optical platform having a concatenated architecture according to an embodiment can be utilized to implement architectures such PointNet and SalsaNet, or portions thereof, as well as many other neural network architectures. As an example, a portion of the PointNet architecture is referred to as the T-Net portion, and an embodiment can be used to perform its function optically.
As an example, a multiplication involving a matrix of size 64×64 1710 can be performed by sharing a first portion 1715 of an optical platform realizing a convolutional layer, a batch normalization layer, and activation layers. A multiplication involving a matrix of size 128×128 1720 can be performed with a second portion 1725 realizing a convolutional layer, a batch normalization layer, and activation layers, and a multiplication involving matrices of size 1024×1024 1730 can be performed with a third portion 1735 realizing a convolutional layer, a batch normalization layer, and activation layers.
An optical platform having a concatenated architecture according to an embodiment can be utilized to implement LiDAR processing, or portions thereof. As an example, optical platforms according to embodiments can be used to process data in a LiDAR system. To do so, an optical platform according to embodiments can further include a Field Analog Vision Module, and a Digital Memory.
Embodiments include an optical computing platform for implementing neural networks required for point cloud processing in LiDAR applications. By exploiting the high bandwidth and lossless propagation of optical signals, embodiments can allow significant improvements in time and energy efficiency over digital electronics-based processing units of the prior art such as CPUs and GPUs. Such improvements are possible because optical signals can have a spectral bandwidth of 5 THz and which provide information at 5 Tb/s for each spatial mode and polarization.
Also, computations in the optical domain can be performed with minimal or theoretically even zero energy consumption-in particular for linear or unitary operations.
Moreover, photonic devices do not have the problem of data movement and clock distribution time along metal wires, and the number of photonic devices required to perform MAC operations can be small, greatly reducing computing latency.
Furthermore, a photonic computing system according to embodiments improves over an all-optical network, because it is based on amplitude and does not require phase information. Hence, the problem of phase noise accumulation can be eliminated. Also, because the Broadcast-and-Weight protocol is not limited to a single wavelength, its use in an embodiment can increase the overall capacity of a system.
In summary, compared to digital electronics of the prior art, a photonic MAC system according to embodiments can potentially offer significant improvements in energy efficiency (up to a factor of >102), computation speed (up to a factor of >103), and compute density (up to a factor of >102). These figures of merit are orders of magnitude better than achievable performance by digital electronics.
In an optical network implementing a B&W protocol according to embodiments, input values can be mapped as intensities of light signals, which are positive values. However, because data provided by a point cloud is based on the position of different points as defined by a coordinate system, e.g. Cartesian coordinates, the input data to a neural network can include negative values. In order to support inputs having arbitrary values, an embodiment can include a pre-processing step by which the points of an input point cloud obtained by a LiDAR system can be linearly transformed, such that each point can be mapped onto a positive-valued point with Cartesian coordinates.
Such linear mapping does not change the relative positions of different points, and therefore, for most computation tasks performed in a LiDAR application, such as object detection and part segmentation, linear mapping does not affect point cloud processing and a network's output. In such tasks, and in many other, the hidden (middle) layers of a neural networks can include a ReLU function as a non-linear activation function, and this can guarantee a positive-valued output, and hence a positive-valued input for the next layer. Accordingly, inputs for middle layers can be positive and using a linear transformation once for an input layer can be sufficient.
In an optical platform implementing a neural network according to embodiments, negative-valued inputs can be used. In applications, including LiDAR applications, the coordinates of different points of a point cloud can be processed by different neural networks, and the coordinates can have positive and negative values. Embodiments can therefore be used for applications requiring arbitrary values as inputs, including LiDAR applications.
Optical platforms implementing neural networks according to embodiments include the implementation of generic analog neural networks, i.e. electronics- and photonics-based neural networks. A neural network based on electronics and photonics can be implemented with an optical platform that further includes electronic components, e.g. a hybrid CMOS-Photonics architecture.
In an embodiment, the neural network computations required for processing point clouds of a LiDAR system can be performed with a hybrid CMOS-Photonics architecture. In particular, matrix multiplications can be performed on a photonics-based computing platform with a B&W architecture, and other computation steps such as summation, subtraction, comparison, and activation functions in neural network layers, can be implemented using electronics-based components. Since a LiDAR architecture can be modified to have an interface appropriate for processing data in the analog domain, a photonics-based neural network according to an embodiment can be an analog neural network (ANN) capable of processing LiDAR-generated data.
Since a neural network mainly performs matrix-to-matrix multiplications, an analog architecture that is capable of realizing those multiplications, while meeting the latency and power requirements, can be also be utilized to develop analog neural network. One or more memristor-based photonic crossbar arrays can be used, where matrices can be realized using a phase-change-material (PCM) memory array and a photonic optical frequency comb, and computation can be performed by measuring the optical transmission of passive optical components. Alternatively, an integrated photonics-based tensor core can be used, where wavelength division multiplexed input signals in the optical domain are modulated by high-speed modulators, propagated through a photonic memory, and weighted in a quantized electro-absorption scheme. Considering the nature of such task, the size of the dataset that should be processed, and the time and power requirements, other types of ANNs can be integrated with a LiDAR system.
With an embodiment, the layers of a neural network can be implemented in the analog domain, and hence, in an architecture according to embodiment, the capabilities of both an electronic and an optical computing platform can be exploited. Moreover, because any part of a neural network can be performed as an analog computation, digital-to-analog or analog-to-digital data conversion are not necessarily required for computations to be performed. Accordingly, the number of ADCs and DACs can be minimized, which can result in a significant improvement in the power consumption of a LiDAR system according to embodiments. Indeed, although embodiments have been discussed with respect to a system which utilizes ADCs and DACs (as the system receives digital inputs, and produces digital outputs), it should be appreciated that other embodiments, do not need the ADC's and DACs if the inputs and outputs are analog.
An optical platform implementing layers of a neural network according to embodiments can be utilized to implement a neural network instead of a GPU or a CPU. By implementing a plurality of layers with one or more optical platforms according to embodiments, an embodiment can implement feedforward neural networks (FFNN), convolutional neural networks (CNN), and other deep neural networks (DNN).
A platform according to embodiments can implement an inference phase of a neural networks. This means that trainable parameters of a layer, such as weights and biases, can be pre-trained, and an optical platform according to embodiments can obtain and use the weights and biases to apply an inference phase over the inputs. However, a similar platform can also be used for a forward propagation step in a training phase of neural networks. Because an optical platform according to embodiments has a higher bandwidth and higher energy efficiency than platforms of the prior art, it can be used to facilitate training in applications that require training in real-time.
The use of an optical platform according to embodiments in a feedforward step of a training phase can be similar to its use in an inference phase. A significant difference is that in contrast to an inference phase, where weights and biases of each layer remain constant, the weight applied to each layer in a training phase can change with each individual batch of data (e.g. each point cloud).
The training of neural networks in an application relying on fast and accurate perception of environmental dynamics, such as LiDAR systems can be intensive and difficult. However, the use of an optical platform according to embodiments to perform point cloud processing can significantly improve the time and energy efficiency of such applications. In particular, the high bandwidth and energy efficiency of an optical platform according to an embodiment can improve the total efficiency of a processing system, and sufficiently so to allow training in real-time.
An optical platform according to an embodiment can be implemented with different numbers of wavelengths. By increasing the number of wavelengths, the number of MRMs also increases. This can increase a computation rate but at the expense of making a control circuitry and an optical platform more complex. There can be limits to the number of wavelengths and MRMs on a single chip and they can be defined based on technical and theoretical considerations.
Embodiments include a platform to implement neural network, including:
A platform according to embodiments can include a processing step to support point clouds having negative-valued Cartesian coordinates. A limitation of B&W architecture can be addressed by a processing step in which a point cloud is linearly transformed such that each point can be described with positive-valued coordinates. Because a transformation according to embodiments does not change the relative position of cloud points, tasks that are related to the objects, such as object detection or classification, can be performed as required for LiDAR and other applications.
Embodiments can be used for implementing neural networks in any applications. For example, deep neural networks that have been developed for addressing different problems in the next generations of wireless communications, i.e., 5G and 6G, can be implemented using an optical computing platform according to embodiments. In particular, an optical neural network platform according to embodiments can be beneficial for ultra-reliable, low-latency, massive MIMO systems, where low latency of transmission and computation are required.
A CNN can be a deep neural network that can include a convolutional structure. The CNN can include a feature extractor that can consist of a convolutional layer and a sub-sampling layer. The feature extractor may be considered to be a filter. A convolution process may be considered as performing convolution on an input image or a convolutional feature map by using a trainable filter. The convolutional layer may indicate a neural cell layer at which convolution processing can be performed on an input signal in the CNN. The convolutional layer can include one neural cell that can be connected only to neural cells in some neighboring layers. One convolutional layer usually can include several feature maps and each of these feature maps may be formed by some neural cells that can be arranged in a rectangle. Neural cells at the same feature map can share one or more weights. These shared weights can be referred to as a convolutional kernel by a person skilled in the art. The shared weight can be understood as being unrelated to a manner and a position of image information extraction. A hidden principle can be that statistical information of a part may also be used in another part. Therefore, in all positions on the image, we can use the same image information obtained through learning. A plurality of convolutional kernals can be used at a same convolutional layer to extract different image information. Generally, a larger quantity of convolutional kernals can indicate that richer image information can be reflected by a convolution operation. A convolutional kernel can be initialized in a form of a matrix of a random size. In a training process of the CNN, a proper weight can be obtained by performing learning on the convolutional kernel. In addition, a direct advantage that can be brought by the shared weight is that a connection between layers of the CNN can be reduced and the risk of overfitting can be lowered.
The process of training a deep neural network, to enable the deep neural network to produce a predicted value that can be as close as possible to a desired value, a predicted value of a current network and a desired target value can be compared and a weight vector of each layer of the neural network can be updated based on the difference between the predicted value and the desired target value. An initialization process can be performed before the first update. This initialization process can include a parameter that can be preconfigured for each layer of the deep neural network. As a non-limiting example, if the predicted value of a network is excessively high, a weight vector can be adjusted to reduce the predicted value. This adjustment can be performed multiple times until the neural network can predict the desired target value. This adjustment process is known to those skilled in the art as training a deep neural network using a process of minimizing loss. The loss function and the objective function are mathematical equations that can be used to determine the difference between the predicted value and the target value.
CNNs can use an error back propagation (BP) algorithm in a training process to revise a value of a parameter in an initial super-solution model so that a re-setup error loss of the super-resolution model can be reduced. A error loss can be generated in a process from forward propagation of an input signal to an output signal. The parameter that can be in the initial super-resolution model can be updated through back propagation of the error loss information to converge the error loss information. The back propagation algorithm can be a back propagation movement that can be dominated by an error loss and can be intended to obtain the optimal super-resolution model parameter, which can be as a non-limiting example a weight matrix.
Target module/rule 1401 can be obtained through training via training device 1420. Training device 1420 can be applied to different systems or devices. As a non-limiting example, training device 1420 can be applied to an execution device 1410. Execution device 1410 can be terminal, as a non-limiting example, a mobile terminal, a tablet computer, a notebook computer, AR/VR, or an in-vehicle terminal, a server, a cloud end, or the like. Execution device 1410 can be provided with an I/O interface 1412 which can be configured to perform data interaction with an external device. A user can input data to the I/O interface 1412 via customer device 1440.
A preprocessing module 1413 can be configured to perform preprocessing that can be based on the input data received from I/O interface 1412.
A preprocessing module 1414 can be configured to perform preprocessing based on the input data received from the I/O interface 1412.
Embodiments of the present disclosure can include a related processing process in which the execution device 1410 can perform preprocessing of the input data or the computation module 1411 of execution device 1410 can perform computation and execution device 1410 may invoke data, code, or the like from a data storage system 1450 to perform corresponding processing, or may store in a data storage system 1450 data, one or more instructions, or the like that can be obtained through corresponding processing.
I/O interface 1412 can return a processing result to customer device 1440.
It should be appreciated that training device 1420 may generate a corresponding target model/rule 1401 for different targets or different tasks that can be based on different training data. Corresponding target model/rule 1401 can be used to implement the foregoing target or accomplish the foregoing task.
Embodiments of
Embodiments of
It should be appreciated that
Convolutional layer/pooling layer 1520 as illustrated by
The convolutional layer 1521 may include a plurality of convolutional operators. The convolutional operator can also be referred to as a kernel. A role of the convolutional operator in image processing can be equivalent to a filter that extracts specific information from an input image matrix. The convolutional operator may be a weight matrix that can be predefined. In a process of performing a convolution operation on an image, the weight matrix can be processed one pixel after another (or two pixels after two pixels, depending on a value of a stride in a horizontal direction on the input image to extract a specific feature from the image. A size of the weight matrix can be related to the size of the image. It should be noted that a depth dimension (depth dimension) of the weight matrix can be the same as a depth dimension of the input image. In the convolution operation process the weight matrix can extend the entire depth of the input image. Therefore, after convolution is performed on a single weight matrix, convolutional output with a single depth dimension can be output. However, the single weight matrix may not be used in all cases but a plurality of weight matrices with the same dimensions (row x column) can be used—in other words a plurality of same-model matrices. Outputs of the weight matrices can be stacked to form the depth dimension of the convolutional image. It can be understood that the dimension herein can be determined by the foregoing “plurality”. Different weight matrices may be used to extract different features from the image. For example, one weight matrix can be used to extract image edge information. Another weight matrix can be used to extract a specific color from the image. Still another weight matrix can be used to blur unneeded noises from the image. The plurality of weight matrices can have a same size (row x column). Feature graphs that can be obtained after extraction has been performed by the plurality of weight matrices with the same dimension also can have a same size and the plurality of extracted feature graphs with the same size can be combined to form an output of the convolution operation.
Weight values in the weight matrices can be obtained through a large amount of training in an actual application. The weight matrices formed by the weight values can be obtained through training that may be used to extract information from an input image so that the convolutional neural network 1500 can perform accurate prediction.
When the convolutional neural network 1500 has a plurality of convolutional layers, an initial convolutional layer (such as 1521) can extract a relatively large quantity of common features. The common feature may also be referred to as a low-level feature. As a depth of the convolutional neural network 1500 increases, a feature extracted by a deeper convolutional layer (such as 1526) can become more complex and as a non-limiting example, a feature with a high-level semantics or the like. A feature with higher-level semantics can be applicable to a to-be-resolved problem.
Because a quantity of training parameters can require reduction, a pooling layer usually needs to periodically follow a convolutional layer. To be specific, at the layers 1521 to 1526 shown in 1520 in
After the image is processed by the convolutional layer/pooling layer 1520, the convolutional neural network 1500 can still be incapable of outputting desired output information. As described above, the convolutional layer/pooling layer 1520 can extract a feature and reduce a parameter brought by the input image. However, to generate final output information (desired category information or other related information) the convolutional neural network 1500 can generate an output of a quantity of one or a group of desired categories by using the neural network layer 1530. Therefore, the neural network layer 1530 may include a plurality of hidden layers (such as 1531, 1532, to 153n in
The output layer 1540 can follow the plurality of hidden layers in the neural network layers 1530. In other words, the output layer 1540 can be a final layer in the entire convolutional neural network 1500. The output layer 1540 can include a loss function similar to category cross-entropy and is specifically used to calculate a prediction error. Once forward propagation (propagation in a direction from 1510 to 1540 in
It should be noted that the convolutional neural network 1500 shown in
An aspect of the disclosure provides an analog computing platform operative to implement at least one layer of a neural network. Such an analog computing platform can include an interface operative to receive elements of a first matrix and elements of a second matrix in the analog domain. An analog computing platform can further include a layered neural network including at least one optical processing chip operative to optically perform multiply-and-accumulate (MAC) operations with the matrix elements in the analog domain. Such end-to-end analog computation architecture can result in the capability of performing very large numbers of operations (per second) in the analog domain. In some embodiments such an architecture for analog computation results in the capability of performing PMAC operations per second. In some embodiments, an interface can include at least one digital-to-analog converter (DAC) for converting elements of the first matrix elements and elements of the second matrix into the analog domain. In some embodiments, for example where a digital output is required, the analog computing platform further includes at least one analog-to-digital converter (ADC) operative to output the result of the MAC operations in a digital format. Accordingly, inputs, including in some embodiments training parameters of the neural network, can be supplied in the digital domain to the analog computing platform. In some embodiments, an analog computing platform can further include a summation unit operative to add bias values over the results of MAC operations. In such embodiments, at least one layer of a neural network is a convolutional layer, and the matrix elements include elements of a kernel matrix. In some embodiments, an analog computing platform can further include a summation unit operative to add bias values over the results of MAC operations, and wherein at least one layer of a neural network is a fully connected layer, and the matrix elements include elements of a kernel matrix. In some embodiments, at least one layer of a neural network implemented by an analog computing platform can be a batch normalization layer, the matrix elements include learned parameters, and the results of the MAC operations are biased by a learned parameter. In some embodiments, an analog computing platform can further include a CMOS circuit, wherein at least one layer of a neural network is a max pooling layer, and the CMOS circuit includes one or more comparators configured to identify in a matrix the matrix element having the maximum value. In some embodiments, at least one layer of a neural network implemented by an analog computing platform can be an average pooling layer, the first matrix can include a number k2 of elements, the second matrix can be constructed such that each of its elements is 1/k2, and the MAC operations between the elements of the first matrix and the elements of the second matrix results in an average value for the elements in the first matrix. In some embodiments, an analog computing platform can further include a CMOS circuit, at least one layer of a neural network can include a rectified linear unit (ReLU) non-linear function, and the CMOS circuit can be configured to perform a ReLU non-linear function over one or more matrix elements. In some embodiments, an analog computing platform can further including a CMOS circuit, at least one layer of a neural network can include a sigmoid function, and the CMOS circuit can be configured to perform a sigmoid function over one or more matrix elements. In some embodiments, an analog computing platform can include at least two different layers of a neural network, implemented in concatenation. In some embodiments, an analog computing platform can operate on matrix elements that include point coordinates from a point cloud. In some embodiments, an analog computing platform can operate on matrix elements including point coordinates that are Cartesian, and point coordinates can be linearly translated from previous point coordinates, such that each point of a point cloud is defined by non-negative values. In some embodiments, an analog computing platform can operate on data from a point cloud obtained with a LiDAR system. In some embodiments, the implementation of at least one layer of a neural network with an analog computing platform can be performed as part of a LiDAR system operation. In some embodiments, an analog computing platform can include at least one optical processing chip, operative to optically perform MAC operations with matrix elements in the analog domain, and an optical processing chip can have a Broadcast-and-Weight architecture that includes modulated microring resonators.
An aspect of the disclosure provides a method for realizing at least one layer of a neural network comprising an analog computing platform: receiving matrix elements with an interface, and optically performing multiply-and-accumulate (MAC) operations with an optical processing chip and the matrix elements; wherein the MAC operations are part of a layered neural network. In some embodiments, MAC operations with the matrix elements can be optically performed in series. In some embodiments, a method can further comprise the analog computing platform: performing with a summation unit the addition of bias values over the results of MAC operations, wherein at least one layer of a neural network is a convolutional layer. In some embodiments, a method can further comprise the analog computing platform performing with a summation unit the addition of bias values over the results of MAC operations, and directing the results of each MAC operation to a subsequent layer; wherein the at least one layer of a neural network is a fully connected layer. In some embodiments, a method can further comprise matrix elements that include learned parameters, results of MAC operations can be biased by at least one learned parameter provided by an interface; and at least one layer of a neural network is a batch normalization layer. In some embodiments, a method can further comprise an analog computing platform that further includes a CMOS circuit with comparators configured to identify in a matrix the matrix element having the maximum value, and wherein the at least one layer of a neural network is a max pooling layer. In some embodiments, a method can further include a first matrix including a number k2 of elements, a second matrix constructed such that each of its elements is 1/k2, MAC operations between the elements of the first matrix and the elements of the second matrix that result in an average value for the elements in the first matrix; and wherein the at least one layer of a neural network is an average pooling layer. In some embodiments, a method can further include using a CMOS circuit configured to perform a ReLU non-linear function over one or more matrix elements, and the at least one layer of a neural network includes a ReLU non-linear function. In some embodiments, a method can further include using a CMOS circuit configured to perform a sigmoid function over one or more matrix elements, and the at least one layer of a neural network includes a sigmoid function. In some embodiments, a method can implement at least two different layers of a neural network in concatenation. In some embodiments, a method can include operating on matrix elements comprising Cartesian coordinates that were linearly translated to non-negative values.
An aspect of the disclosure provides a LiDAR system in which the processing of data is performed with a layered neural network implemented on an analog computing platform operative to optically perform at least one multiply-and-accumulate (MAC) operation with matrix elements received via an interface, the matrix elements including point cloud data from the LiDAR system.
An aspect of the disclosure provides a method of performing LiDAR operations comprising: scanning points of a physical environment, recording the scanned points as spherical coordinates, converting the spherical coordinates of data points into Cartesian coordinates, translating linearly the Cartesian coordinates of each scanned point such as to have non-negative values, defining each point coordinate as a matrix element, and processing the matrix elements with an analog computing platform operative to realize layers of a neural network.
Embodiments have been described above in conjunctions with aspects of the present invention upon which they can be implemented. Those skilled in the art will appreciate that embodiments may be implemented in conjunction with the aspect with which they are described, but may also be implemented with other embodiments of that aspect. When embodiments are mutually exclusive, or are otherwise incompatible with each other, it will be apparent to those skilled in the art. Some embodiments may be described in relation to one aspect, but may also be applicable to other aspects, as will be apparent to those of skill in the art.
Although the present invention has been described with reference to specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the invention. The specification and drawings are, accordingly, to be regarded simply as an illustration of the invention as defined by the appended claims, and are contemplated to cover any and all modifications, variations, combinations or equivalents that fall within the scope of the present invention.
The present application is a continuation of International Application No. PCT/CA2021/051212 filed Sep. 1, 2021 and entitled “METHODS AND SYSTEMS TO OPTICALLY REALIZE NEURAL NETWORKS”, the contents of which are incorporated herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CA2021/051212 | Sep 2021 | WO |
Child | 18441649 | US |