Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application are hereby incorporated by reference under 37 CFR 1.57.
The present disclosure relates to storage device architecture, and more particularly, to data processing inside of the storage device via machine learning.
Machine learning techniques, such as neural networks, are frequently being utilized by modern computing systems. These technologies can operate on large data sets and thus can require large amounts of storage space. However, current memory architectures do not allow for scalability of big data analysis. The present disclosure addresses these and other problems.
The innovations described in the claims each have several aspects, no single one of which is solely responsible for its desirable attributes. Without limiting the scope of the claims, some prominent features of this disclosure will now be briefly described.
While certain embodiments are described, these embodiments are presented by way of example only, and are not intended to limit the scope of protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the methods and systems described herein may be made without departing from the scope of protection.
Various embodiments of this disclosure provide a data storage device configured to perform neural network computations, the device comprising: a non-volatile memory comprising a first memory region configured to store data provided by a host system and a second memory region configured to store data related to neural network computations; a controller configured to: store data in the first memory region and retrieve data from the first memory region in response to at least one data transfer command received from the host system; and perform neural network computations in the second memory region.
In the data storage device of the preceding paragraph or any paragraphs herein, the second memory region can be configured to store a plurality of memory streams, each stream including a contiguous set of physical memory storage units of the non-volatile memory, and the controller can be further configured to perform neural network computations on the plurality of memory streams.
In the data storage device of the preceding paragraph or any paragraphs herein, the controller can be further configured to identify each memory stream of the plurality of memory streams by a common identifier.
In the data storage device of the preceding paragraph or any paragraphs herein, the plurality of memory streams can comprise a first memory stream and a second memory stream.
In the data storage device of the preceding paragraph or any paragraphs herein, the controller can be further configured to store input data for neural network computations in at least one memory stream of the plurality of memory stream.
In the data storage device of the preceding paragraph or any paragraphs herein, the plurality of memory streams can comprise at least one input memory stream and at least one output memory stream, and wherein the controller can be further configured to perform neural network computations on data stored in the at least one input memory stream and store a result of the neural network computations in the at least one output memory stream.
In the data storage device of the preceding paragraph or any paragraphs herein, the controller can be further configured to receive from the at least one output memory stream the result of the neural network computations and provide the result to the host system.
In the data storage device of the preceding paragraph or any paragraphs herein, the controller can comprise a plurality of processor cores configured to process a plurality of memory streams substantially concurrently.
In the data storage device of the preceding paragraph or any paragraphs herein, the controller can include an I/O core, and the device can further comprise another controller that includes a neural network core. The I/O core can be responsible for performing I/O operations on data, while the neural network core can be separately responsible for performing neural network computations.
Various embodiments of this disclosure provide a method for performing neural network computations within a data storage device, the method comprising, by a controller of the data storage device: receiving, from a host system, a first request to perform analysis of data stored in a memory region of a non-volatile memory of the data storage device; locking the memory region of the non-volatile memory; copying of the memory region of the non-volatile memory; unlocking the memory region of the non-volatile memory; and initiating processing of the data by applying a neural network on the copied data.
In the method of the preceding paragraph or any paragraphs herein, the neural network can include a systolic flow engine.
In the method of the preceding paragraph or any paragraphs herein, neural network parameters can be stored in the non-volatile memory, and the processing of the data via the neural network can occur within the data storage device.
The method of the preceding paragraph or any paragraphs herein can further comprise, by the processor: receiving, from the host system, a second request to perform an operation on data stored in the memory region of the non-volatile memory; and in response to determining that the memory region is locked, storing the second request in a journal until the memory region becomes unlocked.
In the method of the preceding paragraph or any paragraphs herein, the operation can include a write operation.
The method of the preceding paragraph or any paragraphs herein can further comprise, by the processor: in response to determining that processing of the data via the neural network has been completed, deleting the copy of the memory region.
Various embodiments of this disclosure provide a data storage device configured to perform neural network computations, the device comprising: a non-volatile memory comprising a first memory region configured to store data provided by a host system and a second memory region configured to store data related to neural network computations; a first controller configured to: receive, from the host system, a first request to perform analysis of data stored in the first memory region; set a locked state for the first memory region; copy the data stored in the first memory region into the second memory region; set an unlocked state for the first memory region; and perform neural network computations on the copy of the data stored in the second memory region; and a second controller configured to: receive, from the host system, a second request to perform an operation on data stored in the first memory region; in response to determining that the first memory region is in an unlocked state, perform the operation; and in response to determining that the first memory region is in a locked state, storing the second request in a journal; and perform neural network computations on the copy of the data stored in the second memory region.
In the data storage device of the preceding paragraph or any paragraphs herein, the journal can prevent writing to the first memory portion.
In the data storage device of the preceding paragraph or any paragraphs herein, the first controller can be further configured to: receive a second request to perform analysis of data stored in the first memory region while the first memory region is in a locked state and copy the data stored in the first memory region into a third memory region without waiting for the first memory region to be in an unlocked state.
In the data storage device of the preceding paragraph or any paragraphs herein, the first controller can be further configured to retrieve from an output memory stream the result of the neural network computations and provide the result to the host system, the output memory stream comprising a contiguous set of physical memory storage units of the second memory portion of the non-volatile memory.
In the data storage device of the preceding paragraph or any paragraphs herein, the first controller can comprise a plurality of processing cores configured to process a plurality of memory streams substantially concurrently.
Overview
Traditional memory architectures, such as the architecture found in non-volatile memory (NVM), magnetic random-access memory (MRAM), resistive random-access memory (Re RAM), nantero random-access memory (NRAM), and/or the like, can have low latency properties, providing opportunities to increase performance of computer systems dramatically. However, these traditional memory architectures are unable to efficiently take advantage of the non-volatile memory. Traditional memory architectures suffer from critical drawbacks, in particular if some data is not pre-fetched into the page cache, then persistent data is transferred to the dynamic random-access memory (DRAM) from persistent storage when some data is processed.
Furthermore, current memory chip architectures do not allow for scalability of big data analysis. With such architectures, large amounts of data would have to be transferred to and from the DRAM and the persistent storage devices. As such, simply increasing the number of cores for increased data processing does not address the issues described herein. For example, the storage device may have to copy data to a host side, and the host side may have to process the data. Then, one set of data needs to be copied in DRAM, the CPUs would process the set of data, and the next set of data would then be copied again for processing. This creates a large bottleneck for performance and cannot scale for large data processing. As such, the data processing would take a large amount of time and resource. Moreover, this would result in large overhead in the software stack. Furthermore, with separate CPU cores, each CPU can be dedicated to a subset of data such as modifying the subset of data, resulting in an inconsistent state of data across the CPUs. Moreover, increasing size of the DRAM also comes with inefficiencies, such as an increase in power consumption. Furthermore, the CPU may not be able to address a DRAM over a certain size, and thus the DRAM is not scalable.
Generally, some embodiments of systems and methods described herein improve memory chip architecture by processing data inside of the storage device.
Accordingly, in some embodiments, the memory chip architecture can reduce or eliminate a bottleneck based on transferring data between the storage device and the DRAM (or another type of memory). Advantageously, data processing on the storage device can be scalable, with the ability to process large amounts of data.
In the embodiment of
Data Streams Stored in Persistent Space
In some embodiments, the storage device 502 can receive a request to process a particular stream. In some embodiments, the persistent space 504 can store data persistently. The data can persist through a power off state of the storage device 502. The persistent space 504 can provide a data stream that can be used as an analogue of a file.
In some embodiments, the contiguous space that corresponds to streams can store data in the persistent space 504 and can be distinguished based on an identifier corresponding to the stream. As such, the streams can be advantageous for a machine learning process, such as a neural network, stored within the storage device because the machine learning process can apply any of the requests stored in the stream for data processing. The machine learning process can identify a stream's identifier, offset inside of the stream, and the machine learning process can process the data inside of the stream. Streams can be advantageous over storing objects because objects can include metadata that may not be needed for neural network processing. Typically, objects include object-based storing, resulting in significant restrictions on the applicability of neural networks. For example, certain neural networks may not be configured to exclude the metadata from the object data or receive as input the relevant data with the metadata, rendering the neural network inoperable. In some embodiments, the stream can store data without the metadata and/or only the relevant data for the neural network. The neural network can receive the relevant data as a byte stream from the contiguous space that corresponds to streams and can store data in the persistent space 504. Advantageously, such neural networks can be simplified in complexity by not having to be trained to differentiate between the metadata and the relevant data. Moreover, objects may or may not be stored contiguously, whereas a stream approach can enable contiguous storage.
The neural network can efficiently implement specialized algorithms for data processing. Artificial neural networks (or connectionist systems or machine learning models) can learn to perform certain tasks based on training data. Moreover, such training can occur without task-specific programming. For example, a neural network can learn to identify images that contain cats by analyzing training data of example images that have been manually labeled as “cat” or “no cat.” The neural network can adjust its weightings in the nodes to identify cats in other images.
The neural network engine used by the disclosed embodiments can be configured to any type of neural network. The neural network engine can define a neural network based on one or more factors, including (1) the number of nodes in one layer, (2) the number of hidden layers, (3) the type of activation function, and/or (4) the matrix of weights for every connection between nodes of layers. In some embodiments, the neural network can be defined based on a functionality, and the neural network engine can retrieve a predefined neural network corresponding to the desired functionality.
Streams Stored as Contiguous Sequences
The file system can allocate some physical sectors of the memory to store files. However, a storage device can be fragmented, and the files may be broken into several pieces, stored in different areas as contiguous spaces accessible via logical block addresses (“LBA”), and may not be stored contiguously. In such cases, for reading and writing to the storage device, the host would use a number of logical block addresses to store and/or retrieve data.
In some embodiments, the data is stored as streams in a contiguous area of the smart storage device.
In some embodiments, the host or host side 602 can request an inference operation of a neural network, such as a systolic flow engine, based on the stream's LBA and length. The host side 602 can send the start LBA number and the length of each extent that the host side 602 would like to process in the neural network. In other embodiments, the host side 602 can send the extent ID. The storage device 608 can receive the LBA number and lengths for each of the extents, can determine and/or configure the number of neural networks for processing the data, and process the streams by the neural networks on the storage device 608 side. Moreover, a stream-based approach of a contiguous sequence of physical memory units can provide an efficient way of processing data in a neural network on the storage device 608. Furthermore, a stream-based approach can enable in-memory neural network data processing for different file sizes, as well as files that change in size over time.
In some embodiments, the storage device 608 can implement the locking functions that enable consistency of the data representation in the stream for the neural network operations. Advantageously, even though the files can have various lengths and can change in size over time, the neural network core can still process data of a fixed size. As shown in the storage device 608, the file may not be stored in a contiguous sequence of physical memory units, but can be stored in a set of one or more storage device memory streams, such as Stream #1 610A, Stream #2 610B, Stream #N 610N (collectively referred to herein as storage device streams 610), that are distributed in different places in the storage device. Streams can be identified by a common identifier, which can be unique. Each of the storage device streams can include a contiguous sequence of physical memory storage units (such as cells, pages, sectors, etc.).
The storage device 608 can process neural network computations on the file 604 using the extents. Advantageously, the extent-based approach allows the storage device 608 to resolve conflicts between a neural network core and another core (e.g. core that processes I/O operations) because of the advantage of locking only the relevant file for the neural network and not the entire memory storage. Also, because each stream is contiguous, the neural network core can process multiple streams substantially simultaneously or substantially in parallel. This can increase efficiency of neural network processing.
Neural Network and I/O Core Architecture
In some embodiments, the neural network core 806 can copy the data 812 into a knowledge space 810 of the persistent space in order to perform neural network computations. Advantageously, creating the copy can free the I/O core 808 to perform I/O commands on the actual data 812 while the neural network core 806 is performing neural network computations in parallel or substantially in parallel. This can be useful if neural network computations take a prolonged period of time and/or if an I/O command is received from the host while neural network processing is being performed. Moreover, creating the copy allows for all neural network computations to be performed on a separate copy of the data, and not on a copy that can be modified by the I/O core 808. Moreover, the copying enables protection for the real data 812 in the event of an error (e.g. real data 812 becoming corrupted via the data processing in the neural network). In some embodiments, the storage device 802 can store the relevant streams for data processing into the knowledge 810 data space. In some embodiments, the output of the neural networks can be stored in the knowledge space 810 and/or data space 812 of the persistent space. Data can be stored, at least in the knowledge space 810, using the streams, as described herein.
In some embodiments, the neural network core 806 can configure one neural network to process data at a given time. In some embodiments, the neural network core 806 can configure a plurality of neural networks to process the same set of data at the same or substantially same time. For example, the neural network core 806 can configure a neural network to identify a person in the picture and another neural network that can identify a background location of the picture. The picture can be inputted into both neural networks for parallel or substantially parallel processing.
In some cases, the cores 806 and 808 can be implemented by different controllers or processors. For example, the I/O core 808 can be a separate core than the neural network core 806. The I/O core can have a dedicated persistent space of data 812 that is used for storing persistent data. The neural network core 806 can be an independent core, such as an ASIC, CPU, FPGA, with a dedicated persistent space of knowledge 810 data to execute training, inference and data processing.
In some embodiments, the I/O core 808 can communicate with the host without knowledge of the underlying data processing via the neural network. For example, the host can request the I/O core 808 to perform a particular operation on a set of data, such as a read/write request. The particular operation can be an inference operation of a neural network that may require substantial processing resources. The I/O core 808 can then store the data to the persistent space (as described herein). In some embodiments, the neural network core 806 can receive the input data from the host, configure the neural network to perform one or more inference operations, process the data through the neural network, and send the output data to the host. Moreover, the neural network core 806 can execute training and/or inference operations of the neural network in parallel or substantially in parallel with the other operations being performed by the I/O core 808.
In some embodiments, the storage device can lock the corresponding input data as the input data is pushed into the neural network. The I/O core doesn't need to wait for the neural network core to finish an inference operation because it only needs to lock the initial data for the time period of copying data from the data 812 to the knowledge area 810. The host can access the data without modification, such as a read operation.
In some embodiments, the neural network core can push the data into the neural network. The circuitry between the layers can include one or more memory cells to store the outputs of a previous layer as inputs to the next layer.
In some embodiments, data can be back-propagated through the layers of the persistent space for training purposes. For example, training data can be forward propagated through the neural network. Based on the output of the neural network, the neural network core can back propagate through each layer by increasing the weight for the nodes that contributed to the desired output and vice versa.
Inference Operation Execution without Copying Stream Data
In step 3, the storage device 902 can receive parameters to define a neural network configuration. In some embodiments, the neural network configuration is predefined, such as predefined during the manufacturing process. In other embodiments, the neural network can be configured and/or reconfigured based on certain parameters, such as a number of nodes, number of layers, set of weights for the nodes, type of functionality, and/or the like. In step 4, the storage device 902 can store the configuration for the neural network in a neural network configuration section 908 of the persistent space 906. As illustrated and described herein, the persistent space 906 can include one or more memory streams.
In step 5, the storage device 902 can receive a request for an inference operation. The neural network core, such as the systolic flow engine 904, can process the one or more stream of data at step 6, and return the result(s) of the processing at step 7.
In some cases, the I/O core 912 may receive a request to update an existing stream at step 1A during a neural network operation. If the I/O core 912 updates the same stream as the stream being processed by the neural network, there may be issues with the neural network not processing a static set of data, but data that is being changed while processing. This can result in errors in the inputs of the neural network and/or change the outputs of the neural network. For example, while the neural network begins processing a picture of a car stored in a stream, the I/O core receives a request to switch the same picture of the car with a picture of a building, the neural network may not be processing either a car or building. Instead, the neural network may be processing a hybrid mix of the car and building picture, resulting in an erroneous result.
Inference Operation Execution with Copying Stream Data
These problems can be addressed as shown in
In step 2, the I/O core can create a view (or a copy) of the relevant stream of data, by sending the copy of the data 1014 to the view 1008, all within the persistent space 1004. In step 3, the I/O core 1006 can indicate to the neural network core 1002 that the data is now unlocked. In step 4, the neural network core 1002 can process the data stored in the view 1008 through a neural network and can store the result in step 5 in a result space 1010.
In some embodiments, the stream that is being copied from the data 1014 to the view 1008 can be locked/unlocked individually during the copying. In some embodiments, all of the streams needed for the neural network can be locked at once, copied over, and unlocked at the same time. In some embodiments, the entire data storage 1014 can be locked, relevant streams copied, and the entire data 1014 unlocked.
In step 2A, an I/O operation and/or other operation that is received while the data is locked can be stored in a journal 1012. After the data is unlocked in step 3, in step 4A, the requests stored in the journal 1012 can be replayed and performed on the data 1014. Advantageously, the neural network can process the data in the view 1008 without affecting I/O operations that need to be performed on the data 1014 and/or without pausing to wait for the I/O operations to be completed.
Other Variations
Any of the embodiments disclosed herein can be used with any of the concepts disclosed in co-pending U.S. patent application Ser. No. 16/363,661, filed on Mar. 25, 2019, and titled “ENHANCED MEMORY DEVICE ARCHITECTURE FOR MACHINE LEARNING”.
Those skilled in the art will appreciate that in some embodiments additional system components can be utilized, and disclosed system components can be combined or omitted. Although some embodiments describe video data transmission, disclosed systems and methods can be used for transmission of any type of data. In addition, although some embodiments utilize erasure coding, any suitable error correction schemes can be used. The actual steps taken in the disclosed processes may differ from those shown in the figures. Depending on the embodiment, certain of the steps described above may be removed, others may be added. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the protection. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the protection. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the protection. For example, the systems and methods disclosed herein can be applied to hard disk drives, hybrid hard drives, and the like. In addition, other forms of storage (such as, DRAM or SRAM, battery backed-up volatile DRAM or SRAM devices, EPROM, EEPROM memory, etc.) may additionally or alternatively be used. As another example, the various components illustrated in the figures may be implemented as software and/or firmware on a processor, ASIC/FPGA, or dedicated hardware. Also, the features and attributes of the specific embodiments disclosed above may be combined in different ways to form additional embodiments, all of which fall within the scope of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of this disclosure. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will further be understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Further, references to “a method” or “an embodiment” throughout are not intended to mean the same method or same embodiment, unless the context clearly indicates otherwise.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the various embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of this disclosure. The example embodiments were chosen and described in order to best explain the principles of this disclosure and the practical application, and to enable others of ordinary skill in the art to understand this disclosure for various embodiments with various modifications as are suited to the particular use contemplated.
Although the present disclosure provides certain preferred embodiments and applications, other embodiments that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the features and advantages set forth herein, are also within the scope of this disclosure. Accordingly, the scope of the present disclosure is intended to be defined only by reference to the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
3602186 | Popenoe | Aug 1971 | A |
5091864 | Baji et al. | Feb 1992 | A |
5138695 | Means et al. | Aug 1992 | A |
5208900 | Gardner | May 1993 | A |
5226092 | Chen | Jul 1993 | A |
5337395 | Vassiliadis | Aug 1994 | A |
5509106 | Pechanek et al. | Apr 1996 | A |
5519811 | Yoneda et al. | May 1996 | A |
5627943 | Yoneda et al. | May 1997 | A |
5659781 | Larson | Aug 1997 | A |
5799134 | Chiueh et al. | Aug 1998 | A |
5812993 | Ginosar et al. | Sep 1998 | A |
7085749 | Matsugu | Aug 2006 | B2 |
7437339 | Matsugu | Oct 2008 | B2 |
7489834 | Kloth | Feb 2009 | B2 |
7564996 | Kloth | Jul 2009 | B2 |
7743004 | Matsugu | Jun 2010 | B2 |
7774313 | Nachenberg | Aug 2010 | B1 |
8392683 | Confalonieri | Mar 2013 | B1 |
8724624 | Bazlamacci et al. | May 2014 | B2 |
8824603 | Ge et al. | Sep 2014 | B1 |
9646243 | Gokmen | May 2017 | B1 |
9665799 | Munteanu et al. | May 2017 | B1 |
9697463 | Ross et al. | Jul 2017 | B2 |
9710748 | Ross et al. | Jul 2017 | B2 |
9721203 | Young et al. | Aug 2017 | B1 |
9747548 | Ross et al. | Aug 2017 | B2 |
9805303 | Ross et al. | Oct 2017 | B2 |
9928460 | Nowatzyk et al. | Mar 2018 | B1 |
9959500 | Torng et al. | May 2018 | B1 |
10019668 | Woo | Jul 2018 | B1 |
10043095 | Yang et al. | Aug 2018 | B2 |
10074051 | Thorson et al. | Sep 2018 | B2 |
10083171 | Yang et al. | Sep 2018 | B1 |
10083395 | Young | Sep 2018 | B2 |
10102453 | Yang et al. | Oct 2018 | B1 |
10459849 | Shorb | Oct 2019 | B1 |
10521488 | Ross et al. | Dec 2019 | B1 |
10790828 | Gunter et al. | Sep 2020 | B1 |
10817802 | Bruestle et al. | Oct 2020 | B2 |
20030004907 | Matsugu | Jan 2003 | A1 |
20040156546 | Kloth | Aug 2004 | A1 |
20040156547 | Kloth | Aug 2004 | A1 |
20070011120 | Matsugu | Jan 2007 | A1 |
20080270335 | Matsugu | Oct 2008 | A1 |
20110029471 | Chakradhar et al. | Feb 2011 | A1 |
20120257506 | Bazlamacci et al. | Oct 2012 | A1 |
20140214766 | Kato | Jul 2014 | A1 |
20140270494 | Sawhney et al. | Sep 2014 | A1 |
20140289445 | Savich | Sep 2014 | A1 |
20150006444 | Tamatsu et al. | Jan 2015 | A1 |
20150112911 | Jackson et al. | Apr 2015 | A1 |
20150170021 | Upon et al. | Jun 2015 | A1 |
20150178246 | Abellanas et al. | Jun 2015 | A1 |
20160142731 | Nakagami et al. | May 2016 | A1 |
20160210550 | Merrill et al. | Jul 2016 | A1 |
20160342889 | Thorson et al. | Nov 2016 | A1 |
20160342893 | Ross et al. | Nov 2016 | A1 |
20160371496 | Sell | Dec 2016 | A1 |
20170103313 | Ross et al. | Apr 2017 | A1 |
20170103314 | Ross | Apr 2017 | A1 |
20170103318 | Ross et al. | Apr 2017 | A1 |
20170147942 | Gao | May 2017 | A1 |
20180005115 | Gokmen et al. | Jan 2018 | A1 |
20180032835 | Shirahata | Feb 2018 | A1 |
20180075350 | Gokmen | Mar 2018 | A1 |
20180101743 | Yang et al. | Apr 2018 | A1 |
20180101747 | Yang et al. | Apr 2018 | A1 |
20180101748 | Yang et al. | Apr 2018 | A1 |
20180107921 | Ross et al. | Apr 2018 | A1 |
20180129936 | Young et al. | May 2018 | A1 |
20180157465 | Bittner et al. | Jun 2018 | A1 |
20180157940 | Yang et al. | Jun 2018 | A1 |
20180165577 | Young et al. | Jun 2018 | A1 |
20180173441 | Cargnini | Jun 2018 | A1 |
20180174031 | Yang et al. | Jun 2018 | A1 |
20180189595 | Yang et al. | Jul 2018 | A1 |
20180189642 | Boesch et al. | Jul 2018 | A1 |
20180189648 | Sengupta et al. | Jul 2018 | A1 |
20180247113 | Yang et al. | Aug 2018 | A1 |
20180268234 | Yang et al. | Sep 2018 | A1 |
20180285005 | Torng et al. | Oct 2018 | A1 |
20180285006 | Torng et al. | Oct 2018 | A1 |
20180285713 | Torng et al. | Oct 2018 | A1 |
20180285714 | Torng et al. | Oct 2018 | A1 |
20180285720 | Torng et al. | Oct 2018 | A1 |
20180285722 | Torng et al. | Oct 2018 | A1 |
20180285723 | Torng et al. | Oct 2018 | A1 |
20180307438 | Huang et al. | Oct 2018 | A1 |
20180307980 | Barik et al. | Oct 2018 | A1 |
20180309050 | Torng et al. | Oct 2018 | A1 |
20180314671 | Zhang et al. | Nov 2018 | A1 |
20180336164 | Phelps et al. | Nov 2018 | A1 |
20180341621 | Park et al. | Nov 2018 | A1 |
20190042918 | Meyer et al. | Feb 2019 | A1 |
20190043203 | Fleishman et al. | Feb 2019 | A1 |
20190073259 | Qin et al. | Mar 2019 | A1 |
20190114499 | Delaye et al. | Apr 2019 | A1 |
20190114548 | Wu et al. | Apr 2019 | A1 |
20190121889 | Gold | Apr 2019 | A1 |
20190156187 | Dasari et al. | May 2019 | A1 |
20190179795 | Huang et al. | Jun 2019 | A1 |
20190236049 | Vantrease et al. | Aug 2019 | A1 |
20190317901 | Kachare et al. | Oct 2019 | A1 |
20200073726 | Lee et al. | Mar 2020 | A1 |
20200127685 | Chen | Apr 2020 | A1 |
20200133531 | Subramaniam et al. | Apr 2020 | A1 |
20200134462 | Gupta et al. | Apr 2020 | A1 |
20200293866 | Guo | Sep 2020 | A1 |
20200327367 | Ma et al. | Oct 2020 | A1 |
20200387798 | Newage et al. | Dec 2020 | A1 |
Number | Date | Country |
---|---|---|
197131902 | Feb 1973 | AU |
771045 | Dec 1973 | BE |
930619 | Jul 1973 | CA |
2139302 | Oct 1978 | DE |
3373210 | Sep 2018 | EP |
196704 | Aug 1975 | ES |
2104032 | Apr 1972 | FR |
1316899 | May 1973 | GB |
37434 | Jan 1974 | IL |
197900473 | May 1979 | KR |
361090 | Oct 1973 | SE |
2017006512 | Jan 2017 | WO |
2019075267 | Apr 2019 | WO |
Entry |
---|
Ogunmolu et al.; “Nonlinear Systems Identification Using Deep Dynamic Neural Networks”; Oct. 2016; available at: https://www.researchgate.net/publication/308896333_Nonlinear_Systems_Identification_Using_Deep_Dynamic_Neural_Networks. |
Ogunmolu et al.; “Nonlinear Systems Identification Using Deep Dynamic Neural Networks”; in arXiv preprint arXiv: 1610.01439; Oct. 5, 2016. |
Lu et al.; “FlexFlow: a Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks”; 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA); 2017; pp. 553-564; available at: https://ieeexplore.ieee.org/document/7920855. |
Qi et al.; “FPGA design of a multicore neuromorphic processing system”; NAECON 2014; IEEE National Aerospace and Electronics Conference; 2014; pp. 255-258; available at: https://ieeexplore.ieee.org/abstract/document/7045812. |
Shafiee et al; “ISAAC: a Convolutional Neural Network Accelerator with in-Situ Analog Arithmetic in Crossbars”; In Proceedings of the 43rd ICSA; pp. 14-26; IEEE press Year: 2016; available at: https://ieeexplore.ieee.org/document/7551379. |
Chi et al.; “PRIME: a Novel Processing-in-memory Architecture for Neural Network”; Jun. 2016; available at https://dl.acm.org/doi/10.1145/3007787.3001140. |
Girones et al.; “Systolic Implementation of a Pipelined on-Line Backpropagation”; Sep. 1999; available at: https://ieeexplore.ieee.org/document/758891. |
International Search Report and Written Opinion from International Application No. PCT/US2018/066593, dated Mar. 29, 2019, 11 pages. |
International Search Report and Written Opinion from International Application No. PCT/US2018/066917, dated Mar. 29, 2019, 11 pages. |
Mahapatra et al.; “Mapping of Neural Network Models onto Systolic Arrays”, Journal of Parallel and Distributed Computing, vol. 60, Issue 6, Jun. 2000, pp. 677-689; available at: https://www.sciencedirect.com/science/article/abs/pii/S0743731500916344. |
Pending U.S. Appl. No. 15/981,679, filed May 16, 2018, entitled “Systolic Neural Network Engine With Crossover Connection Optimization”, Luiz M. Franca-Neto. |
Pending U.S. Appl. No. 16/363,661, filed Mar. 25, 2019, entitled “Enhanced Memory Device Architecture for Machine Learning”, Luiz M. Franca-Neto. |
Chen et al., “Eyeriss: a Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks”, 2016, 2016 ACM/ IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 367-379, doi: 10.1109/ISCA.2016.40. (Year 2016). |
Gokmen et al., “Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices”, Oct. 10, 2017, Front. Neurosci. 11 :538. doi: 10.3389/fins.2017.00538 (Year: 2017). |
James E. Smith, “Decoupled Access/Execute Computer Architectures”, Apr. 1982, SIGARCH Comput. Archit. News 10, 3 (Apr. 1982), 112-119. DOI:https://doi.org/10.1145/1067649.801719(Year: 1982). |
Jones et al., “Learning in Linear Systolic Neural Network Engines: Analysis and Implementation”, Jul. 1994, IEEE Transactions on Neural Networks, vol. 5, No. 4, p. 584-593(Year: 1994). |
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit”, Jun. 24-28, 2017, In Proceedings of ISCA'17, 12 pages. (Year: 2017). |
Du et al.; “A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things”; in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, No. 1, pp. 198-208; Jan. 2018; available at: https://.ieeexplore.ieee.org/document/8011462. |
U.S. Appl. No. 16/234,184, filed Dec. 27, 2018, Franca-Neto. |
U.S. Appl. No. 16/233,876, filed Dec. 27, 2018, Franca-Neto. |
U.S. Appl. No. 15/981,624, filed May 16, 2018, Franca-Neto. |
U.S. Appl. No. 16/233,968, filed Dec. 27, 2018, Franca-Neto. |
U.S. Appl. No. 16/234,166, filed Dec. 27, 2018, Franca-Neto. |
U.S. Appl. No. 15/981,664, filed May 16, 2018, Franca-Neto. |
U.S. Appl. No. 15/981,719, filed May 16, 2018, Franca-Neto. |
U.S. Appl. No. 15/981,711, filed May 16, 2018, Franca-Neto. |
U.S. Appl. No. 15/981,735, filed May 16, 2018, Franca-Neto. |
Parhami et al.; “Periodically Regular Chordal Rings”; IEEE Transactions on Parallel and Distributed Systems; vol. 10, No. 6; Jun. 1999; available at: https://ieeexplore.ieee.org/document/774913. |
Number | Date | Country | |
---|---|---|---|
20200311537 A1 | Oct 2020 | US |