The present disclosure relates to an information processing apparatus, an information processing method, an information processing program, and an information processing system.
Devices incorporating a deep neural network (DNN) function are known. In general, a DNN process involves a large calculation cost, and furthermore, a model size tends to be larger as a model becomes capable of executing a complicated and advanced DNN process. Therefore, a technique of dividing the DNN function, executing processing of some of divided DNN functions by the device, and executing processing of other DNN functions in an external server or the like has been proposed.
When a DNN function is divided, it is necessary to perform appropriate division according to the function and processing amount.
An object of the present disclosure is to provide an information processing apparatus, an information processing method, an information processing program, and an information processing system capable of appropriately dividing a network function.
For solving the problem described above, an information processing apparatus according to one aspect of the present disclosure has a controller configured to select, from a plurality of networks, one or more first networks executed on a one-to-one basis by one or more first processors different from each other, and select, from the plurality of networks, a second network executed by a second processor; and a transmission unit configured to transmit the one or more first networks to the one or more first processors on a one-to-one basis, and transmit the second network to the second processor, wherein the second processor executes the second network using output data as an input, the output data being output as a result of executing a network selected from the one or more first networks for at least one processor among the one or more first processors, and the controller selects, from the plurality of networks, the second network according to the output data.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. Note that, in the following embodiments, same parts are denoted by same reference signs to omit redundant description.
Hereinafter, the embodiments of the present disclosure will be described in the following order.
A first embodiment of the present disclosure will be described. In an information processing system according to the first embodiment of the present disclosure, a controller determines a first network, executed by a first processor, that performs processing on input data received and a second network, executed by a second processor, that performs processing on an output of the first network.
Note that, in the present disclosure, a neural network such as a deep neural network (DNN) is applicable to the network.
First, a basic configuration according to the first embodiment will be described.
In
The network controller 20 is configured in a server on a communication network such as the Internet, for example, and capable of communicating with the first processor 11 and the second processor 12 via the communication network. A specific example of an arrangement of the network controller 20, a sensing device 10, the first processor 11, the second processor 12, and an application execution unit 30 will be described later.
A task is input to the network controller 20. Here, the task refers to processing executed using the neural network (DNN or the like). The task is input to the network controller 20, for example, as a model in the neural network used for the task. The network controller 20 analyzes the task input and divides the task into at least two tasks. For example, when the task is divided into two tasks that are a first task and a second task, the network controller 20 assigns the first task to the first processor 11 and the second task to the second processor 12.
In other words, it can be said that the network controller 20 divides the neural network used by the task into the first neural network used by the first task and the second neural network used by the second task, and assigns the first neural network to the first processor 11 and the second neural network to the second processor 12.
For example, an output of the sensing device 10 is input to the first processor 11 as input data. When the sensing device 10 is an imaging device, the output of the sensing device 10 is a captured image (image signal) obtained by imaging using the imaging device. The input data to the first processor 11 is data to be processed by the task input to the network controller 20, and is not limited to the output of the sensing device 10. The first processor 11 executes the first task on the input data and outputs a processing result. The second processor 12 executes the second task on an output of the first processor and outputs a processing result as output data.
The output data output from the second processor 12 is supplied to the application execution unit 30. The application execution unit 30 may include, for example, an application program installed in an information processing apparatus such as a general computer.
Note that, in
In
In Steps S101 and S102, the network controller 20 may analyze and divide the network based on a function of the network, a hardware element related to hardware that executes the network, an application element related to an application that uses an output of the network, and the like.
Hereinafter, unless otherwise specified, the description will be given on an assumption that the network controller 20 divides the network into a first network that performs processing in a preceding stage in the network and a second network that performs processing on an output of the first network.
In next Step S103, the network controller 20 determines a processor that executes each network divided in Step S102. For example, the network controller 20 determines the first processor 11 as a processor that executes processing by the first network, and determines the second processor 12 as a processor that executes processing by the second network.
In next Step S104, the network controller 20 transmits each divided network to each processor determined to be executed. Specifically, the network controller 20 transmits the first network to the first processor 11 and transmits the second network to the second processor 12.
The first processor 11 uses the output of the sensing device 10 as input data and executes processing by the first network on the input data. The second processor 12 executes processing by the second network on an output of the first processor 11. An output of the second processor 12 is supplied to the application execution unit 30.
As described above, in the first embodiment, the network controller 20 determines the first network to be executed by the first processor and the second network to be executed by the second processor that performs processing on the output of the first processor 11. Therefore, by applying the first embodiment, the network can be appropriately divided and allocated to the first processor 11 and the second processor 12.
Next, the configuration according to the first embodiment will be described in more detail.
In
The imaging block 110 and the signal processing block 120 are electrically connected by connection lines CL1 and CL2 and CL3 that are internal buses, respectively. As will be described later, the imaging block 110 may correspond to the sensing device 10 in
The imaging block 110 includes an imaging unit 111, an imaging processing unit 112, an output controller 113, an output interface (I/F) 114, and an imaging controller 115, and images a subject to obtain a captured image.
The imaging unit 111 includes a pixel array in which a plurality of pixels is arranged according to a matrix array. Each of the pixels is a light receiving element that outputs a signal corresponding to light received by photoelectric conversion. The imaging unit 111 is driven by the imaging processing unit 112 and images the subject.
In other words, light from an optical system (not illustrated) enters the imaging unit 111. The imaging unit 111 receives incident light from the optical system in each pixel included in the pixel array, performs photoelectric conversion, and outputs an analog image signal corresponding to the incident light.
The size of an image according to the image signal output from the imaging unit 111 can be selected from, for example, a plurality of sizes such as, in width×height, 3968 pixels×2976 pixels, 1920 pixels×1080 pixels, and 640 pixels×480 pixels. The image size that can be output by the imaging unit 111 is not limited to this example. Furthermore, for the image output by the imaging unit 111, for example, it is possible to select whether to set a color image of red, green, and blue (RGB) or a monochrome image of only luminance. These selections for the imaging unit 111 may be performed as a type of shooting mode setting.
Note that information based on the output of each pixel arranged in a matrix in the pixel array is referred to as a frame. In the imaging device 100, the imaging unit 111 repeatedly acquires information on the pixels in the matrix at a predetermined rate (frame rate) in chronological order. The imaging device 100 collectively outputs the information acquired for each frame.
Under the control of the imaging controller 115, the imaging processing unit 112 performs imaging processing related to imaging of the image in the imaging unit 111, such as driving of the imaging unit 111, analog to digital (AD) conversion of an analog image signal output from the imaging unit 111, and imaging signal processing.
Examples of the imaging signal processing performed by the imaging processing unit 112 include processing of obtaining brightness for each of predetermined small regions by calculating an average pixel value for each of the small regions for an image output from the imaging unit 111, high dynamic range (HDR) processing to convert the image output from the imaging unit 111 into an HDR image, defect correction, and development.
As a captured image, the imaging processing unit 112 outputs a digital image signal obtained by AD conversion or the like of the analog image signal output from the imaging unit 111. Furthermore, the imaging processing unit 112 can also output a RAW image, as a captured image, that is not subjected to processing such as development. Note that an image in which each pixel has information on each color of RGB obtained by processing such as development on the RAW image is referred to as an RGB image.
The captured image output by the imaging processing unit 112 is supplied to the output controller 113 and also supplied to an image compressor 125 of the signal processing block 120 via the connection line CL2. In addition to the captured image supplied from the imaging processing unit 112, a signal processing result of signal processing using the captured image and the like is supplied from the signal processing block 120 to the output controller 113 via the connection line CL3.
The output controller 113 performs output control of selectively outputting the captured image from the imaging processing unit 112 and the signal processing result from the signal processing block 120 from (one) output I/F 114 to the outside (e.g., memory connected to the outside of the imaging device 100). In other words, the output controller 113 selects the captured image from the imaging processing unit 112 or the signal processing result from the signal processing block 120, and supplies the image or the result to the output I/F 114.
The output I/F 114 is an I/F that outputs the captured image and the signal processing result supplied from the output controller 113 to the outside. For example, a relatively high-speed parallel I/F such as a mobile industry processor interface (MIPI) can be adopted as the output I/F 114.
In the output I/F 114, the captured image from the imaging processing unit 112 or the signal processing result from the signal processing block 120 is output to the outside according to the output control of the output controller 113. Therefore, for example, when only the signal processing result from the signal processing block 120 is necessary outside and the captured image itself is not necessary, only the signal processing result can be output. As a result, an amount of data output from the output I/F 114 to the outside can be reduced.
Furthermore, in the signal processing block 120, signal processing for obtaining a signal processing result required outside is performed, and the signal processing result is output from the output I/F 114, so that it is not necessary to perform signal processing outside. As a result, a load on an external block can be reduced.
The imaging controller 115 includes a communication I/F 116 and a register group 117.
The communication I/F 116 is, for example, a first communication I/F such as a serial communication I/F such as an inter-integrated circuit (I2C), and exchanges necessary information, such as information read from and written to a group of registers 117, with the outside (e.g., controller that controls a device equipped with the imaging device 100).
The register group 117 includes a plurality of registers and stores imaging information related to imaging of an image by the imaging unit 111 and various other types of information. For example, the register group 117 stores the imaging information received from the outside in the communication I/F 116 and an imaging signal processing result (e.g., brightness for each small area of the captured image) of the imaging processing unit 112.
Examples of the imaging information stored in the register group 117 include ISO sensitivity (analog gain at the time of AD conversion in the imaging processing unit 112), exposure time (shutter speed), frame rate, focus, shooting mode, and clipping range (information).
The shooting mode includes, for example, a manual mode in which an exposure time, a frame rate, and the like are manually set, and an automatic mode in which the exposure time, the frame rate, and the like are automatically set according to a scene. Examples of the automatic mode include modes corresponding to various shooting scenes such as a night scene and a person's face.
Furthermore, the clipping range represents a range clipped from an image output by the imaging unit 111 when a part of the image output by the imaging unit 111 is clipped and output as a captured image in the imaging processing unit 112. By specifying the clipping range, for example, only a range in which a person is captured can be clipped from the image output by the imaging unit 111. Note that, as image clipping, there is a method of clipping only an image (signal) in the clipping range from the imaging unit 111 in addition to a method of clipping from the image output by the imaging unit 111.
The imaging controller 115 controls the imaging processing unit 112 according to the imaging information stored in the register group 117, thereby controlling imaging of an image by the imaging unit 111.
Note that the register group 117 may store output control information regarding output control by the output controller 113 in addition to the imaging information and the imaging signal processing result in the imaging processing unit 112. The output controller 113 can perform the output control of selectively outputting the captured image and the signal processing result according to the output control information stored in the register group 117.
Furthermore, in the imaging device 100, the imaging controller 115 and a CPU 121 of the signal processing block 120 are connected via the connection line CL1, and the CPU 121 can read and write information from and to the register group 117 via the connection line CL1. In other words, in the imaging device 100, reading and writing of information from and to the register group 117 can be performed not only from the communication I/F 116 but also from the CPU 121.
The signal processing block 120 includes the central processing unit (CPU) 121, a digital signal processor (DSP) 122, a memory 123, a communication I/F 124, the image compressor 125, and an input I/F 126, and performs predetermined signal processing using the captured image or the like obtained by the imaging block 110. Note that the CPU 121 is not limited thereto, and may be a micro processor unit (MPU) or a micro controller unit (MCU).
The CPU 121, the DSP 122, the memory 123, the communication I/F 124, and the input I/F 126 configuring the signal processing block 120 are connected to each other via a bus, and can exchange information as necessary.
The CPU 121 executes the program stored in the memory 123 to perform control of the signal processing block 120, reading and writing of information from and to the register group 117 of the imaging controller 115 via the connection line CL1, and other various processes.
For example, by executing the program, the CPU 121 functions as an imaging information calculation unit that calculates imaging information by using a signal processing result obtained by signal processing in the DSP 122, and feeds back new imaging information calculated by using the signal processing result to the register group 117 of the imaging controller 115 via the connection line CL1 to store new imaging information.
Therefore, as a result, the CPU 121 can control imaging by the imaging unit 111 and imaging signal processing by the imaging processing unit 112 according to the signal processing result of the captured image.
In addition, the imaging information stored in the register group 117 by the CPU 121 can be provided (output) to the outside from the communication I/F 116. For example, focus information in the imaging information stored in the register group 117 can be provided from the communication I/F 116 to a focus driver (not illustrated) that controls the focus.
By executing the program stored in the memory 123, the DSP 122 functions as a signal processing unit that performs signal processing using the captured image supplied from the imaging processing unit 112 to the signal processing block 120 via the connection line CL2 and information received by the input I/F 126 from the outside.
The memory 123 includes a static random access memory (SRAM), a dynamic RAM (DRAM), and the like, and stores data and the like necessary for processing in the signal processing block 120.
For example, the memory 123 stores a program received from the outside in the communication I/F 124, the captured image compressed by the image compressor 125 and used in signal processing in the DSP 122, the signal processing result of signal processing performed by the DSP 122, information received by the input I/F 126, and the like.
The communication I/F 124 is, for example, a second communication I/F such as a serial communication I/F such as a serial peripheral interface (SPI), and exchanges necessary information such as a program executed by the CPU 121 or the DSP 122 with the outside (e.g., memory and controller (not illustrated)).
For example, the communication I/F 124 downloads a program executed by the CPU 121 or the DSP 122 from the outside, supplies the program to the memory 123, and stores the program. Therefore, various processes can be executed by the CPU 121 or the DSP 122 by the program downloaded by the communication I/F 124.
Note that, in addition to programs, the communication I/F 124 can exchange arbitrary data with the outside. For example, the communication I/F 124 can output the signal processing result obtained by signal processing in the DSP 122 to the outside. In addition, the communication I/F 124 outputs information according to an instruction of the CPU 121 to an external device, whereby the external device can be controlled according to the instruction of the CPU 121.
The signal processing result obtained by the signal processing in the DSP 122 can be output from the communication I/F 124 to the outside and can be written in the register group 117 of the imaging controller 115 by the CPU 121. The signal processing result written in the register group 117 can be output from the communication I/F 116 to the outside. The same applies to the processing result of the processing performed by the CPU 121.
The captured image is supplied from the imaging processing unit 112 to the image compressor 125 via the connection line CL2. The image compressor 125 performs a compression process of compressing the captured image, and generating a compressed image having a smaller data amount than the captured image. The compressed image generated by the image compressor 125 is supplied to the memory 123 via a bus and stored therein.
Here, the signal processing in the DSP 122 can be performed using not only the captured image itself but also the compressed image generated from the captured image by the image compressor 125. Since the compressed image has the smaller amount of data than the captured image, it is possible to reduce the load of signal processing in the DSP 122 and to save a storage capacity of the memory 123 that stores the compressed image.
As the compression process in the image compressor 125, for example, scale-down for converting a captured image of 3968 pixels×2976 pixels into an image of 640 pixels×480 pixels can be performed. Furthermore, when the signal processing in the DSP 122 is performed on luminance and the captured image is an RGB image, YUV conversion for converting the RGB image into, for example, a YUV image can be performed as the compression process.
Note that the image compressor 125 can be realized by software or can be realized by dedicated hardware.
The input I/F 126 is an I/F that receives external information. The input I/F 126 receives, for example, an output of an external sensor (external sensor output) from the external sensor, and supplies the external sensor output to the memory 123 via a bus. Then, the memory stores the external sensor output. As the input I/F 126, for example, a parallel I/F such as an MIPI can be adopted similarly to the output I/F 114.
Furthermore, as the external sensor, for example, a distance sensor that senses information regarding a distance can be adopted. Furthermore, as the external sensor, for example, an image sensor that senses light and outputs an image corresponding to the light, i.e., an image sensor different from the imaging device 100, can be adopted.
In the DSP 122, in addition to using the captured image or the compressed image generated from the captured image, the input I/F 126 can perform signal processing using the external sensor output received from the external sensor and stored in the memory 123 as described above.
The DSP 122, or the DSP 122 and the CPU 121 may correspond to the first processor 11 in
In the imaging device 100 configured as described above, signal processing using the captured image obtained by imaging by the imaging unit 111 or the compressed image generated from the captured image is performed by the DSP 122, and the signal processing result of the signal processing and the captured image are selectively output from the output I/F 114. Therefore, it is possible to downsize the imaging device that outputs the information required by the user.
Here, when the signal processing of the DSP 122 is not performed in the imaging device 100, and thus the signal processing result is not output from the imaging device 100 but the captured image is output, i.e., when the imaging device 100 is configured as an image sensor that merely captures and outputs an image, the imaging device 100 can be configured only with the imaging block 110 not provided with the output controller 113.
For example, as illustrated in
Note that the die refers to a small thin piece of silicon in which an electronic circuit is built, and a piece in which one or more dies are sealed is referred to as a chip.
In
The die 130 on the top side and the die 131 on the bottom side are electrically connected, for example, by forming a through hole that penetrates the die 130 and reaches the die 131. The present disclosure is not limited thereto. The dies 130 and 131 may be electrically connected by performing metal-metal wiring such as Cu—Cu bonding that directly connects metal wiring such as Cu exposed on a bottom surface side of the die 130 and metal wiring such as Cu exposed on a top surface side of the die 131.
Here, in the imaging processing unit 112, as a method of performing AD conversion of the image signal output from the imaging unit 111, for example, a column-parallel AD method or an area AD method can be adopted.
In the column-parallel AD method, for example, an AD converter (ADC) is provided for a column of pixels configuring the imaging unit 111, and the ADC in each column is in charge of AD conversion of a pixel signal of a pixel in the column, whereby AD conversion of an image signal of a pixel in each column in one row is performed in parallel. When the column-parallel AD method is adopted, a part of the imaging processing unit 112 that performs AD conversion of the column-parallel AD method may be mounted on the die 130 on the top side.
In the area AD method, pixels configuring the imaging unit 111 are divided into a plurality of blocks, and the ADC is provided for each block. Then, the ADC of each block is in charge of AD conversion of the pixel signals of the pixels in the block, whereby AD conversion of the image signals of the pixels in the plurality of blocks is performed in parallel. In the area AD method, the block is a minimum unit, and the AD conversion (reading and AD conversion) of image signals can be performed only for necessary pixels among the pixels configuring the imaging unit 111.
Note that, when an area of the imaging device 100 is allowed to be large, the imaging device 100 can be configured with one die.
Furthermore, in the example in
Here, in an imaging device in which chips of a sensor chip, a memory chip, and a DSP chip are connected in parallel by a plurality of bumps (hereinafter also referred to as a bump-connected imaging device), a thickness is greatly increased and the device is increased in size as compared with the one-chip imaging device 100 configured in a stacked structure.
Furthermore, in the bump-connected imaging device, it may be difficult to secure a sufficient rate at which the captured image is output from the imaging processing unit 112 to the output controller 113 due to signal deterioration or the like at a connected portion of the bumps.
According to the imaging device 100 having the stacked structure, it is possible to prevent the above-described increase in size of the device and the inability to secure a sufficient rate between the imaging processing unit 112 and the output controller 113. Therefore, according to the imaging device 100 having the stacked structure, it is possible to downsize the imaging device that outputs information required for processing in a subsequent stage of the imaging device 100.
When the information required in the subsequent stage is the captured image, the imaging device 100 can output the captured image (RAW image, RGB image, etc.). Furthermore, when information required in the subsequent stage is obtained by signal processing using the captured image, the imaging device 100 can obtain and output the signal processing result as the information required by the user by performing the signal processing in the DSP 122.
As the signal processing performed by the imaging device 100, i.e., the signal processing of the DSP 122, for example, a recognition process of recognizing a predetermined recognition target from the captured image can be adopted.
Furthermore, for example, the imaging device 100 can receive, by the input I/F 126, an output of a distance sensor such as a time of flight (ToF) sensor arranged to have a predetermined positional relationship with the imaging device 100. In this case, as the signal processing of the DSP 122, for example, a fusion process can be adopted to integrate the output of the distance sensor and the captured image to obtain an accurate distance, such as a process of removing noise of the distance image obtained from the output of the distance sensor received by the input I/F 126 using the captured image.
Furthermore, for example, the imaging device 100 can receive, by the input I/F 126, an image output by an image sensor arranged to have a predetermined positional relationship with the imaging device 100. In this case, as the signal processing of the DSP 122, for example, a simultaneously localization and mapping (SLAM) process using the image received by the input I/F 126 and the captured image as stereo images can be adopted.
Note that, in
In
The analysis unit 200, the division unit 201, the transmission unit 202, and the learning unit 203 included in the network controller 20 are configured by executing an information processing program according to the first embodiment on the CPU. However, the present disclosure is not limited thereto. Some or all of the analysis unit 200, the division unit 201, the transmission unit 202, and the learning unit 203 may be configured by hardware circuits that operate in cooperation with each other.
The task is input to the network controller 20. The task is input as, for example, a model of the neural network used by the task. The task input to the network controller 20 is delivered to the analysis unit 200 and the learning unit 203.
The analysis unit 200 analyzes the task delivered. For example, by analyzing the task, the analysis unit 200 extracts processing performed in the preceding stage of the task and processing performed in the subsequent stage based on a processing result of the preceding stage. The analysis result by the analysis unit 200 is passed to the division unit 201 together with the task input to the network controller 20.
When the task is divided into two tasks, the division unit 201 divides the task into a preceding-stage network (first network) and a subsequent-stage network (second network) based on the analysis result received from the analysis unit 200. The division unit 201 further determines, from the processors included in the device group 21, a processor to which each of the first network and the second network is applied. The device group 21 includes, for example, the imaging device 100, a signal processing device that performs signal processing and the like on the output of the imaging device 100.
In this example, the division unit 201 determines the first processor 11, to which the input data is input as described with reference to
The division unit 201 passes, to the transmission unit 202, the first network and the second network obtained by dividing the task input to the network controller 20, and information indicating processors to apply the first network and the second network.
The transmission unit 202 transmits the first network and the second network transferred from the division unit 201 to respective processors determined. For example, the transmission unit 202 transmits the first network and the second network among the processors included in the devices included in the device group 21 to the respective processors determined. In this example, the transmission unit 202 transmits the first network to the first processor 11 and transmits the second network to the second processor 12.
The processors included in the device group 21 and to which the first network and the second network are applied can return, for example, a processing result of at least one of the first network and the second network to the network controller 20. The processing result is passed to the learning unit 203 in the network controller 20.
The learning unit 203 can retrain, using the processing result received, the network input as the task into the network controller 20. The learning unit 203 passes the retrained network to the analysis unit 200. The analysis unit 200 analyzes the retrained network received from the learning unit 203, and the division unit 201 divides the retrained network and updates the first network and the second network.
In
The storage device 2014 is a nonvolatile storage medium such as a hard disk drive or a flash memory. The CPU 2010 controls the entire operation of the server 2000 using the RAM 2012 as a work memory according to a program stored in the storage device 2014 or the ROM 2011.
The display device 2013 includes a display device that displays an image, and a display controller that converts a display control signal generated by the CPU 2010 into a display signal that can be displayed on the display device. The input device 2017 receives a user input, and a pointing device such as a mouse, a keyboard, or the like can be applied. Devices applicable to the input device 2017 are not limited thereto.
The data I/F 2015 is an interface for inputting/outputting data to/from an external device. A universal serial bus (USB) or the like can be applied as the data I/F 2015, but an applicable interface method is not particularly limited. The communication I/F 2016 controls communication via a communication network such as the Internet.
As described above, the server 2000 includes the CPU 2010, the ROM 2011, the RAM 2012, and the like, and is configured as a general computer. Not limited to this, the server 2000 may be configured using a cloud computing service by cloud computing.
In the server 2000, the CPU 2010 executes the information processing program for realizing the function according to the first embodiment, thereby configuring each of the analysis unit 200, the division unit 201, the transmission unit 202, and the learning unit 203 described above as, for example, a module on a main storage area in the RAM 2012.
The information processing program can be acquired from the outside via a communication network (not illustrated) by communication via the communication I/F 2016, for example, and can be installed on the server 2000. However, the present disclosure is not limited thereto, and the information processing program may be provided by being stored in a detachable storage medium such as a compact disk (CD), a digital versatile disk (DVD), or a universal serial bus (USB) memory.
Next, a system configuration according to the first embodiment will be described.
In addition, processing by the first network obtained by dividing the network is referred to as a first-phase process in the network, and processing by the second network is referred to as a second-phase process in the network. In
First to seventh examples of the system configuration according to the first embodiment will be described more specifically with reference to
Note that an imaging device 150 in
Note that, in the configuration in
Note that, in the configuration of
The output of the imaging device 150 is transmitted to the cloud network 310 via wired or wireless communication such as the Internet or an intranet, and is input to the signal processing unit 301. The signal processing unit 301 can correspond to the application execution unit 30 in the information processing system 1 described with reference to
Note that, in the configuration of
The processing result of the signal processing unit 301a in the information processing apparatus 300a is transmitted to the cloud network 310 via wired or wireless communication such as the Internet or an intranet, and is input to the signal processing unit 301b. The information processing apparatus 300a may directly transmit the processing result output from the signal processing unit 301a to the cloud network 310, or may read the processing result from the storage device 303 and transmit the processing result to the cloud network 310.
The processing result of the signal processing unit 301b is output to the outside of the cloud network 310, for example, and is provided, for example, to the user. The processing result by the signal processing unit 301b may be stored in a storage device (not illustrated) included in the cloud network 310.
In an example in
Note that, in the configuration in
In
The output of the imaging device 100 is transmitted to the information processing apparatus 300b via, for example, wired or wireless communication, and is input to the second DNN 52. The output of the second DNN 52 is output to the application execution unit 30 in the information processing system 1 described with reference to
Note that, in the configuration in
The output of the imaging device 100 is transmitted to the cloud network 310 via wired or wireless communication such as the Internet or an intranet, and is input to the second DNN 52. The processing result by the second DNN 52 is input to the signal processing unit 301a. The signal processing unit 301a can correspond to the application execution unit 30 in the information processing system 1 described with reference to
Note that, in the configuration in
The processing result of the signal processing unit 301a in the information processing apparatus 300b is transmitted to the cloud network 310 via wired or wireless communication such as the Internet or an intranet, and is input to the signal processing unit 301b. The information processing apparatus 300b may directly transmit the processing result output from the signal processing unit 301a to the cloud network 310, or may read the processing result from the storage device 303 and transmit the processing result to the cloud network 310.
The processing result by the signal processing unit 301b is output to the outside of the cloud network 310 and provided, for example, to the user. The processing result by the signal processing unit 301b may be stored in a storage device (not illustrated) included in the cloud network 310.
In an example in
Note that, in the configuration in
Next, a second embodiment of the present disclosure will be described. In the second embodiment, a network is divided into a plurality of networks based on performance required for the network (referred to as required performance).
The required performance includes at least one of performance required for a hardware element and performance required for an application element. The application element is required for output data output from a network by an application that executes processing on the output data.
Prior to the description of the second embodiment, an existing technology related to the second embodiment will be described to facilitate understanding.
The smart camera 1100 includes an imaging device 100 using a CIS and a host unit 140 that performs processing on the output data output from the imaging device 100 and controls the imaging device 100. Furthermore, the imaging device 100 includes a DNN processor 1110 that executes processing by the DNN, and a memory that can use, for example, 8 megabytes (MB) of a capacity for processing by the DNN processor 1110. Furthermore, the network controller 1000 is configured on, for example, a cloud network or a server, and includes a DSP converter 1011 and a packager 1012.
A DNN 50 mounted on the DNN processor 1110 of the smart camera 1100 is input to the network controller 1000 and delivered to the DSP converter 1011. The DSP converter 1011 converts the DNN 50 into a format that can be executed by the DNN processor 1110. Furthermore, the DSP converter 1011 optimizes the converted DNN 50 so that the DNN falls within the capacity of the memory connected to the DNN processor 1110 in the smart camera 1100.
The DNN 50 having converted and optimized format (hereinafter referred to as an optimized DNN) is encrypted and packaged by a packager 1012 and transmitted to the smart camera 1100. In the smart camera 1100, the host unit 140 transfers the converted DNN transmitted from the network controller 1000 to a memory connected to the DNN processor 1110, and mounts the optimized DNN on the DNN processor 1110.
In this system, the DSP converter 1011 returns an error when the data size of the optimized DNN is larger than the capacity of the memory connected to the DNN processor 1110. In an example in
Furthermore, according to the existing technology, even when the host unit 140 of the smart camera 1100 or a part of the optimized DNN can be executed in the cloud network, this point is not automatically taken into consideration in the DSP converter 1011. As described above, according to the existing technology, since the DSP converter 1011 returns an error indicating that the optimized DNN cannot be mounted, execution of the DNN is frequently limited even when the smart camera 1100 is working.
As another method, it is conceivable that the DNN is manually divided on the user side to be adapted to constraints of the device (smart camera 1100 in this example). Then, a portion for performing pre-stage processing in the DNN is executed in the DNN processor 1110, and a portion for the subsequent stage is executed in the host unit 140.
However, in general, there may be various types of devices with different specifications that are not recognized by the user or a developer of the DNN 50. Therefore, it is considered extremely difficult to manually and efficiently divide the DNN 50 to be executed by various processors and components in the system. It may also be necessary to divide the DNN in different ways for certain applications based on system constraints.
Therefore, in the second embodiment of the present disclosure, the DNN 50 to be mounted on the device is divided based on performance (required performance) required for execution of processing by the DNN 50. As a result, for example, in the smart camera 1100, DNNs can be appropriately mounted on the DNN processor 1110 and the host unit 140 included in the imaging device 100, and processing by a more advanced DNN can be executed as a system.
Next, a configuration and processing according to the second embodiment will be described.
In the example in
Each of the smart cameras 1100a, 1100b, and 1100c includes an imaging device 100 including the DNN processor 1110 and a memory (not illustrated) that can use, for example, 8 MB of capacity for processing by the DNN processor 1110. Each DNN processor 1110 may include the function of the first processor 11.
For example, the imaging device 100 may have the configuration of the imaging device 100 illustrated in
Furthermore, the smart cameras 1100a, 1100b, and 1100c include respective host units 140a, 140b, and 140c that control processing on the output of the imaging device 100 included in each of the smart cameras and the operation of the imaging device 100. In the example in
The MCU 141 and the MPU 142 included in the host units 140a and 140b may each include the function of the second processor 12. In addition, the MPU 142 and the ACCL 143 included in the host unit 140c may include, in cooperation or either one, the function of the second processor 12.
In the following description, unless otherwise specified, the smart camera 1100a among the smart cameras 1100a, 1100b, and 1100c will be described as a target device on which the DNN 50 is mounted.
In
The CIS converter 225 converts the input DNN into a format that can be executed by the DNN processor 1110. At this time, the CIS converter 225 calculates a key performance indicator (KPI) when processing by the input
DNN is executed on the DNN processor 1110. The key performance indicator calculated here is performance (required performance) required for processing by the input DNN.
The host DNN converter 226 converts the input DNN into a format that can be executed by the host unit 140a (MCU 141).
The device setting database 221 stores target device capability information indicating the capability of a target device.
The DNN 50 to be operated in the smart camera system 2 is input to the optimization unit 220.
The optimization unit 220 uses the optimization algorithm 230 to determine which part of the input DNN 50 is to be operated on which hardware component on the smart camera system 2. The optimization algorithm 230 also considers requirements (application elements) for an application that uses an output of processing by the DNN 50, in addition to the capabilities of each hardware (hardware elements), to efficiently divide the DNN 50. Examples of the application element include a frame rate and transmission size required by the application and cost between components in the application.
More specifically, the optimization unit 220 delivers the input DNN 50 to the CIS converter 225. The CIS converter 225 obtains the key performance indicator based on the DNN 50 transferred and returns the key performance indicator to the optimization unit 220.
In addition, a target device type, an application parameter, and a user flag are input to the optimization unit 220.
The target device type indicates a type of the target device. The application parameter is a parameter for specifying a requirement for the output in an application that uses the output of the target device. The application parameter includes, for example, at least one of a frame rate of the output of the target device, a total processing time in the application for the output, and other constraints in the application.
The user flag specifies whether or not the user has priority when executing the DNN on a specific component in the smart camera system. For example, the user flag specifies whether the DNN is executed only by the DNN processor 1110 of the smart camera 1100a, only by the host unit 140a, by the DNN processor 1110 and the host unit 140a, or by the host unit 140a and the cloud network.
Designation by the user flag is necessary, for example, when an application used by the user emphasizes privacy with respect to the output of the smart camera 1100a and it is desired to execute processing by the DNN only on a specific component such as the DNN processor 1110 (imaging device 100).
The target device type, the application parameter, and the user flag may be specified by the user.
Further, the optimization unit 220 acquires target device capability information from the device setting database 221. The target device capability information includes various information regarding the target device (referred to as the smart camera 1100a).
The target device capability information may include, for example, at least one of the following pieces of information.
The optimization unit 220 determines which part of the DNN 50 is operated on which hardware component in the smart camera system 2 by using the optimization algorithm 230 based on each piece of information (target device type, application parameter, user flag, target device capability information, key performance indicator) described above. The optimization unit 220 divides the DNN 50 according to this determination.
For example, the optimization unit 220 divides the DNN 50 into a first network executed by the DNN processor 1110, a second network executed by the host unit 140a (MCU 141), and a third network executed by a server or a cloud network. Here, the optimization unit 220 can determine a division position of the DNN 50 so that, for example, the divided first network and second network can be processed temporally continuously by a pipeline process.
In the compatibility test processing unit 222, compatibility of the first, second, and third networks into which the DNN 50 is divided by the optimization unit 220 with each component (DNN processor 1110, host unit 140a (MCU 141), server, or cloud network) to which the first, second, and third networks are applied is tested by a DSP process 240, a host unit process 241, and a server process 242, respectively. For example, the compatibility test processing unit 222 may execute each test by simulation.
In addition, the compatibility test processing unit 222 obtains a key performance indicator (KPI) based on a test result and presents the obtained key performance indicator to, for example, the user.
The CIS converter 225 converts the first network determined to have no problem in compatibility by the DSP process 240 into a format that can be executed by the DNN processor 1110. The host DNN converter 226 converts the second network determined by the host unit process 241 to have no problem in compatibility into a format that can be executed by the host unit 140a (MCU 141).
The packager 223 encrypts and packages the first network converted by the CIS converter 225 and the second network converted by the host DNN converter 226, and transmits the first network and the second network to the smart camera 1100a. In addition, the packager 223 passes the third network, which is determined to have no problem in compatibility by the server process 242, to the DNN execution unit 224.
The smart camera 1100a transfers the converted first network passed from the packager 223 to a memory connected to the DNN processor 1110, and mounts the first network on the DNN processor 1110. Similarly, the smart camera 1100a transfers the converted second network transferred from the packager 223 to the host unit 140a, and mounts the second network on the host unit 140a (MCU 141).
In Step S10a, the DNN 50 is input to the optimization unit 220. In addition, in Step S10b, the target device type, the application parameter, and the user flag are input to the optimization unit 220. In Step S11a, the optimization unit 220 passes the DNN 50 to the CIS converter 225. The CIS converter 225 calculates the key performance indicator based on the transferred DNN 50. The calculated key performance indicator may include information indicating a runtime memory capacity required and an execution cycle when executing a process by the DNN 50. The CIS converter 225 returns the calculated key performance indicator to the optimization unit 220 (Step S11b).
In Step S12, the optimization unit 220 acquires the target device capability information of the target device (smart camera 1100a in this example) from the device setting database 221.
In Step S13, the optimization unit 220 uses the optimization algorithm 230 to determine efficient arrangement of the DNN 50 for each hardware component in the smart camera system 2 based on each piece of information acquired in Step S10b and the target device capability information acquired in Step S12.
The optimization unit 220 determines the arrangement in consideration of not only each independent piece of information but also a combination thereof. For example, the optimization unit 220 may consider a transfer data size and a transfer speed in data transfer between the DNN processor 1110 in the smart camera 1100a, the host unit 140a, and the cloud network or the server, and a total latency by an application executed by the application execution unit 30.
The optimization unit 220 divides the DNN 50 into a plurality of subnetworks (e.g. first, second and third networks) by the optimization algorithm 230. The optimization unit 220 allocates arrangement of the divided subnetworks to respective hardware components of the smart camera system 2. The key performance indicator when executing each subnetwork can be obtained by a simulator for each hardware component.
In Step S200, the optimization unit 220 acquires the key performance indicator (KPI) acquired from the CIS converter 225 in Step S12b as a condition.
In next Step S201, the optimization unit 220 extracts one combination of parameters using each piece of information acquired in Step S10b and each piece of target device capability information acquired in Step S12 as parameters. In next Step S202, the optimization unit 220 executes simulation when each subnetwork is executed in each hardware component by the combination of the extracted parameters (DSP process 240, host unit process 241, and server process 242).
In next Step S203, the optimization unit 220 compares the key performance indicator obtained by the simulation with the key performance indicator acquired as a condition in Step S200.
In next Step S204, the optimization unit 220 determines whether or not the key performance indicator obtained in Step S202 satisfies the condition based on a comparison result in Step S203. When determining that the key performance indicator obtained in Step S202 does not satisfy the condition (Step S204, “No”), the optimization unit 220 returns the process to Step S201 and extracts one unprocessed combination of parameters.
On the other hand, when the optimization unit 220 determines in Step S204 that the key performance indicator obtained in Step S202 satisfies the condition (Step S204, “Yes”), the process proceeds to Step S205. In Step S205, the optimization unit 220 applies each subnetwork to each hardware component whose arrangement has been determined.
After the process in Step S205, a series of processes according to the flowchart of
In an example in Section (a), the DNN is divided into the first network and the second network immediately after a layer N. Here, it is assumed that a transfer size of the layer N is 256 pixels×256 pixels×32 bits, and the layer N−1 (Layer N−1) immediately before the layer N has a size of 64 pixels×64 pixels×32 bits. In other words, in the example in the section (a), data of 256 pixels×256 pixels×32 bits=2097152 bits is transferred from the DNN processor 1110 to the host unit 140a or the server.
Section (b) illustrates an example of a case where optimization is performed on the state of Section (a). In an example in Section (b), a division position of the DNN is changed to a position immediately before the layer N (between the layer N and the layer N−1). The last layer of the first network is the layer N−1, and data of 64 pixels×64 pixels×32 bits=131072 bits is transferred from the DNN processor 1110 to the host unit 140a or the server. Therefore, in the example in Section (b), an amount of data to be transferred is 1/16 as compared with the case of Section (a).
Returning to the description of
For example, the compatibility test processing unit 222 simulates an operation when each subnetwork is executed in each hardware component, and calculates an approximate key performance indicator of the execution. The compatibility test processing unit 222 compares the calculated key performance indicator with the key performance indicator acquired from the CIS converter 225 in Step S12b, and determines whether or not each subnetwork can be arranged in each hardware component.
The compatibility test processing unit 222 delivers the subnetwork (first network) determined to have no problem in compatibility as a result of the test to the CIS converter 225 (Step S15a). The CIS converter 225 converts the subnetwork passed into a format that can be executed by the DNN processor 1110.
In addition, the compatibility test processing unit 222 passes a subnetwork (second network) determined to have no problem in compatibility as a result of the compatibility test to the host DNN converter 226. The host DNN converter 226 converts the subnetwork passed into a format that can be executed by the host unit 140a (MCU 141).
Further, the compatibility test processing unit 222 presents the key performance indicator calculated by the compatibility test to, for example, the user (Step S16). The user may determine a result of the compatibility test based on the key performance indicator presented.
The compatibility test processing unit 222 passes the subnetwork (first network) converted by the CIS converter 225 and the subnetwork (second network) converted by the host DNN converter 226 to the packager 223 (Step S17). The packager 223 encrypts and packages each subnetwork passed from the compatibility test processing unit 222, and transmits the encrypted and packaged subnetworks to the smart camera 1100a (Step S18). In addition, the packager 223 passes, to the DNN execution unit 224, a subnetwork (third network) that is determined by the server process 242 to have no problem in compatibility.
Next, effects according to the second embodiment will be described.
In Sections (a) to (c) of
First, processing according to the existing technology will be described with reference to Section (a) of
In the existing technology illustrated in Section (a) of
The existing technology illustrated in Section (a) of
In Section (a) of
In Section (a) of
The existing technology illustrated in Section (a) of
Therefore, an interval at which the processing result using the DNN processor 1110 is output (interval of the dotted arrows) is 30 msec, i.e., twice the frame interval (15 msec) that the imaging device 100 can originally output. The number of frames (frame rate) of the DNN processing result that can be output by the DNN processor 1110 per unit time (for example, 1 sec) is ½ of a frame rate R that can be originally output by the imaging device 100, i.e., R/2. When the DNN processing result is output, the frame rate at which the input data used for the DNN process is output is also R/2.
Next, an effect of the processing according to the second embodiment on the frame rate will be described with reference to Section (b) in
Section (b) in
In the second embodiment, the smart camera 1100a includes a plurality of processors capable of processing by DNNs, and the plurality of processors executes processing using DNNs in a pipeline. The pipeline process by the plurality of processors can be commonly applied to the first embodiment described above and third to fifth embodiments described later.
As an example of performing the pipeline process using a plurality of processors, Section (b) of
Specifically, the DNN process is performed on information of one frame input from the imaging unit 111 in the imaging device 100 by using the first processor (e.g., the DNN processor 1110 included in the imaging device 100) and the second processor (e.g., the MCU 141 included in the host unit 140a). The total time required for this processing is 20 msec, which is the same as in the case of the existing technology illustrated in Section (a) of
Note that the first processor and the second processor can respectively correspond to the first processor 11 and the second processor 12 described with reference to
This DNN process is divided into a first DNN process executed by the first processor and a second DNN process executed by the second processor. Here, time required for the first DNN process and time required for the second DNN process are each 10 msec, which is shorter than the frame interval that can be output by the imaging device 100.
For the first frame input from the imaging unit 111 in the imaging device 100, first, the first processor (DNN processor 1110) executes the first DNN process, and then, the second processor (MCU 141 of the host unit 140a) executes the second DNN process. In this case, in a time zone overlapping with at least a part of time zone in which the second processor executes the second DNN process on the first frame, a so-called pipeline process in which the first processor executes the first DNN process on the second frame to be the next frame in parallel is executed.
The dotted arrow extending downward from the rear end of each band indicating the second DNN process in Section (b) of
In Section (b) of
In the existing technology described using Section (a) of
In particular, in the case of the example illustrated in Section (b) of
In a form in which not only the result of the DNN process but also the input data used for the DNN process is output together, an interval of outputting the input data used for the DNN process indicated by the solid arrow at the tip of each band of the first DNN process is also the same as the frame interval that can be originally output by the imaging device 100. Therefore, the frame rate at which the input data is output is also the same as the frame rate at which the imaging device 100 outputs data.
The second embodiment includes the plurality of processors, and the plurality of processors is configured to execute the process using DNNs in the pipeline. As a result, in the second embodiment, it is possible to increase the frame rate at which the process using the DNN is performed and the frame rate at which the processing result is output, with respect to the existing technology not having this configuration. In other words, with respect to the existing technology, the second embodiment can shorten the time from the start of the process using the DNN to the output of the processing result of the process.
Next, an effect on latency of processing according to the second embodiment will be described with reference to Section (c) in
In the second embodiment, the plurality of processors is provided, and the plurality of processors is configured to execute processing using DNNs in the pipeline, so that latency caused by performing the DNN process can be reduced as compared with the existing technology that does not have this configuration.
More specifically, in the second embodiment, the “Time required from when a sensing device (for example, the imaging device 100) outputs data that is a sensing result to when the data is processed by a first processor that performs DNN processing and the processing result is output to a subsequent stage that uses the processing result” can be reduced.
Here, as processing using the DNN according to the second embodiment, a process of detecting and discriminating an object existing in front of a vehicle in a vehicle that performs automatic driving will be described as an example.
In this example, the following processes (a) and (b) are executed as the process of detecting and discriminating the object.
In the case of the existing technology illustrated in Section (a) of
On the other hand, in the case of the second embodiment illustrated in Section (c) of
After the first DNN process, at a time point when the second DNN process executed by the second processor is completed, a result of discriminating the type of the detected object can be output to the travel controller that controls the travel of the own vehicle. When receiving the processing result of the first DNN process, the travel controller can start the brake operation of the own vehicle. Therefore, from when the smart camera 1100a outputs the data of the first frame to when the first processor outputs the result of the first DNN process for the first frame is latency for the travel controller (latency of the first DNN in the drawing).
As described above, the second embodiment includes the plurality of processors, and the plurality of processors is configured to execute processing using DNNs in the pipeline. As a result, the latency can be reduced (shortened) as compared with the existing technology described using Section (a) of
Next, a third embodiment of the present disclosure will be described. In the third embodiment, a network is divided into a plurality of networks based on functions in each unit of the network.
Prior to the description of the third embodiment, an existing technique related to the third embodiment will be described for easy understanding.
Generally, while the DNN process has a large calculation cost, calculation resources are limited in edge devices. As a result, there are the following problems.
The edge device is a terminal information device connected to the Internet. Examples of the edge device include a monitoring camera that transmits information such as image data via the Internet, an electronic device provided in a home appliance and having a communication function with respect to the Internet, and the like.
In order to overcome these problems, many methods for creating a lightweight and high-performance model using data quantization and the like have been proposed, but the DNN process that can be performed by the edge device is still limited.
Therefore, when a high-performance DNN process is performed, data acquired by an edge device is transmitted to a device such as a server having a relatively large amount of calculation resources, and the process is performed there. However, there are many problems such that a communication band becomes a bottleneck or data including personal information needs to be added to communication. In particular, in image processing that tends to require a large amount of calculation resources, this problem becomes more obvious.
In the example in
In other words, in the existing technology, when the plurality of processes is performed in one or more DNN processes, data acquired by the edge device is transferred to a single device (information processing apparatus 400 in this example) having a lot of calculation resources, particularly memories, and the DNN process is performed. In this case, since the image data has a relatively large amount of data transferred in a unit time, the communication band used for data transfer may become a bottleneck of the transfer speed. In addition, the image data may include personal information such as a face.
A schematic configuration according to the third embodiment will be described.
In
It is assumed that the second processor 12 can use more abundant hardware resources than the first processor 11 and can execute processing at a higher speed and with higher functionality than the first processor 11. As an example, the capacity of the memory that is connected to the second processor 12 and that can be used by the second processor may be larger than the capacity of the memory that is connected to the first processor 11 and that can be used by the first processor. A frequency at which the second processor operates may be higher than a frequency at which the first processor operates.
A processing result of the first processor 111 is transmitted to the second processor 12. The processing result of the first processor 111 can be, for example, a feature amount map or metadata obtained by processing of the RGB image by the first DNN 51.
Furthermore, the information processing system according to the third embodiment can include a plurality of first processors 111, 112, . . . , and 11N each executing processing by the first DNN 51. For example, the network controller 20 divides the processing in the first phase into the first DNN process, second DNN process, . . . , and N-th DNN process. The network controller 20 determines the first processors 111, 112, . . . , and 11N as processors for executing the first DNN process, the second DNN process, . . . , and the Nth DNN process, respectively.
In this case, the DNN processes executed by the first processors 111, 112, . . . , and 11N are not necessarily the same, and may be different. For example, processes in the first phases different from each other and having a common process in the second phase may be assigned to the first processors 111, 112, . . . , and 11N, respectively. The output of each of the first processors 111, 112, . . . , 11N is transmitted to the second processor 12.
The configuration described with reference to
In Step S300, the network controller 20 trains a model of the given DNN 50 as necessary. In next Step S301, the network controller 20 analyzes a task executed by the model. In next Step S302, the network controller 20 divides the model based on the result of the task analysis in Step S301.
For example, the network controller 20 extracts, based on the model analysis in Step S301, a layer group related to the processing of the preceding stage and a layer group related to the processing in the subsequent stage that receives the processing result of the preceding stage in each layer configuring the DNN 50. In model division in Step S302, the network controller 20 divides the DNN 50 into a model of a layer group related to the processing in the preceding stage (first DNN 51) and a model of a layer group related to the processing in the subsequent stage (second DNN).
In Step S303, the network controller 20 transmits the divided models to the corresponding devices (processors). The network controller 20 transmits, for example, the model of the layer group related to the processing in the preceding stage to the first processor 11 to which input data is input. In addition, the model of the layer group related to the processing in the subsequent stage is transmitted to the second processor 12 to which the output of the first processor 11 is input.
Devices to which respective models are transmitted execute processing according to each model. Each device may transmit a processing result of the processing according to each model to the network controller 20.
In next Step S304, the network controller 20 receives data (for example, a processing result) transmitted from each device that has transmitted each model. In next Step S305, the network controller 20 retrains the model, i.e., the DNN 50 based on the data transmitted from each device. In Step S305, only a part of the divided network may be retrained.
In next Step S306, the network controller 20 determines whether or not the model has been updated by retraining in Step S305. When the network controller 20 determines that no update has occurred in the model (Step S306, “No”), the process returns to Step S304. On the other hand, when it is determined that the update has occurred in the model, the network controller 20 returns the process to Step S302 and executes the model division on the updated model.
As described above, in the third embodiment, the first DNN 51 and the second DNN 52 obtained by dividing the DNN 50 are executed by different hardware such as the first processor 111 and the second processor 12. Therefore, in the third embodiment, it is possible to implement execution of a model that is difficult to execute with a single device, reduction in communication bandwidth, improvement in security, and the like. In addition, in the third embodiment, one of the pre-stage processing and the post-stage processing can be made common, and the other can be changed. Furthermore, in the third embodiment, parallel processing is possible in the edge device.
Here, in Patent Literature 1, the DNN process is divided into a preceding stage and a subsequent stage, but the DNN process performed in the preceding stage and the subsequent stage is uniquely fixed. Further, in Patent Document 1, a single process is performed.
On the other hand, in the third embodiment, the network model used in the subsequent stage DNN can be switched according to the processing result of the preceding stage DNN. Furthermore, in the third embodiment, a configuration in which the plurality of DNN models is processed in parallel can be adopted in processing in both the preceding stage and the subsequent stage.
Furthermore, in Patent Literature 1, regarding the division method of the DNN process, the pre-stage processing is convolutional neural network (CNN), and the post-stage processing is long short term memory (LSTM).
On the other hand, in the third embodiment, the DNN process for realizing a function A as a whole can be performed, or the DNN process for realizing a function B as a whole can be performed. When the function A is realized as a whole, a function A #1 obtained by dividing the function A into functions A #1 and A #2 can be executed by the processor #1 (e.g., first processor 11) including hardware resources necessary for executing the function A #1, and the function A #2 can be executed by the processor #2 (e.g., second processor) including hardware resources necessary for executing the function A #2.
Similarly, when the function B is realized as a whole, and the processor #1 is insufficient for hardware resources required for executing a function B #1 of the function B divided into functions B #1 and B #2 but the processor #2 is sufficient, the processor #2 executes the function B #1. Similarly, when the processor #1 is insufficient for hardware resources required for executing the function B #2 but the processor #2 is sufficient, the processor #2 executes the function B #2.
The processor #2 alone can execute the function A and also execute the function B. In this case, when the function A is executed, in order to prioritize a desired characteristic, the function A #1 is executed by the processor #1, and the function A #2 is executed by the processor #2. On the other hand, when the function B is executed, the function B as a whole is executed by the processor #2 without giving priority to the above characteristic.
This a configuration is not disclosed in Patent Literature 1.
Hereinafter, each example of the configuration according to the third embodiment will be described with reference to
A first example of the third embodiment will be described. The first example of the third embodiment is an example of determining an output of the preceding stage by processing in the subsequent stage when the processing by the second DNN 52 that executes the processing in the second phase that is the subsequent stage is performed using a processing result of the first DNN 51 that executes processing in the first-phase process that is the preceding stage.
In the first processor 11, the first-phase process (described as phase #1 in the drawing) is executed by the first DNN 51. In the example in the drawing, the first processor 11 can output three types of data of an input tensor 60, an output tensor 61, and a RAW data 62 according to the model to be applied.
The input tensor 60 is, for example, an RGB image obtained by performing processing such as development of an output of the imaging unit 111 so as to give RGB color information to each pixel. The output tensor 61 is, for example, processing result data obtained by performing processing by the first DNN 51 on the captured image. In the example in the drawing, a detection result obtained by detecting an object (person) based on the image captured by the first DNN 51 is set as the output tensor 61. The output tensor 61 may include, for example, a feature amount map as a detection result. The RAW data 62 is data for which processing such as development is not performed on the output of imaging unit 111.
The second processor 12 performs the second-phase process (described as phase #2 in the drawing) by the second DNN 52. The second processor 12 transmits, to the first processor 11, a model for outputting data required in the second-phase process among the input tensor 60, the output tensor 61, and the RAW data 62. The first processor 11 stores the model transmitted from the second processor 12 in, for example, the memory 123 (see
When the first processor 11 does not need to change the model, the second processor 12 may only specify the type of the output data without transmitting the model to the first processor 11. For example, when the input tensor 60 and the RAW data 62 are required in the second DNN 52, there is a possibility that it is not necessary to change the model in the first processor 11.
Note that, in the configuration of
Next, a second example of the third embodiment will be described. The second example of the third embodiment is an example of changing the model applied to the second DNN 52 that executes the processing in the second phase that is the subsequent stage.
In the example in
A first specific example in the second example of the third embodiment will be described. The first specific example is an example in which the DNN of the subsequent stage, i.e., the second processor 12, is switched according to the processing result of the first DNN 51 in the preceding stage, i.e., the first processor 11.
The second processor 12 is configured on, for example, a server connected to the first processor 11 via a communication network, and can switch between two second DNNs 52a and 52b having different functions according to a counting result of the first DNN 51b.
More specifically, each of the second DNNs 52a and 52b executes the tracking process of tracking the detected person based on the result of the person detection by the first DNN 51b. In this case, as the first DNN process in the second phase, the second DNN 52a executes a first tracking process corresponding to high accuracy but a small number of people. On the other hand, as the second DNN process in the second phase, the second DNN 52b executes a second tracking process corresponding to low accuracy but a large number of people.
The second processor 12 compares the number of people counted by the first processor 11 with a threshold th. As a result of comparison, the second processor 12 executes the process by the second DNN 52a when the number of people is less than the threshold th (Yes), and executes the process by the second DNN 52b when the number of people is equal to or more than the threshold th (No).
The comparison in the second processor 12 may be realized by a program different from the second DNNs 52a and 52b, or may be included in functions of the second DNNs 52a and 52b.
A second specific example in the second example of the third embodiment will be described. The second specific example is an example in which a plurality of second DNNs having different functions is executed in parallel in the subsequent stage, i.e., the second processor 12.
The second processor 12 is configured on, for example, a server connected to the first processor 11 via a communication network, and executes two second DNNs 52c and 52d having different functions in parallel. In this example, the second DNN 52c executes, as the second-phase process, a segmentation process based on the feature amount map generated by the first DNN 51c. Furthermore, the second DNN 52d executes, as the second-phase process, a posture estimation process based on the feature amount map generated by the first DNN 51c.
In
The second DNN 52c includes a pyramid pooling module 520, a decorder module 521, and an auxiliary loss module 522. The first feature amount map is input to the pyramid pooling module 520, and is output as information of 21 classes×(height) 475 pixels×(width) 475 pixels by the pyramid pooling module 520 and the decorder module 521. Furthermore, the second feature amount map is input to the auxiliary loss module 522 and is output as information of 21 classes×(height) 475 pixels×(width) 475 pixels. Outputs of the decorder module 521 and the auxiliary loss module 522 are set as outputs of the second DNN 52c.
The first DNN 51c further generates a third feature amount map of 128 channels×(height) 475 pixels×(width) 475 pixels based on the image 55 including information of 3 channels (e.g., RGB)×(height) 46 pixels×(width) 46 pixels. The second DNN 52d generates part affinity fields (PAFs) of 38 channels×(height) 46 pixels×(width) 46 pixels in Block 1_1 of Stage 1 based on the third feature amount map output from the first DNN 51c. Furthermore, in Block 1_2 of Stage 1, a heat map of 19 channels×(height) 46 pixels×(width) 46 pixels is generated.
The second DNN 52d integrates the PAFs and the heat map generated in Stage 1 and the feature amount map generated in the first DNN 51c, and generates PAFs of 38 channels×(height) 46 pixels×(width) 46 pixels in Block 2_1 of Stage 2 based on the integrated information. Furthermore, in Block 2_2 of Stage 2, a heat map of 19 channels×(height) 46 pixels×(width) 46 pixels is generated.
The second DNN 52d repeatedly executes Stage 2-related processes in Stage 3 to Stage 6 to generate PAFs of 38 channels×(height) 46 pixels×(width) 46 pixels in Block 6_1 of Stage 6. Furthermore, the second DNN 52d generates a heat map of 19 channels×(height) 46 pixels×(width) 46 pixels in Block 6_2 of Stage 6. The second DNN 52d outputs the PAFs and the heat map generated in Stage 6 as a posture estimation result in the second DNN 52d.
Note that, in the example in
Next, a third example of the third embodiment will be described. The third example of the third embodiment is an example in which different DNN processes are performed by a plurality of edge devices as the preceding stage, and DNN process of aggregating a plurality of outputs of the preceding stage is executed in the subsequent stage.
Each of the first processors 11-1, 11-2, 11-3, and so on passes each processing result by the processing of the first DNNs 51-1, 51-2, 51-3, and so on to the second processor 12. For example, in the second processor 12 mounted on the server, a second DNN 52e executes the DNN process of aggregating each processing result by the processing of the first DNNs 51-1, 51-2, 51-3, and so on as the second-phase process.
A first specific example in the third example of the third embodiment will be described. The first specific example is an example in which data mining is performed using the output of the preceding stage, i.e., each of the first processors 11-11, 11-12, 11-13, and so on, in the subsequent stage, that is, the second processor 12.
The first processor 11-11 transmits, to the second processor 12, the posture information output through the processing of the first DNN 51-11. The first processor 11-12 transmits, to the second processor 12, the posture information output by the processing of the first DNN 51-12. Furthermore, the first processor 11-13 transmits, to the second processor 12, the face information obtained by the processing of the first DNN 51-13 in association with, for example, the feature amount based on the face and the ID for specifying the face.
The second processor 12 performs data mining based on each piece of information transmitted from the first processors 11-11, 11-12, and 11-13 by the processing of a second DNN 52f. As the data mining process executed by the second DNN 52f, behavior analysis or the like can be considered. For example, based on the posture information and the age information transmitted from the first processors 11-11 and 11-12, the second DNN 52f refers to database (DB) 56 in which the behavior information regarding the posture and the age is registered, thereby performing the behavior analysis on the person indicated by the ID.
At this time, since the first DNN 51-13 that performs the face authentication transmits the ID and the feature amount instead of the face image, it is possible to protect the privacy of the person whose behavior has been analyzed.
A second specific example in the third example of the third embodiment will be described. The second specific example is an example in which a person or the like is tracked using the output of the preceding stage, i.e., each of the first processors 11-21, 11-22, 11-23, and so on in the subsequent stage, i.e., the second processor 12.
Each of the first processors 11-21, 11-22, and 11-23 transmits, to the second processor 12 in association with each other, the feature amount of the person authenticated by each of the first DNNs 51-21, 51-22, and 51-23, an ID (in this example, target X and target Y) for identifying the person, and information indicating time when the person with the ID is authenticated.
The second processor 12 can acquire a device position information 57 indicating a position of each imaging device (imaging devices A, B, and C) that acquires an image for authentication by each of the first processors 11-21, 11-22, and 11-23. The device position information 57 may be given in advance, or may be obtained based on an object or the like included in the captured image captured by each imaging device.
The second processor 12 tracks, by a second DNN 52g, the movement trajectory of the person indicated by the ID that performs the second-phase process based on the feature amount transmitted from each of the first processors 11-21, 11-22, and 11-23, the time information, and the position information of each imaging device. The person indicated in the information transmitted from each of the first processors 11-21, 11-22, and 11-23 can be identified and specified by comparing the feature amounts.
In the example in the drawing, by the processing of the first DNN 51-21 of the first processor 11-21, the target Y is recognized at time “10:00”, and the target X is recognized at the time “10:01”. Furthermore, by the processing of the first DNN 51-22 of the first processor 11-22, the target X is recognized at time “10:00”, and the target Y is recognized at the time “10:01”. Furthermore, the targets X and Y are recognized at time “10:02” by the processing of the first DNN 51-23 of the first processor 11-23.
The second processor 12 detects that the object X moves in the order of the position of the imaging device B, the position of the imaging device A, and the position of the imaging device C by the tracking process based on the information transmitted from each of the first processors 11-21, 11-22, and 11-23 by the processing of the second DNN 52g. Similarly, the second processor 12 detects that the object Y moves in the order of the position of the imaging device A, the position of the imaging device B, and the position of the imaging device C by the processing of the second DNN 52g.
At this time, since each of the first DNNs 51-21, 51-22, and 51-23 transmits the ID and the feature amount, instead of the face image, it is possible to protect the privacy of the tracked person.
Next, a fourth example of the third embodiment will be described. The fourth example of the third embodiment is an example in which lightweight inference is executed at high speed by the edge device as the preceding stage, and high-accuracy and high-functional inference is executed by, for example, a server as the subsequent stage.
Note that, hereinafter, “lightweight inference executed at high speed” is referred to as “coarse inference”, and “highly accurate and highly functional inference” is referred to as “detailed inference”.
configuration of an example of an information processing system according to the fourth example of the third embodiment. In
The first DNN 51-31 executes the first DNN process in the first phase. The first DNN 51-32 executes the second DNN process in the first phase. Furthermore, the first DNN 51-33 executes the third DNN process in the first phase. The first to third DNN processes in the first phase are processes of performing predetermined coarse inference based on input data. Coarse inference results 71-31, 71-32 and 71-33, which are results of the coarse inference by each of the first DNNs 51-31, 51-32 and 52-33, are provided, for example, to the user.
The first processor 11-31 (first DNN 51-31) outputs the input data as an input tensor 70-31. The first processor 11-32 (first DNN 51-32) outputs the input data as an input tensor 70-32. Similarly, the first processor 11-33 (first DNN 51-33) outputs the input data as an input tensor 70-33.
In the second processor 12, a second DNN 52h that performs the second-phase process performs detailed inference using the input tensors 70-31, 70-32, and 70-33 output from the first processors 11-31, 11-32, and 11-33, respectively. A detailed inference result that is a result of the detailed inference by the second DNN 52h is provided, for example, to the user.
Content of the coarse inference and the detailed inference is not particularly limited, but it is conceivable to infer the presence or absence of a person and the number of persons by the coarse inference based on the captured image, and infer recognition and action analysis of the person by the detailed inference.
According to the fourth example of the third embodiment, the user can first grasp the approximate situation of the site based on the coarse inference result and then grasp the detailed situation of the site based on the detailed inference result. As a result, it becomes easy to take a more appropriate response to the situation of the site.
Next, an application example in the fourth example of the third embodiment will be described. In this application example, the second processor 12 is an example in which the second processor 12 retrains each of the first DNNs 51-31, 51-32, and 51-33 by using each of coarse inference results 71-31, 71-32, and 71-33 in the fourth example described with reference to
Each of the coarse inference results 71-31, 71-32, and 71-33 output from each of the first processors 11-31, 11-32, and 11-33 is passed to the relearning unit 72. The relearning unit 72 retrains each of the first DNNs 51-31′, 51-32′, and 51-33′ by using the detailed inference result obtained by the processing of the second DNN 521 and the coarse inference results 71-31, 71-32, and 71-33.
The relearning unit 72 can perform relearning using a method called “distillation”. The “distillation” generally refers to a technique for improving performance of a target network (in this example, the first DNN 51-31, to 51-33) using an output of an existing network (in this example, the second DNN 521). In this case, the existing network is assumed to be a network having a large scale, high performance, and/or a lot of training data. On the other hand, the target network is assumed to be a network having a small scale, low performance, and/or insufficient training data. As described above, it is known that performance is further improved by not only simply using training data for learning by the target network but also using outputs of other networks.
The second processor 12 transmits the first DNNs 51-31′ to 51-33′ retrained by the relearning unit 72 to the first processors 11-31 to 11-33, respectively. In each of the first processors 11-31 to 11-33, the first DNNs 51-31′ to 51-33′ transmitted from the second processor 12 update the first DNNs 51-31 to 51-33, respectively. In this case, among the retrained first DNNs 51-31′ to 51-33′, the second processor 12 may transmit only models with improved performance than before the retraining to the first processors 11-31 to 11-33.
As described above, in the application example in the fourth example of the third embodiment, since the accuracy of the coarse inference by the first DNNs 51-31 to 51-33 is improved according to the use time and the like, the user can acquire the inference result with higher accuracy at high speed.
Next, a fifth example of the third embodiment will be described. The fifth example of the third embodiment is an example in which the processing result by the first DNN that performs the first-phase process is transmitted to the second DNN that performs the second-phase process while reducing a data amount. More specifically, by the first DNN process, an object area is detected based on the image, and information on only the detected object area is transmitted to the second DNN. The second DNN performs processing on the object area transmitted from the first DNN.
An image 73 including an object 74 is input to the first processor 11. In the first processor 11, a first DNN 51d that performs processing of the first phase performs object detection on the input image 73, and detects a smallest rectangular region including the object 74 as an object area 75. The first DNN 51d extracts the object area 75 from the image 73 and transmits the object area 75 to the second processor 12. The second processor 12 executes the DNN process of processing based on the image of the object area 75 by a second DNN 52j that performs the second-phase process.
Since the first processor 11 transmits the image of the object area 75 extracted from the image 73 to the second processor 12, an amount of communication between the first processor 11 and the second processor is reduced. Furthermore, in the second processor 12, the second DNN 52j executes the DNN process on the image of the object area 75 having the small size with respect to the original image 73, so that a processing load can be reduced as compared with the case of executing the DNN process on the image 73.
The fifth example of the third embodiment is not limited to this example, and it is also possible to cause the first DNN 51d to relearn based on a processing result by the second DNN 52j, for example. For example, in a case where more detailed detection processing is performed on the object 74 included in the image of the object area 75 by the second DNN 52j, it is conceivable to cause the first DNN 51d to relearn by the above-described distillation method using the detection result of the second DNN 52j.
Furthermore, in the first DNN 51d, the object area 75 can be detected using the RGB image as the image 73, and only the data of the object area 75 in the RAW data corresponding to the image 73 can be transmitted to the second DNN 52j. Furthermore, when a target object is not detected from the image 73 in the first DNN 51d, data may not be transmitted to the second DNN 52j.
Note that the first to fifth examples of the third embodiment described above can be implemented not only independently but also in appropriate combination within a range in which there is no contradiction.
Next, a fourth embodiment of the present disclosure will be described. The fourth embodiment relates to implementation of the first processor 11, the second processor 12, and the like. More specifically, the fourth embodiment relates to a structure of a signal path for transmitting data between the sensing device 10, the first processor 11, and the second processor 12.
In
In the sensing device 10, the first processor 11, and the second processor 12, information based on an output of the sensing device 10 and information of a result of performing arithmetic processing based on the information are handled as data in units of a plurality of bits.
In
Note that, in
A fourth signal path example according to the fourth embodiment will be described.
In
A second signal path example according to the fourth embodiment will be described.
Note that the high-speed serial transfer signal line 81 is a signal line having a structure for transferring data according to a higher clock frequency than the parallel transfer signal line 80 illustrated in
In
When the processing by the first DNN 51 ends at time t3, the first processor 11 starts parallel/serial conversion (P/S conversion) from the head data of the output of the first DNN 51. At time t4, data transfer from the first processor 11 to the second processor 12 via the high-speed serial transfer signal line 81 is started. At time t5, serial/parallel conversion (S/P conversion) is started from the head data transmitted by the second processor 12, and at time t6, processing by the second DNN 52 is started in the second processor 12. The output of the processing result by the second DNN 52 is started at time t7.
As described above, in the present disclosure, the processing by the first DNN 51 and the processing by the second DNN 52 are executed by the pipeline process.
The latency of the processing by the first DNN 51 is from time t0 when the pixel data is output from the sensing device 10 to time t3 when the output of the processing result of the first DNN 51 is started. In addition, the latency of the processing by the first DNN 51 and the second DNN 52 is from the time t0 to time t7 at which the output of the processing result of the second DNN 52 is started.
A third signal path example according to the fourth embodiment will be described.
In
At time t12, the first processor 11 starts S/P conversion from the head data of the pixel data received via the high-speed serial transfer signal line 81. At time t13, the first processor 11 starts processing by the first DNN 51 on the pixel data transferred from the sensing device 10 via the high-speed serial transfer signal line 81.
When the processing by the first DNN 51 is ended at time t14, the first processor 11 starts the P/S conversion from the head data of the output of the first DNN 51. At time t15, data transfer from the first processor 11 to the second processor 12 via the high-speed serial transfer signal line 81 is started. At time t16, S/P conversion is started from the head data transmitted in the second processor 12, and at time t17, processing by the second DNN 52 is started in the second processor 12. The output of the processing result by the second DNN 52 is started at time t18.
As described above, in the present disclosure, the processing by the first DNN 51 and the processing by the second DNN 52 are executed by the pipeline process.
The latency of the processing by the first DNN 51 is from time t10 when the pixel data is output from the sensing device 10 to time t14 when the output of the processing result of the first DNN 51 is started. Furthermore, the latency of the processing by the second DNN 52 is from the time t10 to time t18 at which the output of the processing result of the second DNN 52 is started.
Comparing
Therefore, in consideration of the latency, it can be said that the second signal path example of the fourth embodiment illustrated in
A fourth signal path example according to the fourth embodiment will be described. The fourth signal path example according to the fourth embodiment is an example when the sensing device 10 and the first processor 11 have a stacked structure or are superimposed. For example, the imaging unit 111 included in the sensing device 10 is configured on a first die, and a portion other than the imaging unit 111 included in the sensing device 10 and the first processor 11 are configured on a second die. The first die and the second die are bonded together to form a one-chip semiconductor device having a stacked structure.
Hereinafter, this structure in which a plurality of devices is stacked or superimposed is referred to as the stacked structure as appropriate.
Note that, in
Hereinafter, the substrate 502 will be described as an interposer substrate 502.
In
The present disclosure is not limited to this example, and as illustrated in
In this way, adoption of the stacked structure facilitates a layout of the signal path of the data between the imaging unit 111 and the first processor 11. Furthermore, the imaging device 150 can be downsized by adopting the stacked structure.
A fifth signal path example according to the fourth embodiment will be described.
In
In the structure according to the fifth signal path example according to the fourth embodiment illustrated in
At the same time, in the structure according to the fifth signal path example, the first processor 11 and the second processor 12 disposed outside the stacked structure of the sensing device 10 and the first processor 11 are connected by a high-speed serial transfer signal line 81. As a result, by applying the structure of the fifth signal path example, an effect of facilitating the layout of the wiring in the substrate 16a can also be obtained.
Furthermore, in the structure according to the fifth signal path example, the first processor 11 and the second processor 12 are connected by the high-speed serial transfer signal line 81, and the sensing device 10 and the first processor 11 are connected by the parallel transfer signal line by a plurality of signal lines 500. Therefore, by applying the fifth signal path example, as described with reference to
A sixth signal path example according to the fourth embodiment will be described.
A seventh signal path example according to the fourth embodiment will be described.
In
Not limited to this example, as illustrated in
As described above, in the seventh signal path example according to the fourth embodiment, the stacked structure in which the imaging unit 111, the first processor 11, and the second processor 12 are stacked or superimposed is adopted. Therefore, the layout of the signal path of data is facilitated between the imaging unit 111 and the first processor 11 and between the first processor 11 and the second processor 12. Furthermore, by adopting the stacked structure in which the imaging unit 111, the first processor 11, and the second processor 12 are stacked or superimposed, the imaging device 150 can be further downsized as compared with the fourth signal path example according to the fourth embodiment described above.
An eighth signal path example according to the fourth embodiment will be described.
In the example in
According to the configuration illustrated in
Note that, in the configuration illustrated in
A ninth signal path example according to the fourth embodiment will be described.
Note that the configuration of adding the signal path connecting the first processor 11 and the communication unit 14 is also applicable to the first to eighth signal path examples according to the fourth embodiment described above.
Next, a fifth embodiment of the present disclosure will be described. The fifth embodiment relates to a configuration of a memory used by the first processor 11 and the second processor 12. In the following description, it is assumed that the first processor 11 and the second processor 12 are configured on the same substrate 16a.
First, a first example of the fifth embodiment will be described. The first example of the fifth embodiment is an example in which the first processor 11 and the memory used by the first processor 11 are provided in a die or a chip of one semiconductor integrated circuit, and the second processor 12 and the memory used by the second processor 12 are provided in a die or a chip of another semiconductor integrated circuit.
In
The input unit 600 is an input interface in the first processor 11a, and receives an output from the sensing device 10. The first arithmetic processing unit 610a has a single-core configuration and executes a predetermined arithmetic operation according to a program. The other circuit 611 may include a circuit (such as a clock circuit) used for arithmetic processing by the first arithmetic processing unit 610a and a circuit that performs arithmetic processing other than the first arithmetic processing unit 610a. The output unit 602 is an output interface in the first processor 11a, and outputs a computation result by the first arithmetic processing unit 610a to the outside of the first processor 11a.
The first memory 603 is a memory used for arithmetic processing by the first arithmetic processing unit 610a. For example, the first DNN 51 is stored in the first memory 603. The first arithmetic processing unit 610a reads the first DNN 51 from the first memory 603 and executes, for example, the DNN process by the first phase.
A second processor 12a includes an input unit 6220, an arithmetic circuit unit 621, an output unit 622, and a second memory 604. The arithmetic circuit unit 621 includes a second arithmetic processing unit 630a and the other circuit 631.
The input unit 620 is an input interface in the second processor 12a, and receives an output from the first processor 11. The second arithmetic processing unit 630a has a single-core configuration and executes a predetermined arithmetic operation according to a program. The other circuit 631 may include a circuit (such as a clock circuit) used for arithmetic processing by the second arithmetic processing unit 630a and a circuit that performs arithmetic processing other than the second arithmetic processing unit 630a. The output unit 622 is an output interface in the second processor 12a, and outputs a computation result by the second arithmetic processing unit 630a to the outside of the second processor 12a.
The second memory 604 is a memory used for arithmetic processing by the second arithmetic processing unit 630a. For example, the second DNN 52 is stored in the second memory 604. The second arithmetic processing unit 630a reads the second DNN 52 from the second memory 604 and executes, for example, the DNN process by the second phase.
The first arithmetic processing unit 610a and the second arithmetic processing unit 630a may have a multi-core configuration.
The first arithmetic processing unit 610b includes a plurality of cores 6121, 6122, and so on capable of executing arithmetic processing (also respectively indicated as Core #1, Core #2, and so on in the drawing). Similarly, the second arithmetic processing unit 630b includes a plurality of cores 6321, 6322, and so on capable of executing arithmetic processing (also respectively indicated as Core #1, Core #2, and so on in the drawing).
Note that, in the configuration illustrated in
According to the configuration in
First, a second example of the fifth embodiment will be described. The second example of the fifth embodiment is an example in which the first processor 11 and the memory used by the first processor 11 are provided in dies or chips of different semiconductor integrated circuits, and the second processor 12 and the memory used by the second processor 12 are provided in dies or chips of different semiconductor integrated circuits.
In the example in
In this manner, by configuring the first memory 603 and the second memory 604 in dies or chips different from those of the first processor 11c and the second processor 12c, it is easy to increase capacities of the first memory 603 and the second memory 604 as compared with the configurations illustrated in
Furthermore, in the configuration in
Also in the configuration illustrated in
First, a third example of the fifth embodiment will be described. The third of the fifth embodiment is an example in which the memory used by the first processor 11 and the memory used by the second processor 12 are made common. The memory commonly used by the first processor 11 and the second processor 12 is configured in a die different from the die in which the first processor 11 and the second processor 12 are configured.
In the example in
As described above, by using the memory 605 in which the memory used by the first processor 11e and the memory used by the second processor 12e are common and configuring the memory 605 in a die different from those of the first processor 11e and the second processor 12e, it is easy to increase the capacity of the memory 605 used by the first processor 11e and the second processor 12e as compared with the configuration illustrated in
In this manner, by sharing the memory used by the first processor 11e and the second processor 12e and configuring the memory in a die different from the first processor 11c and the second processor 12c, it is easy to increase the capacity of the memory used by the first processor 11e and the second processor 12e as compared with the configurations illustrated in
Also in the configuration illustrated in
Note that the effects described in the present specification are merely examples and not limited, and other effects may be provided.
The present technology can also have the following configurations.
(1) An information processing apparatus including
Furthermore, the present technology can also have the following configurations.
(30) An information processing apparatus comprising:
Number | Date | Country | Kind |
---|---|---|---|
2022-039541 | Mar 2022 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2023/009541 | 3/13/2023 | WO |