This application claims priority under 35 U.S.C § 119 to Korean Patent Application No. 10-2023-0029483, filed in the Korean Intellectual Property Office on Mar. 6, 2023, and No. 10-2023-0088285, filed in the Korean Intellectual Property Office on Jul. 7, 2023, the entire contents of which are hereby incorporated by reference.
The present disclosure relates to a method for creating an operation call list, and more specifically, to a method and system for creating an operation call list including at least one primitive operation for artificial intelligence calculation.
An application programmer who writes a deep learning application may create a source code combining a plurality of calculations using a deep learning framework (e.g., TensorFlow). The source code may be implemented by utilizing operations included in operation libraries (e.g., Nvidia's cuDNN, Intel's MKL-DNN, etc.) distributed by hardware manufacturers.
The source code implemented with the artificial intelligence program is compiled and converted into binary code through a processor, and the processor executes the binary code to perform an operation associated with the artificial intelligence calculation.
Meanwhile, the compiled binary code can operate normally only in a designated type of processor. For example, a binary code compiled by a first accelerator provided by a specific manufacturer may normally operate only in the first accelerator, but may not normally operate in a second accelerator provided by another manufacturer.
Meanwhile, a computing system including mass storage resources capable of simultaneously processing complex artificial intelligence calculations is built, and various types of artificial intelligence calculations are simultaneously processed through such a computing system. This computing system including the mass storage resources may include various types of processors. For example, processors associated with various manufacturers, such as a first accelerator provided by a first manufacturer, a second accelerator provided by a second manufacturer, a third processor provided by a third manufacturer, and so on may be included in the computing system.
In this computing system including various types of processors as described above, compiling in a related manner may result in creation of a binary code dependent on a specific type of processor. Accordingly, there is a demand for a compilation technology that is universally applicable to various types of processors without depending on a specific type of processor.
In order to solve the problems described above, the present disclosure provides a method, a computer program stored in a computer readable recording medium, a computer readable recording medium, and an apparatus (system) for creating an operation call list for artificial intelligence calculation.
The present disclosure may be implemented in a variety of ways, including methods, apparatus (systems) and/or computer programs stored on computer readable storage media.
A method for creating an operation call list for artificial intelligence calculation is provided, which may be performed by one or more processors and include acquiring a trace from a source program including an artificial intelligence calculation, the trace includes at least one of code or primitive operation associated with the source program, and creating a call list including a plurality of primitive operations based on the trace, in which the plurality of primitive operations may be included in an operation library accessible to each of the plurality of accelerators.
In addition, the acquiring the trace may include acquiring at least one of the code or primitive operation associated with the artificial intelligence calculation by executing the source program, and acquiring the trace including the acquired at least one code or primitive operations. In addition, the creating the call list may include determining a correlation for each of the plurality of primitive operations, and creating the call list including the correlation for each of the determined plurality of primitive operations.
In addition, the correlation may be a relationship in which output data of a first primitive operation included in the call list is input to a second primitive operation included in the call list.
In addition, the creating the call list may include creating a graph representing a call order and a correlation of the plurality of primitive operations based on the plurality of primitive operations included in the call list.
In addition, the method may further include transmitting the created call list to at least one of a plurality of accelerators, and the accelerator may be configured to, upon receiving the call list, access the operation library and call the plurality of primitive operations included in the call list.
The method may further include creating a new call list based on the call list by applying the call list to at least one compiler pass.
In addition, the creating the new call list may include determining, from among the primitive operations included in the call list, a plurality of primitive operations to be merged based on identifiers of primitive operations, merging the determined plurality of primitive operations into one primitive operation, and creating the new call list by changing the call list to include the merged primitive operation.
In addition, the input data for each of the determined plurality of primitive operations may be input as a merged primitive operation.
In addition, the creating the new call list may include determining the number of the plurality of accelerators to be provided with the call list, dividing input data included in the call list based on the determined number of the plurality of accelerators, and creating the new call list by changing the call list to include the divided input data.
In addition, the creating the new call list may include determining the number of the plurality of accelerators to be provided with the call list, dividing the call list into a plurality of sub call lists based on the determined number of the plurality of accelerators, and creating the new call list by changing the call list such that the divided plurality of sub call lists are pipelined.
In addition, the method may further include, after the creating the new call list, transmitting the divided plurality of sub call lists to the plurality of accelerators, and the first accelerator receiving a first sub call list and the second accelerator receiving a second sub call list pipelined with the first sub call list may be included in the same node.
In addition, the creating the new call list may include inserting at least one command into at least one of the first sub call list or the second sub call list such that output data based on the first sub call list is provided as input data of a primitive operation included in the second sub call list.
In addition, the method may further include, after the creating the new call list, transmitting the divided plurality of sub call lists to the plurality of accelerators, and a first accelerator receiving a first sub call list may be included in a first node, and a second accelerator receiving a second sub call list pipelined with a first sub call list may be included in a second node, and the first node may be a neighboring node adjacent to the second node.
In addition, the creating the new call list may include determining the number of the plurality of accelerators to be provided with the call list, dividing a plurality of parameters applied to each of the primitive operations included in the call list based on the determined number of the plurality of accelerators, and creating a new call list by changing the call list to include the divided parameters.
In addition, the creating the new call list may include determining, from among the primitive operations included in the call list, a plurality of primitive operations to be merged based on at least one of data structure or identifier associated with the primitive operations, merging the determined plurality of primitive operations into one primitive operation, and creating the new call list by changing the call list to include the merged primitive operation.
In addition, the creating the new call list may include identifying, from among the plurality of primitive operations included in the call list, at least one independently performed primitive operation, changing an execution order of the identified at least one primitive operation, and creating a new call list by changing the call list to include the changed at least one primitive operation.
In addition, the changing the execution order of the identified at least one primitive operation may include changing the execution order of the at least one primitive operation such that the execution order of the identified at least one primitive operation is advanced.
There may be provided a computer-readable non-transitory recording medium recording instructions for causing performance of the method described above on a computer.
An information processing system is provided, which may include a memory, and one or more processors connected to the memory and configured to execute one or more computer-readable programs included in the memory, in which the one or more programs may further include instructions for acquiring a trace from a source program including an artificial intelligence calculation, the trace includes at least one of code or primitive operation associated with the source program, and creating a call list including a plurality of primitive operations based on the trace, in which the plurality of primitive operations may be included in an operation library accessible to each of the plurality of accelerators.
According to some examples of the present disclosure, a call list including a plurality of primitive operations can be created based on a trace including code or primitive operations associated with an artificial intelligence calculation. A plurality of primitive operations included in the call list can be normally executed in various types of accelerators without depending on the accelerator type.
According to some examples of the present disclosure, while the artificial intelligence program is running (i.e., during runtime), a plurality of codes and/or primitive operations related to calculations can be extracted, and a trace including the plurality of extracted codes and/or primitive operations can be created. Based on the trace, all primitive operations essential for artificial intelligence calculations can be included in the call list completely.
According to some examples of the present disclosure, a call list can be optimized by applying the call list to one or more compiler passes. With the optimized call list, computing resources can be saved and calculation results can be output more quickly.
The effects of the present disclosure are not limited to the effects described above, and other effects not described herein can be clearly understood by those of ordinary skill in the art (referred to as “ordinary technician”) from the description of the claims.
The above and other objects, features and advantages of the present disclosure will be described with reference to the accompanying drawings described below, where similar reference numerals indicate similar elements, but not limited thereto, in which:
Hereinafter, example details for the practice of the present disclosure will be described in detail with reference to the accompanying drawings. However, in the following description, detailed descriptions of well-known functions or configurations will be omitted if it may make the subject matter of the present disclosure rather unclear.
In the accompanying drawings, the same or corresponding components are assigned the same reference numerals. In addition, in the following description of various examples, duplicate descriptions of the same or corresponding components may be omitted. However, even if descriptions of components are omitted, it is not intended that such components are not included in any example.
Advantages and features of the disclosed examples and methods of accomplishing the same will be apparent by referring to examples described below in connection with the accompanying drawings. However, the present disclosure is not limited to the examples disclosed below, and may be implemented in various forms different from each other, and the examples are merely provided to make the present disclosure complete, and to fully disclose the scope of the disclosure to those skilled in the art to which the present disclosure pertains.
The terms used herein will be briefly described prior to describing the disclosed example(s) in detail. The terms used herein have been selected as general terms which are widely used at present in consideration of the functions of the present disclosure, and this may be altered according to the intent of an operator skilled in the art, related practice, or introduction of new technology. In addition, in specific cases, certain terms may be arbitrarily selected by the applicant, and the meaning of the terms will be described in detail in a corresponding description of the example(s). Therefore, the terms used in the present disclosure should be defined based on the meaning of the terms and the overall content of the present disclosure rather than a simple name of each of the terms.
As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates the singular forms. Further, the plural forms are intended to include the singular forms as well, unless the context clearly indicates the plural forms. Further, throughout the description, when a portion is stated as “comprising (including)” a component, it is intended as meaning that the portion may additionally comprise (or include or have) another component, rather than excluding the same, unless specified to the contrary.
Further, the term “module” or “unit” used herein refers to a software or hardware component, and “module” or “unit” performs certain roles. However, the meaning of the “module” or “unit” is not limited to software or hardware. The “module” or “unit” may be configured to be in an addressable storage medium or configured to play one or more processors. Accordingly, as an example, the “module” or “unit” may include components such as software components, object-oriented software components, class components, and task components, and at least one of processes, functions, attributes, procedures, subroutines, program code segments, drivers, firmware, micro-codes, circuits, data, database, data structures, tables, arrays, and variables. Furthermore, functions provided in the components and the “modules” or “units” may be combined into a smaller number of components and “modules” or “units”, or further divided into additional components and “modules” or “units.”
The “module” or “unit” may be implemented as a processor and a memory, or may be implemented as a circuit (circuitry). Terms like “circuit” and “circuitry” may refer to circuits in hardware, but may also refer to circuits in software. The “processor” should be interpreted broadly to encompass a general-purpose processor, a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a controller, a microcontroller, a state machine, and so forth. Under some circumstances, the “processor” may refer to an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a field-programmable gate array (FPGA), and so on. The “processor” may refer to a combination for processing devices, e.g., a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in conjunction with a DSP core, or any other combination of such configurations. In addition, the “memory” should be interpreted broadly to encompass any electronic component that is capable of storing electronic information. The “memory” may refer to various types of processor-readable media such as random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, magnetic or marking data storage, registers, and so on. The memory is said to be in electronic communication with a processor if the processor can read information from and/or write information to the memory. The memory integrated with the processor is in electronic communication with the processor.
In the present disclosure, a “system” may refer to at least one of a server device and a cloud device, but is not limited thereto. For example, the system may include one or more server devices. In another example, the system may include one or more cloud devices. In still another example, the system may include both the server device and the cloud device operated in conjunction with each other.
In addition, terms such as first, second, A, B, (a), (b), etc. used in the following examples are only used to distinguish certain components from other components, and the nature, sequence, order, etc. of the components are not limited by the terms.
In addition, in the following examples, if a certain component is stated as being “connected,” “combined” or “coupled” to another component, it is to be understood that there may be yet another intervening component “connected,” “combined” or “coupled” between the two components, although the two components may also be directly connected or coupled to each other.
In addition, as used in the following examples, “comprise” and/or “comprising” does not foreclose the presence or addition of one or more other elements, steps, operations, and/or devices in addition to the recited elements, steps, operations, or devices.
In the present disclosure, “each of a plurality of A” may refer to each of all components included in the plurality of A, or may refer to each of some of the components included in a plurality of A.
Before describing various examples of the present disclosure, terms used will be described.
In the examples of the present disclosure, “artificial intelligence calculation” may refer to any calculation associated with a machine learning model (e.g., an artificial neural network model, etc.). For example, the artificial intelligence calculation may be a calculation performed in each layer included in the artificial neural network model. For example, the artificial intelligence calculation may include an addition calculation, a subtraction calculation, a maximum value computation calculation, a minimum value computation calculation, a floating point multiplication calculation, weighting calculation, a convolution calculation, a matrix multiplication calculation, a batch normalization calculation, a Rectified Linear Unit (ReLU) calculation, a pooling calculation, a Long Short-Term Memory (LSTM) calculation, a Gated Recurrent Unit (GRU) calculation, etc., performed in a layer included in the artificial neural network model, but is not limited thereto.
The “artificial intelligence program” may herein be a source program that performs calculations associated with artificial intelligence or artificial neural network models. For example, the artificial intelligence program may be a source program associated with deep learning calculation.
The “code” may herein refer to any code prepared to execute a program, and may refer to a source code, for example. In addition, codes may be associated with instructions for calculations.
In performing the artificial intelligence calculations, a “primitive operation” may herein refer to an operation of a processor associated with basic codes and/or basic instructions. For example, the primitive operation may be included in a set of calculation operations frequently used to infer a result value in a machine learning model. For example, the primitive operation may include operations related to calculations such as addition, subtraction, maximum value calculation, minimum value calculation, floating point multiplication, convolution calculation, matrix multiplication, batch normalization, ReLU, pooling, LSTM, GRU, etc., but are not limited thereto.
A “trace” may herein include at least one code and/or at least one primitive operation associated with the artificial intelligence calculation. For example, the trace may be created by collecting calculation-related codes and/or primitive operations extracted during runtime in which an artificial intelligence program is executed. The trace may include a correlation with an execution order of each code and/or primitive operation.
An “accelerator” may herein refer to any processor or circuitry that performs artificial intelligence calculations. For example, the accelerator may refer to a processor or circuitry capable of performing artificial intelligence calculations quickly, and may include a graphics processing unit (GPU), a neural processing unit (NPU), a tensor processing unit (TPU), etc., for example, but is not limited thereto.
An operation library may herein be a collection or library of codes associated with a call of primitive operations. For example, the operation library may include a first code for calling a first primitive operation associated with an addition, a second code for calling a second primitive operation associated with a subtraction, a third code for calling a third primitive operation associated with a maximum value calculation, and a fourth code for calling a fourth primitive operation associated with a minimum value calculation. Additionally, the operation library may include a fifth code for calling a fifth primitive operation associated with a floating point multiplication, a sixth code for calling a sixth primitive operation associated with a convolution calculation, a seventh code for calling a seventh primitive operation associated with a matrix multiplication calculation, and an eighth code for calling an eighth primitive operation associated with a batch normalization. In addition, the operation library may include a code associated with any primitive operation.
Hereinafter, various examples of the present disclosure will be described in detail with reference to the accompanying drawings.
In addition, the call list 130 may be created based on the trace 120. According to some examples, the trace 120 may include a plurality of primitive operations, in which case a plurality of primitive operations may be extracted from the trace 120, and the call list 130 including the extracted plurality of primitive operations may be created. The primitive operations included in the call list 130 may not be in the binary form, but may take at least one form of a data structure, serialized data, or text in memory. The plurality of primitive operations included in the call list 130 may be represented in a graph form, as illustrated in
The plurality of primitive operations may include any operations included in the operation library. In this case, the operation library may be a library accessible to a plurality of accelerators provided by a plurality of manufacturers.
A correlation for each of the plurality of primitive operations included in the call list 130 may be determined, and the determined correlation for each of the plurality of primitive operations may be included in the call list 130. The correlation may herein refer to a calculation of a specific primitive operation being performed based on another primitive operation. For example, in a relationship in which output data of a first primitive operation is input to a second primitive operation, may be determined that the first primitive operation and the second primitive operation have a correlation. In addition, an execution order of each of the plurality of primitive operations included in the call list 130 may be determined.
A call order and the correlation of the plurality of primitive operations may be represented in a graph form, based on the plurality of primitive operations included in the call list 130. An example of a graph representation of the call list 130 will be described below with reference to
The call list 130 is transmitted to at least one accelerator, and the at least one accelerator may access the operation library to call and execute a plurality of primitive operations included in the call list 130. According to some examples, the at least one accelerator may compile the plurality of primitive operations included in the call list 130 to create binary codes and then execute the created binary codes to call a plurality of primitive operations.
The call list 130 may be applied to a compiler pass 100 such that the call list 130 may be optimized. The compiler pass 100 may herein be a module for optimizing the call list 130. An optimized call list 140 may be created as the original call list 130 is changed. Various methods of optimizing the call list 130 through the compiler pass 100 will be described below with reference to
The compiler pass 100 may be included in an information processing system. In addition, operations related to the compiler pass 100 may be executed by one or more processors included in the information processing system.
The memory 210 may include any non-transitory computer-readable recording medium. The memory 210 may include a permanent mass storage device such as read only memory (ROM), disk drive, solid state drive (SSD), flash memory, and so on. In another example, a non-destructive mass storage device such as ROM, SSD, flash memory, disk drive, and so on may be included in the information processing system 200 as a separate permanent storage device that is distinct from the memory. In addition, the memory 210 may store an operating system and at least one program code (e.g., a code for creating a call list).
These software components may be loaded from a computer-readable recording medium separate from the memory 210. Such a separate computer-readable recording medium may include a recording medium directly connectable to the information processing system 200, and may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, a memory card, and the like, for example. In another example, the software components may be loaded into the memory 210 through the communication module 230 rather than the computer-readable recording medium. For example, at least one program may be loaded into the memory 210 based on a computer program (e.g., a program for creating an operation call list, etc.) installed by files provided by developers or a file distribution system that distributes application installation files through the communication module 230.
The processor 220 may be configured to process the commands of the computer program by performing basic arithmetic, logic, and input and output calculations. The commands may be provided to a user terminal (not illustrated) or another external system by the memory 210 or the communication module 230. For example, the processor 220 may transmit a call list including a plurality of primitive operations to the accelerator. The accelerator may be included in the information processing system 200 or may be included in another server or system.
The communication module 230 may provide a configuration or function for the user terminal (not illustrated) and the information processing system 200 to communicate with each other through a network, and may provide a configuration or function for the information processing system 200 to communicate with an external system (e.g., a separate cloud system). For example, control signals, commands, data, and the like provided under the control of the processor 220 of the information processing system 200 may be transmitted to the user terminal and/or the external system through the communication module 230 and the network through the communication module of the user terminal, the external system. For example, the processor 220 may transmit a call list to an accelerator included in another server or system through the communication module 230.
In addition, the input and output interface 240 of the information processing system 200 may be a means for interfacing with a device (not illustrated) for inputting or outputting, which may be connected to the information processing system 200 or included in the information processing system 200. In
The processor 220 of the information processing system 200 may be configured to manage, process, store the information, data, etc. received from a plurality of user terminals and/or a plurality of external systems. The processor 220 may acquire a trace from the artificial intelligence program and create a call list including a plurality of primitive operations based on the trace.
The trace acquisition unit 310 may acquire a trace from an artificial intelligence program. The trace acquisition unit 310 may execute an artificial intelligence program, extract a plurality of codes, primitive operations, etc. associated with a calculation, and create a trace including the plurality of extracted codes and/or primitive operations. For example, when the artificial intelligence program is executed, the trace acquisition unit 310 may acquire all codes, all primitive operations, etc. executed through the artificial intelligence program, extract a plurality of codes, primitive operations, etc. associated with calculations from the acquired codes and/or primitive operations, and create a trace including the plurality of extracted codes, primitive operations, etc. The codes, primitive operations, etc. related to the calculation are those associated with the calculation of the accelerator and may be codes, primitive operations, etc. associated with an operation included in the operation library.
The call list creation unit 320 may create a call list including a plurality of primitive operations based on the trace created by the trace acquisition unit 310. The primitive operation included in the call list may not be in the binary form, but may take at least one form of a data structure, serialized data, or text in memory. In addition, the plurality of primitive operations included in the call list may be in the form of a graph. The call list creation unit 320 may represent the call order and correlation of the plurality of primitive operations in a graph form, based on the plurality of primitive operations included in the call list. An example of the graph representation of the call list will be described below with reference to
The call list optimization unit 330 may optimize the call list created by the call list creation unit 320. The optimized call list may be created based on the original call list. The original call list may herein be a call list created by the call list creation unit 320. The call list optimization unit 330 may optimize the original call list by applying the original call list to at least one compiler pass. The compiler pass may be a module for optimizing the original call list, and the call list optimization unit 330 may include at least one compiler pass.
The original call list may be optimized based on a calculation type of the primitive operation, an identifier (e.g., name) of the primitive operation, the number of accelerators to which the call list is transmitted, etc. Referring to
The order of primitive operations may be determined based on the execution order of the codes, primitive operations, etc. included in the trace, and input data and output data of primitive operations may be determined based on input data associated with the code and output data associated with the code.
In addition, operations having a correlation among a plurality of primitive operations included in the call list 500 may be determined. If the output data of the first primitive operation is input as the input data of the second primitive operation, it may be determined that there is a correlation between the first primitive operation and the second primitive operation. Taking
A call list may be transmitted to at least one accelerator. Upon receiving the call list, the accelerator may access the operation library to call and execute a plurality of primitive operations included in the call list.
Meanwhile, the original call list may be applied to at least one compiler pass, so that a plurality of primitive operations included in the original call list may be optimized. The original call list may be a call list created based on the trace, and may be a call list before applying a compiler pass.
Hereinafter, with reference to
It may be determined whether or not the first call list 720 is applicable to the first compiler pass 710 based on a plurality of primitive operation identifiers (e.g., names, indexes, etc.) included in the first call list 720. For example, the first compiler pass 710 may determine whether or not there are a plurality of primitive operations to be merged, based on the plurality of primitive operation identifiers and the execution order included in the first call list 720. The first compiler pass 710 may be a module provided to merge a plurality of primitive operations into one primitive operation.
Merge reference data including identifiers (e.g., names) of a plurality of mergeable primitive operations and operation identifiers used during merging may be stored in the information processing system, and the first compiler pass 710 may determine a plurality of primitive operations to be merged from among all primitive operations included in the first call list 720 based on the stored merge reference data. The first compiler pass 710 may determine, as the primitive operations to be merged, a plurality of continuously executed primitive operations, and merge the determined primitive operations to be merged into one primitive operation.
Input data to the fourth primitive operation (FusedCBR) may be determined from among a plurality of pieces of input data to each of the first to third primitive operations to be merged. If a plurality of primitive operations are merged into one operation, an identifier of at least one piece of input data to the merged primitive operation may be recorded in merge reference data. The identifier of the input data may refer to any one of a plurality of pieces of input data to each of a plurality of primitive operations to be merged. Input data to the fourth primitive operation (FusedCBR) may be determined from among a plurality of pieces of input data to each of the first to third primitive operations to be merged based on merging reference data.
The second call list 730 that passed through the first compiler pass 710 may have fewer operations than the first call list 720. If the calculation is performed based on the second call list 730, fewer computing resources may be used and the calculation speed may be faster than performing calculation based on the first call list 720.
Each of the input data and the activation may be divided into a plurality of data and activations. If each of the input data and the activation can be divided into a plurality of data and activations, a plurality of sub call lists 830 and 840 may be be created to include the divided input data and activations. For example, the input data (Input) may be divided into first sub-input data (Input-1) and second sub-input data (Input-2), and in this case, first sub input data (Input-1) may be included in the first sub call list 830, and second sub input data (Input-2) may be included in the second sub call list 840.
Through the second compiler pass 810, the input data (input) may be divided into a plurality of pieces of data based on a size of the mini-batch and processed. For example, the input data (input) may be divided into a plurality of pieces of sub-input data based on the number of accelerators and the size of the mini-batch for parallel processing the first call list 820. For example, if the size of the mini-batch of the input data (input) is “16” and the number of accelerators is “2”, the input data (input) may be divided into first sub-input data (Input-1) and second sub-input data (Input-2) each including 8 mini-batches. If the input data (Input) is divided into sub-input data (Input-1, Input-2) in units of mini-batches, the activations (Act 0 to Act 2) may also be divided into sub-activations (Act 0-1 to Act 2) in units of mini-batches, and additionally, the output data (Output) may also be divided into sub-output data (Output-1, Output-2) in units of batches. If the input data (Input) is divided into sub-input data (Input-1, Input-2), partial calculations based on the sub-input data (Input-1, Input-2) are performed through a machine learning model (e.g., an artificial neural network model, etc.), and the activation (Act) output through the machine learning model may also be divided and output. As illustrated in
The number of divisions of the input data and the activation may correspond to the number of accelerators transmitting call lists. For example, if there are (n) number of accelerators to which the call list is transmitted (where, n is a natural number equal to or greater than 2), the number of divisions of the input data and the activation may also be (n). The size and number of divided data may be equivalent or differential. For example, if the input data is divided into first sub-input data and second sub-input data, the size of the first sub-input data and the size of the second sub-input data may be the same as each other, or the size of the first sub-input data may be greater or less than the size of the second sub input data. The number of accelerators to which the call list is transmitted may be determined based on user input, or may be determined based on the number by which each of the input data, activations, etc. can be divided.
The first sub call list 830 acquired from the second compiler pass 810 may include some of the divided input data and activations, and the second sub call list 840 may include the rest of the divided input data and activations. As illustrated in
Among the call lists 830 and 840 acquired from the second compiler pass 810, the first sub call list 830 may be transmitted to the first accelerator (GPU 0), and the second sub call list 840 may be transmitted to the second accelerator (GPU 1), so that primitive operations included in the first sub call list 830 and primitive operations included in the second sub call list 840 may be executed in parallel through the first accelerator (GPU 0) and the second accelerator (GPU 1). The dividing the input data, the activation, etc. described above may be performed by the second compiler pass 810.
The results of execution through the first accelerator (GPU 0) and the second accelerator (GPU 1) may be aggregated by one or more accelerators and/or one or more processors. For example, the second accelerator (GPU 1) may execute all primitive operations included in the second sub call list 840 and then transmits the acquired second sub output data (Output-2) to the first accelerator (GPU 0). In addition, the first accelerator (GPU 0) may create final result data based on the first sub output data (Output-1) acquired by executing all primitive operations included in the first sub call list 830 and the second sub output data (Output-2) received from the second accelerator (GPU 1). The final result data may be a value associated with a gradient. For example, if the first sub-output data (Output-1) received from the first accelerator (GPU 0) is associated with a first gradient and the second sub-output data (Output-2) received from the second accelerator (GPU 1) is associated with a second gradient, the first and second gradients may be reflected and the final result data may be acquired.
In some examples, the results (Output-1 and Output-2) of execution through the first accelerator (GPU 0) and the second accelerator (GPU 1) may be subsequently processed without being aggregated in a specific accelerator. For example, the sub-output data (Output-1 and Output-2) created through each of the first accelerator (GPU 0) and the second accelerator (GPU 1) may be stored in a memory of a specific accelerator. In this case, the specific accelerator may create final result data based on the sub output data (Output-1, Output-2) stored in its memory. As another example, the sub-output data (Output-1 and Output-2) created through each of the first accelerator (GPU 0) and the second accelerator (GPU 1) may be stored in the main memory of the information processing system. In this case, the processor included in the information processing system may create final result data based on the sub output data (Output-1 and Output-2) stored in the main memory. As still another example, the first sub-output data (Output-1) created through the first accelerator (GPU 0) may be managed by the first accelerator (GPU 0) without being transmitted to another accelerator or memory, and similarly, the second sub-output data (Output-2) created through the second accelerator (GPU 1) may also be managed by the second accelerator (GPU 1) without being transmitted to another accelerator or memory. In this case, the first sub-output data (Output-1) and the second sub-output data (Output-2) may be continuously maintained in the divided state.
Meanwhile, a target accelerator to transmit the plurality of sub call lists 830 and 840 to may be determined based on the location of the accelerator. In order to minimize communication delay during gathering of the output data, the target accelerator to transmit the plurality of call lists 830 and 840 to may be determined such that the first accelerator (GPU 0) and the second accelerator (GPU 1) receiving the sub call lists 830 and 840 are included in the same node. The node may herein refer to a physically or logically separated computing device (e.g., a server, a user terminal, etc.) If a plurality of accelerators included in the same node are not retrieved, the target accelerators to transmit the plurality of sub call lists 830 and 840 to may be determined such that the first node including the first accelerator (GPU 0) and the second node including the second accelerator (GPU 1) are located adjacent to each other. In this case, the first node may be a neighbor node to the second node.
As described above, when the first call list 820 is passed through the second compiler pass 810, the plurality of sub call lists 830 and 840 for parallel processing of data may be acquired. Each of the plurality of sub call lists 830 and 840 is transmitted to a plurality of accelerators (GPU 0 and GPU 1), and each primitive operation included in the plurality of sub call lists 830 and 840 may be executed in parallel. If a plurality of accelerators are used for the parallel processing of data, the calculation speed may be further improved.
If there are a plurality of accelerators to which the call list is transmitted, the first call list 920 may be applied to the third compiler pass 910. The number of accelerators to which the sub call lists 930 and 940 are transmitted may be determined based on user input, or may be determined based on the number by which each of the input data, output data, activations, etc. can be divided.
As illustrated in
Pipelining may be performed between the divided first sub-list 930 and second sub call list 940. For pipelining, a command may be inserted into at least one of the first sub call list 930 and the second sub call list 940 such that result data output through the primitive operations included in the first sub call list 930 are provided as input data of the primitive operations included in the second sub call list 940. For example, a command associated with providing the result data may be inserted into the first sub call list 930 such that the data output through the last executed primitive operation of the primitive operations included in the first sub call list 930 is provided as the input data of the first executed primitive operation of the primitive operations included in the second sub call list 940. The command associated with providing the result data may be a command for transmitting data output through the last executed primitive operation of the primitive operations included in the first sub call list 930 to the second accelerator (GPU 1). In this case, if the output data based on the first sub call list 930 is received, the second accelerator (GPU 1) may sequentially execute the primitive operations included in the second sub call list 940.
A first command for transmitting result data (Act 1) output from the second primitive operation 934 to the second accelerator (GPU 1) may be inserted into the second call list 930. Additionally or alternatively, a second command for receiving result data (Act 1) of the second primitive operation 934 from the first accelerator (GPU 0) may be inserted into the third call list 940. The inserting the command and the dividing the call list described above may be performed through the third compiler pass 910.
Meanwhile, a target accelerator to transmit the plurality of sub call lists 930 and 940 to may be determined based on the location of the accelerator. In order to minimize communication delay between the accelerators, a target accelerator to transmit the plurality of sub call lists 930 and 940 to may be determined such that the first accelerator (GPU 0) and the second accelerator
(GPU 1) that communicate with each other through the inserted command are included in the same node. The node may herein refer to a physically or logically separated computing device (e.g., a server, a user terminal, etc.)
If a plurality of accelerators included in the same node are not retrieved, the target accelerators to transmit the plurality of sub call lists 930 and 940 to may be determined such that the first node including the first accelerator (GPU 0) and the second node including the second accelerator (GPU 1) are located adjacent to each other. In this case, the first node may be a neighbor node to the second node.
As described above, if the divided sub call lists 930 and 940 are processed through a plurality of accelerators, the entire computing resources of the information processing system can be managed more efficiently. For example, based on the pipeline, the plurality of sub call lists 930 and 940 may be processed in parallel through a plurality of accelerators, thereby increasing the throughput of the accelerator per unit time and shortening the total calculation time. In addition, since the whole calculation is divided into small calculations and processed in parallel through a plurality of accelerators, memory resources used by each of the plurality of accelerators may be reduced. In addition, the divided sub call lists 930 and 940 are allocated to the accelerators with low load and processed, so that the number of accelerators in an idle state can be minimized.
The input data (input) may be divided into a plurality of pieces of sub-input data in units of batches and processed.
Two or more of th sub call lists 1020 to 1050 may be pipelined.
For pipelining, a command may be inserted into at least one of the first sub call list 1020 and the second sub call list 1030 such that result data output through the primitive operations included in the first sub call list 1020 are provided as input data of the primitive operations included in the second sub call list 1030. For example, a command associated with providing the result data may be inserted into the first sub call list 1020 such that each of the data (Acts 0-1 to Act 0-4) output through the first primitive operation (linear_0) included in the first sub call list 1020 is provided as input data of the primitive operation (linear_1) included in the second sub call list 1030. The command associated with providing the result data may be a command for transmitting data output through the first primitive operation (linear_0) included in the first sub call list 1020 to the second accelerator (GPU 1).
Likewise, a command associated with providing the result data may be inserted into the second sub call list 1030 such that each of the data (Acts 1-1 to Act 1-4) output through the second primitive operation (linear_1) included in the second sub call list 1030 is provided as input data of the primitive operation (linear_2) included in the third sub call list 1040. In addition, a command associated with providing the result data may be inserted into the third sub call list 1040 such that each of the data (Acts 2-1 to Act 2-4) output through the third primitive operation (linear_2) included in the third sub call list 1040 is provided as input data of the fourth primitive operation (linear_3) included in the fourth sub call list 1050.
As illustrated in
Meanwhile,
As described above, if the sub call lists 1020 to 1050 are processed through a plurality of accelerators, the entire computing resources of the information processing system can be managed more efficiently. For example, based on the pipeline, the plurality of sub call lists 1020 to 1050 may be processed in parallel through a plurality of accelerators, thereby increasing the throughput of the accelerator per unit time and shortening the total calculation time. In addition, since the whole calculation is divided into small calculations and processed in parallel through a plurality of accelerators, memory resources used by each of the plurality of accelerators may be reduced. In addition, the divided sub call lists 1020 to 1050 are allocated to the accelerators with low load and processed, so that the number of accelerators in an idle state can be minimized.
Each parameter data may include a plurality of parameters. The plurality of parameters may be a plurality of weights applied to nodes included in a specific layer. Each of the plurality of weights may be, in a calculation, weights that are applied to variables (e.g., input values) and/or constants.
The number by which the parameter data is divided may be determined based on the number of accelerators to which the call list is transmitted, and the parameter data may be divided by the determined number based on the determined number. Each of a plurality of parameters applied to each of a plurality of primitive operations may be divided by the determined number. The divided parameter data may include equivalent or differential numbers of parameters.
The plurality of sub call lists 1130 and 1140 including each of the divided parameter data may be created.
If an operation is performed using the divided parameter data, only a part of the primitive operations may be performed in a specific accelerator. For example, if the first accelerator (GPU 0) executes a first primitive operation 1132 included in the first sub call list 1130, the first accelerator (GPU 0) may only perform a calculation based on the first sub-parameter data (P0-1), but may not be able to perform a calculation based on the second sub-parameter data (P0-2). Similarly, if the second accelerator (GPU 1) executes a first primitive operation 1142 included in the second sub call list 1140, the second accelerator (GPU 1) may only perform a calculation based on the second sub-parameter data (P0-2), but may not be able to perform a calculation based on the first sub-parameter data (P0-1).
Accordingly, the result data of the first primitive operation 1132 executed by the first accelerator (GPU 0) and the result data of the first primitive operation 1142 executed by the second accelerator (GPU 1) should be aggregated to perform an original primitive operation 1122 normally. For example, in the original primitive operation 1122, it can be assumed that the calculation is “Act 0=(a*input)−(b*input)”, where “a” is the first sub-parameter (P0-1), and “b” is the second sub-parameter P0-2”. In this case, the result data of the first primitive operation 1132 executed by the first accelerator (GPU 0) may be calculated based on “(a*input)”, and the result data of the first primitive operation 1142 executed by the second accelerator (GPU 1) may be calculated based on “(b*input)”. In order to calculate a final value for “Act 0” by partially performing the calculation, the partially calculated result data in one accelerator should be transmitted to another accelerator or processor.
A command for sharing a calculation result with each accelerator may be inserted into a call list. For example, a master accelerator may be determined from among a plurality of accelerators to which the call list is transmitted. For example, after determining the state of the accelerator to which the call list is transmitted, the accelerator with the lowest utilization rate may be determined to be the master accelerator, and the rest may be determined to be slave accelerators. In some examples, among a plurality of accelerators included in the information processing system, an accelerator to which the call list is not transmitted may be determined to be a master accelerator.
At least one command for transmitting a calculation result of a primitive operation to the master accelerator may be inserted into the sub call list transmitted to the slave accelerator. If a calculation for a specific primitive operation is completed based on the inserted command, the slave accelerator may immediately transmit the completed calculation result to the master accelerator. The calculation result should be transmitted from the slave accelerator to the master accelerator in real time because it may be necessary to minimize a delay time for the primitive operation that is linearly performed in the master accelerator.
The master accelerator may calculate a final calculation result of the original primitive operation based on the calculation result of the directly executed primitive operation and the calculation result of the primitive operation received from the slave accelerator. In addition, a command for transmitting the final calculation result to the slave accelerator may be inserted into the call list transmitted to the master accelerator.
Referring to
Based on the inserted command, the first accelerator (GPU 0) may receive a calculation result of the first primitive operation 1142 included in the second sub call list 1140 from the second accelerator (GPU 1). In addition, the first accelerator (GPU 0) may calculate the final calculation result (Act 0) of the first primitive operation 1122 based on the partial calculation result of the first primitive operation 1132 included in the first sub call list 1130 and the primitive operation result received from the second accelerator (GPU 1), and transmit the calculated final calculation result to the second accelerator (GPU 1).
Each of the first accelerator (GPU 0) and the second accelerator (GPU 1) may perform partial calculations on second primitive operations 1134 and 1144, and the partial calculation results for the second operations 1134 and 1144 may be aggregated by the first accelerator (GPU 0), which is the master accelerator, so that a final calculation on the second operation can be performed.
The inserting the command and the dividing the parameter data described above may be performed by the fourth compiler pass 1110.
Meanwhile, a target accelerator to transmit the plurality of sub call lists 1130 and 1140 to may be determined based on the location of the accelerator. The first accelerator (GPU 0) and the second accelerator (GPU 1) may be included in the same node or in neighboring nodes such that communication delay between the master accelerator and the slave accelerator can be minimized.
As described above, if the first call list 1120 is passed through the fourth compiler pass 1110, a plurality of sub call lists 1130 and 1140 for parallel processing of parameter may be acquired. Each of the plurality of sub call lists 1130 and 1140 is transmitted to a plurality of accelerators (GPU 0 and GPU 1), and each primitive operation included in the plurality of sub call lists 1130 and 1140 may be executed in parallel. If such parallel processing is performed, calculation speed can be further improved. In addition, since the calculation of the accelerator is performed based on some of the parameter data, memory resources used by the accelerator may be reduced.
It may be determined whether or not the first call list 1220 is applicable to the fifth compiler pass 1210, based on the data structure and operation identifier (e.g., name, index, etc.) associated with each of the plurality of primitive operations included in the first call list 1220. That is, it may be determined whether or not there are a plurality of primitive operations to be merged, based on the data structure and the operation identifier associated with each of the plurality of primitive operations included in the first call list 1220. The fifth compiler pass 1210 may be a module for merging a plurality of primitive operations associated with the data structure into one primitive operation.
If it is determined that there are a plurality of primitive operations to be merged, a plurality of primitive operations to be merged may be determined from among all primitive operations included in the first call list 1220 based on the data structure and the operation identifier associated with the plurality of primitive operations included in the first call list 1220. A plurality of primitive operations for defining or changing the data structure may be determined to be operations to be merged. For example, if a plurality of Reshape operations having a consecutive order are retrieved from the first call list 1220, the plurality of Reshape operations having the consecutive order may be determined to be the primitive operations to be merged.
As illustrated in area 1222 of
The plurality of primitive operations 1222 determined to be the targets to be merged may be merged into one primitive operation 1232. As illustrated in
The merging the primitive operations described above may be performed through the fifth compiler pass 1210. If the second call list 1230 is created through the fifth compiler pass 1210, final output data may be calculated more quickly compared to the first call list 1220.
It may be determined whether or not there is at least one independently performed primitive operation among the plurality of primitive operations included in the first call list 1320. If it is determined that there is at least one independently performed primitive operation, the first call list 1320 may be applied to the sixth compiler pass 1310. The independently performed primitive operation may be an operation that can be independently performed without being affected by the execution result of the primitive operation in the previous order. In addition, the sixth compiler pass 1310 may be a module for outputting calculation results more quickly by adjusting the order of primitive operations.
If it is determined that there is at least one independently executed primitive operation in the first call list 1320, an execution order of the at least one independently executed primitive operation may be changed. In addition, the sixth compiler pass 1310 may create the second call list 1330 such that at least one modified primitive operation is included.
Referring to
The second call list 1330 may be created such that an execution order of an independently executable primitive operation (e.g., B=Raed( ) in
If the second call list 1330 is created through the sixth compiler pass 1310, final output data can be calculated more quickly compared to the first call list 1320. That is, the time the final output data (e.g., Tensor C in
The processor may acquire a trace from a source program including an artificial intelligence calculation, at S1410. The trace may include code and/or primitive operations associated with the source program. The processor may execute the source program to acquire a plurality of codes and/or primitive operations associated with the artificial intelligence calculation, and acquire a trace including the acquired plurality of codes and/or primitive operations.
The processor may create a call list including a plurality of primitive operations based on the trace, at S1420. A plurality of primitive operations may be included in an operation library accessible to each of the plurality of accelerators.
The processor may determine a correlation for each of a plurality of primitive operations, and create a call list including the determined correlation for each of the plurality of primitive operations. The correlation may be a relationship in which output data of the first primitive operation included in the call list is input to the second primitive operation included in the call list.
Additionally or alternatively, the processor may create a graph representing a call order and the correlation of the plurality of primitive operations based on the plurality of primitive operations included in the call list.
The processor may transmit the created call list to at least one of a plurality of accelerators. The accelerator may be configured to, upon receiving the call list, access the operation library and call the plurality of primitive operations included in the call list.
The processor may apply the call list to at least one compiler pass to create a new call list based on the call list, at S1510. The call list may be the original call list before being applied to the compiler pass. For example, the processor may determine, from among the primitive operations included in the call list, a plurality of primitive operations to be merged, based on the identifier of the primitive operation, and merge the determined plurality of primitive operations into one primitive operation. The processor may modify the call list to include the merged primitive operations to create a new call list. In this case, the input data for each of the determined plurality of primitive operations may be input as a merged primitive operation.
As another example, the processor may determine the number of the plurality of accelerators provided with the call list, and divide at least one of the input data, the activation, or the output data included in the call list based on the determined number of the plurality of accelerators. The processor may change the call list to include at least one of the divided input data, activation data, or output data to create a new call list.
As another example, the processor may determine the number of the plurality of accelerators provided with the call list, and divide the call list into a plurality of sub call lists based on the determined number of the plurality of accelerators. The processor may change the call list such that the divided sub call lists are pipelined to create a new call list. The processor may transmit the plurality of divided sub call lists to the plurality of accelerators. In this case, the first accelerator receiving a first sub call list and the second accelerator receiving a second sub call list pipelined with the first sub call list may be included in the same node. As another example, the first accelerator receiving the first sub call list may be included in the first node, the second accelerator receiving the second sub call list pipelined with the first sub call list may be included in the second node, and the first node may be a neighboring node adjacent to the second node. The processor may insert at least one command into at least one of the first sub call list or the second sub call list such that output data based on the first sub call list is provided as input data of a primitive operation included in the second sub call list.
As still another example, the processor may determine the number of the plurality of accelerators provided with the call list, and divide a plurality of parameters applied to each of the primitive operations included in the call list based on the determined number of the plurality of accelerators. The processor may change the call list to include the divided parameters to create a new call list.
As still another example, the processor may determine, from among the primitive operations included in the call list, a plurality of primitive operations to be merged, based on at least one of a data structure or an identifier associated with the primitive operations, and merge the determined plurality of primitive operations into one primitive operation. The processor may modify the call list to include the merged primitive operations to create a new call list.
As still another example, the processor may identify, among a plurality of primitive operations included in the call list, at least one independently performed primitive operation, and change an execution order of the identified at least one primitive operation. In this case, the processor may change the execution order of the at least one primitive operation such that the execution order of the identified at least one primitive operation is advanced. The processor may change the call list to include the modified at least one primitive operation to create a new call list.
If the creation of the new call list is completed, the processor may transmit the new call list to at least one accelerator, at S1520.
In the embodiments described above, several compiler passes have been described as examples used for optimizing the call list, but various other compiler passes may also be used to optimize the call list. Additionally, the optimized call list may be transmitted to at least one accelerator.
The flowchart and description described above are merely examples, and may be implemented differently in some examples. For example, in some examples, the order of respective steps may be changed, some steps may be repeatedly performed, some steps may be omitted, or some steps may be added.
The method described above may be provided as a computer program stored in a computer-readable recording medium for execution on a computer. The medium may be a type of medium that continuously stores a program executable by a computer, or temporarily stores the program for execution or download. In addition, the medium may be a variety of recording means or storage means having a single piece of hardware or a combination of several pieces of hardware, and is not limited to a medium that is directly connected to any computer system, and accordingly, may be present on a network in a distributed manner. An example of the medium includes a medium configured to store program instructions, including a magnetic medium such as a hard disk, a floppy disk, and a magnetic tape, an optical medium such as a CD-ROM and a DVD, a magnetic-optical medium such as a floptical disk, and a ROM, a RAM, a flash memory, and so on. In addition, other examples of the medium may include an app store that distributes applications, a site that supplies or distributes various software, and a recording medium or a storage medium managed by a server.
The methods, operations, or techniques of the present disclosure may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. Those skilled in the art will further appreciate that various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented in electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such a function is implemented as hardware or software varies depending on design requirements imposed on the particular application and the overall system. Those skilled in the art may implement the described functions in varying ways for each particular application, but such implementation should not be interpreted as causing a departure from the scope of the present disclosure.
In a hardware implementation, processing units used to perform the techniques may be implemented in one or more ASICs, DSPs, digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, electronic devices, other electronic units designed to perform the functions described in the present disclosure, computer, or a combination thereof.
Accordingly, various example logic blocks, modules, and circuits described in connection with the present disclosure may be implemented or performed with general purpose processors, DSPs, ASICs, FPGAs or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination of those designed to perform the functions described herein. The general purpose processor may be a microprocessor, but in the alternative, the processor may be any related processor, controller, microcontroller, or state machine. The processor may also be implemented as a combination of computing devices, for example, a DSP and microprocessor, a plurality of microprocessors, one or more microprocessors associated with a DSP core, or any other combination of the configurations.
In the implementation using firmware and/or software, the techniques may be implemented with instructions stored on a computer-readable medium, such as random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable PROM (EEPROM), flash memory, compact disc (CD), magnetic or marking data storage devices, etc. The instructions may be executable by one or more processors, and may cause the processor(s) to perform certain aspects of the functions described in the present disclosure.
If implemented in software, the techniques described above may be stored on a computer-readable medium as one or more instructions or codes, or may be sent via a computer-readable medium. The computer-readable media include both the computer storage media and the communication media including any medium that facilitates the transmission of a computer program from one place to another. The storage media may also be any available media that may be accessible to a computer. By way of non-limiting example, such a computer-readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other media that can be used to transmit or store desired program code in the form of instructions or data structures and can be accessible to a computer. In addition, any connection is properly referred to as a computer-readable medium.
For example, if the software is sent from a website, server, or other remote sources using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, wireless, and microwave, the coaxial cable, the fiber optic cable, the twisted pair, the digital subscriber line, or the wireless technologies such as infrared, wireless, and microwave are included within the definition of the medium. The disks and the discs used herein include CDs, laser disks, optical disks, digital versatile discs (DVDs), floppy disks, and Blu-ray disks, where disks usually magnetically reproduce data, while discs optically reproduce data using a laser. The combinations described above should also be included within the scope of the computer-readable media.
The software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known. An exemplary storage medium may be connected to the processor, such that the processor may read or write information from or to the storage medium. Alternatively, the storage medium may be integrated into the processor. The processor and the storage medium may exist in the ASIC. The ASIC may exist in the user terminal. Alternatively, the processor and storage medium may exist as separate components in the user terminal.
Although the examples described above have been described as utilizing aspects of the currently disclosed subject matter in one or more standalone computer systems, aspects are not limited thereto, and may be implemented in conjunction with any computing environment, such as a network or distributed computing environment. Furthermore, the aspects of the subject matter in the present disclosure may be implemented in multiple processing chips or devices, and storage may be similarly influenced across a plurality of devices. Such devices may include PCs, network servers, and portable devices.
Although the present disclosure has been described in connection with some examples herein, various modifications and changes can be made without departing from the scope of the present disclosure, which can be understood by those skilled in the art to which the present disclosure pertains. In addition, such modifications and changes should be considered within the scope of the claims appended herein.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0029483 | Mar 2023 | KR | national |
10-2023-0088285 | Jul 2023 | KR | national |