This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2016-000151, filed on Jan. 4, 2016, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a matrix division method and a parallel processing apparatus.
Numerical simulations based on analytical schemes, such as structural analyses, fluid analyses, and electromagnetic field analyses, are employed in designing structures, electronic circuits, and the like. Many of equations defining the physical laws of analysis objects are expressed as partial differential equations dealing with continuous physical quantities. Therefore, in numerical simulations, problems for solving partial differential equations are replaced by problems for solving matrix equations using a discretization method, such as a finite element method (FEM).
The coefficient matrix of such a matrix equation is a high-dimensional large-scale sparse matrix in which most of the elements are zero. Therefore, in order to reduce computational load and the memory usage, an iterative method is used which keeps modifying the solution to the matrix equation until the correct solution is found by repeated computation. Examples of iterative methods include the conjugate gradient (CG) method; the bi-conjugate gradient (BiCG) method; the conjugate residual (CR) method; the conjugate gradient squared (CGS) method; and the incomplete Cholesky conjugate gradient (ICCG) method.
There are disclosed techniques for enabling high-speed magnetic field analyses and structural analyses using finite element methods and iterative methods. There is also a disclosed technique for preparing the Cholesky method (a type of direct method) and the ICCG method (a type of iterative method) to solve simultaneous linear equations. According to this technique, either the Cholesky method or the ICCG method is used depending on whether the count of non-zero elements included in a stiffness matrix falls within the range that fits in memory of a computer.
Japanese Laid-open Patent Publication No. 2010-122850
Japanese Laid-open Patent Publication No. 2005-207900
Japanese Laid-open Patent Publication No. 5-73527
Japanese Laid-open Patent Publication No. 2012-204835
In the case of employing an iterative method, a plurality of processes (each being an executable unit of computing processing) are run in parallel by dividing a coefficient matrix into a plurality of row groups (sets of rows) and assigning each of the processes to a different row group. If the coefficient matrix is a band matrix whose non-zero elements are confined to the diagonal and a few of the immediately adjacent diagonals, there is less imbalance in operation load among the processes. On the other hand, if the coefficient matrix includes some rows having a significantly larger number of non-zero elements compared to others, each process assigned to a row group including many non-zero elements acts as a bottleneck and slows down the entire parallel processing.
For example, in the case where a magnetic field produced by an electric current flowing through a conductor when a voltage is applied to a part of the conductor is analyzed using a finite element method and an iterative method, the vector potential component and the current component in each finite element are unknowns. A region corresponding to the vector potential component within the coefficient matrix is defined by the connectivity among the finite elements, and non-zero elements are, therefore, confined to the diagonal and a few of the immediately adjacent diagonals. On the other hand, as for coefficients corresponding to the current component, many finite elements have non-zero values. As a result, each row of matrix elements corresponding to the current component includes many non-zero elements. In the case of a structural analysis, rows with a great number of non-zero elements appear when constraint conditions are added.
In parallel processing, execution results obtained from one process are used in different processes. Therefore, a process falling behind in its operation is not able to pass its operation results on to other processes, thus causing a delay in the entire processing. That is, the process falling behind in its operation acts as a bottleneck and slows down the entire processing. The processing power preferably increases in a linear fashion as a function of the number of processes executable in parallel (the parallel process count). However, under conditions where such a bottleneck is present, very little increase in the processing power takes place with increasing parallel process count once the parallel process count exceeds a certain number.
According to one embodiment, there is provided a non-transitory computer-readable storage medium storing a matrix computing program that causes a processor of a computer including memory and the processor to perform a procedure. The computer performs processing for computing a matrix equation that includes a sparse matrix as a coefficient matrix. The procedure includes acquiring, from the memory, a threshold used to determine the multitude of non-zero elements included in each of rows of the sparse matrix; identifying, within the sparse matrix, a first row whose count of non-zero elements is larger than the threshold; extending the sparse matrix by dividing the identified first row into a plurality of second rows; and dividing the extended sparse matrix into a plurality of row groups and assigning a process being an executable unit of the processing to each of the row groups.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
Several embodiments will be described below with reference to the accompanying drawings. In the following description and the accompanying drawings, like reference numerals refer to like elements having substantially the same functions, and a repeated description thereof may be omitted.
A first embodiment is described next with reference to
As for some patterns of non-zero elements (non-zero patterns) included in a coefficient matrices, no improvement in the processing speed is observed with increasing parallel process count once the parallel process count exceeds a certain number. That is, there are non-zero patterns that reduce the scalability of the parallel processing. The first embodiment provides a technique for improving the scalability of the parallel processing even in the case of dealing with a coefficient matrix including such a non-zero pattern.
As illustrated in
The storing unit 11 is a volatile storage device, such as random access memory (RAM), or a non-volatile storage device, such as a hard disk drive (HDD) or flash memory. The computing units 12A to 12F are processors, such as central processing units (CPUs) or digital signal processors (DSPs). In addition, some of the computing units 12A to 12F may be general-purpose computing on graphics processing units (GPGPUs). The computing units 12A to 12F execute a program stored in the storing unit 11 or different memory. The computing units 12A to 12F are able to run a plurality of processes in parallel. The term process here means an executable unit of computing processing. For example, the computing units 12A to 12F are able to run Processes P1 to P6, respectively, in parallel. The example of
As for a finite element model illustrated in (A) of
The constraint conditions above result in the coefficient matrix A with rows and columns each including a great number of non-zero elements (see the lowermost row and rightmost column in the coefficient matrix A of (B) of
In order to correct the above-described operation load imbalance, the parallel processor 10 divides each row with a great number of non-zero elements in the coefficient matrix A. In this regard, the storing unit 11 stores therein a threshold TH used to determine the multitude of non-zero elements included in each row of the sparse matrix. The threshold TH is set in advance, for example, according to the number of rows in the coefficient matrix A and the width of a band consisting of non-zero elements (band region) confined to the diagonal and a few of the immediately adjacent diagonals.
For example, the computing unit 12A identifies, within the sparse matrix (coefficient matrix A), a first row J whose number of non-zero elements exceeds the threshold TH. In addition, the computing unit 12A divides the first row J into two second rows J1 and J2 to thereby extend the sparse matrix, as illustrated in (C) of
In the example of (C) of
By dividing each row with a great number of non-zero elements and then dividing the coefficient matrix A into a plurality of row groups according to the number of non-zero elements as described above, it is possible to almost equally distribute operation load of the processes assigned to the individual row groups. As a result, it is less likely to cause a slowdown in the operations of some processes due to relative delays in the operations of other processes, which in turn circumvents the restrictions on the scalability attributed to the operation load imbalance. The first embodiment has been described thus far.
Next described is a second embodiment. The second embodiment is directed to a method for solving a matrix equation problem whose coefficient matrix is a sparse matrix, and provides a parallel processing method for efficiently processing the problem by assigning a plurality of processes, each being an executable unit of computing processing, to sub-regions of the coefficient matrix and executing the processes in parallel. The parallel processing method provides a technique for appropriately dividing the coefficient matrix to prevent imbalance in the operation load distribution across the processes and improve the scalability of the parallel processing.
b-1. Hardware
Next described is hardware of an information processor capable of implementing the parallel processing method according to the second embodiment, with reference to
The CPUs 101a, 101b, . . . , and 101f function, for example, as computing or control units and control all or part of the operations of the hardware elements based on various programs stored in the memory 102. Each of the CPUs 101a, 101b, . . . , and 101f may include a plurality of processor cores. Note that the CPU group 101 may include one or more GPGPUs. The memory 102 is an example of storage device for temporarily or permanently storing, for example, a program to be loaded into the CPUs 101a, 101b, . . . , and 101f, data to be used for their computation, and various parameters that vary when the program is executed. The memory 102 may be a volatile storage device, such as RAM, or a non-volatile storage device, such as a HDD or flash memory.
The communication interface 103 is a communication device used to connect with a network 201. The communication interface 103 is, for example, a wired or wireless local area network (LAN) communication circuit or an optical communication circuit. The network 201 is a network connected with a wire or wirelessly, and is, for example, the Internet or a LAN. The display interface 104 is a connection device used to connect to a display unit 202. The display unit 202 is, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display panel (PDP), or an electro-luminescence display (ELD). The device interface 105 is a connection device used to connect to an external device, such as an input unit 203. The device interface 105 is, for example, a universal serial bus (UBS) port, an IEEE 1394 port, a small computer system interface (SCSI), or an RS-232C port. To the device interface 105, a removable storage medium (not illustrated) or an external device, such as a printer, may be connected. The removable storage medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
A computing method according to the second embodiment is implemented using the single information processor 100 provided with a plurality of CPUs as illustrated in
Each of memory units 102a, 102b, . . . , and 102f is the same as the memory 102 described above. Each of communication interfaces 103a, 103b, . . . , and 103f are the same as the communication interface 103 described above. Note that hardware components corresponding to the display interface 104 and the device interface 105 are not illustrated in each of the information processors 100a, 100b, . . . , and 100f for the purpose of illustration. The information processors 100a, 100b, . . . , and 100f are able to operate as a distributed processing system for performing computing operations distributed across the CPUs 101a, 101b, . . . , and 101f by sending and receiving results of the computing operations to and from each other.
The computing method according to the second embodiment is implemented by using the hardware of the information processor 100 of
b-2. Functions
Next described are functions of the information processor 100 with reference to
As illustrated in
The storing unit 111 stores therein a coefficient matrix 111a and a threshold 111b. The coefficient matrix 111a is a coefficient matrix included in a matrix equation subject to the computing processing. For example, in the case of solving a matrix equation “Ax=b” which includes vectors b and x and a coefficient matrix A, the information of the coefficient matrix A is stored in the storing unit 111. In this case, information on the vector b (the right-side vector) is also stored in the storing unit 111. The threshold 111b is used to determine the multitude of non-zero elements in each row of the coefficient matrix 111a (i.e., to determine whether the non-zero count of each row of the coefficient matrix 111a is large). The threshold is set, for example, according to the size of the coefficient matrix A and the number of processes (each being an executable unit of the computing processing) executed by the information processor 100 in parallel.
The matrix extending unit 112 counts the number of non-zero elements (the non-zero count) included in each row of the coefficient matrix A, and identifies rows whose non-zero count exceeds the threshold. Then, the matrix extending unit 112 divides each of the identified rows into a plurality of rows to thereby extend the coefficient matrix A. That is, the matrix extending unit 112 breaks up each row with high non-zero count into a plurality of rows with low non-zero count.
The process assigning unit 113 divides the coefficient matrix A extended by the matrix extending unit 112 into a plurality of row groups according to the number of processes to be executed by the information processor 100 in parallel. Note that each of the row groups is a set of one or more rows. Then, the process assigning unit 113 assigns a process to each of the row groups. The parallel computing unit 114 causes a plurality of CPUs (some or all of the CPUs 101a to 101f) included in the CPU group 101 to execute the processes assigned to the individual row groups.
Next described is a method for applying the technique of the second embodiment to a calculation algorithm for solving a matrix equation using the ICCG method, with reference to
The ICCG method is a hybrid computation scheme that combines preprocessing called an incomplete Cholesky (IC) decomposition with the conjugate gradient (CG) method. For example, in the case where a matrix A is a target matrix, the matrix A is decomposed using a diagonal matrix D and an off-diagonal matrix L, as expressed by Equation (1) below. In this regard, the diagonal matrix D and the off-diagonal matrix L are obtained by Equations (2) and (3), respectively.
The incomplete Cholesky decomposition is a scheme for calculating a matrix C (C=LDLT≈A) under the restriction that “the arrangement pattern of non-zero elements (non-zero pattern) included in the off-diagonal matrix L is the same as the non-zero pattern of the matrix A″. Note that the superscript T refers to the matrix transpose. Placing such a restriction offers the advantage of being able to determine in advance the size of an array for storing values of the off-diagonal matrix L. In addition, the ICCG method has the advantage of being able to reduce the computational load and memory usage in the case where the coefficient matrix A is a sparse matrix, like one illustrated in (A) of
A pseudocode for the ICCG method to solve the matrix equation “Ax=b” which includes the vectors b and x and the coefficient matrix A is given in (B) of
By executing the pseudocode of (B) of
In view of this, a method is employed which divides the coefficient matrix A into a plurality of sub-regions (for example, {A(1)} and {A(2)}), as illustrated in (A) of
(B) of
The coefficient matrix A illustrated in (A) of
In (A) of
On the other hand, in the case of dividing the coefficient matrix A with the total number of rows amounting to 13 into ten row groups, the created row groups include a row group including 11 (or more) non-zero elements and row groups each including about 5 or so non-zero elements. As described above, the computation of a CPU in charge of one row group needs computational results obtained by another CPU in charge of a different row group. This means that a CPU in charge of a row group with a low non-zero count waits for a different CPU in charge of a row group with a high non-zero count to produce computational results. That is, improvement in the processing power commensurate with the parallel process count (10 in this case) is unlikely to be achieved.
In view of the above, according to the second embodiment, the matrix extending unit 112 identifies a row whose non-zero count exceeds a threshold S (for example, S=7) (such a row is referred to as a “large row”) and divides the large row into a plurality of rows each with a low non-zero count, as illustrated in (B) of
In addition, the matrix extending unit 112 extends the vectors {p} and {q} corresponding to the large row. For example, in the case where an element of the vector {q} corresponding to the large row is Q, Q1 and Q2 are defined as elements individually corresponding to each of the rows created by the division. Note here that Q=Q1+Q2. In the case where an element of the vector {p} corresponding to the large row is P, the values of elements corresponding to the rows created by the division depend on the placement of a non-zero element. In the example of (B) of
By extending the coefficient matrix A in the above-described manner, it is possible to divide the coefficient matrix A into a plurality of row groups in consideration of the balance of the non-zero counts, as illustrated in (A) of
Note here that c is the number of large rows included in the coefficient matrix A; Nkc is the parallel process count (the number of processes executed in parallel); and Q(k, m) is the mth divided element obtained by dividing an element of the vector {q} corresponding to the kth large row. For example, according to the example of (B) of
b-3. Flow of Processing
Next described is the flow of processing performed by the information processor 100 to solve the matrix equation “Ax=b” which includes the vectors b and x and the coefficient matrix A, with reference to
[Step S101] The matrix extending unit 112 acquires a parallel process count N, the threshold S, the coefficient matrix A, the right-side vector b, and a total count of non-zero elements nall. The parallel process count N is the number of processes to be used in the parallel processing, and corresponds to the number of CPUs in the case of assigning each of the processes to a different CPU. The threshold S is set in advance according to, for example, the size of the coefficient matrix A. The total count of non-zero elements nall is the sum total of non-zero elements included in the coefficient matrix A.
[Steps S102 and S107] The matrix extending unit 112 repeats steps S103 to S106 while changing the parameter k from 1 to nd. nd is the total number of rows included in the coefficient matrix A.
[Step S103] The matrix extending unit 112 counts the number of non-zero elements in the kth row, nk, of the coefficient matrix A.
[Step S104] The matrix extending unit 112 determines whether the number of non-zero elements nk counted in step S103 exceeds the threshold S. If the number of non-zero elements nk exceeds the threshold S, the processing moves to step S105. On the other hand, the number of non-zero elements nk does not exceed the threshold S, the processing moves to step S107, and step S102 and the subsequent steps are then carried out if the parameter k is equal to or below nd.
[Step S105] The matrix extending unit 112 determines the kth row as a large row. The large row is regarded as a row including a great number of non-zero elements and is, therefore, going to be divided.
[Step S106] The matrix extending unit 112 adds nk to a parameter nc. The parameter nc is a parameter for counting the sum total of non-zero elements in one or more large rows included in the coefficient matrix A. The process moves to step S107 after the completion of step S106, and step S102 and the subsequent steps are then carried out if the parameter k is equal to or below nd.
[Step S108] The matrix extending unit 112 calculates a large-row division number Nc. The large-row division number Nc is a parameter indicating into how many row groups the region of the large rows included in the coefficient matrix A is to be divided. For example, if a single large row (the lowermost row in the example of
Equation (5) is based on a relational expression (4) below. That is, Nc is determined in such a manner that the non-zero count obtained by dividing the non-zero elements of the large row into Nc groups becomes approximately equal to the non-zero count obtained by dividing non-zero elements included outside of the region of the large row into (N-Nc) groups. This technique of determining Nc reduces the imbalance in the non-zero count among a plurality of row groups created by dividing the coefficient matrix A.
Note that how to determine Nc when a plurality of large rows are present in the coefficient matrix A is described later.
[Step S109] The matrix extending unit 112 divides the region of one or more large rows included in the coefficient matrix A into Nc row groups (matrix extension). For example, in the case where a single large row is present in the coefficient matrix A, the matrix extending unit 112 divides the large row into Nc row groups, for example, in the method illustrated in
[Step S110] The process assigning unit 113 divides the region of the coefficient matrix A, except for the region of the one or more large rows, into (N-Nc) row groups (region division). In the example of
[Step S111] The process assigning unit 113 assigns a process to each of the rows obtained by dividing the large row in step S109, and also assigns a process to each of the row groups obtained by the region division in step S110.
At this point of time, the process assigning unit 113 generates matrix information pieces each having a data structure illustrated in
Matrix information pieces are individually passed on to a corresponding one of CPUs in charge of processes to be executed in parallel. That is, a matrix information piece is generated for each of the processes. The item “size of rows” indicates the size of rows to be handled by the corresponding process in charge (the number of rows included in the corresponding row group). The item “count of non-zero elements” indicates the number of non-zero elements included in the row group to be handled by the process in charge. The item “leading row number” indicates in which row of the coefficient matrix A the beginning of the row group to be handled by the process in charge is located.
The item “array of row-by-row array leading numbers” is an array representing positions within the corresponding array of coefficients, at each of which data of the first one of non-zero elements in each row to be handled by the process in charge is stored. The item “array of column numbers” is an array representing, within the corresponding array of coefficients, columns in which data of non-zero elements in each row to be handled by the process is stored. The item “array of coefficients” is an array representing, within the coefficient matrix A, positions of non-zero elements to be handled by the process. Distributing each of such matrix information pieces to an appropriate process allows for efficient passing of data on non-zero elements, which then contributes to saving memory usage.
[Step S112] The process assigning unit 113 generates communication information pieces used to execute a plurality of processes in parallel, according to details of the assignment of the processes. Once the details of the assignment of the processes are confirmed in step S111, it is determined which CPU is in charge of each of the row groups. Then, it is identified, in order for each CPU to proceed with the computing operation, from which CPU the CPU is going to acquire computational results and to which CPU the CPU is going to provide its computational results. The communication information pieces are information enabling such transmission and reception of computational results among CPUs.
For example, the process assigning unit 113 generates communication information pieces each having a data structure illustrated in
[Step S113] The parallel computing unit 114 carries out transmission and reception of the matrix information pieces generated in step S111 and the communication information pieces generated in step S112. In this regard, a CPU having performed steps S101 to S112 transmits, to each of other CPUs, matrix and communication information pieces corresponding to the CPU, which then receives the transmitted matrix and communication information pieces. That is, matrix and communication information pieces are distributed to CPUs individually in charge of one of processes to be executed in parallel.
[Step S114] The parallel computing unit 114 causes a plurality of CPUs to operate in parallel so that the individual CPUs perform computing operations (calculation of unknowns) of their corresponding row groups based on the matrix information pieces while maintaining cooperation among the CPUs based on the communication information pieces. After the completion of step S114, the series of processing illustrated in
b-4. Application Example: Magnetic Field Analysis
Next described is an example of applying the technique of the second embodiment to a magnetic field analysis. In the following example, let us assume for the purpose of illustration that CPU #1, CPU #2, . . . , and CPU #10 are available for parallel computation.
Coefficient Matrix
Equations for describing the magnetic field produced by a coil are given as Equations (6) to (8) below using a vector potential A, a current density J, magnetic resistivity ν, interlinkage magnetic flux Φ, resistance R, an electric current I, terminal voltage V, a cross-sectional area of the coil S, and a unit directional vector n of the current density in the coil. As for the left-hand side of Equation (7), the first term represents the induced electromotive force due to temporal change in magnetic flux, and the second term represents a voltage drop due to the resistance.
Now let us consider a finite element model where 25 nodes are provided in a rectangular coil and a shaded area is a region where the current is unknown while the remaining area is a region where the current is known, as illustrated in (A) of
If the current density J is fixed, the coefficient matrix A becomes a band matrix as illustrated in (B) of
Matrix Extension and Region Division
In the case where the threshold S is 10, the row corresponding to the current unknowns becomes a large row. If the parallel process count N is 10, the large-row division number Nc is obtained by Equations (9) and (10) below based on the above-cited Equations (4) and (5). The large-row division number Nc is 2 according to the result of Equation (10). On the other hand, the division number of the region except for the region of the large row, (N-Nc), is 8. Note that ROUND(•) is a function for rounding off a value passed thereto, and MOD(•) is a function for outputting the reminder.
By dividing the large row into two and dividing the remaining rows into eight row groups, the coefficient matrix A illustrated in
Communication and Matrix Information Pieces
When the extension of the coefficient matrix A and the process assignment are completed as illustrated in
Once the assignment details of
In addition, based on the assignment details of
Once data of the computational results transmitted and received among the CPUs and sources and destinations of the data transmission are identified, as illustrated in
With reference to
The CPUs after obtaining the above-cited communication information pieces executes a program code illustrated in
As for the transmission processing, in Lines 4 to 7, values to be transmitted are copied to a transmission array, and in Line 10, data of the array is transmitted to another CPU using the MPI function. As for the reception processing, in Line 17, data of an array is received from another CPU using the MPI function, and in Lines 18 to 20, received values are copied to a vector used for the computing operation of the recipient CPU. Data is transmitted and received using the Share (•) function.
In implementing the above-described matrix extension, the column vectors are also extended along with the extension of the coefficient matrix A. In this regard, the column vector element (Q) for storing a component of the matrix-vector product corresponding to the large row is divided. Therefore, a step of restoring the element is incorporated in the computing operation. The Reduce sum function is used to perform the step of restoring the element.
Prior to executing the Reduce sum function, a program code illustrated in (A) of
Advantageous Effect: Scalability of Parallel Processing
The application of the technique of the second embodiment described thus far achieves high parallel scalability as illustrated in
Supplementary Note: Case of a Plurality of Coils
The above-described application example is directed to the scheme for analyzing a magnetic field produced by one coil to which a terminal voltage is applied. In the above description, the number of coils is set to one for the purpose of illustration; however, the technique of the second embodiment may also be applied to, for example, an inductor model composed of a plurality of coils wound around a core. In this case, a plurality of large rows corresponding to the plurality of coils are included in the coefficient matrix A. Therefore, the procedure for calculating the large-row division number Nc is extended as represented by Equations (11) to (15) below.
The degrees of freedom of all the coils nc_all are obtained by Equation (11) below where ycoil is the number of coils and ncy is the degrees of freedom associated with the yth coil (total unknowns). In addition, based on Equation (12), the large-row division number Nc is obtained by Equation (13). Further, the division number of each large row Ncy (the number of divisions of a large row corresponding to the yth coil) is obtained by Equation (14).
Note that the average degrees of freedom of the coils assigned to one process <nc>is obtained by Equation (15). If ncy is less than <nc>, Ncy becomes 0. In this case, large rows corresponding to two or more coils are grouped together so that at least one process is assigned to the large row group. For example, the process assigning unit 113 rearranges the numbers of coils in ascending order of ncy (ncy<nc(y+1)) (corresponding to the order of the large rows), and groups large rows whose Ncy is less than 1 together. In this regard, the process assigning unit 113 forms the group by combining large rows in such a manner that the value obtained by summing the division numbers of all the grouped large rows exceeds 1.
When the identification number of each group is denoted by ID_G, the division number of each group is denoted by Pa_G, the degrees of freedom of each group is denoted by Dof_G, and the division number of each large row is denoted by Pa, these values are calculated using the program code (the DistCoilPa(•) function) illustrated in
If the division number of each group is 1, the process assigning unit 113 does not divide the corresponding large row or rows. If the division number of each group is larger than 1 and the group includes one large row, the process assigning unit 113 divides the large row by the corresponding division number Pa. If the division number of each group is larger than 1 and the group includes more than one large row, the process assigning unit 113 considers that the division number exceeded 1 when the last large row was added to the group. Then, the process assigning unit 113 divides only the last large row of the group by the division number.
An example of assigning four CPUs #1 to #4 to five large rows (corresponding to five coils) is illustrated in
Note that, in order to adjust the degrees of freedom, a part of the degrees of freedom corresponding to the fourth coil, X4, is assigned to CPU #1. The degrees of freedom X4 are calculated by Equation (16) below. MOD(•) is a function for calculating the reminder. By placing a plurality of large rows corresponding to a plurality of coils into one group and assigning a process to each group as illustrated in
The second embodiment has been described thus far.
According to an aspect, it is possible to improve the scalability of parallel processing.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention.
Although one or more embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2016-000151 | Jan 2016 | JP | national |