Molecular dynamics simulation means to simulate the motion process of molecules by using a computer. As an important HPC (High Performance Computing) application, it is often adopted in investigating the properties of substances. By utilizing molecular dynamics simulation in a computer to trace the characteristics of motion of all molecules, the overall properties of substance can be derived, whereby matters at the molecular level can be dealt with. This is of practical significance in research fields of material, biology, optics, medical science, and the like.
In order to obtain the motion trajectories of molecules in molecular dynamics simulation, the motion of all the molecules needs to be traced at every moment. Accordingly, there exist a large number of iterative simulation computation steps. In molecular dynamics simulation, during each iterative step, properties such as force, acceleration, velocity, position, etc. of each molecule that are capable of indicating the current state of the molecule need to be calculated, respectively.
It can be appreciated that computationally, molecular dynamics simulation is a very huge task, since there will be a large number of molecules to be simulated and a large number of simulation steps to be performed.
In molecular dynamics simulation, the vast majority of calculation time is spent on calculating the acting forces of molecular pairs, since when calculation of acting force between molecules is performed on a specific molecule, all the surrounding adjacent molecules of the specific molecule need to be taken into account, i.e. acting forces between these surrounding adjacent molecules and the specific molecule need to be calculated, respectively, and then, operations such as summation, etc. are performed on these acting forces.
In existing solutions of molecular dynamics simulation, as is often the case, the whole substance space on which molecular dynamics simulation needs to be performed is divided into M×M×M cubic cells or cuboid cells in the space coordinates system, so as to facilitate finding the adjacent molecules based thereon. That is, each molecule belongs to a specific cell depending on its position. The description is given hereinafter by taking M×M×M cubic cells as an example, however, the person skilled in the art can appreciate that the case of cuboid cells is similar. As to M×M×M cubic cells, the length of each edge of each cell is equal to a cut-off radius, which is a predetermined value. If the distance between two molecules is larger than the cut-off radius, the acting force between the two molecules will be ignored. In this manner, calculation of acting force between molecules can be made convenient.
More specifically,
27 cells=26 surrounding adjacent cells+the central cell 102 itself.
At present, there exists a plurality of different algorithms for optimizing the calculation of acting force between molecules, among which the linkcell method has the best performance. In the linkcell method, according to Newton's third law, i.e. forcea->b=−forceb->a, it is considered that the acting force between two molecules is calculated only once. Based on such consideration, in the linkcell method, when the acting force of a specific molecule is calculated, the amount of calculation can be reduced almost by half by looking for only 14 cells, instead of all the 27 cells in the conventional solution, i.e. the following is taken into account in the linkcell method:
14 cells=13 surrounding adjacent cells+the central cell 102 itself.
More specifically,
However, the above existing solutions of molecular dynamics simulation are all implemented on a platform of a single processor system, and implementation on such a platform yields less than ideal simulation performance.
The Cell Broadband Engine (CBE) is a single-chip multiprocessor system. As shown in
As the Inventors herein have recognized, if the above existing solutions of molecular dynamics simulation are directly applied to a multiprocessor system such as CBE 130, the performance will not be enhanced greatly. The reason is as follows.
In the existing solutions of molecular dynamics simulation, data of molecules of each cell are discretely stored in the memory, and the discretely stored data of molecules in a cell are concatenated together by means of a linked list. That is, each cell has a linked list corresponding thereto, which comprises pointers pointing to storage location of data of all the molecules within the cell. In addition, a global array is utilized to store headers of all the linked lists.
Moreover, in the existing solutions of molecular dynamics simulation, considering that molecules are in constant motion, and one molecule may move from one cell to another cell or even beyond the adjacent cell(s), the subordination relationship between the molecules and the cells is therefore adjusted after each iterative computation step. This adjustment is realized by adjusting the linked list. More specifically, the storage location of data of the molecules is kept unchanged, and by adjusting the linked list, data of the molecules whose subordination relationship with the cell has changed are removed from the linked list of the original cell, and linked to the linked lists of the cells into which the molecules have newly moved, so as to reflect the position change of the molecules in the simulated substance space.
Suppose that the above solutions are applied to CBE 130. When each SPU 141-148 acquires data of molecules of a required cell from main memory 180 of CBE 130 into its local storage 161-168, so as to perform simulation computations such as calculation of acting force between molecules, due to the fact that storage locations 161-168 of the data of molecules within the cell in the main memory 180 are discrete, it is necessary to utilize the linked list corresponding to the cell to position in turn the storage location of data of each molecule within the cell, and utilize in turn DMA operations to acquire these data of molecules into its local storage 161-168. In this manner, since the data of molecules are discretely stored, a DMA operation needs to be performed each time the data of a molecule is acquired, i.e. only one molecule can be acquired the data in one DMA operation. Accordingly, in order to acquire data of molecules of the required cell, each SPU 151-158 needs to utilize DMA operations to repeatedly perform data exchange between itself and main memory 180, which leads to a sharp decrease in the simulation performance.
Therefore, it is necessary to design a solution of molecular dynamics simulation, which is suitable for multiprocessor systems such as the CBE 130. The present invention relates to the data processing field, and more specifically, to a method and an apparatus for performing molecular dynamics simulation on a multiprocessor system.
In view of the above problem, the present invention provides a method and an apparatus for performing molecular dynamics simulation on a multiprocessor system which, by continuously storing data of molecules of each cell in the simulated substance space in a memory area corresponding to the cell, enables each accelerator in the multiprocessor system to utilize less DMA operations to acquire data of molecules of a plurality of cells from the main memory into its local storage, whereby reducing the frequent data exchanges between the accelerator and the main memory, and enhancing the simulation performance.
According to one aspect of the invention, there is provided a method for performing molecular dynamics simulation on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprising: dividing a substance space on which molecular dynamics simulation are to be performed into a plurality of cells; storing data of molecules of the plurality of cells in the main memory of the multiprocessor system in the manner in which data of molecules of each cell are continuously stored in a memory area corresponding to the cell; and the plurality of accelerators repeatedly acquiring the data of molecules of the plurality of cells from the main memory and performing molecular dynamics simulation computations in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation.
According to another aspect of the invention, there is provided an apparatus for performing molecular dynamics simulation in a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the apparatus comprising: a cell dividing unit for dividing a substance space on which molecular dynamics simulation are to be performed into a plurality of cells; a molecular data storing unit for storing data of molecules of the plurality of cells in the main memory of the multiprocessor system in the manner in which data of molecules of each cell are continuously stored in a memory area corresponding to the cell; and a simulation unit for enabling the plurality of accelerators to repeatedly acquire the data of molecules of the plurality of cells from the main memory and perform molecular dynamics simulation computations in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation.
It is believed that the features, advantages, and objectives of the present invention will be better understood from the following detailed description of the embodiments of the present invention, taken in conjunction with the drawings.
Next, a detailed description of the preferred embodiments of the present invention will be given with reference to the drawings.
The method for performing molecular dynamics simulation on a multiprocessor system according to the present embodiment differs from the manner of the abovementioned existing solutions of molecular dynamics simulation, in which data of molecules of each cell are discretely stored and concatenated together by means of a linked list. The method of the present embodiment adopts a manner in which data of molecules within each cell of the substance space on which molecular dynamics simulation is to be performed are continuously stored in the memory area corresponding to the cell, respectively, in the main memory of the multiprocessor system.
The description is given hereinafter by taking M×M×M cubic cells as an example, however, the person skilled in the art can appreciate that the case of cuboid cells is similar.
More specifically, as shown in
At block 410, data of molecules of the plurality of cells are stored in the main memory of the multiprocessor system in the manner that data of molecules of each cell are continuously stored in a memory area corresponding to this cell. As to this block, a detailed description will be given later with reference to
At block 415, the plurality of accelerators repeatedly acquire the data of molecules of the plurality of cells from the main memory and perform molecular dynamics simulation computations in parallel in the manner that data of molecules of at least one cell are acquired in one DMA operation. As to this block, a detailed description will be given later with reference to
Next, the block 410 of storing the data of molecules in
As shown in a process 500 in
More specifically, in the case where as described at the block 405, the simulated substance space is divided into M×M×M cubic cells, at this block 505, M×M×M memory areas are set in the main memory of the multiprocessor system, so as to store data of molecules of the M×M×M cells, respectively.
In addition, in a preferred embodiment, at this block, the plurality of memory areas are set continuously in the main memory.
In addition, since as mentioned earlier, molecules are in constant motion and one molecule may move from one cell to another cell, the number of molecules in a cell also changes accordingly. For this reason, at this block, each of the plurality of memory areas is set to be large enough, so that even if the number of molecules in the corresponding cell changes, the memory area is capable of storing all the data of molecules in the cell entirely. More specifically, first, the maximum possible number of molecules in a cell may be preset, and then, the size of each of the plurality of memory areas is set based on the maximum possible number of molecules. In addition, preferably, the plurality of memory areas have the same size.
At block 510, correspondence relationship between the plurality of cells and the plurality of memory areas is determined.
In an embodiment, relative position coordinates may be set for the plurality of cells in the space coordinates system, and the correspondence relationship between the plurality of cells and the plurality of memory areas is determined based on the relative position coordinates.
More specifically, referring to
Then, on the basis of the relative position coordinates, and in the case where as mentioned earlier, the simulated substance space is divided into M×M×M cubic cells, correspondence between the cell whose coordinates are (x, y, z) and the corresponding memory area in the plurality of memory areas is established according to the following equation (1):
Index=x+M×y+M2×z (1),
wherein, Index denotes the sequence number of the memory area corresponding to the cell whose coordinates are (x, y, z). That is, based on Index, it can be determined that the memory area corresponding to the cell whose coordinates are (x, y, z) is the Indexth memory area among the plurality of (in this case, M×M×M) memory areas that starts from the initial address.
More specifically, referring to
In addition, based on the relative position coordinates for the cells in the space coordinates system, the adjacency relationship between the respective cells can also be determined. For example, it can be determined that the cell whose coordinates are (x=1, y=0, z=0) is the adjacent cell on the right side of the cell whose coordinates are (x=0, y=0, z=0), and the cell whose coordinates are (x=0, y=1, z=0) is the adjacent cell right above the cell whose coordinates are (x=0, y=0, z=0).
The above correspondence relationship between the cells and memory areas, and the adjacency relationship between the cells, are to be applied to the process in which the respective accelerators acquire a plurality of relevant cells so as to perform molecular dynamics simulation computations. Therefore, if these relationships can be directly determined based on the coordinates of the cells, a great convenience will be offered to the respective accelerators.
Although it is described hereinabove that the relative position coordinates for the cells are utilized to determine the correspondence relationship between the cells and memory areas, the present invention is not limited to this, and corresponding numbers may be directly set for the plurality of cells and the plurality of memory areas, so as to establish one-to-one correspondence between the plurality of cells and the plurality of memory areas according to the numbers.
At block 515, based on the correspondence relationship between the plurality of cells and the plurality of memory areas, data of molecules of the plurality of cells are stored in the respective corresponding memory areas among the plurality of memory areas, respectively, wherein, data of molecules of each cell is continuously stored in the memory area corresponding to the cell.
In addition, in an embodiment, at the beginning of each of the plurality of memory areas, the amount of data of molecules stored in the memory area, namely the number of molecules in the cell corresponding to the memory area, is indicated, so as to facilitate the access of a corresponding accelerator among the plurality of accelerators to the data in the memory area.
The above is a detailed description of the molecule data storing process as shown in
Next, the block 415 of acquiring the data of molecules and performing molecular dynamics simulation computation of the method as shown in
As shown in
More specifically, as shown in a diagram 650 in
In addition, in an embodiment, according to the rule of load balancing, the plurality of cells are divided into a plurality of equal parts. That is, in the case where as mentioned hereinabove, the simulated substance space is divided into M×M×M cells and the number of the accelerators is m, the plurality of cells are divided along the z-axis into a plurality of parts corresponding to the accelerators in number, each part comprising M/m layers of cells.
It should be noted that although dividing is carried out along the z-axis direction in
At block 610, as shown in
At block 615, the plurality of accelerators, for their respective parts, acquire data of molecules and perform molecular dynamics simulation computations layer by layer in parallel in the manner that data of molecules of at least one cell are acquired in one DMA operation, wherein the plurality of accelerators are spaced apart from each other by multiple layers of cells throughout the parallel processing.
As mentioned earlier, since molecules are in constant motion, and one molecule may move from one cell to another cell, it is necessary to adjust the subordination relationship between the molecules and the cells after each iterative computation step. In the present embodiment, as mentioned earlier, data of molecules of each cell are continuously stored in a memory area corresponding to the cell, therefore, adjustment of the subordination relationship between molecules and cells can be realized by directly moving data of molecules between the memory areas corresponding to the respective cells.
However, in the case where molecular dynamics simulation is performed in parallel on a plurality of accelerators, if cells on two different accelerators are too close to each other within the simulated substance space, then as shown in
In the present embodiment, through dividing the plurality of cells into the plurality of parts each comprises multiple layers of cells and spacing the respective accelerators apart from each other by multiple layers of cells throughout the parallel processing, the data collision that is likely to be generated when the subordination relationships between molecules and cells are adjusted can be avoided.
More specifically, in the present embodiment, as shown by
In this manner, when the plurality of accelerators acquire data of molecules of their respective first layers of cells, these first layers are spaced apart from each other by multiple layers of cells, and accordingly, since the plurality of accelerators perform parallel processing in the same layer sequence, the spacing state can be maintained, i.e. the current layers processed in parallel by the respective accelerators are always spaced apart from each other by multiple layers of cells.
Accordingly, the case where two cells located on different accelerators are too close to each other within the substance space, thus giving rise to a data collision can be avoided.
In addition, in the case where another manner of dividing is adopted, such as dividing the plurality of cells into a plurality of parts along the x-axis or y-axis, the present invention can also be realized according to the above.
Next, the block 615 of acquiring data of molecules and performing molecular dynamics simulation layer-by-layer in
It should be noted that as described earlier with respect to the linkcell method, when simulation computations such as calculation of acting force between molecules are performed on molecules in a certain central cell, data of molecules in the central cell itself and 13 surrounding adjacent cells, in total 14 cells, need to be taken into account, and thus acquired.
In contrast with this, in the process as shown in
More specifically, as shown in a process 900 in
At a block 910, the currently processed layer is divided into a plurality of columns.
More specifically, referring to
At this block, dividing of the currently processed layer into a plurality of columns is based on the consideration that in a multiprocessor system, the size of the local storage of an accelerator is generally limited. For example, in CBE 130, the capacity of the local storage 161-168 of each SPU 141-148 is 256K only. In such a case, since one layer of cells in the simulated substance space generally comprise a large amount of molecule data and the capacity of the local storage of each accelerator is generally far from enough, it is necessary to acquire in sequence a part of the data of molecules in the layer to perform processing.
Accordingly, in the present embodiment, the currently processed layer is divided into a plurality of columns along the x-axis, so that as shown in
At a block 915, the first column in the currently processed layer is set as the current column.
At a block 920, for a bar (hereinafter referred to as the central bar) on which molecular dynamics simulation computations are to be performed in the current column, the accelerator acquires data of molecules of a plurality of bars relevant to the molecular dynamics simulation computations of the central bar into its local storage in the manner in which data of molecules of at least one cell are acquired in one DMA operation, wherein, as shown in
More specifically, as shown in
Accordingly, it can be appreciated that in the case where the first bar in the current column serves as the central bar on which molecular dynamics simulation computations are to be performed, it is required that the accelerator acquires data of molecules of the central bar itself, the next bar of the bar and the 3 adjacent bars above the bar, in total 5 bars, into the local storage thereof. In addition, it can be appreciated by a person skilled in the art that since the first bar is located on the boundary of the current layer, as one of the three adjacent bars above the bar, the bar on the upper layer of the central bar and at the other side opposite to the side on which the central bar is located is used. Moreover, as to all the similar boundary problems existing in simulation computations, the similar dealing manner is adopted.
In addition, in the case where the central bar on which molecular dynamics simulation computations are to be performed is not the first bar in the current column, it is not necessary that the accelerator reacquires data of molecules of all the 5 bars relevant to the molecular dynamics simulation computations of the central bar into its local storage. Since some bars among the 5 bars have been stored in the local storage of the accelerator during the molecular dynamics simulation computations of the preceding bar, only the bars among the 5 bars that have not been stored in the local storage of the accelerator are needed to be acquired, i.e. only the next adjacent bar of the central bar, and the next bar above the central bar are needed to be acquired.
In addition, in the present embodiment, since data of molecules within each cell are continuously stored in the memory area corresponding to the cell, all the data of molecules in the cell can be acquired into the local storage of the accelerator in one DMA operation. Accordingly, at this block, the accelerator may acquire a plurality of required bars in the manner in which one cell is acquired in one DMA operation.
Further, in the case where as described at block 505 in
Furthermore, in the case where as described at block 510 in
Further, at this block, based on the determined correspondence relationship between the plurality of cells and the plurality of memory areas, the accelerator acquires data of molecules of the plurality of bars.
Next, at a block 925, the accelerator utilizes the data of molecules of the plurality of bars stored in its local storage to perform molecular dynamics simulation computations of the data of molecules of the central bar. At this block, the data of molecules of the plurality of bars are utilized to accomplish molecular dynamics simulation computations of all the cells in the central bar. That is, for all the cells in the central bar, molecular dynamics simulation computations are performed by utilizing in turn data of molecules of the relevant cells in the plurality of bars.
At a block 930, it is determined whether all the bars in the current column have undergone molecular dynamics simulation computations. If so, the process turns to a block 940; otherwise, proceeds to a block 935.
At block 935, the next bar of the central bar in the current column is set as the central bar on which molecular dynamics simulation computations are to be performed, then the process returns to block 920, to process the next bar.
At block 940, it is determined whether there is a column in the current layer that has not been processed. If so, the process proceeds to a block 945; otherwise, turns to a block 950.
At block 945, the next column of the current column is set as the column to be processed, then the process returns to block 920, to process the next column.
At block 950, it is determined whether there is a layer in the part assigned to the accelerator that has not been processed. If so, the process proceeds to a block 955; otherwise, the process ends.
At block 955, the layer to be processed next is set. In the case where each accelerator processes the part assigned thereto upwardly layer by layer from the bottommost layer in the z-axis direction, the layer above the currently processed layer is set as the layer to be processed next, and in the case where the part is processed downwardly layer by layer from the topmost layer, the layer below the currently processed layer is set as the layer to be processed next.
Next, the process returns to block 910, to process the next layer.
The above is a detailed description of the process of acquiring data of molecules and performing simulation computation layer-by-layer in
It should be noted that although the process in
In addition, in the process in
In addition, in the process in
The above is a detailed description of the method for performing molecular dynamics simulation on a multiprocessor system of the present embodiment. In the present embodiment, by continuously storing data of molecules of each cell in the simulated substance space in a memory area corresponding to the cell, each accelerator can utilize less DMA operations to acquire data of molecules of a plurality of cells from the main memory into its local storage, thereby the frequent data exchanges between the accelerator and the main memory are reduced. Further, by continuously storing the plurality of cells in the simulated substance space based on position relationships, each accelerator can acquire data of molecules of a bar constituted by a plurality of cells into its local storage in one DMA operation, whereby molecular dynamics simulation is performed in term of bar, thereby the data exchanges between the accelerator and the main memory are further reduced. Accordingly, as compared with the abovementioned existing solutions of molecular dynamics simulation, the molecular dynamics simulation method of the present embodiment can increase the ratio of time spent on calculation versus the time spent on data transfer, thereby enhancing simulation performance.
Under the same inventive concept, the present invention provides an apparatus for performing molecular dynamics simulation in a multiprocessor system, which will be described as follows with reference to the figures.
More specifically, as shown in
Cell dividing unit 11 divides a substance space on which molecular dynamics simulation needs to be performed into a plurality of cubic cells. Molecular data storing unit 12 stores data of molecules of the plurality of cells in the main memory of the multiprocessor system in the manner in which data of molecules of each cell are continuously stored in a memory area corresponding to the cell.
As shown in
Memory area setting unit 121 sets a plurality of memory areas corresponding in number to the plurality of cells in the main memory of the multiprocessor system. In a preferred embodiment, the memory area setting unit 121 continuously sets the plurality of memory areas in the main memory.
The correspondence relationship determining unit 122 determines the correspondence relationship between the plurality of cells and the plurality of memory areas. In a preferred embodiment, the correspondence relationship determining unit 122 sets for the plurality of cells relative position coordinates in the space coordinates system, and by means of calculation of the relative position coordinates, determines the correspondence relationship between the plurality of cells and the plurality of memory areas.
The storing unit 123 stores data of molecules of the plurality of cells in the plurality of memory areas respectively based on the correspondence relationship between the plurality of cells and the plurality of memory areas in the manner in which data of molecules of each cell are continuously stored in a memory area corresponding to the cell.
The plural part dividing unit 13 divides the plurality of cells into a plurality of corresponding parts based on the number of the plurality of accelerators, wherein each part comprises multiple layers of cells.
Assigning unit 14 assigns the plurality of parts to the plurality of accelerators, so that each accelerator processes one part thereamong.
Simulation unit 15 enables the plurality of accelerators to repeatedly acquire the data of molecules of the plurality of cells from the main memory and perform molecular dynamics simulation computations in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation.
More specifically, simulation unit 15 enables the plurality of accelerators, for their respective parts, to acquire data of molecules and perform molecular dynamics simulation computations layer by layer in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation, wherein the plurality of accelerators are spaced apart from each other by multiple layers of cells throughout the parallel processing. Further, simulation unit 15 enables the plurality of accelerators, for their respective parts, to acquire data of molecules and perform molecular dynamics simulation computations bar by bar within each layer in parallel, wherein one bar comprises a plurality of cells.
As shown in
Molecular data acquiring unit 181 enables the plurality of accelerators, for their respective parts, and layer by layer from the first layer: for the respective bars in the current layer, to acquire data of molecules of a plurality of bars relevant to the molecular dynamics simulation computations of the current bar into the local storages in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation. In addition, molecular data acquiring unit 181 enables the plurality of accelerators to acquire data of molecules of their respective plurality of bars based on the determined correspondence relationship between the plurality of cells and the plurality of memory areas, respectively.
Simulation computation unit 182 enables the plurality of accelerators to utilize the data of molecules of the plurality of bars stored in their respective local storages to perform molecular dynamics simulation computations in parallel.
In an embodiment, Simulation unit 15 also comprises an optional column dividing unit 183 which divides the currently processed layer into a plurality of columns for each of the plurality of accelerators based on the capacity of the local storages of the plurality of accelerators and the number of molecules in each of the plurality of cells.
In such a case, molecular data acquiring unit 181 and simulation computation unit 182 enable the plurality of accelerators, for their respective current bars in each of the plurality of columns, to acquire data of molecules of a plurality of bars relevant to the molecular dynamics simulation computations of the current bars into the local storages and utilize the data of molecules of the plurality of bars to perform molecular dynamics simulation computations of the current bars in parallel in the manner in which data of molecules of at least one cell are acquired in one DMA operation.
In one embodiment, the memory areas corresponding to the respective cells in each bar are continuously set in the main memory.
In such a case, simulation unit 15 enables the plurality of accelerators, for their respective parts, to acquire data of molecules and perform molecular dynamics simulation computations bar by bar within each layer in parallel in the manner in which data of molecules of one bar are acquired in one DMA operation.
The above is a detailed description of the apparatus for performing molecular dynamics simulation in a multiprocessor system of the present embodiment. Herein, apparatus 10 and the components thereof can be implemented with specifically designed circuits or chips or be implemented by a computer (processor) executing corresponding programs.
The present invention also provides a program product, which comprises program codes for implementing all the above methods on a multiprocessor system, and a bearing media that bearing the program codes.
While the method and apparatus for performing molecular dynamics simulation on a multiprocessor system of, the present invention have been described in detail with some exemplary embodiments, these embodiments are not exhaustive, and those skilled in the art may make various variations and modifications within the spirit and scope of the present invention. Therefore, the present invention is not limited to these embodiments, the scope of which is only defined by appended claims.
Number | Date | Country | Kind |
---|---|---|---|
200910003257.1 | Jan 2009 | CN | national |