This application claims benefit of Chinese patent application number 200810211887.3, filed Sep. 18, 2008, which is herein incorporated by reference.
The present invention relates to traffic generator. More particularly, the present invention relates to traffic generator for testing the performance of a graphic processing unit.
A graphics processing unit (GPU) is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. Generally, a GPU can sit on top of a video card, or it can be integrated directly into the motherboard.
When testing the performance of a GPU, a traffic generator and a traffic monitor are arranged. The traffic generator produces data to be processed by the GPU, and then the traffic monitor observes the traffic, so as to evaluate the performances of the GPU. Since the modern GPU is required to processing image data of different formats, the test for GPU becomes more complex.
In the technical field of high performance GPU, a traffic generator is in great demand for simulating multiple engines (“clients”) which send a series of requests for reading and writing. Therefore, it is necessary to test the efficiency of memory system of the GPU under multiple clients to see whether the design can meet the performance requirement. For example, the engines in the HD Video Decode flows include: SEC, VLD, MSPDEC, MSPPP, Display, and Graphics. However, at the very beginning of the design phase, it is hard to have so many real clients be implemented. As a result, a traffic generator capable of emulating plural of different engines is required.
The present invention provides a general traffic generator capable of emulating plural of changeable engines to test the performance of a graphic processing unit. The present invention also provides a simpler method for emulating plural changeable engines with a single device to test the performance of a graphic processing unit.
According to an embodiment of the present invention, the traffic generator for testing the performance of a graphic processing unit comprises: at least one simulated engine module for generating at least one read stream and/or at least one write stream, and an output arbiter for selecting a stream to be output from a group comprising the at least one read stream and/or the at least one write stream; wherein the selected stream is arranged to be output to the memory system of the graphic processing unit.
According to another embodiment of the present invention, the method for testing the performance of a graphic processing unit comprises: setting a configuration of at least one simulated engine module and an output arbiter; generating at least one read stream and/or at least one write stream by the at least one simulated engine module; selecting a stream to be output from a group comprising the at least one read stream and/or the at least one write stream by the output arbiter; outputting the selected stream to the memory system of the graphic processing unit.
The traffic generator and method for testing the performance of a graphic processing unit of the present invention is capable of simulating traffics of many changeable clients without creating these clients actually one by one. By modifying the configurations controlled by the configuration module, the traffic generator of the present invention becomes a more flexible instrument for testing the performance of graphic processing units under different environments.
To make the aforementioned and other objects, features, and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Referring to
According to the preferred embodiment of the present invention, the configuration module 12 is capable of determining the characteristic of the traffic generator, such as the number and type of the engines simulated. That is to say, the number of the simulated engine modules is not limited to three in the present invention.
Furthermore, the configuration module 12 is capable of defining the characteristics of each generated stream, such as throughput and access pattern. As a result, the engines simulated by the traffic generator may have different behaviors. For example, the configuration module 12 may define the address and size of each read or write request. If the start address 0x1000 is determined, the configuration module 12 may further define the access patterns, such as sequential or random. As to sequential pattern, the address is increased with equal intervals. For example, if the request size is 32B, the sequential addresses to be accessed should be 0x1000, 0x1020, 0x1040, 0x1060 . . . . The sequential pattern can be used to simulate display traffic with pitch surface. For random pattern, each address is generated randomly, with the scope of each surface, e.g., 0x1300, 0x2200, 9x1800 . . . . The random pattern can be used to simulate motion compensation stream in MSPDEC engine. For some other stream, there can be many other complex access patterns. Like in video engines, we have one access pattern called “semi sequential.”
As illustrated in
Besides access patterns, the configuration module 12 is capable of defining the throughput of each stream, which would be determined when to send the request. Take display client for example, for worst case, each line will have 2048 pixels, each pixel is in 4 byte, and the monitor should scan one line every 7.28 μsecs. So we get the throughput:
If we want to test whether high throughput traffic will stress out our graphic processing unit, the throughput would be increased. Please note that since each client will be composed of several read or write streams, each stream may have different access pattern and throughput parameters in the configuration module 12.
According to a preferred embodiment of the present invention, the configuration module comprises a knobfile for recording the above-mentioned characteristics and parameters of the data stream. When the designer of the graphic processing unit would like to test the graphic processing unit, the designer can simulate different kinds of plural engines with the traffic generator by editing the knobfile, so as to test the graphic processing unit under a predetermined environment. If the designer would like to test the graphic processing unit under another environment (with different clients), the knobfile is modified.
A knobfile is used for simulating a copy engine, which is a client copying data from source surface to destination surface, as an example. The knobfile contains the following contents for a read stream:
In the above content described in the knobfile, the first two lines define the read stream number and read stream name, the next five lines define the start address, surface size and surface type, and the next five lines define the burst size, throughput and access pattern. In the same manner, the write stream for the copy engine can be define as follows:
After reading above content described in the knobfile, the configuration module 12 enable the traffic generator 100 to act as a copy engine. In the preferred embodiment of the present invention, the knobfile is an external configuration file. Therefore, the user can easily modify the content of the knobfile, so as to simulate different engines with the traffic generator. In summary, to create different engines with a traffic generator, a user must define how many engines and how many streams the traffic generator has and what characteristics each steam is. Such definition of the traffic generator may be obtained by analyzing the behaviors of clients or the results from previous generation chips. Therefore, the traffic generator cannot only simulate the clients already have, but those under implementing. When the user would like to create a new client, just add relative content into the knobfile which describes the stream characteristics of such client.
Given the above, the advantage of the present invention is to simulate traffics of many clients without creating these clients actually one by one. By editing the knobfile or configurations stored in the configuration module, the traffic generator of the present invention can simulate different engines, and thus becomes a more flexible instrument for testing the performance of graphic processing units.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention, provided that they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
200810211887.3 | Sep 2008 | CN | national |