This application claims priority to provisional application 202241020656, filed Apr. 6, 2022, in India, the entirety of which are incorporated by reference herein.
The present disclosure relates to memory placement in a computing device and more particularly, to configuring computing device memory latency.
Computing devices, such as digital signal processors (DSP), are processors that run algorithms to mathematically manipulate digitized signals, such as, but not limited to, voice, audio, video, sonar, radar, etc. Signal processing is implemented using one or more processors, or microprocessors, that comprise one or more data memory systems. For example, internal memory, cached memory, external memory, among others.
Computing device performance is measured in many ways, but the most common metric is a time required for a processor to accomplish a task, which depends on placement of data in memory to achieve the fastest access time, thus decreasing processing time. Processors have multiple memory types. Lower level memory types, i.e., Level 1, are smaller in size with short access times making them “faster”. Higher level memory types, i.e., Level 2-Level 16, are larger in size with longer access times making them “slower”. A hierarchy from fastest to slowest is Level 1 (L1) as fastest and Level 16 (L16) as slowest. Different systems will have differing number of memory types. Memory latency correlates with a duration of time needed from initiating a request for accessing memory to reading and writing data at the requested memory.
Typically, placement of data in memory requires: 1) calculating how much memory a processing pipeline will require, 2) gaining an understanding of how each type of memory is used and how frequently it is used, 3) placing memory blocks in such a way that the fastest memory is allocated to a part of a process which would benefit the most from faster access, 4) compiling code after completing memory placement, and 5) flashing the files on flash-based memory storage devices.
To optimize processor performance, the code changes described above are iteratively performed. After each code change, the code is compiled and flashed before the effect on the processor can be measured. In practice, multiple code changes are required because there are multiple blocks competing for the lowest latency/fastest memory. A current disadvantage to this approach is that it is a tedious and time-consuming effort and does not offer much flexibility to an engineer. The approach is time complex, time consuming, and typically results in processing resources that are under-utilized.
The inventive subject matter includes one or more embodiments of an interface system for configuring memory placement in a computing device having a plurality of processing modules, the plurality of processing modules each have data to be stored in a plurality of memory locations of the computing device to be accessed and communicated to a system for execution. The interface system comprises a communication protocol for receiving a memory layout from the computing device, a graphical user interface for displaying a configuration for each of the plurality of processing modules to a memory location within the memory layout, a memory placement request for a modification to the configuration, the memory placement request is entered by a user at the graphical user interface and correlates a processing module with a latency level for a memory location in the plurality of memory locations. And a configuration data generated at the graphical user interface, the configuration data represents placement of data to be stored in the plurality of memory locations based on the memory placement request for each processing module and a memory capacity of the computing device.
In one or more embodiments, the graphical user interface includes a plurality of options for the user to select the memory placement request, wherein one or more user selected placement requests are used to generate the configuration data.
In one or more embodiments. a consumption guide is displayed showing consumption levels for the configuration data.
In one or more embodiments an allocator on the computing device allocates a memory layout according to the configuration data.
In one or more embodiments, a results profile is displayed indicating the results of the configuration of the memory layout as modified according to the configuration data.
The inventive subject matter includes one or more embodiments of a method for allocating memory locations for a predetermined number of processing modules, the memory locations are on a computing device having a predefined memory layout. the processing modules have instructions to be stored in memory locations according to a configuration where they may be accessed and carried out by the computing device, the method is carried out by a processor for a graphical user interface in communication with the computing device. The method includes the steps of receiving the predefined memory layout of the computing device at the graphical user interface, displaying the configuration of the predefined memory layout of the computing device at the graphical user interface, receiving user input memory placement requests for allocation of data to memory locations, the user input memory placement requests are input at the graphical user interface, generating, in the processor for the graphical user interface, a configuration data for the allocation of data to memory locations in the predefined memory layout of the computing device, the configuration data is based on the user input memory placement requests for memory allocation and a memory capacity of the computing device, and displaying, at the graphical user interface, a consumption guide showing estimated consumption levels for the allocation of data to memory locations as determined by the configuration data.
In one or more embodiments, the method further comprises the steps of presenting a plurality of options for user selections of requests for memory allocations and generating the configuration data base don the user selections.
In one or more embodiments, the method further comprises sending the configuration data to the computing device where it is allocated to memory locations according to the configuration data.
In one or more embodiments, the method further comprises displaying, at the graphical user interface, a results profile, the results profile shows memory allocations made according to the configuration data.
In one or more embodiments, the method further comprises the step of prioritizing the user input memory placement requests based on estimated consumption levels and the results profile.
Elements and steps in the figures are illustrated for simplicity and clarity and have not necessarily been rendered according to any sequence. For example, steps that may be performed concurrently or in different order are illustrated in the figures to help to improve understanding of embodiments of the present disclosure.
While various aspects of the present disclosure are described with reference to
The CPU 108 has internal memory 110 that is divided into levels, L1 112, L2 114 to Ln 116, (where n depends on processor architecture), that are intended to be used in a manner that minimizes time for memory access. This is referred to as local memory, and as described above, the levels are differentiated by size and speed of access. Memory on the CPU 108 is typically accessed faster than external memory such as program memory 104 and data memory 106. Level L1 112 may be considered the fastest and level Ln 116 may be considered the slowest. Level L2 114 is slower than level L1 112 but faster than level Ln 116. Data is supplied to and from the computing device 102 by way of an Input/Output (I/O) block 118.
Each type of memory, program memory 104, data memory 106, and internal memory levels 112, 114, 116, each has its own fetch time. Depending on where data is placed, or configured, in a memory layout, the CPU 108 can fetch data faster and perform manipulations at a higher rate. In one or more embodiments, a graphical user interface (GUI) 120 receives user input 122 which indicates a configuration for memory placement to accommodate signal flow in the computing device 102. The GUI 120 provides a user with the ability to input modifications to memory placement. From the GUI 120 the user may select specific levels to which the programs and data will be placed in memory to test and preview the effect the selections will have on the computing device 102.
The GUI 120 generates configuration data 124, which is communicated to the computing device 102 via a general communications port 126 and reflects a modification to a configuration for the memory placement, as requested by the user, into a memory layout of the computing device. General communications port 126 is used for sending and receiving all configuration and feedback data between the GUI and the computing device 102. The configuration data 124, once generated, may be sent to the computing device 102 via a communication protocol 123, where it is stored by the computing device 102 into a persistent memory 105 of the program memory 104 and/or data memory 106. There is no need for hard-coding. Compiling and flashing are not needed and an allocator 128 in the computing device 102 configures memory placement according to the configuration data 124.
A framework 107 connects, by way of communication protocol 123, to the GUI 120, to provide information about the memory capacity of the computing device 102. The computing device memory capacity is presented to the user and used by the GUI when generating the configuration data 124. The GUI 120 presents the user with options for selections as user inputs 122. The selections are used to generate the configuration data 124. For example, an object, such as an audio module in a multi-channel audio system, has memory types and sizes. The user selects. by way of the GUI 120, the latency requested to be assigned to the module.
The configuration data 124 is generated at the GUI and communicated back to the computing device 102 where an allocator 128 will determine, based on the computing device memory capacity, whether the computing device may fulfill the requested assignments The allocator 128 will accommodate the configuration data 124 to the best of the computing device's memory capacity.
In the event the allocator 128 is not able to accommodate latency assignments as requested by the user and set out in the configuration data 124, default settings will be applied at the computing device 102. The allocator 128 will provide details on the allocations made and send profiling results to the GUI where it is displayed to the user as a results profile. The results profile shows the user how the configuration data allocated the data into memory locations and how it will affect the computing device's memory capacity and its processing performance. The profiling results and the configuration of the memory layout may be used to allocate memory having the fastest fetch time to the object considered, by the user, to be most critical.
A configuration data is generated 204 at the GUI and contains all such requests for memory placement that a user would like to be “tested” for an indication of how the memory placement requests affect a performance of the computing device. The configuration data is sent 206 to the computing device by way of the communication protocol where it is allocated 208 to the computing device memory by an allocator The allocator allocates the memory based on a memory capacity of the computing device. Whether or not the computing device may fulfill the requested assignments is reported 210 back to the GUI to be displayed to the user. The goal of the user is to optimize the performance of the computing device, so the user may assess the results to determine whether modifications to the memory placement need to be requested.
The configuration data is communicated to the computing device and tested without a need to compile and flash a code change prior to measuring its effect on the computing device. This scenario allows a user, through the GUI and its connection to the computing device, the flexibility to test the computing device performance for optimizing performance efficiently and quickly through memory placement without the need for time consuming hard-coded memory placement.
As an example, the system and method are described as a tuning tool for an audio system.
According to the method 300, the tuning tool connects 302 with the audio system over a communication protocol and, upon connection, the audio system sends a layout of its cores and memory configuration to the graphical user interface (GUI). The GUI displays 304 the configuration and the existing layout of cores in the computing device of the audio system. The audio system has a signal flow associated therewith which becomes viewable at the GUI.
Referring to the method 300 shown in
Referring again to
Referring back to
Performance metrics of a consumption guide 500 is displayed at the GUI as shown in
An allocator on the computing device of the audio system allocates 322 memory placements in accordance with the configuration data and the memory capacity of the DSP and system performance is checked at the computing device. The results are presented at the GUI as a results profile of CPU consumption data (per audio module) and actual memory allocation (which is audio system dependent). The memory placement allocated by the allocator is then visible 324 on the GUI where the user can see a visual representation of the individual memory blocks and their latency.
In the foregoing specification, the present disclosure has been described with reference to specific exemplary embodiments. The specification and figures are illustrative, rather than restrictive, and modifications are intended to be included within the scope of the present disclosure. Accordingly, the scope of the present disclosure should be determined by the claims and their legal equivalents rather than by merely the examples described.
For example, the steps recited in any method or process claims may be executed in any order, may be executed repeatedly, and are not limited to the specific order presented in the claims. Additionally, the components and/or elements recited in any apparatus claims may be assembled or otherwise operationally configured in a variety of permutations and are accordingly not limited to the specific configuration recited in the claims. For example, the latencies of multiple memory blocks may be modified at the same time in an Excel file and imported to the tuning tool at the GUI.
According to the inventive subject matter, memory placement is no longer hardcoded, and DSP engineers can easily and quickly improve MIPS utilization. Instead of modifying code, re-compiling, and re-flashing to modify memory placement, the user may simply modify the memory placement by inputting selections at the GUI to generate a configuration file. The configuration file is sent to the DSP via xTP, resulting in a much more flexible and faster method to optimize processor performance and memory placement.
Any method or process described may be carried out by executing instructions with one or more devices, such as a processor or controller, memory (including non-transitory), sensors, network interfaces, antennas, switches, actuators to name just a few examples.
Benefits, other advantages, and solutions to problems have been described above with regard to particular embodiments; however, any benefit, advantage, solution to problem or any element that may cause any particular benefit, advantage or solution to occur or to become more pronounced are not to be construed as critical, required or essential features or components of any or all the claims.
The terms “comprise”, “comprises”, “comprising”, “having”, “including”, “includes” or any variation thereof, are intended to reference a non-exclusive inclusion, such that a process, method, article, composition or apparatus that comprises a list of elements does not include only those elements recited but may also include other elements not expressly listed or inherent to such process, method, article, composition or apparatus. Other combinations and/or modifications of the above-described structures, arrangements, applications, proportions, elements, materials, or components used in the practice of the present disclosure, in addition to those not specifically recited, may be varied, or otherwise particularly adapted to specific environments, manufacturing specifications, design parameters or other operating requirements without departing from the general principles of the same.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202241020656 | Apr 2022 | IN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/US2023/065457 | 4/6/2023 | WO |