Portable computing devices (“PCDs”) are becoming necessities for people on personal and professional levels. These devices may include cellular telephones, portable digital assistants (“PDAs”), portable game consoles, palmtop computers, and other portable electronic devices. PCDs commonly contain integrated circuits, or systems on a chip (“SoC”), that include numerous components designed to work together to deliver functionality to a user. For example, a SoC may contain any number of processing engines such as modems, central processing units (“CPUs”) made up of cores, graphical processing units (“GPUs”), etc. that read and write data and instructions to and from memory components on the SoC.
The data and instructions are transmitted between the devices via a collection of wires known as a bus. A bus may include two parts in the forms of an address bus and a data bus; the data bus being used to actually transfer data and instructions between the processing engines and the memory components, and the address bus being used to transmit metadata that specifies a physical address within a memory component from/to which data or instructions are read/written.
The efficient use of bus bandwidth and memory capacity in a PCD is important for optimizing the functional capabilities of processing components on the SoC. Commonly, the utilization of memory capacity is optimized by compressing data so that the data requires less space in the memory. Data compression also increases bus bandwidth availability (bytes per memory access) for transactions coming from, or heading to, the memory component; however, compression reduces efficiency in accessing the memory component (bytes per clock cycle) because compressed transactions are relatively smaller transactions. Furthermore, compression requires the use of “padding” or “filler data” to round up transactions to integer multiples of the memory's minimum access length (MAL). Lots of small transactions coupled with the transmission of the filler data when working with compressed data presents an opportunity for improved efficiency in memory utilization.
Therefore, there is a need in the art for a system and method that addresses the inefficiencies associated transactions carrying lossless and lossy compressed data.
Various embodiments of methods and systems for managing compressed data transaction sizes in a system on a chip (“SoC”) in a portable computing device (“PCD”) are disclosed. An exemplary method begins by determining lengths of compressed data tiles associated in a group, wherein the compressed data tiles are comprised within a compressed image file. The group may be defined by, but is not limited to being defined by, sequential tiles that are in a row in an image file, sequential tiles that are in a column in an image file, or tiles that are contiguous in an area in an image file. Based on the determined lengths, the compressed data tiles may be aggregated into a one or more multi-tile transaction(s) that is written to a DRAM memory component. A metadata file may be generated in association with each multi-tile transaction to identify said transaction as a multi-tile transaction and provide offset data to distinguish data associated with the compressed data tiles. Using the metadata, embodiments of the solution may provide for random access and modification of the compressed data stored in association with a multi-tile transaction.
In the drawings, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral encompass all parts having the same reference numeral in all figures.
The word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect described herein as “exemplary” is not necessarily to be construed as exclusive, preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
In this description, reference to “DRAM” or “DDR” memory components will be understood to envision any of a broader class of volatile random access memory (“RAM”) and will not limit the scope of the solutions disclosed herein to a specific type or generation of RAM. That is, it will be understood that various embodiments of the systems and methods provide a solution for managing transactions of data that has been compressed according to lossless and/or lossy compression algorithms and are not necessarily limited in application to compressed data transactions associated with double data rate memory. Moreover, it is envisioned that certain embodiments of the solutions disclosed herein may be applicable to DDR, DDR-2, DDR-3, low power DDR (“LPDDR”) or any subsequent generation of DRAM.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer generally to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution, unless specifically limited to a certain computer-related entity. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In this description, the terms “central processing unit (“CPU”),” “digital signal processor (“DSP”),” “graphical processing unit (“GPU”),” and “chip” are used interchangeably. Moreover, a CPU, DSP, GPU or chip may be comprised of one or more distinct processing components generally referred to herein as “core(s).”
In this description, the terms “engine,” “processing engine,” “processing component” and the like are used to refer to any component within a system on a chip (“SoC”) that transfers data over a bus to or from a memory component. As such, a processing component may refer to, but is not limited to refer to, a CPU, DSP, GPU, modem, controller, etc.
In this description, the term “bus” refers to a collection of wires through which data is transmitted from a processing engine to a memory component or other device located on or off the SoC. It will be understood that a bus consists of two parts—an address bus and a data bus where the data bus transfers actual data and the address bus transfers information specifying location of the data in a memory component (i.e., address and associated metadata). The terms “width” or “bus width” or “bandwidth” refers to an amount of data, i.e. a “chunk size,” that may be transmitted per cycle through a given bus. For example, a 16-byte bus may transmit 16 bytes of data at a time, whereas 32-byte bus may transmit 32 bytes of data per cycle. Moreover, “bus speed” refers to the number of times a chunk of data may be transmitted through a given bus each second. Similarly, a “bus cycle” or “cycle” refers to transmission of one chunk of data through a given bus.
In this description, the term “portable computing device” (“PCD”) is used to describe any device operating on a limited capacity power supply, such as a battery. Although battery operated PCDs have been in use for decades, technological advances in rechargeable batteries coupled with the advent of third generation (“3G”) and fourth generation (“4G”) wireless technology have enabled numerous PCDs with multiple capabilities. Therefore, a PCD may be a cellular telephone, a satellite telephone, a pager, a PDA, a smartphone, a navigation device, a smartbook or reader, a media player, a combination of the aforementioned devices, a laptop computer with a wireless connection, among others.
To make efficient use of bus bandwidth and DRAM capacity, data is often compressed according to lossless or lossy compression algorithms, as would be understood by one of ordinary skill in the art. Because the data is compressed, it takes less space to store and uses less bandwidth to transmit. However, because DRAM typically requires transactions to be an integer multiples of a minimum amount of data to be transacted at a time (a minimum access length, i.e. “MAL”), a transaction of compressed data may require filler data to round-up to an integer multiple of the minimum access length in bytes. Filler data or “padding” is used to “fill” the unused capacity in a transaction that must be accounted for in order to meet the integer multiple of MAL requirements.
For example, consider a 256-byte data tile transferred from a processing engine to a DRAM that has a 64-byte MAL. If transmitted in its uncompressed form, the 256-byte data transfer should ideally take 8 cycles of 32-byte data chunks to complete when transmitted on a bus with a 32-byte bandwidth. When compressed, the same data tile may be compressed on only 93 bytes thus requiring only require 3 cycles to transfer; however, to meet the integer-multiple of MAL requirements, an additional 35 bytes of padding or fill is added to the transaction in order to meet the 64-byte MAL of the DRAM. In this case the transactions containing the compressed tile would become 128 Byte of which 93 Bytes are the compressed data and 35 bytes are padding/fill that do not contain any useful information. The need to include the padding/filler data may result in an effectively reduced capacity on the bus as a portion of the bandwidth used over the transaction went to transmitting the filler data instead of the compressed data.
Advantageously, embodiments of multi-tile transaction solutions combat the degradation in effective bus capacity by aggregating multiple compressed tiles into a single transaction, thereby increasing the size of each transaction, reducing the overall number of transactions in the system and reducing the ratio of padding bytes to useful compressed data in each transaction. A more detailed explanation of exemplary embodiments of multi-tile transaction solutions will be described below with reference to the figures.
Turning to
Returning to the
In the uncompressed image frame, each tile may be of a size K, whereas in the compressed image frame each tile may be of a size K or less (K for no compression possible, K-1 bytes, K-2 bytes, K-3 bytes, . . . , K=1 byte). In the illustration, the various tiles that form the compressed image frame are represented by differing levels of shading depending on the extent of compression that resulted from the compression block having applied its compression algorithm to the data held by the given tile. Notably, the compression block 113 creates a companion buffer for a compressed image frame metadata, as would be understood by one of ordinary skill in the art. The compressed image frame metadata contains a record of the size, type and attributes for each compressed tile in the compressed image frame. Because DRAM access may be limited to units of the minimum access length MAL the size of a given compressed tile may be represented in the metadata as the number of ABSs required to represent the compressed tile size (e.g., 1 MAL, 2 MAL, . . . N MAL). This size description in the meta-data allows a future reader of the buffer to ask the memory for only the minimum required amount of data needed to decompress each tile back to the original size K.
Looking to the exemplary four sequential tiles in their uncompressed states, each tile (#1,0; #2, 0; #3,0; #4,0) is of a 256 byte length (other lengths are envisioned). When compressed, the exemplary four sequential tiles have lengths of 112 bytes, 56 bytes, 33 bytes and 177 bytes, respectively. Assuming the MAL is 64 bytes, the transaction lengths for each of the exemplary four sequential tiles, respectively, may be 128 bytes (112 bytes compressed data plus 16 bytes padding), 64 bytes (56 bytes compressed data plus 8 bytes padding), 64 bytes (33 bytes compressed data plus 31 bytes padding) and 192 bytes (177 bytes compressed data plus 15 bytes padding). Notably, to transact all four of the exemplary sequential tiles, methods known in the art make four transactions—one for each compressed tile.
Turning to
Notably, although exemplary MTT embodiments described herein make use of sequential tiles in a row, MTT embodiments are not limited to aggregating sequential tiles in a row of an image frame. Rather, it is envisioned that MTT embodiments may aggregate tiles according to any logical grouping such as, but not limited to, sequential tiles in a row, sequential tiles in a column, contiguous tiles in an area, etc.
Returning to the
It is envisioned that certain MTT embodiments may look to combine more than two tiles into a single transaction. For example,
The metadata for CT1 and CT2 in
Any entity that wants to read the compressed tiles 621(1) or 621(2) may first read the metadata 625(1) to ascertain the header field type. For a “Normal” header field, such as shown in
Advantageously, the producer of the data in
Once an MTT is written into the DRAM as shown in
Random writes are also possible for the compressed tiles that are already written to the DRAM as a MTT shown in
Random writes may also be possible for a new or modified CT1 tiles that are already written to the DRAM as a MTT shown in
If the size of the new CT1 is larger than the original CT1631(1), then it cannot be written into the same location as the original CT1631(1) because it would overwrite the existing CT2 data 631(2). In this the data and metadata layout showing in
Notably, 646(3) field may be used by a reader to locate starting byte of the CT2 within the footprint of the uncompressed Tile #1 in DRAM. The third field 646(1) in the meta-data is the size of CT2 in MALs which allows a reader that is accessing CT2 to issue correct size for the read instruction to the DRAM. The fourth field of the metadata 646(2) is the size of the modified CT1 (MCT1) so that a reader that needs to access MCT1 may issue a read transaction that reads a number of MALs equal to 646(2) starting at the start of the footprint of the uncompressed Tile #2 in DRAM.
The above algorithm/method demonstrates combining of up to two compressed tiles into a multi-tile transaction. This algorithm may be easily expanded to increase the maximum number of compressed tiles in one transaction to beyond two (three, four . . . etc) This may require a commensurate increase in the size of metadata for such systems.
At decision block 810, an MTT embodiment may determine if tiles in the group may be combined into a single transaction (determination based on sums of the tile lengths). If no combination of the tiles is possible, or desirable, the “NO” branch is followed and the method 800 returns. If combination of tiles in the group into a single transaction is possible, and desirable, the “YES” branch is followed to block 815 and multi-tile transaction is generated to include compressed tile data from multiple tiles in the group. Metadata associated with the MTT transaction is generated so that each compressed tile in the MTT transaction may be randomly accessed and updated.
At decision block 820, the method 800 may recognize whether a compressed tile previously stored in a memory space associated with a MTT transaction requires updating. If not, then the “NO” branch is followed and the method 800 returns. If updating of compressed data previously aggregated into a MTT transaction is required, then the “YES” branch is followed to decision block 825.
At decision block 825, if the modified compressed file is equal in size to, or smaller than, the original compressed file, then the “NO” branch is followed to block 830 and the modified data is written over the original compressed data. If, however, the modified compressed file is larger in size than the original compressed file, then the “YES” branch is followed to block 835.
At block 835, the modified data may be written into an unused memory location associated with a second transaction that was previously not needed as a result of the multi-tile aggregation. The method 800 continues to block 840 and the metadata associated with the MTT transaction is updated to reflect that the original data of the compressed tile stored per the first transaction is invalid and that the “fresh” modified data for the compressed tile may be found in the memory space associated with the previously unused, second transaction. The method 800 returns.
In general, multi-tile transaction (“MTT”) aggregator module 101 may be formed from hardware and/or firmware and may be responsible for combining multiple compressed tiles of an image frame into a single transaction. It is envisioned that write bursts to a DRAM memory 115 (generally labeled 112 in the
As illustrated in
As further illustrated in
The CPU 110 may also be coupled to one or more internal, on-chip thermal sensors 157A as well as one or more external, off-chip thermal sensors 157B. The on-chip thermal sensors 157A may comprise one or more proportional to absolute temperature (“PTAT”) temperature sensors that are based on vertical PNP structure and are usually dedicated to complementary metal oxide semiconductor (“CMOS”) very large-scale integration (“VLSI”) circuits. The off-chip thermal sensors 157B may comprise one or more thermistors. The thermal sensors 157 may produce a voltage drop that is converted to digital signals with an analog-to-digital converter (“ADC”) controller (not shown). However, other types of thermal sensors 157 may be employed.
The touch screen display 132, the video port 138, the USB port 142, the camera 148, the first stereo speaker 154, the second stereo speaker 156, the microphone 160, the FM antenna 164, the stereo headphones 166, the RF switch 170, the RF antenna 172, the keypad 174, the mono headset 176, the vibrator 178, thermal sensors 157B, the PMIC 180 and the power supply 188 are external to the on-chip system 102. It will be understood, however, that one or more of these devices depicted as external to the on-chip system 102 in the exemplary embodiment of a PCD 100 in
In a particular aspect, one or more of the method steps described herein may be implemented by executable instructions and parameters stored in the memory 112 or as form the MTT aggregator module 101 and/or the image CODEC module 113. Further, the MTT aggregator module 101, the image CODEC module 113, the memory 112, the instructions stored therein, or a combination thereof may serve as a means for performing one or more of the method steps described herein.
As the processing engines 201 generate data transfers for transmission via bus 211 to cache memory 116 and/or DRAM memory 115, the image CODEC module 113 may compress tile-sized units of an image frame, as would be understood by one of ordinary skill in the art. Subsequently, the MTT aggregator module 101 may seek to combine two or more of the compressed tiles into a single transaction and work with the memory controller 114 to make more efficient use of DRAM 115 capacity and bus 211 bandwidth. Using the metadata associated with a MTT transaction, the memory controller 114 may service access requests and/or write requests from the processing engines 201 for data stored in DRAM 115 according to an MTT transaction.
The CPU 110 may receive commands from the MTT aggregator module(s) 101 that may comprise software and/or hardware. If embodied as software, the module(s) 101 comprise instructions that are executed by the CPU 110 that issues commands to other application programs being executed by the CPU 110 and other processors.
The first core 222, the second core 224 through to the Nth core 230 of the CPU 110 may be integrated on a single integrated circuit die, or they may be integrated or coupled on separate dies in a multiple-circuit package. Designers may couple the first core 222, the second core 224 through to the Nth core 230 via one or more shared caches and they may implement message or instruction passing via network topologies such as bus, ring, mesh and crossbar topologies.
Bus 211 may include multiple communication paths via one or more wired or wireless connections, as is known in the art and described above in the definitions. The bus 211 may have additional elements, which are omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications. Further, the bus 211 may include address, control, and/or data connections to enable appropriate communications among the aforementioned components.
When the logic used by the PCD 100 is implemented in software, as is shown in
In the context of this document, a computer-readable medium is an electronic, magnetic, optical, or other physical device or means that may contain or store a computer program and data for use by or in connection with a computer-related system or method. The various logic elements and data stores may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random-access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). Note that the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, for instance via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In an alternative embodiment, where one or more of the startup logic 250, management logic 260 and perhaps the MTT interface logic 270 are implemented in hardware, the various logic may be implemented with any or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
The memory 112 is a non-volatile data storage device such as a flash memory or a solid-state memory device. Although depicted as a single device, the memory 112 may be a distributed memory device with separate data stores coupled to the digital signal processor 110 (or additional processor cores).
The startup logic 250 includes one or more executable instructions for selectively identifying, loading, and executing a select program for aggregating and managing multi-tile transactions. The startup logic 250 may identify, load and execute a select MTT program. An exemplary select program may be found in the program store 296 of the embedded file system 290. The exemplary select program, when executed by one or more of the core processors in the CPU 110 may operate in accordance with one or more signals provided by the MTT aggregator module 101 to create and manage multi-tile transactions.
The management logic 260 includes one or more executable instructions for terminating an MTT program on one or more of the respective processor cores, as well as selectively identifying, loading, and executing a more suitable replacement program. The management logic 260 is arranged to perform these functions at run time or while the PCD 100 is powered and in use by an operator of the device. A replacement program may be found in the program store 296 of the embedded file system 290.
The interface logic 270 includes one or more executable instructions for presenting, managing and interacting with external inputs to observe, configure, or otherwise update information stored in the embedded file system 290. In one embodiment, the interface logic 270 may operate in conjunction with manufacturer inputs received via the USB port 142. These inputs may include one or more programs to be deleted from or added to the program store 296. Alternatively, the inputs may include edits or changes to one or more of the programs in the program store 296. Moreover, the inputs may identify one or more changes to, or entire replacements of one or both of the startup logic 250 and the management logic 260. By way of example, the inputs may include a change to the number of compressed tiles that may be combined into a single MTT transaction and/or to the definition of a group of tiles eligible for aggregation into a single MTT transaction.
The interface logic 270 enables a manufacturer to controllably configure and adjust an end user's experience under defined operating conditions on the PCD 100. When the memory 112 is a flash memory, one or more of the startup logic 250, the management logic 260, the interface logic 270, the application programs in the application store 280 or information in the embedded file system 290 may be edited, replaced, or otherwise modified. In some embodiments, the interface logic 270 may permit an end user or operator of the PCD 100 to search, locate, modify or replace the startup logic 250, the management logic 260, applications in the application store 280 and information in the embedded file system 290. The operator may use the resulting interface to make changes that will be implemented upon the next startup of the PCD 100. Alternatively, the operator may use the resulting interface to make changes that are implemented during run time.
The embedded file system 290 includes a hierarchically arranged memory management store 292. In this regard, the file system 290 may include a reserved section of its total file system capacity for the storage of information for the configuration and management of the various MTT algorithms used by the PCD 100.
Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example. Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the drawings, which may illustrate various process flows.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.
This application claims priority under 35 U.S.C §119(e) to U.S. Provisional Patent Application Ser. No. 62/293,646, filed on Feb. 10, 2016, entitled, “SYSTEM AND METHOD FOR MULTI-TILE DATA TRANSACTIONS IN A SYSTEM ON A CHIP,” and to U.S. Provisional Patent Application Ser. No. 62/495,577, filed on Jun. 23, 2016, entitled, “SYSTEM AND METHOD FOR MULTI-TILE DATA TRANSACTIONS IN A SYSTEM ON A CHIP.” The entire contents of these applications are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62293646 | Feb 2016 | US | |
62495577 | Jun 2016 | US |