The present disclosure relates generally to data storage, and more particularly, to a system and method for data warehouse and fine granularity scheduling for a System on Chip.
System on Chip (SoC) bulk memory (e.g., Level 3 (L3) RAM) and off-chip memory (e.g., double data rate (DDR) memory) found in most wireless communication devices is often used very inefficiently, with much of the memory sitting idle with old data that will not be reused, or storing data that is double- or triple-buffered to simplify processing access to tables and arrays. This can lead to significant waste of power and chip physical area. Some existing technologies employ a simple global memory map of all available bulk memory and software organization of data, such as static mapping. Hand optimization of memory usage via “overlays” of data is employed in some real time embedded systems; however, such techniques are difficult and time consuming to create, and have poor code reuse properties. Some Big Data servers employ various memory management techniques in file servers; however, these techniques are usually complicated and have large overhead requirements that make the techniques not suitable for SoC.
According to one embodiment, there is provided a data warehouse. The data warehouse includes a memory and a controller disposed on a substrate that is associated with a System on Chip (SoC). The controller is operatively coupled to the memory. The controller is configured to receive data from a first intellectual property (IP) block executing on the SoC; store the data in the memory; and in response to a trigger condition, output at least a portion of the stored data to the SoC for use by a second IP block. An organization scheme for the stored data in the memory is abstracted with respect to the first and second IP blocks.
According to another embodiment, there is provided a method. The method includes receiving, by a controller of a data warehouse, data from a first IP block executing on a SoC, the controller disposed on a substrate, the substrate different than the SoC. The method also includes storing, by the controller, the data in a memory disposed on the substrate, the memory operatively coupled to the controller. The method further includes, in response to a trigger condition, outputting, by the controller, at least a portion of the stored data to the SoC for use by a second IP block. An organization scheme for the stored data in the memory is abstracted with respect to the first and second IP blocks.
For a more complete understanding of the present disclosure, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, wherein like numbers designate like objects, and in which:
To facilitate understanding of this disclosure, it may be helpful to distinguish between ‘memory’ and ‘storage’, as the terms are used herein. Memory is a physical construct that has no associated semantics. That is, memory has no awareness of what data is stored in it. Memory may be used by multiple different software applications for storage of data. In contrast, storage is associated with indicators, pointers, labels, and the like, that provide context for the storage, including relationships between memory addresses, etc.
In current systems that utilize System on Chip (SoC) technology, data that is not going to be used for a while may not be stored on-chip, but instead may be stored off-chip in long term DDR memory. However, some systems are beginning to encounter significant challenges in terms of DDR memory access. There are at least two factors driving this. A first factor is the physical analog interface from the SoC chip to the DDR memory. Although SoC chips continue to improve according to Moore's law, there has not been a similar improvement to the analog interface to the DDR memory. Thus, the interface is becoming more and more of a bottleneck. A second factor is that, in some systems, many masters on the SoC drive access to the DDR memory (which acts as a slave component to the different masters). That is, there may be a large number of different IP (intellectual property) blocks with software components (e.g., software applications), hardware components, or both, that need to store data in the DDR memory or retrieve data from it. In many systems, these IP blocks do not work together to have a coordinated access scheme. Each IP block may carve out oversized sections of the DDR memory, which leads to unused or inefficiently used memory. Also, the pattern in which the IP blocks access memory is uncoordinated and may lead to bursts of heavy data access and periods of no access. This is an inefficient use of the limited DDR access bandwidth.
The present disclosure describes many technical advantages over conventional memory management techniques. For example, one technical advantage is memory management and processing that is performed close to the DDR memory itself. Another technical advantage is simplified digital signal processor (DSP) access to the DDR memory. Another technical advantage is efficient bulk storage that includes lower overhead in the memory access. Another technical advantage is better code reusability at the software application level, due to the local management of data at the DDR memory. And another technical advantage is the ability of simple hardware accelerators (HACs) to access complex data structures stored in the DDR memory.
In this example, the communication system 100 includes user equipment (UE) 110a-110c, radio access networks (RANs) 120a-120b, a core network 130, a public switched telephone network (PSTN) 140, the Internet 150, and other networks 160. While certain numbers of these components or elements are shown in
The UEs 110a-110c are configured to operate and/or communicate in the system 100. For example, the UEs 110a-110c are configured to transmit and/or receive wireless signals or wired signals. Each UE 110a-110c represents any suitable end user device and may include such devices (or may be referred to) as a user equipment/device (UE), wireless transmit/receive unit (WTRU), mobile station, fixed or mobile subscriber unit, pager, cellular telephone, personal digital assistant (PDA), smartphone, laptop, computer, touchpad, wireless sensor, or consumer electronics device.
The RANs 120a-120b here include base stations 170a-170b, respectively. Each base station 170a-170b is configured to wirelessly interface with one or more of the UEs 110a-110c to enable access to the core network 130, the PSTN 140, the Internet 150, and/or the other networks 160. For example, the base stations 170a-170b may include (or be) one or more of several well-known devices, such as a base transceiver station (BTS), a Node-B (NodeB), an evolved NodeB (eNodeB), a Home NodeB, a Home eNodeB, a site controller, an access point (AP), or a wireless router, or a server, router, switch, or other processing entity with a wired or wireless network.
In the embodiment shown in
The base stations 170a-170b communicate with one or more of the UEs 110a-110c over one or more air interfaces 190 using wireless communication links. The air interfaces 190 may utilize any suitable radio access technology.
It is contemplated that the system 100 may use multiple channel access functionality, including such schemes as described above. In particular embodiments, the base stations and UEs may implement LTE, LTE-A, and/or LTE-B. Of course, other multiple access schemes and wireless protocols may be utilized.
The RANs 120a-120b are in communication with the core network 130 to provide the UEs 110a-110c with voice, data, application, Voice over Internet Protocol (VoIP), or other services. Understandably, the RANs 120a-120b and/or the core network 130 may be in direct or indirect communication with one or more other RANs (not shown). The core network 130 may also serve as a gateway access for other networks (such as PSTN 140, Internet 150, and other networks 160). In addition, some or all of the UEs 110a-110c may include functionality for communicating with different wireless networks over different wireless links using different wireless technologies and/or protocols.
Although
As shown in
The UE 110 also includes at least one transceiver 202. The transceiver 202 is configured to modulate data or other content for transmission by at least one antenna 204. The transceiver 202 is also configured to demodulate data or other content received by the at least one antenna 204. Each transceiver 202 includes any suitable structure for generating signals for wireless transmission and/or processing signals received wirelessly. Each antenna 204 includes any suitable structure for transmitting and/or receiving wireless signals. One or multiple transceivers 202 could be used in the UE 110, and one or multiple antennas 204 could be used in the UE 110. Although shown as a single functional unit, a transceiver 202 could also be implemented using at least one transmitter and at least one separate receiver.
The UE 110 further includes one or more input/output devices 206. The input/output devices 206 facilitate interaction with a user. Each input/output device 206 includes any suitable structure for providing information to or receiving information from a user, such as a speaker, microphone, keypad, keyboard, display, or touch screen.
In addition, the UE 110 includes at least one memory 208. The memory 208 stores instructions and data used, generated, or collected by the UE 110. For example, the memory 208 could store software or firmware instructions executed by the processing unit(s) 200 and data used by the processing unit(s) 200. Each memory 208 includes any suitable volatile and/or non-volatile storage and retrieval device(s). Any suitable type of memory may be used, such as random access memory (RAM), read only memory (ROM), hard disk, optical disc, subscriber identity module (SIM) card, memory stick, secure digital (SD) memory card, and the like. In accordance with the embodiments described herein, the memory 208 may comprise DDR memory, L3 memory, any other suitable memory, or a combination of two or more of these. Together, the memory 208 and at least one processing unit 200 could be implemented as a data warehouse, as described in greater detail below. The memory 208 and the at least one processing unit 200 associated with the data warehouse may be disposed in close proximity on a substrate, such as a chip. In particular embodiments, the memory 208 and the at least one processing unit 200 associated with the data warehouse may be part of the SoC.
As shown in
Each transmitter 252 includes any suitable structure for generating signals for wireless transmission to one or more UEs or other devices. Each receiver 254 includes any suitable structure for processing signals received wirelessly from one or more UEs or other devices. Although shown as separate components, at least one transmitter 252 and at least one receiver 254 could be combined into a transceiver. Each antenna 256 includes any suitable structure for transmitting and/or receiving wireless signals. While a common antenna 256 is shown here as being coupled to both the transmitter 252 and the receiver 254, one or more antennas 256 could be coupled to the transmitter(s) 252, and one or more separate antennas 256 could be coupled to the receiver(s) 254. Each memory 258 includes any suitable volatile and/or non-volatile storage and retrieval device(s). In accordance with the embodiments described herein, each memory 258 may comprise DDR memory, L3 memory, bulk on-chip memory, any other suitable memory, or a combination of two or more of these. Together, the memory 258 and at least one processing unit 250 could be implemented as a data warehouse, as described in greater detail below. The memory 258 and the at least one processing unit 250 associated with the data warehouse may be disposed in close proximity on a substrate, such as a chip. In particular embodiments, the memory 258 and the at least one processing unit 250 associated with the data warehouse may be part of the SoC.
Additional details regarding the UEs 110 and the base stations 170 are known to those of skill in the art. As such, these details are omitted here. It should be appreciated that the devices illustrated in
As shown in
Data in DDR memory (e.g., data arrays, buffers, tables, etc.) is generally moved around the system in bulk, moving from processing to storage and back. There are common aspects for some of the DDR data movements when considered from the point of view of the physical (PHY) layer. Typically, the data will not be changed, i.e., the data will be read out of memory the same as it is written into memory. The total amount of data to be stored is large, and the stored data has a same or similar type or data structure (e.g., usually the data is separated by “user” or UE). Typically, there are few or no real-time requirements; every time the memory is accessed, only a small part of the data is visited, either by request (i.e., event driven) or periodically. If the data is fetched by request, typically it is known in advance when the data is needed (e.g., through MAC/RRC, it is known which user's data will be needed for the next one or several subframes).
From the description above, it can be seen that there are similarities between DDR memory access and how a commercial or industrial warehouse operates. The “goods” (i.e., data) are shipped from many sources to a “warehouse” (i.e., DDR memory) for storage and are packed in “boxes” (i.e., data blocks or data records). The warehouse may have multiple “floors” (i.e., subframes) or one floor. In each floor, the boxes are organized in different “rows” (i.e., users/UEs, or certain processes of a user/UE). Whenever the goods are packed in the boxes and put in the warehouse, the locations of the boxes are tracked in a warehouse inventory log (e.g., a register).
In a warehouse, it is generally known when the boxes will be moved or sent to their destination. However, the sender and receiver generally do not know the exact location of their box in the warehouse. They may have a tracking label that is used to store and find the box by a warehouse management system.
Likewise, data in DDR memory can be stored and tracked using a “data warehouse” correlation. Data blocks (e.g., data arrays, buffers, tables, etc., which are analogous to warehouse “boxes”) come in different sizes and are stored as a unit in the memory (“warehouse”) by the data warehouse management system. Each data block is given a tracking number for retrieval purposes. The time to move (i.e., output) data in DDR memory is predictable, e.g., either by request from an IP block, periodically according to a schedule, or in response to another trigger condition, such as back pressure, or a lack of space in the memory to store new received data. With the help of the register, the data can be pre-arranged and output in advance. For example, the data from different users can be packed, arranged, or assembled beforehand, and the pre-arranged data can be delivered to the “consumer” (e.g., an IP block or software application that uses the data) in advance or at a designated time. Since the required data is pre-arranged in the DDR memory module and there are few interactions between the DDR memory module and other cluster nodes, the efficiencies are higher, both from the perspective of cluster node scheduling and of transmission (e.g., the data is transmitted in burst). Thus, a “data warehouse” parallel can be used for DDR memory access.
In accordance with the above description, embodiments of this disclosure provide systems and methods to abstract the storage and retrieval of data (“data warehousing”). The disclosed embodiments also allow large blocks to be automatically split up during transport to minimize double buffering overhead (“fine granularity scheduling”). By using the “data warehouse” concept, the digital signal processor (DSP) is less involved in the data movements from the DDR to the DSP cluster. Furthermore, the data is managed locally in the DDR interface instead of at the DSP. Data is already “pushed” to the DSP cluster memory when the DSP cluster is ready to process the data.
Certain embodiments can include hardware that is physically connected close to the DDR, and so the access latency to the DDR is small. The embodiments provide a single, centralized point of organization and management that all data masters can go through to access data in the DDR. In certain embodiments, the system scheduler (e.g., the MAC scheduler) may know when data needs to be moved in advance and can retrieve the data into a “holding area” of memory close to the DDR interface for rapid retrieval.
The disclosed embodiments are described with respect to at least two components: a controller and the DDR memory. The controller interfaces with the SoC architecture. In particular, the controller and the DDR memory may be disposed on the same substrate, which may include the SoC. In other embodiments, the controller and the DDR memory may be disposed on a substrate (e.g., a chip) that is separate from the SoC. The controller performs operations such as sending and receiving FLITs (flow control digits), segmenting packets into FLITs, and preparing a header for each FLIT. The controller is also responsible for operations such as generating and terminating back pressure credit messages, and user management functions.
Although the disclosed embodiments are described primarily with respect to a LTE system, it will be understood that these embodiments can also be applied to a UMTS (Universal Mobile Telecommunications System) system. Likewise, while the disclosed embodiments are described primarily with respect to SoC architecture, the systems and methods of the disclosed embodiments are also applicable for other architectures.
Before the description of how data is managed, it is helpful to first describe how data can be stored in a data warehouse in DDR memory.
In
Since UL HARQ is synchronous, the redundancy version (RV) data can be packed in advance and “pushed out” in a synchronized fashion. There are two options for using the retransmission data: (1) keep all the RV data, or (2) only keep the combined data. For Option 1, the data from all the redundancy versions will be used for HARQ combination and decoding (e.g., incremental redundancy, or ‘IR’). That is, every time that the HARQ data is needed from the DDR memory, all of the RV data stored will be output. Different methods for storing all RV data for Option 1 are shown in
Option 1: Keep All RV Data
In the data storage scheme 400b in
The data storage schemes 400a-400b have the same or similar memory requirements. Considered from the point of view of timing, the data storage scheme 400a is very straightforward. However, the data storage scheme 400b may have a smaller user list and time table, and thus be easier to manage. In some embodiments, if all RV data is kept, the storage scheme used by the data storage scheme 400b may be advantageous for the HARQ DDR storage.
Option 2: Keep Only Combined Data
It may also be possible that only the combined RV data is stored (for example, in chase combining or IR). In
As described above with respect to
In one aspect of operation, the data warehouse first determines the size of the data to be stored. Based on the data size, the data warehouse can determine how many buffers are needed for each UE. For example, in one embodiment, 128 bytes are chosen for the buffer size. Of course, in other embodiments, the buffer size can be larger or smaller, depending on system configuration. It is assumed that 100 bytes are to be stored for UE0, 200 bytes are to be stored for UE1, and 1200 bytes are to be stored for UE2. Based on a 128-byte buffer size, the stored data will use 1, 2, and 10 buffers, respectively. Thus, the data warehouse allocates one buffer (buffer 0) to UE0; two buffers (buffers 1 and 2) to UE1, and ten buffers (buffers 3 through 12) for UE2. Based on the allocated buffers, the data warehouse will create the user table 501 or the user table 502. The user table 501 includes the number of allocated buffers (i.e., 1, 2, or 10) for each UE. In contrast, the user table 502 includes the buffer number of the starting buffer (i.e., 0, 1, or 3) for each UE. Each user table 501-502 also includes the word count for each user.
The data and the user table 501-502 can be stored for eight subframes. At the seventh subframe (or the beginning of the eighth subframe), the data warehouse can send out the data for the first subframe. Based on the user table of the first subframe, the data warehouse can pre-arrange the data for the first subframe and send out the data to the DSP cluster. Once the data is sent out, the user table 501-502 and the data in the DDR memory will not be used anymore. After the DSP cluster processes the HARQ data and writes the new HARQ data to the DDR memory, the data warehouse can overwrite the old data, and create a new user table for the current subframe.
In some systems, the data source 601 and destination 602 use data in different quantities. For example, the data source 601 may create the data 605 for the destination 602 in 1000-kilobyte blocks. However, the destination 602 may consume and process the data 605 in smaller-sized blocks (e.g., tens of kilobytes). Thus, the data warehouse 600 can receive and store the large blocks of data 605 from the data source 601, and then provide the data 605 in smaller blocks to the destination 602. In particular, the data source 601 may send the data 605 to the data warehouse 600 as complete “boxes” including 1000 KB of data 605. The data warehouse 600 sets up each box for fine granularity scheduling during storage. Later, upon receipt of a request for data 605 for the destination 602, the data warehouse 600 divides or separates a 1000 KB box of data into smaller boxes (e.g., tens of kilobytes), and sends one or more of the smaller boxes to the destination 602.
Thus, the data warehouse 600 abstracts the source 601 and destination 602 with respect to each other, and provides a data “interface” between the source 601 and destination 602, which may not be compatible for communication directly with each other. This can reduce buffering in the DSP cluster and the HAC dramatically.
In
In
In
As shown in
The data warehouse controller 801 manages the input and output of data stored in the DDR memory 806. To optimize the processing, the data warehouse controller 801 programs the DMA 803 to accelerate the movement of data to and from the DDR memory 806. The data warehouse controller 801 can include one or more tables or lists that link boxes of data by users, subframe, or any other logical entity. Data is physically stored in the DDR memory 806 using one or more dynamic buffer management algorithms.
The buffer management unit 805, under control of the data warehouse controller 801, allocates and frees data buffers in the DDR memory 806 so that the memory can be used and reused as required. The cluster interconnect 807 is an interconnect to the remaining portions of the DSP or HAC cluster or the SoC. The cluster interconnect interface module 802 provides a connection between the data warehouse 800 and the DDR memory 806, and provides a connection between the data warehouse 800 and the cluster interconnect 807.
In some embodiments, some or all of the functions or processes of the one or more of the devices are implemented or supported by a computer program that is formed from computer readable program code and that is embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory.
It may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrases “associated with” and “associated therewith,” as well as derivatives thereof, mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like.
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.