1. Field
The present disclosure relates generally to power savings in multi-core computing systems, and more specifically to synchronizing operation of multiple cores in a multi-core system.
2. Background
With the advent of multiple processors or multiple cores on a single chip (also known as SOCs), processing tasks have been distributed to various processors or cores that specialize in a given function. For instance, some smartphones now comprise a core for OS activities including audio decoding, a core for video decoding, a core for rendering and composing graphical frames, a core for composing frames, another core for handling WiFi data, and yet another core for telephony. While some cores on multi-core processors operate in parallel or at least partially in parallel, many of them operate sequentially—each only operating once data is received from a preceding core in a sequential chain of cores. When all cores in a system or a subsystem are not processing data they are able to enter one of various sleep modes where power consumption is reduced, and the longer this idle period the deeper the sleep mode that can be entered (and hence the more power that can be conserved). When cores operate sequentially, there are only short idle periods where all cores are idle and thus the device is not able to enter its deepest modes of sleep.
As an example,
Once the MDP core has informed the apps core that it has processed the data (time t6), no more core activity occurs until t8, when the apps core begins reading the next media frame. So, between t7 and t8 the system can enter a sleep mode, but due to the short nature of this idle period, the system cannot select a very deep sleep mode. A duration of the idle period is determined based on an expected next activity of any one or more cores. A timer typically expires and triggers a next activity, and thus the difference between an expiry timer and the current time gives an idle period. The cores themselves can also enter various sleep modes when they are not in operation. For instance, the apps core can enter a deeper sleep state between t2 and t4, then between t5 and t6. There is therefore a need in the art for systems and methods to enable multi-core systems, where two or more cores operate sequentially, to see longer system and core idle times and thus deeper modes of sleep.
Embodiments disclosed herein address the above stated needs by providing a multi-core system that triggers each of the multiple cores to begin processing at the same time and once per processing period. Cores that have to operate sequentially can be instructed to process different data blocks each processing period, such that they operate sequentially relative to a given data block, but in parallel for a given processing period.
One aspect of the disclosure can be described as a multi-core system comprising a peripheral memory device, a controller, a memory, a first core, and a second core. The peripheral memory device can comprise data to be read and processed. The controller can send a control signal once per processing period. The first core can be coupled to the memory and coupled to the controller so as to receive the control signal and to read a first portion of the data from the peripheral memory device upon receipt of a first instance of the control signal. The first core can further process the first portion of the data and convert it to a processed first portion of the data. The first core can then write the processed first portion of the data to the memory and the first core can further be configured to read a second portion of the data from the peripheral memory device upon receipt of a second instance of the control signal.
Another aspect of the disclosure can be described as a method of operating a multi-core system. The method can include sending a first instance of a control signal to two or more cores of a computing device. The method can further include, upon receiving a first instance of the control signal, reading, via a first of the two or more cores, a first portion of data from a peripheral memory device, processing the first portion of data from a peripheral memory device, processing the first portion of data, and writing the first portion of data to a memory. The method can further include sending a second instance of the control signal to the two or more cores of the computing device. The method can yet further include, upon receiving the second instance of the control signal, reading, via the first of the two or more cores, a second portion of data from the peripheral memory device. The method can also include, upon receiving the second instance of the control signal, reading, via the second of the two or more cores, the first portion of data from the memory.
Yet another aspect of the disclosure can be described as a non-transitory, tangible computer readable storage medium, encoded with processor readable instructions to perform a method for operating a multi-core system. The method can include sending a first instance of a control signal to two or more cores of a computing device. The method can further include, upon receiving a first instance of the control signal, reading, via a first of the two or more cores, a first portion of data from a peripheral memory device, processing the first portion of data from a peripheral memory device, processing the first portion of data, and writing the first portion of data to a memory. The method can further include sending a second instance of the control signal to the two or more cores of the computing device. The method can yet further include, upon receiving the second instance of the control signal, reading, via the first of the two or more cores, a second portion of data from the peripheral memory device. The method can also include, upon receiving the second instance of the control signal, reading, via the second of the two or more cores, the first portion of data from the memory.
The term “processing period” is used herein to mean a fixed period of time during which a single block of data can be sequentially processed, although multiple blocks of data can be processed if multiple cores are operating in parallel.
The term “control signal” is used herein to mean any signal (e.g., an instruction or interrupt, to name two) that triggers two or more cores (or processors) to begin operating at the same time.
The term “peripheral memory device” is used herein to mean any memory component that is read via a peripheral bus of a computing system.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
The term “system idle time” is used herein to mean a period during which no processors are active and hence the system can enter various system level modes of sleep.
The term “core idle time” is used herein to mean a period during which a given core is inactive and hence the core can enter various core level modes of sleep.
The term “data blocks” is used herein to mean any chunk or group of individual pieces of information, such as a frame in video playback. Other examples of data blocks include registers, bit streams, interrupts, and file systems.
By offsetting the operations of sequentially-related cores by one or more processing periods, and triggering processing of multiple cores at the same moment in a given processing period, sequentially-related cores can be operated in parallel thereby creating longer idle periods between an end of activity in a given processing period and the start of activity in a next processing period.
All three cores initiate processing at t1 (e.g., as initiated via a common triggering signal or command), with core activity ending at t6. Because the cores are able to operate in parallel, the idle or sleep period has been extended as compared to the idle or sleep period illustrated in
Achieving the core activity seen in
More specifically, core 1 processes a first data block in processing period 1. Then, core 2 processes this first data block in a next processing period (processing period 2), while core 1 processing a second data block. In processing period 3, core 1 processes a third data block, core 2 processes the second data block, and core 3 processes the first data block.
While core 2 is offset from core 1 and core 3 by one processing period each, and core 1 is offset from core 3 by two processing periods, these offsets are not limiting, but rather can be tailored to a given system. For instance, in an embodiment, two cores can handle a given data block in the same processing period where the cores are not sequentially related to each other. For instance, a second and third core may handle audio and video decoding, respectively, for frame n during a given processing period, while another core handles data reading of frame n+1 and a fourth core handles writing of decoded data to memory for a frame n−1, all during the same processing period. In other words, there are scenarios where two different cores can process the same data block in the same processing period and thus do not have to be offset from each other. Other arrangements of offsets are also envisioned and can be implemented without departing from the scope of this disclosure.
One result of the above-noted offsets is that as processing of data blocks begins, there may be processing periods where a core remains inactive for the entirety of one or more processing periods. In
The second aspect of achieving the illustrated parallel processing and hence longer idle times is that when cores are active during a given processing period, they begin processing at or substantially at the same time. Traditionally, cores begin processing as soon as data is ready for processing. So, for sequentially-related cores, such as those illustrated in
Instead, this disclosure suggests modifying a core activation trigger so as to trigger on a control signal rather than on indication that data from a preceding core is ready to be processed. So, a control signal is sent to the cores at some point during a given processing period. In
In the case of core 2, the first of these two requirements is met at time t2, however no control signal has been received. However, at t1′, core 2 receives a control signal and now has data to process, so it begins processing data block n. Similarly, core 3 does not have data to process until t4″, does not thereafter receive a control signal until t1″, and thus remains inactive until t1″.
As seen, the cores still operate on a given data block in a sequential manner, but are able to operate in parallel in a given processing period. Hence, for the purposes of this disclosure, operation of the cores will be referred to as parallel while the operation of
The control signal can take a variety of forms. For instance, the control signal can be an interrupt (analog or digital), timer, register read/write, shared memory communication, inter processor communication methods, file system read/write, input/output signals, function calls, and an analog wave. The control signal can be generated in hardware or software controlling hardware. In the embodiments illustrated in
The processing periods and the duration of core activity for each of the cores in
Additionally, this disclosure describes all cores beginning processing for a processing period at the moment that they receive the control signal. However, some variation between cores is possible without straying from the disclosure. For instance, and comparing
Similarly,
Furthermore, it should be understood that this disclosure has so far only discussed systems where each separate core has different functionality and is thus tailored for different tasks. However, in some known instances, the task of processing a given data block has been distributed between multiple cores operating at the same time (see, e.g., Kuroda et al., Multimedia Processors, Proceedings of the IEEE, Vol. 86, No. 6, June 1998). Yet, one will also note that this distribution of processing of a single data block between similar or identical cores does not overlap with the newly disclosed form of parallel processing where different cores perform different types of processing on a given data block and thus must sequentially operate on a given data block.
In a first processing period (e.g., processing period 1 in
In a second processing period (e.g., processing period 2 in
In the third processing period (e.g., processing period 3 in
The peripheral memory device 402 can include any device or component comprising a memory and being accessed by the apps core 404 via a peripheral bus (Universal Serial Bus, Tunderbolt, PCI, PCI Express, Videoport, Fire Wire). A USB drive, DVD, BluRay, iPhone, iPod, iPad, smartphone, and BluRay player are just a few non-limiting examples of a peripheral memory device 402 or devices that include a peripheral memory device 402.
The controller 406 can be a hardware component, software module embodied in a non-transitory tangible computer readable medium, firmware, or some combination of the above. The controller 406 can include any component, device, or module configured to send a simultaneous control signal to the two or more cores that initiates parallel processing of one or more of the cores. In particular, each instance of the control signal causes one or more of the two or more cores to begin processing data if a given core has data available to process.
The two or more cores can communicate with each other and with the peripheral memory device and the memory via one or more system busses, peripheral busses, and/or memory buses, to name a few non-limiting examples.
When it is said that the system 400 enters a sleep mode it is meant that any one or more components or subsystems within the system 400 can enter a sleep mode. These include, but are not limited to, the two or more cores 404, 408, 410, buses that connect cores, memory devices (e.g., 412) and memory controllers, buses that connect cores to memory controllers, or buses that connect multiple memory devices (e.g., SD card, DDR, EMMC, etc.), to name a few examples. This can also include lowering a voltage and/or frequency of a bus when cores that use the bus are in a sleep state or lowering the voltage and/or frequency of a memory controller when all cores are in a sleep state.
In a first processing period (e.g., processing period 1 in
In a second processing period (e.g., processing period 2 in
In the third processing period (e.g., processing period 3 in
The peripheral memory device 502 can include any device or component comprising a memory and being accessed by the apps core 504 via a peripheral bus (Universal Serial Bus, Tunderbolt, PCI, PCI Express, Videoport, Fire Wire). A USB drive, DVD, BluRay, iPhone, iPod, iPad, smartphone, and BluRay player are just a few non-limiting examples of a peripheral memory device 502 or devices that include a peripheral memory device 502.
The controller 506 can be a hardware component, software module embodied in a non-transitory tangible computer readable medium, firmware, or some combination of the above. The controller 506 can include any component, device, or module configured to send a simultaneous control signal to the apps core 504, the DSP 508, and the MDP 510 that initiates parallel processing of one or more of these three components. In particular, the control signal causes one or more of the apps core 504, DSP 508, or MDP 510 to begin processing data if a given core has data available to process.
The apps core 504 can read data from the peripheral memory device 502 via the system bus. The apps core 504 can write processed data to the memory 512 via the system bus and the memory bus. The DSP 508 can read and write data to the memory 512 via the system bus and the memory bus. The MDP 510 can read and write data to the memory 512 via the system bus and the memory bus.
While
This disclosure uses the terms sleep and/or idle modes without specifying a type of sleep or idle mode. However, one of skill in the art is well aware that such sleep or idle modes can comprise different modes or levels depending on an amount of idle time available. For instance, there can be five levels of system level sleep, sometimes referred to as Levels 1-5. Level 1 enables the system level bus connecting multiple cores (e.g., system bus in
As another example of different modes of sleep, there can be four core level modes of sleep, sometimes referred to as Levels 1-4. In Level 1, a core idles until an interrupt is received. In Level 2, a core frequency is reduced. In Level 3, a core is powered off, but the cache remains on. In Level 4, the deepest sleep mode that a processor can enter, a core and its cache are powered off. In some embodiments, the controller 406, 506 can determine what sleep mode to place one or more cores into.
Upon receipt of the control signal, the first core can read a first portion of data from a storage device, for instance, a peripheral memory device. Processing the data involves whatever functionality the first core is responsible for (e.g., the apps core 504 in
Where the first core has previously processed data (e.g., data block n−1), receipt of the control signal can trigger the second core to read, process, and write data previously processed by the first core (e.g., data block n−1), in parallel with the first core's processing of a different block of data (e.g., data block n). The reading by the second core can involve reading the first portion of data from the memory. Processing by the second core can involve processing the first portion of data and this processing depends on the functionality of the second core. For instance, the DSP 508 in
The control signal can be sent by a controller such as controller 406 in
The systems and methods described herein can be implemented in a computer system in addition to the specific physical devices described herein.
Computer system 800 includes at least a processor 801 such as a central processing unit (CPU) or an FPGA to name two non-limiting examples. Cores 404, 408, and 410 in
Processor(s) 801 (or central processing unit(s) (CPU(s))) optionally contains a cache memory unit 802 for temporary local storage of instructions, data, or computer addresses. Processor(s) 801 are configured to assist in execution of computer-readable instructions stored on at least one non-transitory, tangible computer-readable storage medium. Computer system 800 may provide functionality as a result of the processor(s) 801 executing software embodied in one or more non-transitory, tangible computer-readable storage media, such as memory 803, storage 808, storage devices 835, and/or storage medium 836 (e.g., read only memory (ROM)). For instance, the method of operating a multi-core system resulting in the timing charts of
The memory 803 may include various components (e.g., non-transitory, tangible computer-readable storage media) including, but not limited to, a random access memory component (e.g., RAM 804) (e.g., a static RAM “SRAM”, a dynamic RAM “DRAM, etc.), a read-only component (e.g., ROM 805), and any combinations thereof. ROM 805 may act to communicate data and instructions unidirectionally to processor(s) 801, and RAM 804 may act to communicate data and instructions bidirectionally with processor(s) 801. ROM 805 and RAM 804 may include any suitable non-transitory, tangible computer-readable storage media described below. In some instances, ROM 805 and RAM 804 include non-transitory, tangible computer-readable storage media for carrying out the methods behind the timing charts in
Fixed storage 808 is connected bidirectionally to processor(s) 801, optionally through storage control unit 807. Fixed storage 808 provides additional data storage capacity and may also include any suitable non-transitory, tangible computer-readable media described herein. Storage 808 may be used to store operating system 809, EXECs 810 (executables), data 811, API applications 812 (application programs), and the like. For instance, the storage 808 could be implemented for storage of a duration of the processing period as described in
In one example, storage device(s) 835 may be removably interfaced with computer system 800 (e.g., via an external port connector (not shown)) via a storage device interface 825. Particularly, storage device(s) 835 and an associated machine-readable medium may provide nonvolatile and/or volatile storage of machine-readable instructions, data structures, program modules, and/or other data for the computer system 800. In one example, software may reside, completely or partially, within a machine-readable medium on storage device(s) 835. In another example, software may reside, completely or partially, within processor(s) 801.
Bus 840 connects a wide variety of subsystems. Herein, reference to a bus may encompass one or more digital signal lines serving a common function, where appropriate. Bus 840 may be any of several types of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combinations thereof, using any of a variety of bus architectures. As an example and not by way of limitation, such architectures include an Industry Standard Architecture (ISA) bus, an Enhanced ISA (EISA) bus, a Micro Channel Architecture (MCA) bus, a Video Electronics Standards Association local bus (VLB), a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, an Accelerated Graphics Port (AGP) bus, HyperTransport (HTX) bus, serial advanced technology attachment (SATA) bus, and any combinations thereof.
Computer system 800 may also include an input device 833. In one example, a user of computer system 800 may enter commands and/or other information into computer system 800 via input device(s) 833. Examples of an input device(s) 833 include, but are not limited to, an alpha-numeric input device (e.g., a keyboard), a pointing device (e.g., a mouse or touchpad), a touchpad, a joystick, a gamepad, an audio input device (e.g., a microphone, a voice response system, etc.), an optical scanner, a video or still image capture device (e.g., a camera), and any combinations thereof. Input device(s) 833 may be interfaced to bus 840 via any of a variety of input interfaces 823 (e.g., input interface 823) including, but not limited to, serial, parallel, game port, USB, FIREWIRE, THUNDERBOLT, or any combination of the above.
In particular embodiments, when computer system 800 is connected to network 830 (such as a cellular network), computer system 800 may communicate with other devices, such as mobile devices and enterprise systems, connected to network 830. Communications to and from computer system 800 may be sent through network interface 820. For example, network interface 820 may receive incoming communications (such as requests or responses from other devices) in the form of one or more packets (such as Internet Protocol (IP) packets) from network 830, and computer system 800 may store the incoming communications in memory 803 for processing. Computer system 800 may similarly store outgoing communications (such as requests or responses to other devices) in the form of one or more packets in memory 803 and communicated to network 830 from network interface 820. Processor(s) 801 may access these communication packets stored in memory 803 for processing.
Examples of the network interface 820 include, but are not limited to, a network interface card, a modem, and any combination thereof. Examples of a network 830 or network segment 830 include, but are not limited to, a wide area network (WAN) (e.g., the Internet, an enterprise network), a local area network (LAN) (e.g., a network associated with an office, a building, a campus or other relatively small geographic space), a telephone network, a direct connection between two computing devices, and any combinations thereof. For instance, a cellular or home WiFi network are exemplary implementations of the network 830. A network, such as network 830, may employ a wired and/or a wireless mode of communication. In general, any network topology may be used.
Information and data can be displayed through a display 832. Examples of a display 832 include, but are not limited to, a liquid crystal display (LCD), an organic liquid crystal display (OLED), a cathode ray tube (CRT), a plasma display, and any combinations thereof. The display 832 can interface to the processor(s) 801, memory 803, and fixed storage 808, as well as other devices, such as input device(s) 833, via the bus 840. The display 832 is linked to the bus 840 via a video interface 822, and transport of data between the display 832 and the bus 840 can be controlled via the graphics control 821.
In addition to a display 832, computer system 800 may include one or more other peripheral output devices 834 including, but not limited to, an audio speaker, a printer, and any combinations thereof. Such peripheral output devices may be connected to the bus 840 via an output interface 824. Examples of an output interface 824 include, but are not limited to, a serial port, a parallel connection, a USB port, a FIREWIRE port, a THUNDERBOLT port, and any combinations thereof.
In addition or as an alternative, computer system 800 may provide functionality as a result of logic hardwired or otherwise embodied in a circuit, which may operate in place of or together with software to execute one or more processes or one or more steps of one or more processes described or illustrated herein. Reference to software in this disclosure may encompass logic, and reference to logic may encompass software. Moreover, reference to a non-transitory, tangible computer-readable medium may encompass a circuit (such as an IC) storing software for execution, a circuit embodying logic for execution, or both, where appropriate. The present disclosure encompasses any suitable combination of hardware, software, or both.
Those of skill in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Within this specification, the same reference characters are used to refer to terminals, signal lines, wires, etc. and their corresponding signals. In this regard, the terms “signal,” “wire,” “connection,” “terminal,” and “pin” may be used interchangeably, from time-to-time, within the this specification. It also should be appreciated that the terms “signal,” “wire,” or the like can represent one or more signals, e.g., the conveyance of a single bit through a single wire or the conveyance of multiple parallel bits through multiple parallel wires. Further, each wire or signal may represent bi-directional communication between two, or more, components connected by a signal or wire as the case may be.
Those of skill will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein (e.g., the methods behind the timing charts in
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.