Portable computing devices (e.g., cellular telephones, smart phones, tablet computers, portable digital assistants (PDAs), portable game consoles, wearable devices, and other battery-powered devices) and other computing devices continue to offer an ever-expanding array of features and services, and provide users with unprecedented levels of access to information, resources, and communications. To keep pace with these service enhancements, such devices have become more powerful and more complex. Portable computing devices now commonly include a system on chip (SoC) comprising one or more chip components embedded on a single substrate (e.g., one or more central processing units (CPUs), a graphics processing unit (GPU), digital signal processors, etc.). The SoC may be coupled to one or more volatile memory devices (e.g., dynamic random access memory (DRAM) and one or more non-volatile storage devices (e.g., flash storage) via high-performance data and control interface(s).
In a conventional SoC boot operation, the SoC boots from an internal read only memory (ROM). The boot ROM firmware boot typically requires internal memory and complex drivers to securely boot the SoC over emerging boot devices and interfaces, such as, for example, Universal Flash Storage (UFS), Peripheral Component Interconnect Express (PCle), Non-Volatile Memory Host Controller Interface Specification (NVMe), Universal Serial Bus 3 (USB3), etc. Each boot relies on a combination of different types of bootable processing systems (e.g., multi-core central processing unit (CPU) systems, digital signal processors (DSPs), and other processing subsystems) with different sizes and types of dedicated internal memories (e.g., CPU cache, static random access memory (SRAM), and tightly coupled memory (TCM).
The different types of processor chips and the varying size of their internal memories present unique challenges to securely booting up the SoC. For example, the increasing complexity of SoCs and double data rate (DDR) technologies, increasing secure boot requirements to enable higher hashing and cryptographic algorithms (RSA, error code correction (ECC), etc.), and the variety of storage boot devices and interfaces necessitate significant internal memory size requirements to initialize and calibrate the peripherals while still meeting stringent boot time key performance indicators (KPIs) and performance for multiple market segments (e.g., mobile computer, automotive, and Internet of Things (IoTs).
One potential solution to this problem is to expand or reserve internal memory (e.g., SRAM) that is only required temporarily during boot. However, increasing the SRAM size can add significant area/power cost, and it may increase the average unit cost (AuC) for the SoC if the increased internal memory size cannot be justified for post-boot use cases. Another proposed solution in the context of a heterogeneous processor cluster architecture is to repurpose application processor internal memory (e.g., using a CPU level 2 (L2) cache) as tightly coupled internal memory. However, these solutions are challenged by size. For example, the processors used in a Little processor cluster are typically smaller in size to provide power saving advantages compared to the processors used in a Big processor cluster designed for performance. Furthermore, while enabling higher performance differentiated CPU designs (e.g., ARM-based register transfer level (RTL) designs) can provide faster time-to-market (TTM), they introduce challenges in SoC integration, which may require additional custom wrapper functionality requirements. For example, the addition of custom wrapper functionality may alter the smooth RTL integration and introduce risk to SoC tapeout and verification, which negatively impact TTM. The increased need to optimize process node scaling only makes this integration and TTM even more challenging.
Accordingly, there is a need for improved systems and methods to optimize internal memory usage in the SoC during a secure boot-up.
Systems, methods, and computer programs are disclosed for securely booting a system on chip. One embodiment is a system comprising a system on chip (SoC) and a virtual collated internal memory pool (VCIMP). The SoC comprises a bootable processing device having a first internal memory, a read only memory (ROM), and one or more bootable processing subsystems each having a dedicated internal memory. The bootable processing device is configured to execute a bootloader in the ROM. The VCIMP provides time-shared control and access to the one or more bootable processing subsystems during execution of a boot sequence. The VCIMP comprises a contiguous logical-to-physical address mapping of the first internal memory residing on the bootable processing device and the dedicated internal memories residing on the corresponding one or more bootable processing subsystems.
Another embodiment is a method for securely booting a system on chip (SoC). The method comprises powering up an SoC comprising a bootable processing device having a first internal memory, a read only memory (ROM), and one or more bootable processing subsystems each having a dedicated internal memory. The first internal memory residing on the bootable processing device and the dedicated internal memories residing on the corresponding one or more bootable processing subsystems are mapped to a virtual collated internal memory pool (VCIMP). The bootable processing device executes a bootloader in the ROM. The one or more bootable processing subsystems are provided time-shared access to the VCIMP during execution of the bootloader.
In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In this description, the terms “communication device,” “wireless device,” “wireless telephone”, “wireless communication device,” and “wireless handset” are used interchangeably. With the advent of third, fourth, and fifth generation (“3/4/5G”) wireless technology, greater bandwidth availability has enabled more portable computing devices with a greater variety of wireless capabilities. Therefore, a portable computing device may include a cellular telephone, a smartphone, a pager, a personal digital assistant (PDA), a navigation device, a wearable device (e.g., a smartwatch, fitness watch), any handheld computer with a wireless connection or link, Internet of Things (IoTs) devices, etc.
As described below in more detail, the system 100 collates various internal memories on dedicated components residing on the SoC 102 into a virtual collated internal memory map (VCIMP) 150. The VCIMP 150 may comprise a contiguous virtual memory map, which may be directly accessed (e.g., direct memory access (DMA), read/write, etc.) by a plurality of SoC masters. In this regard, it should be appreciated that the VCIMP 150 may support a hardware framework that provides connectivity to the plurality of SoC masters, an access protection unity capability, and system memory map allocation capability. In this regard, the VCIPM 150 appears as a single shared internal memory pool for time-shared, intelligent and secure use by the SoC masters during the boot-up of the SoC 102. The VCIMP 150 enables various methods for optimizing internal memory usage in the SoC 102. It should be further appreciated that the VCIMP 150 enables an intelligent temporal sharing framework in, for example, hardware that boot firmware may better use and, thereby, reduce SoC AuC, optimize off-the-shelf and improved processor designs supporting register transfer level (RTL) without adding complexities of custom wrappers, and enable integration of different chip CPU cores and derivative chip customization.
The SoC 102 comprises various on-chip components. In the embodiment of
The multi-core processing system 108 may support a heterogeneous processor cluster architecture comprising a plurality of processor clusters coupled to a cache controller. As known in the art, each processor cluster may comprise one or more processors or processor cores (e.g., central processing unit (CPU), graphics processing unit (GPU), digital signal processor (DSP), etc.) with a corresponding shared cache.
In the embodiment of
Processor clusters 122 and 124 may have independent shared cache memory used by the corresponding processors in the cluster to reduce the average time to access data from a main memory. In an embodiment, the shared cache memory and the main memory may be organized as a hierarchy of cache levels (e.g., level one (L1), level two (L2), level three (L3). In the embodiment illustrated in
The multi-core processing system 108 may execute a high-level operating system (HLOS) 144 configured to generate and/or access the time-shared VCIMP 150. As described below in more detail, the VCIMP 150 may be hardcoded in a designated register transfer layer (RTL) or configured by the HLOS 144.
As further illustrated in the embodiment of
It should be appreciated that system 100 may support flexible memory map layouts for the temporal sharing. The flexible memory map layouts enable software to dynamically infer and enable intelligent decision-making through late-in-product-cycle and SKU/derivative chip options.
The SoC 102 may collate one or more of the dedicated memories illustrated in
At block 204, the method 200 maps the first internal memory residing on the bootable processing device and the dedicated internal memories residing on the corresponding one or more bootable processing subsystems to a virtual collated internal memory pool (VCIMP) 150. At block 206, the bootable processing device executes a bootloader in ROM 112. At block 208, the method 200 provides time-shared or temporal access to the VCIMP 150 to the one or more bootable processing subsystems during execution of the bootloader. In an embodiment, the time-shared access to the first internal memory and the dedicated internal memories may be performed until the bootable processing subsystems are brought out of a reset state. The time-shared access may be performed during a plurality of stages of the boot procedure, and may terminate upon entry into HLOS 144. At block 210, the method 200 may disable the mapping of the first internal memory and the dedicated internal memories to the VCIMP 150 to enable the one or more bootable processing subsystems to reclaim access control of their dedicated internal memories.
In this manner, the time-shared memory region provided via the VCIMP 160 may be made available to any SoC component/master, as needed, during an initial boot sequence of the SoC 102. The time-shared access during the boot process may be managed as a time-shared resource among clients or it can be managed as a group of memory segments, one for each client. Regardless the implementation, the respective clients use the VCIMP 150 to locally access the collated internal memories. The time-shared memory may no longer be accessible as VCIMP 150 by the clients after boot is completed. It should be appreciated that the time-shared access may refer to the usage of the VCIMP 150 during boot by, for example, one client for a period of time and then another client for another period of time, and so one. In other embodiments, the virtual memory space may be partitioned such that each client gets a portion of it during boot.
The VCIMP 150 may support a hardware framework that provides connectivity from different SoC masters, access protection unit capability, and system memory map allocation capability to collate the multiple scattered internal memories in the SoC 102 and make them appear as a single shared internal memory pool for temporal use intelligently and securely. It should be appreciated that post-boot usage of the VCIMP 150 may include further logic to relinquish, under secure control, the VCIMP 150 to their respective processor and functional subsystems. In one embodiment, the details of the different memories mapped to the VCIMP 150 may be implemented in hardware RTL through registers or software-maintained tables with information on ranges and attributes or parameters (such as whether the memory is cacheable or not, the type of SoC master accessibility (e.g., UFS, PCIe, USB, DMA, etc.)) which software may infer and intelligently allocated. It should be further appreciated that this hardware framework may enables the SoC leverage across chips in the same family/architecture and derivative chips/SKUs that may fuse out certain CPUs or subsystems (e.g., via eFuse 158). One exemplary implementation strategy may be to enable an application processor's instruction cache and/or data cache while executing or accessing the time-shared VCIMP 150 during boot for boot-time KPI competitiveness.
At block 506, the SoC 102 may enable a default connectivity state from a plurality of bootable masters (e.g., flash controller 157, boot interface controllers 156 to USB 158 and/or PCIe 160, etc.) to the VCIMP 150. In this regard, the default connectivity state enables the various SoC masters to directly access (e.g., direct memory access (DMA), read/write access) the coalesced memories in the VCIMP 150. At block 508, the SoC 102 may enable a default “open” security access state to the VCIPM 150. The SoC 102 may bring a bootable core of CPU subsystem(s) out of reset to execute secure firmware in ROM 112 (block 510). The bootable core of CPU subsystem(s) may comprise, for example, one of the CPUs residing in the process cluster 122 or the process cluster 124.
It should be appreciated that the VCIMP 150 enables a boot process that separately provides secure and non-secure access control. A secure world or mode of operation is illustrated in blocks 512, 514, 516, 518, and 520. A non-secure world or mode of operation is illustrated in blocks 522, 524, and 526. In this manner, low-level access control may be configured for both the secure and non-secure world ranges in the same coalesced memory range provided by VCIMP 150.
At block 512, a primary bootloader (e.g., BootROM) may initialize and use the VCIMP 150 to execute SRAM 110 memory needs and to load/authenticate a secondary bootloader. In an embodiment, SRAM memory needs may include executable-software RAM needs, such as, for example, stack, heap, scratch memory, etc. It should be appreciated that the terms primary bootloader and secondary bootloader generally refer to different stages of boot firmware (e.g., boot firmware in ROM versus RAM). A primary boot stage may be executed from ROM 112. A secondary boot stage may be executed pre-DDR from internal SRAM 110. A tertiary boot stage may be executed from DRAM 104.
At block 514, the SoC 102 may exit firmware execution control out of a secure ROM (e.g., ROM 112) and into a secure SRAM space (e.g., SRAM 110). Secure software in ROM 112 or SRAM 110 may optionally enable access control partially on VCIMP 150 space (block 516), such that only secure logic can read and/write from portions of VCIMP 150, thereby preventing non-secure access. At block 518, the method 500 may lock down security accesses for critical processor subsystems and/or functionality. In an ARM-based embodiment, Trustzone-based differentiation may be used to differentiate a secure world versus a non-secure world, which the SoC 102 may use to lock down certain resources, hardware, functionality, features, etc. that may be enabled/disabled in secure world firmware but not in non-secure world firmware. In other embodiments, the SoC 102 may choose to implement similar separation of secure versus non-secure worlds using local security protection units in hardware with their own signaling of secure versus non-secure state of execution.
At block 520, the SoC 102 may switch to a non-secure mode of operation in which access may be given to, for example, OEM-specific board initializations and differentiation. At block 522, the SoC 102 may initialize the DRAM controller 152 and perform operations, such as, DDR calibrations, etc. and load and/or authenticate other images in the SoC 102. For example, the SoC 102 may handle various non-secure early boot initializations. In this regard, the SoC 102 may be configured to make various intelligent decisions (block 524) depending on the mapping in VCIMP 150 to, for example, tackle destination address of executables and data segments. For example, intelligent runtime decision-making may be based on memory checks in the coalesced memory range of the VCIMP 150 to accommodate different types of processor subsystems with variable internal memory sizes, as well as, improved binning and process node yield improvement options. At block 526, the SoC bootloader may transition control out of the VCIMP 150 to the DRAM 104. In one embodiment, post-ROM bootloader firmware may transition control, although any ROM-based bootloader (primary) or post-ROM-based bootloader firmware (secondary, tertiary, etc.) may be implemented. At block 528, the SoC 102 may switch back to a secure mode of operation to relinquish the internal memories mapped to the VCIMP 150 back to their respective processor subsystems. At block 530, the SoC 102 may disable connectivity between the VCIMP 150 and the bootable processing subsystems and other processing subsystems on the SoC 102. At block 532, the processing subsystems may reclaim ownership of their internal memories to be used when the securely boot up.
As mentioned above, the system 100 may be incorporated into any desirable computing system.
A display controller 628 and a touch screen controller 630 may be coupled to the CPU 602. In turn, the touch screen display 606 external to the on-chip system 622 may be coupled to the display controller 628 and the touch screen controller 630.
Further, as shown in
As further illustrated in
As depicted in
It should be appreciated that one or more of the method steps described herein may be stored in the memory as computer program instructions, such as the modules described above. These instructions may be executed by any suitable processor in combination or in concert with the corresponding module to perform the methods described herein.
Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example.
Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the Figures which may illustrate various process flows.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, NAND flash, NOR flash, M-RAM, P-RAM, R-RAM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
Disk and disc, as used herein, includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.