HARDWARE-BASED CRYPTOGRAPHIC PROTECTION OF TOKENS

BACKGROUND

Federal Information Processing Standards (FIPS) Publication 202, dated August 2015, published by the National Institute of Standards and Technology (NIST), describes standards for a cryptographic function referred to as Secure Hash Algorithm 3 (SHA-3). SHA-3 is a subset of a cryptographic primitive family referred to as KECCAK. NIST Special Publication 800-185, dated December 2016, describes standards for various SHA-3 derived functions, including KECCAK Message Authentication Code (KMAC). Such cryptographic functions may find utility in various computer security applications.

There is an ongoing need for improved computational devices to enable ever increasing demand for modeling complex systems, providing reduced computation times, and other considerations. In particular, there is an ongoing desire to improve security circuits that are included in or otherwise support operation of integrated circuits. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to improve computational efficiency becomes even more widespread.

BRIEF DESCRIPTION OF DRAWINGS

Various examples in accordance with the present disclosure will be described with reference to the drawings, in which:

FIG. 1A is a block diagram of an example of an apparatus in one implementation.

FIG. 1B is a block diagram of another example of an apparatus in one implementation.

FIG. 1C is a block diagram of another example of an apparatus in one implementation.

FIG. 1D is a block diagram of another example of an apparatus in one implementation.

FIG. 2 is a block diagram of an example of an electronic device in one implementation.

FIG. 3 is a block diagram of an example of a system in one implementation.

FIG. 4 is a block diagram of another example of a system in one implementation.

FIG. 5 is a block diagram of another example of a key generator in one implementation.

FIG. 6 is a block diagram of another example of a key generator in one implementation.

FIG. 7 is a block diagram of another example of a system that includes technology for hardware-based cryptographic protection of tokens in one implementation.

FIG. 8 illustrates an example of another computing system.

FIG. 9 illustrates a block diagram of an example processor and/or System on a Chip (SOC) that may have one or more cores and an integrated memory controller.

FIG. 10A is a block diagram illustrating both an example in-order pipeline and an example register renaming, out-of-order issue/execution pipeline according to examples.

FIG. 10B is a block diagram illustrating both an example in-order architecture core and an example register renaming, out-of-order issue/execution architecture core to be included in a processor according to examples.

FIG. 11 illustrates examples of execution unit(s) circuitry.

FIG. 12 is a block diagram of a register architecture according to some examples.

FIG. 13 is a block diagram illustrating the use of a software instruction converter to convert binary instructions in a source instruction set architecture to binary instructions in a target instruction set architecture according to examples.

DETAILED DESCRIPTION

The present disclosure relates to methods, apparatus, systems, and non-transitory computer-readable storage media for hardware-based cryptographic protection of tokens. According to some examples, the technologies described herein may be implemented in one or more electronic devices. Non-limiting examples of electronic devices that may utilize the technologies described herein include any kind of mobile device and/or stationary device, such as cameras, cell phones, computer terminals, desktop computers, electronic readers, facsimile machines, kiosks, laptop computers, netbook computers, notebook computers, internet devices, payment terminals, personal digital assistants, media players and/or recorders, servers (e.g., blade server, rack mount server, combinations thereof, etc.), set-top boxes, smart phones, tablet personal computers, ultra-mobile personal computers, wired telephones, combinations thereof, and the like. More generally, the technologies described herein may be employed in any of a variety of electronic devices including integrated circuitry which is operable to provide hardware-based cryptographic protection of tokens.

In the following description, numerous details are discussed to provide a more thorough explanation of the examples of the present disclosure. It will be apparent to one skilled in the art, however, that examples of the present disclosure may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring examples of the present disclosure.

Note that in the corresponding drawings of the examples, signals are represented with lines. Some lines may be thicker, to indicate a greater number of constituent signal paths, and/or have arrows at one or more ends, to indicate a direction of information flow. Such indications are not intended to be limiting. Rather, the lines are used in connection with one or more exemplary examples to facilitate easier understanding of a circuit or a logical unit. Any represented signal, as dictated by design needs or preferences, may actually comprise one or more signals that may travel in either direction and may be implemented with any suitable type of signal scheme.

Throughout the specification, and in the claims, the term “connected” means a direct connection, such as electrical, mechanical, or magnetic connection between the things that are connected, without any intermediary devices. The term “coupled” means a direct or indirect connection, such as a direct electrical, mechanical, or magnetic connection between the things that are connected or an indirect connection, through one or more passive or active intermediary devices. The term “circuit” or “module” may refer to one or more passive and/or active components that are arranged to cooperate with one another to provide a desired function. The term “signal” may refer to at least one current signal, voltage signal, magnetic signal, or data/clock signal. The meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”

The term “device” may generally refer to an apparatus according to the context of the usage of that term. For example, a device may refer to a stack of layers or structures, a single structure or layer, a connection of various structures having active and/or passive elements, etc. Generally, a device is a three-dimensional structure with a plane along the x-y direction and a height along the z direction of an x-y-z Cartesian coordinate system. The plane of the device may also be the plane of an apparatus which comprises the device.

The term “scaling” generally refers to converting a design (schematic and layout) from one process technology to another process technology and subsequently being reduced in layout area. The term “scaling” generally also refers to downsizing layout and devices within the same technology node. The term “scaling” may also refer to adjusting (e.g., slowing down or speeding up—i.e. scaling down, or scaling up respectively) of a signal frequency relative to another parameter, for example, power supply level.

The terms “substantially,” “close,” “approximately,” “near,” and “about,” generally refer to being within +/−10% of a target value. For example, unless otherwise specified in the explicit context of their use, the terms “substantially equal,” “about equal” and “approximately equal” mean that there is no more than incidental variation between among things so described. In the art, such variation is typically no more than +/−10% of a predetermined target value.

It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the examples of the invention described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.

Unless otherwise specified the use of the ordinal adjectives “first,” “second,” and “third,” etc., to describe a common object, merely indicate that different instances of like objects are being referred to and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking or in any other manner.

The terms “left,” “right,” “front,” “back,” “top,” “bottom,” “over,” “under,” and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. For example, the terms “over,” “under,” “front side,” “back side,” “top,” “bottom,” “over,” “under,” and “on” as used herein refer to a relative position of one component, structure, or material with respect to other referenced components, structures or materials within a device, where such physical relationships are noteworthy. These terms are employed herein for descriptive purposes only and predominantly within the context of a device z-axis and therefore may be relative to an orientation of a device. Hence, a first material “over” a second material in the context of a figure provided herein may also be “under” the second material if the device is oriented upside-down relative to the context of the figure provided. In the context of materials, one material disposed over or under another may be directly in contact or may have one or more intervening materials. Moreover, one material disposed between two materials may be directly in contact with the two layers or may have one or more intervening layers. In contrast, a first material “on” a second material is in direct contact with that second material. Similar distinctions are to be made in the context of component assemblies.

The term “between” may be employed in the context of the z-axis, x-axis or y-axis of a device. A material that is between two other materials may be in contact with one or both of those materials, or it may be separated from both of the other two materials by one or more intervening materials. A material “between” two other materials may therefore be in contact with either of the other two materials, or it may be coupled to the other two materials through an intervening material. A device that is between two other devices may be directly connected to one or both of those devices, or it may be separated from both of the other two devices by one or more intervening devices.

As used throughout this description, and in the claims, a list of items joined by the term “at least one of” or “one or more of” can mean any combination of the listed terms. For example, the phrase “at least one of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C. It is pointed out that those elements of a figure having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.

In addition, the various elements of combinatorial logic and sequential logic discussed in the present disclosure may pertain both to physical structures (such as AND gates, OR gates, or XOR gates), or to synthesized or otherwise optimized collections of devices implementing the logical structures that are Boolean equivalents of the logic under discussion.

Computer devices may employ a wide variety of security mechanisms via hardware (HW), software (SW), firmware (FW), and combinations thereof. Some computer devices and/or parts may include secure features or functionality that involve some sort of authentication to unlock such secure features or functionality. In some systems, FW may be utilized to unlock the device/part via token injection. For example, tokens may be processed by FW cryptographic methods for authenticity. For example, FW may support various token-based directives. Token-based directives provide flexibility, and may be utilized by part suppliers, OEMs, customers, etc., and accordingly benefit from being well protected.

Various HW authentication technologies may utilize a cryptographic hash function and may be completely implemented in HW. Such HW authentication may be considered highly secure (e.g., even a successful brute force attack (e.g., complexity of breaking SHA3-384) may allow only a single unit exploitation). A problem is that such HW security may not provide the flexibility needed by suppliers, OEMs, customers, etc. for the wide variety of potential applications of the device/part.

Some examples described herein overcome one or more of the foregoing problems. Some examples provide technology for HW-based cryptographic protection of tokens. Some examples may provide HW-level security technology for tokens, while preserving the ability for FW to control the timing and sequencing of the enabling, disabling or modification of the features. Some examples may ensure that HW performs the authentication and ungating of secure features. Some examples may not depend on injecting a secret into the part. Some examples may additionally provide technology to perform HW-based critical rule checking to ensure the part is in a state that allows specific modes to be enabled regardless of FW/token requests (e.g., further enhancing the defense in depth).

Some examples may improve security on token processing for a wide variety of devices/part that support unlock/debug tokens. In some examples, a HW layer may increase layers of defense in depth. Some examples may comply with various applicable National Institute of Standards and Technology (NIST) standards to ensure that HW authenticates the token is meant for the target unit and that the integrity of the message is guaranteed. Some examples may also enforce rules of feature enablement/unlock to better assure protection of secrets, security modes, and/or features.

Advantageously, some examples may provide HW-level security while providing FW-level flexibility. Some examples may provide improved security that includes unit specific HW authentication to maintain the many benefits of tokens to suppliers and OEMs, and maintains the flexibility of FW control. Some examples may utilize a low cost HW cryptography engine. Some examples may additionally or alternatively implement one way functions. Some examples may additionally or alternatively implement a custom rule processing engine to further enhance token processing.

Some examples may extend existing HW-based authentication techniques to token-based requests. Some examples may share a hardware engine or a subset of the hardware engine that otherwise supports HW-based unlock key hash processing for a device/part, advantageously reducing the combined HW costs. Some examples may enable a low cost/secure path for child chiplets that may require or benefit from the adoption of tokens (i.e. core chiplets).

Some examples may improve protection from attacks by increasing the defense in depth complementing the FW protection. Some examples may implement a HW without a FW protection. Some examples may implement a HW cryptographic module to ensure that a request can be trusted and that the request comes from a trusted source. Advantageously, some examples may support the use of tokens with elevated security techniques that are roughly on par with HW-only unlock techniques. With trusted tokens, some examples may provide greater levels of security for post-sale soft SKUs, controlled unlocks, etc. For example, a customer that depends on Off-line Sort, Assembly, and Test (OSAT) external manufacturing may have full control of the unlock security, debug, and/or even functional feature enablement via HW secure tokens.

In some examples, the utilization of a per-part hardware secret (xPPHS) key as part of a token may substantially enhance security control. In some examples, the part-specific (secret) key or a portion of the part-specific key or an additional key is to be provided by the part-supplier or OEM so they are in control of the secrets or parts of the secrets. In some examples the secret can be made unique per part or kept consistent for a group of parts.

In some examples, the one way function is considered cryptographically and quantum secure from exploitation. This ensures that from a token a new nefarious token cannot be created as it requires the reversing of the one way function depicted in some examples.

With reference to FIG. 1A, an example of an apparatus 100 (e.g., an integrated circuit (IC), an electronic system, a SOC, etc.) may include first circuitry 110 (e.g., a part of the apparatus 100, a component of the apparatus 100, an IP block of the apparatus 100, a feature of the apparatus 100, etc.) that is to be selectively locked and unlocked, second circuitry 112 (e.g., that implements FW) to process one or more tokens including a token (e.g., an unlock token) for the first circuitry 110, and hardware authentication circuitry 114 to authenticate the token for the first circuitry 110 in response to a request from the second circuitry 112. In some examples, the hardware authentication circuitry 114 may be configured to generate a per-part unlock key (e.g., at provision time), and provide the per-part unlock key for secure external storage (e.g., a secure key storage database). In some examples, the hardware authentication circuitry 114 may be further configured to generate a per-part unlock key at runtime, compute a tag based on the generated per-part unlock key, and determine whether the token is authentic based on a comparison of the computed tag and a tag (e.g., an unlock tag) from the token. For example, the hardware authentication circuitry 114 may be configured to utilize symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime (e.g., the per-part unlock generated at provision time and provided to the secure external storage).

With reference to FIG. 1B, an example of an apparatus 120 may be similarly configured as the apparatus 100, while like reference numerals indicating like elements. The apparatus 120 may further include hardware ungate circuitry 122 to selectively gate and ungate one or more features of the first circuitry 110 in response to an indication of a result of the authentication. In some examples, the hardware ungate circuitry 122 may be configured to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate one or more features of the first circuitry 110 based on the comparison. In some examples, the second circuitry 112 may be configured to is further to selectively enable and disable one or more ungated features of the first circuitry 110.

For example, the apparatus 100 and/or 120 may be integrated/incorporated with/in any of the processors described herein. In particular, any/all of the circuitry 110, 112, 114 and/or 122 may be integrated/incorporated with/in the processor 800, the processor 870, the processor 815, the coprocessor 838, and/or the processor/coprocessor 880 (FIG. 8), the processor 900 (FIG. 9), the core 1090 (FIG. 10B), the execution units 1062 (FIGS. 10B and 11), and the processor 1316 (FIG. 13). Some examples of any/all of the circuitry 110, 112, 114, 122 may be integrated/incorporated with/in a parallel computing application, a GPU, a SIMD processor, and/or an AI processor.

With reference to FIG. 1C, an example of an apparatus 140 (e.g., an IC, an electronic system, a SOC, etc.) may include first circuitry 150 (e.g., a part of the apparatus 140, a component of the apparatus 140, an IP block of the apparatus 140, a feature of the apparatus 140, etc.) that is to be selectively locked and unlocked, second circuitry 152 (e.g., that implements FW) to process one or more tokens including a token (e.g., an unlock token) for the first circuitry 150, and hardware ungate circuitry 154 to selectively gate and ungate one or more features of the first circuitry 150 in response to an indication of whether the token is authentic. In some examples, the hardware ungate circuitry 154 may be configured to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate one or more features of the first circuitry 150 based on the comparison. In some examples, the second circuitry 152 may be configured to selectively enable and disable one or more ungated features of the first circuitry 150.

With reference to FIG. 1D, an example of an apparatus 160 may be similarly configured as the apparatus 140, while like reference numerals indicating like elements. The apparatus 160 may further include hardware authentication circuitry 170 to authenticate the token for the first circuitry 150 in response to a request from the second circuitry 152. In some examples, the hardware authentication circuitry 170 may be configured to generate a per-part unlock key, and provide the per-part unlock key for secure external storage. In some examples, the hardware authentication circuitry 170 may be further configured to generate a per-part unlock key at runtime, compute a tag based on the generated per-part unlock key, and determine whether the token is authentic based on a comparison of the computed tag and a tag (e.g., an unlock tag) from the token. For example, the hardware authentication circuitry 170 may be configured to utilize symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime.

For example, the apparatus 140 and/or 160 may be integrated/incorporated with/in any of the processors described herein. In particular, any/all of the circuitry 150, 152, 154 and/or 170 may be integrated/incorporated with/in the processor 800, the processor 870, the processor 815, the coprocessor 838, and/or the processor/coprocessor 880 (FIG. 8), the processor 900 (FIG. 9), the core 1090 (FIG. 10B), the execution units 1062 (FIGS. 10B and 11), and the processor 1316 (FIG. 13). Some examples of any/all of the circuitry 150, 152, 154, 170 may be integrated/incorporated with/in a parallel computing application, a GPU, a SIMD processor, and/or an AI processor.

With reference to FIG. 2, an example of an electronic device 200 (e.g., an IC, a SOC, etc.) may include one or more technical features 210A through 210N (collectively “features 210”) to be selectively locked and unlocked. For example, the features 210 may correspond to debug features (e.g., trace data, register information, etc.), performance features (e.g., memory, cores, clock speeds, etc.), functional features (e.g., sensors, interfaces, cameras, GPUs, etc.), etc. The electronic device 200 may further include memory to store firmware (FW) 220 to process one or more tokens including a token 230 (e.g., an unlock token) for at least one of the features 210 that includes a message portion M and a tag portion T, and hardware authentication circuitry 240 to authenticate the message portion M of the token 230 based at least in part on the tag portion T of the token 230.

In some examples, the hardware authentication circuitry 240 may be configured to generate a per-device (e.g., the “part” corresponds to a complete device) unlock key (K1) at provision time that is unique to the device 200, and provide the generated per-device unlock key (K1) for secure external storage. In some examples, the hardware authentication circuitry 240 may be further configured to generate a per-device unlock key (K2) at runtime based at least in part on the message portion M of the token 230, compute a tag (T2) based on the generated per-device unlock key (K2), and determine whether the message portion M of the token 230 is authentic based on a comparison of the computed tag (T2) and the tag portion T of the token 230. For example, the hardware authentication circuitry 240 may be configured to utilize symmetric key cryptography to generate the per-device unlock (K2) at runtime to authenticate against another unlock key (e.g., K1) generated at provision time that is unique to the device 200.

In some examples, the device 200 may optionally include hardware ungate circuitry 250 to selectively gate and ungate the one or more technical features 210 of the device 200 in response to an indication that the one or more technical features 210 are respectively one of locked or unlocked. In some examples, the hardware ungate circuitry 250 may be further configured to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate the one or more technical features 210 of the device 200 based on the comparison. In some examples, the firmware 220 may be configured to selectively enable and disable ungated ones of the one or more technical features 210 of the device 200.

For example, the device 200 may be integrated/incorporated with/in any of the processors described herein. In particular, any/all of the features 210, FW 220, circuitry 240 and/or circuitry 250 may be integrated/incorporated with/in the processor 800, the processor 870, the processor 815, the coprocessor 838, and/or the processor/coprocessor 880 (FIG. 8), the processor 900 (FIG. 9), the core 1090 (FIG. 10B), the execution units 1062 (FIGS. 10B and 11), and the processor 1316 (FIG. 13). Some examples of any/all of the features 210, FW 220, circuitry 240 and/or circuitry 250 may be integrated/incorporated with/in a parallel computing application, a GPU, a SIMD processor, and/or an AI processor.

With reference to FIG. 3, an example of a system 300 includes a server 310, a key store 320, a remote debugger 330, and a target device 340, coupled or in communication as shown. The target device 340 includes device FW 350, a hardware authentication module (HAM) 360, and a hardware ungate module (HUM) 370. The HAM 360 includes key generator(s) 362, 364 and a tag verifier 366. The device 340 further stores or has secure access to a first uniquification value 382 (e.g., provided by an OEM), a second uniquification value 384 (e.g., provided by a supplier), and a global secret value 386.

In one example application, the target device 340 is configured such that the device must be unlocked for debug via a token directive. The remote debugger 330 (e.g., a system controlled by an authorized user/debugger) wants to unlock the target device 340 via a token. The remote debugger 330 may have access to the server 310 that assists in generating an authentication tag T (e.g., via the key store 320) to support message authentication in the HAM 360 of the target device 340.

In order to generate an authentication tag (T) from the token's message (M), a per-part authentication key will be combined with the token's message M via an industry standard cryptographic function. To generate the respective per-part authentication key, the HAM 360 may utilize the first cryptographic key generator 362 that generates a symmetric key (e.g., utilizing symmetric key cryptography technology, sometimes also referred to as symmetric encryption, where a secret key may be leveraged for both encryption and decryption functions).

Collateral such as OEM provisioned uniquification salt (e.g., utilized for the first uniquification value 382), and supplier provisioned uniquification salt (e.g., utilized for the second uniquification value 384), and the global secret value 386 may be required to generate the per-part authentication key. In some systems, the uniquification salts may be discarded after respective manufacturing provisioning (e.g., the system may have no additional supplier uniquification salt(s)). The per-part authentication key that the HAM 360 generates is generated by HW and supplied to a manufacturer's respective provisioning system. For example, the supplied per-part authentication key may be stored in a respective secure key store or database (e.g., the key store 320) that is only accessible by the respective manufacturer's authorized users.

When debug of the target device 340 is required, the remote debugger 330 will authenticate themselves to the server 310. The server 310 will then use the stored per-part authentication key from the key store 320 and a tag generator 312 that utilizes industry standard cryptography algorithms to process the token's message (M) and generate the respective tag (T).

The remote debugger 330 will then send a token comprised of message M and tag T to the target device 340. The device FW 350 will initially receive the token utilizing any suitable token handling technology. In some examples, the device FW 350 will forward the token message M and tag T to the HAM 360 for further authentication. The HAM 360 will invoke the key generator(s) 362, 364 (e.g., a cryptography engine) to regenerate the respective per-part authentication key. Then the tag verifier 366 will be prompted to verify the authenticity of the message M. The tag verification may again use the same cryptograph HW to combine the message M with the internally generated per-part authentication key to HW generate a tag (T2) that is then compared to the token supplied tag T. A compare that shows that the supplied tag T and the generated tag (T2) are identical indicates that the message M is authentic for the target device 340 (e.g., successful authentication), and the integrity of the message M has been preserved.

An authentication success status and the message M may then be passed to the HUM 370 to apply custom rule checking including verification of various device state information to HW ensure that the requested features/unlocks are approved and safe to enable. If the rule checking passes, the HUM 370 will not enable the features. Instead, the HUM 370 will ungate the corresponding features such that the device FW 350 may enable the needed features at the appropriate time. If rule checking fails, however, the HUM 370 will not ungate the corresponding features and the HUM 370 may even take additional security measures.

Various examples may use any suitable symmetric cryptography technology (e.g., or MAC, one-wat function, etc.), so long as both the external systems (e.g., the server 310) and the target devices utilize the same symmetric cryptography technology. Some examples may utilize proprietary cryptography. More preferably, some examples may utilize any suitable industry standard cryptography such as, for example, KECCAK Message Authentication Code (KMAC) technology as described in NIST Special Publication 800-185, dated December 2016.

FIG. 4 shows an example of a system 400 that utilizes KMAC-based token authentication (e.g., KMAC-SHAKE256 (KMAC256)-based FW/HW token authentication. Various operations may be illustrated with reference to dashed circles numbered zero (0) through nine (9) in FIG. 4. At (0), in some HW examples, a HAM 410 uses a HW key generator 412 to generate a per part key (e.g., utilizing a global secret value and OEM and supplier uniquification values to generate a per-part-hardware secret key xPPHS1) that will be stored in respective key store DBs 420. Such initial key generation and exporting for storage may be performed once per key type for the life of the device, or while the device is in a certain state (e.g., manufacturing).

At (1), for normal debug usage, a remote debugger 430 generates a customized message (M) and sends the message M over to a trusted server 440 to generate a MAC tag on M for the target device/part. At (2), the trusted server 440 that has a secure access to the product database gets the per-part unlock key xPPHS1 (e.g., from the secure key store 420) and generates a secure MAC tag T using a KMAC256 algorithm on M and the retrieved key xPPHS. At (3), the server 440 then hands over the tag T to the debugger 430.

At (4), the remote debugger 430 sends the message M and the tag T as an unlock token to the target device. At (5), in some examples, device FW 450, upon receiving the unlock token, sends the message M and the tag T to the HAM 410. At (6), the HAM 410 regenerates a new per part unlock key xPPHS2 (e.g., with the key generator 412 using a KMAC256 algorithm and the various uniquification values). At (7), the HAM 410 uses HW tag verification 414 to compute a KMAC256 tag T2 on the received message M using the per-part unlock key xPPHS2 and to compare the computed tag T2 with the received tag T. At (8), if both tags are the same, then the HAM 410 sends an authentication pass result along with the message M (e.g., that in some examples may carry a configuration request) to a HUM 460. At (9), in some examples, the HUM 460 upon receiving the authentication result, verifies HW rules against a current HW state, FW state, fuse setting(s), etc., to determine if the HUM 460 will ungate the requested configuration settings/modes of the device or unlock the device accordingly.

FIG. 5 shows an example of a key generator 500 for HW KMAC-based per part unlock key generation. In an example operation of the key generator 600, a per-part secret (xPPHS) or a per-part unlock key is first generated through a hash function MAC (HMAC)-based key derivation function (HKDF). To generate the single key xPPHS, a HKDF-Extract algorithm may take a salt value and an initial key material (IKM) value as inputs: HKDF-Extract (salt, IKM). In some examples, HKDF-Extract may utilize KMAC256 to compute a per part unlock key on a salt value and an IKM value, where an OEM uniquification value is input for the salt value and where the IKM value is computed as a concatenation of following inputs a family secret constant (K) and a fuse value. For example, the family secret may correspond to a global secret value and the fuse value may correspond to a supplier uniquification value that the supplier programs into appropriate fuses at provision time (e.g., following supplier test of the device). In some examples, the family secret may correspond to some other combination of the foregoing and/or other values.

In some examples, the per family constant (K) and/or the IKM value may be generated through an in-device physical unclonable function (PUF) circuit. A PUF circuit may provide robust technology to provide a per part constant for the global secret. Given the benefit of the present specification and drawings, those skilled in the art will appreciate that the use of KMAC256 in various of the examples may be replaced by other MAC/KDF schemes in other implementations. A wide variety of technology and techniques may be utilized in various examples to generate a symmetric authentication key that can be used for token authentication that is resistant to cross-OEM attacks.

FIG. 6 shows an example of a key generator 600 that utilizes KMAC cryptographic technology for per part unlock key generation. For example, the key generator 600 may implement or otherwise comply with a recommendation for cryptographic key generation as indicated in NIST SP 800-133 Rev. 2 (published June 2020). Advantageously, examples may support symmetric key generation where suppliers, OEMs, customers, etc. may all utilize industry standard cryptographic technology. The key generator may implement or otherwise comply with a HW KMAC as indicated in NIST SP 800-185 (published December 2016). The key generator 600 may implement a KMAC KDF-Extract function that provides inputs for K (a key bit string of any length, including zero), X (a main input bit string of any length), L (an integer length of the K), and S (an optional customization bit string of any length, including zero, where if no customization is desired, S is set to the empty string). An output of the key generator may be a KMAC function of K and X (e.g., OUT=KMAC (K, X, L, S)). KMAC may be particularly useful in some examples because KMAC is a one-way function.

In some examples, random bit generators (RBGs) 610, 620, 630 may be respectively utilized to provide various secret/unique values as inputs for the key generator 600. In some examples, the RBGs 610, 620, 630 may implement or otherwise comply with a recommendation for random number generation using deterministic random bit generators as indicated in NIST SP 800-90A Rev. 1 (published June 2015).

The inputs available to generate the symmetric authentication key are at least two (K, X). One input may correspond to a global secret, which has various options including HW, SW, FW, PUF, etc., embedded by the device creator. The other input may correspond to a uniquification value that can be unique for each part and that may be provisioned after fabrication, for example during assembly and test. Some examples may combine multiple secrets into one (e.g., through XOR, concatenation, etc.) or into a symmetric key (e.g., utilizing HMAC/KMAC in a KDF-extract capacity where the key input receives a salt, and the main input receives the IKM). Some examples may utilize a global secret as part of the main input of KMAC. In some applications, the global secret may be a combined value.

Additionally, the OEM uniquification value may be preferred for the salt because it is customary that salt values change between key generation (e.g., for different devices) and the global secret may be global for an entire family of devices. In the case there is more than one supplier uniquification value/source, then the second uniquification value (e.g., K2 in FIG. 6) may be concatenated with the constant (K1) as the IKM. Concatenation (e.g., or XOR, etc.) of the two values K1, K2 may add strength to the IKM because the global constant (K1) and the supplier uniquification value (K2) may be orthogonal in terms of methods of maliciously extracting the two constants (e.g., optical vs. electrical vs SW vs layout, etc.) including the skillset and equipment needed to do such malicious extraction. Advantageously, examples that utilize multiple uniquification values owned by multiple provisioning entities may add further levels of protection. Alternatively, in some examples, a PUF circuit may be utilized for the main input (e.g., in place of K1∥K2).

In some examples, the computed xPPHS is stored into the product database before deployment of the device. Note that the device itself does not need to store the per part unlock key. Rather, the device may recompute the per part unlock key from K and X when required.

In an example attack scenario, if the secret key used based on XOR, then OEM-1 can easily compute the other input, the constant, after OEM-1 has the output key and the fuse value. After OEM-1 obtains the constant, then by obtaining OEM-2's uniquification value (i.e. by optical means), OEM-1 can compute the other OEM-2's key. But by using a KMAC, having the output and one of the inputs would not allow OEM-1 to compute the other input, the constant.

Examples that use multiple uniquification values may inhibit or prevent the following attack scenario. OEM-1 wants to unlock parts from OEM-2 using partially provisioned parts (e.g., parts that are provisioned only by the supplier but not yet provisioned by the OEM). If OEM-1 is able to maliciously attack and extract the uniquification value of OEM-2's fully provisioned part and the supplier's uniquification value, then OEM-1 could use OEM-2's uniquification value to provision a new partially provisioned part. But OEM-1 can't provision the supplier's uniquification value because the supplier uniquification value has been pre-provisioned in the OEM-1's partially provisioned value thus OEM-1 cannot make the part produce the same secret (e.g., computed per part unlock key) as OEM-2's unit did when OEM-2 provisioned the part.

FIG. 7 illustrates a computer system or computing device 700 (also referred to as device 700), where technology in one or more circuit blocks of the device 700, in accordance with some examples, provides hardware-based cryptographic protection of tokens.

In some examples, device 700 represents an appropriate computing device, such as a computing tablet, a mobile phone or smart-phone, a laptop, a desktop, an Internet-of-Things (IoT) device, a server, a wearable device, a set-top box, a wireless-enabled e-reader, or the like. It will be understood that certain components are shown generally, and not all components of such a device are shown in device 700.

In an example, the device 700 comprises a SOC 701. An example boundary of the SOC 701 is illustrated using dotted lines in FIG. 7, with some example components being illustrated to be included within SOC 701. However, SOC 701 may include any appropriate components of device 700.

In some examples, device 700 includes processor 704. Processor 704 can include one or more physical devices, such as microprocessors, application processors, microcontrollers, programmable logic devices, processing cores, or other processing means. The processing operations performed by processor 704 include the execution of an operating platform or OS on which applications and/or device functions are executed. The processing operations include operations related to I/O (input/output) with a human user or with other devices, operations related to power management, operations related to connecting computing device 700 to another device, and/or the like. The processing operations may also include operations related to audio I/O and/or display I/O.

In some examples, processor 704 includes multiple processing cores 707a, 707b, 707c (also referred to individually or collectively as core(s) 707). Although merely three cores 707a, 707b, 707c are illustrated in FIG. 7, the processor 704 may include any other appropriate number of processing cores, e.g., tens, or even hundreds of processing cores. Processor cores 707a, 707b, 707c may be implemented on a single integrated circuit (IC) chip. Moreover, the chip may include one or more shared and/or private caches, buses or interconnections, graphics and/or memory controllers, or other components.

In some examples, processor 704 includes cache 706. In an example, sections of cache 706 may be dedicated to individual cores 707 (e.g., a first section of cache 706 dedicated to core 707a, a second section of cache 706 dedicated to core 707b, and so on). In an example, one or more sections of cache 706 may be shared among two or more of cores 707. Cache 706 may be split in different levels, e.g., level 1 (L1) cache, level 2 (L2) cache, level 3 (L3) cache, etc.

In some examples, a core 707 of the processor 704 may include a fetch unit to fetch instructions (including instructions with conditional branches) for execution by the core 707. The instructions may be fetched from any storage devices such as the memory 730. Core 707 may also include a decode unit to decode the fetched instruction. For example, the decode unit may decode the fetched instruction into a plurality of micro-operations. Core 707 may include a schedule unit to perform various operations associated with storing decoded instructions. For example, the schedule unit may hold data from the decode unit until the instructions are ready for dispatch, e.g., until all source values of a decoded instruction become available. In one example, the schedule unit may schedule and/or issue (or dispatch) decoded instructions to an execution unit for execution.

The execution unit may execute the dispatched instructions after they are decoded (e.g., by the decode unit) and dispatched (e.g., by the schedule unit). In an example, the execution unit may include more than one execution unit (such as an imaging computational unit, a graphics computational unit, a general-purpose computational unit, etc.). The execution unit may also perform various arithmetic operations such as addition, subtraction, multiplication, and/or division, and may include one or more an arithmetic logic units (ALUs). In an example, a co-processor (not shown) may perform various arithmetic operations in conjunction with the execution unit.

Further, execution unit may execute instructions out-of-order. Hence, core 707 may be an out-of-order processor core in one example. Core 707 may also include a retirement unit. The retirement unit may retire executed instructions after they are committed. In an example, retirement of the executed instructions may result in processor state being committed from the execution of the instructions, physical registers used by the instructions being de-allocated, etc. The processor 704 may also include a bus unit to enable communication between components of the processor 704 and other components via one or more buses. Processor 704 may also include one or more registers to store data accessed by various components of the cores 707 (such as values related to assigned app priorities and/or sub-system states (modes) association.

In some examples, device 700 comprises connectivity circuitries 731. For example, connectivity circuitries 731 includes hardware devices (e.g., wireless and/or wired connectors and communication hardware) and/or software components (e.g., drivers, protocol stacks), e.g., to enable device 700 to communicate with external devices. Device 700 may be separate from the external devices, such as other computing devices, wireless access points or base stations, etc.

In an example, connectivity circuitries 731 may include multiple different types of connectivity. To generalize, the connectivity circuitries 731 may include cellular connectivity circuitries, wireless connectivity circuitries, etc. Cellular connectivity circuitries of connectivity circuitries 731 refers generally to cellular network connectivity provided by wireless carriers, such as provided via GSM (global system for mobile communications) or variations or derivatives, CDMA (code division multiple access) or variations or derivatives, TDM (time division multiplexing) or variations or derivatives, 3rd Generation Partnership Project (3GPP) Universal Mobile Telecommunications Systems (UMTS) system or variations or derivatives, 3GPP Long-Term Evolution (LTE) system or variations or derivatives, 3GPP LTE-Advanced (LTE-A) system or variations or derivatives, Fifth Generation (5G) wireless system or variations or derivatives, 5G mobile networks system or variations or derivatives, 5G New Radio (NR) system or variations or derivatives, or other cellular service standards. Wireless connectivity circuitries (or wireless interface) of the connectivity circuitries 731 refers to wireless connectivity that is not cellular, and can include personal area networks (such as Bluetooth, Near Field, etc.), local area networks (such as Wi-Fi), and/or wide area networks (such as WiMax), and/or other wireless communication. In an example, connectivity circuitries 731 may include a network interface, such as a wired or wireless interface, e.g., so that a system example may be incorporated into a wireless device, for example, cell phone or personal digital assistant.

In some examples, device 700 comprises control hub 732, which represents hardware devices and/or software components related to interaction with one or more I/O devices. For example, processor 704 may communicate with one or more of display 722, one or more peripheral devices 724, storage devices 727, one or more other external devices 729, etc., via control hub 732. Control hub 732 may be a chipset, a Platform Control Hub (PCH), and/or the like.

For example, control hub 732 illustrates one or more connection points for additional devices that connect to device 700, e.g., through which a user might interact with the system. For example, devices (e.g., devices 729) that can be attached to device 700 include microphone devices, speaker or stereo systems, audio devices, video systems or other display devices, keyboard or keypad devices, or other I/O devices for use with specific applications such as card readers or other devices.

As mentioned above, control hub 732 can interact with audio devices, display 722, etc. For example, input through a microphone or other audio device can provide input or commands for one or more applications or functions of device 700. Additionally, audio output can be provided instead of, or in addition to display output. In another example, if display 722 includes a touch screen, display 722 also acts as an input device, which can be at least partially managed by control hub 732. There can also be additional buttons or switches on computing device 700 to provide I/O functions managed by control hub 732. In one example, control hub 732 manages devices such as accelerometers, cameras, light sensors or other environmental sensors, or other hardware that can be included in device 700. The input can be part of direct user interaction, as well as providing environmental input to the system to influence its operations (such as filtering for noise, adjusting displays for brightness detection, applying a flash for a camera, or other features).

In some examples, control hub 732 may couple to various devices using any appropriate communication protocol, e.g., PCIe (Peripheral Component Interconnect Express), USB (Universal Serial Bus), Thunderbolt, High Definition Multimedia Interface (HDMI), Firewire, etc.

In some examples, display 722 represents hardware (e.g., display devices) and software (e.g., drivers) components that provide a visual and/or tactile display for a user to interact with device 700. Display 722 may include a display interface, a display screen, and/or hardware device used to provide a display to a user. In some examples, display 722 includes a touch screen (or touch pad) device that provides both output and input to a user. In an example, display 722 may communicate directly with the processor 704. Display 722 can be one or more of an internal display device, as in a mobile electronic device or a laptop device or an external display device attached via a display interface (e.g., DisplayPort, etc.). In one example display 722 can be a head mounted display (HMD) such as a stereoscopic display device for use in virtual reality (VR) applications or augmented reality (AR) applications.

In some examples and although not illustrated in the figure, in addition to (or instead of) processor 704, device 700 may include Graphics Processing Unit (GPU) comprising one or more graphics processing cores, which may control one or more aspects of displaying contents on display 722.

Control hub 732 (or platform controller hub) may include hardware interfaces and connectors, as well as software components (e.g., drivers, protocol stacks) to make peripheral connections, e.g., to peripheral devices 724.

It will be understood that device 700 could both be a peripheral device to other computing devices, as well as have peripheral devices connected to it. Device 700 may have a “docking” connector to connect to other computing devices for purposes such as managing (e.g., downloading and/or uploading, changing, synchronizing) content on device 700. Additionally, a docking connector can allow device 700 to connect to certain peripherals that allow computing device 700 to control content output, for example, to audiovisual or other systems.

In addition to a proprietary docking connector or other proprietary connection hardware, device 700 can make peripheral connections via common or standards-based connectors. Common types can include a Universal Serial Bus (USB) connector (which can include any of a number of different hardware interfaces), DisplayPort including MiniDisplayPort (MDP), High Definition Multimedia Interface (HDMI), Firewire, or other types.

In some examples, connectivity circuitries 731 may be coupled to control hub 732, e.g., in addition to, or instead of, being coupled directly to the processor 704. In some examples, display 722 may be coupled to control hub 732, e.g., in addition to, or instead of, being coupled directly to processor 704.

In some examples, device 700 comprises memory 730 coupled to processor 704 via memory interface 734. Memory 730 includes memory devices for storing information in device 700. Memory can include nonvolatile (state does not change if power to the memory device is interrupted) and/or volatile (state is indeterminate if power to the memory device is interrupted) memory devices. Memory 730 can be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory device, phase-change memory device, or some other memory device having suitable performance to serve as process memory. In one example, memory 730 can operate as system memory for device 700, to store data and instructions for use when the one or more processors 704 executes an application or process. Memory 730 can store application data, user data, music, photos, documents, or other data, as well as system data (whether long-term or temporary) related to the execution of the applications and functions of device 700.

Elements of various examples and examples are also provided as a machine-readable medium (e.g., memory 730) for storing the computer-executable instructions (e.g., instructions to implement any other processes discussed herein). The machine-readable medium (e.g., memory 730) may include, but is not limited to, flash memory, optical disks, CD-ROMs, DVD ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, phase change memory (PCM), or other types of machine-readable media suitable for storing electronic or computer-executable instructions. For example, examples of the disclosure may be downloaded as a computer program (e.g., BIOS) which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals via a communication link (e.g., a modem or network connection).

In some examples, device 700 comprises temperature measurement circuitries 740, e.g., for measuring temperature of various components of device 700. In an example, temperature measurement circuitries 740 may be embedded, or coupled or attached to various components, whose temperature are to be measured and monitored. For example, temperature measurement circuitries 740 may measure temperature of (or within) one or more of cores 707a, 707b, 707c, voltage regulator 714, memory 730, a mother-board of SOC 701, and/or any appropriate component of device 700.

In some examples, device 700 comprises power measurement circuitries 742, e.g., for measuring power consumed by one or more components of the device 700. In an example, in addition to, or instead of, measuring power, the power measurement circuitries 742 may measure voltage and/or current. In an example, the power measurement circuitries 742 may be embedded, or coupled or attached to various components, whose power, voltage, and/or current consumption are to be measured and monitored. For example, power measurement circuitries 742 may measure power, current and/or voltage supplied by one or more voltage regulators 714, power supplied to SOC 701, power supplied to device 700, power consumed by processor 704 (or any other component) of device 700, etc.

In some examples, device 700 comprises one or more voltage regulator circuitries, generally referred to as voltage regulator (VR) 714. VR 714 generates signals at appropriate voltage levels, which may be supplied to operate any appropriate components of the device 700. Merely as an example, VR 714 is illustrated to be supplying signals to processor 704 of device 700. In some examples, VR 714 receives one or more Voltage Identification (VID) signals, and generates the voltage signal at an appropriate level, based on the VID signals. Various type of VRs may be utilized for the VR 714. For example, VR 714 may include a “buck” VR, “boost” VR, a combination of buck and boost VRs, low dropout (LDO) regulators, switching DC-DC regulators, etc. Buck VR is generally used in power delivery applications in which an input voltage needs to be transformed to an output voltage in a ratio that is smaller than unity. Boost VR is generally used in power delivery applications in which an input voltage needs to be transformed to an output voltage in a ratio that is larger than unity. In some examples, each processor core has its own VR which is controlled by Power Control Unit (PCU) 710a/b and/or Power Management Integrated Circuit (PMIC) 712. In some examples, each core has a network of distributed LDOs to provide efficient control for power management. The LDOs can be digital, analog, or a combination of digital or analog LDOs.

In some examples, device 700 comprises one or more clock generator circuitries, generally referred to as clock generator 716. Clock generator 716 generates clock signals at appropriate frequency levels, which may be supplied to any appropriate components of device 700. Merely as an example, clock generator 716 is illustrated to be supplying clock signals to processor 704 of device 700. In some examples, clock generator 716 receives one or more Frequency Identification (FID) signals, and generates the clock signals at an appropriate frequency, based on the FID signals.

In some examples, device 700 comprises battery 717 supplying power to various components of device 700. Merely as an example, battery 717 is illustrated to be supplying power to processor 704. Although not illustrated in the figures, device 700 may comprise a charging circuitry, e.g., to recharge the battery, based on Alternating Current (AC) power supply received from an AC adapter.

In some examples, device 700 comprises PCU 710 (also referred to as Power Management Unit (PMU), Power Controller, etc.). In an example, some sections of PCU 710 may be implemented by one or more processing cores 707, and these sections of PCU 710 are symbolically illustrated using a dotted box and labelled PCU 710a. In an example, some other sections of PCU 710 may be implemented outside the processing cores 707, and these sections of PCU 710 are symbolically illustrated using a dotted box and labelled as PCU 710b. PCU 710 may implement various power management operations for device 700. PCU 710 may include hardware interfaces, hardware circuitries, connectors, registers, etc., as well as software components (e.g., drivers, protocol stacks), to implement various power management operations for device 700.

In some examples, device 700 comprises PMIC 712, e.g., to implement various power management operations for device 700. In some examples, PMIC 712 is a Reconfigurable Power Management ICs (RPMICs) and/or an IMVP (Intel® Mobile Voltage Positioning). In an example, the PMIC is within an IC chip separate from processor 704. The PMIC 712 may implement various power management operations for device 700. PMIC 712 may include hardware interfaces, hardware circuitries, connectors, registers, etc., as well as software components (e.g., drivers, protocol stacks), to implement various power management operations for device 700.

In an example, device 700 comprises one or both PCU 710 or PMIC 712. In an example, any one of PCU 710 or PMIC 712 may be absent in device 700, and hence, these components are illustrated using dotted lines.

Various power management operations of device 700 may be performed by PCU 710, by PMIC 712, or by a combination of PCU 710 and PMIC 712. For example, PCU 710 and/or PMIC 712 may select a power state (e.g., P-state) for various components of device 700. For example, PCU 710 and/or PMIC 712 may select a power state (e.g., in accordance with the ACPI (Advanced Configuration and Power Interface) specification) for various components of device 700. Merely as an example, PCU 710 and/or PMIC 712 may cause various components of the device 700 to transition to a sleep state, to an active state, to an appropriate C state (e.g., CO state, or another appropriate C state, in accordance with the ACPI specification), etc. In an example, PCU 710 and/or PMIC 712 may control a voltage output by VR 714 and/or a frequency of a clock signal output by the clock generator, e.g., by outputting the VID signal and/or the FID signal, respectively. In an example, PCU 710 and/or PMIC 712 may control battery power usage, charging of battery 717, and features related to power saving operation. In accordance with some examples, technology for hardware-based cryptographic protection of tokens may be integrated with one or more of the PCU 710 and/or PMIC 712.

The clock generator 716 can comprise a phase locked loop (PLL), frequency locked loop (FLL), or any suitable clock source. In some examples, each core of processor 704 has its own clock source. As such, each core can operate at a frequency independent of the frequency of operation of the other core. In some examples, PCU 710 and/or PMIC 712 performs adaptive or dynamic frequency scaling or adjustment. For example, clock frequency of a processor core can be increased if the core is not operating at its maximum power consumption threshold or limit. In some examples, PCU 710 and/or PMIC 712 determines the operating condition of each core of a processor, and opportunistically adjusts frequency and/or power supply voltage of that core without the core clocking source (e.g., PLL of that core) losing lock when the PCU 710 and/or PMIC 712 determines that the core is operating below a target performance level. For example, if a core is drawing current from a power supply rail less than a total current allocated for that core or processor 704, then PCU 710 and/or PMIC 712 can temporarily increase the power draw for that core or processor 704 (e.g., by increasing clock frequency and/or power supply voltage level) so that the core or processor 704 can perform at higher performance level. As such, voltage and/or frequency can be increased temporarily for processor 704 without violating product reliability.

In an example, PCU 710 and/or PMIC 712 may perform power management operations, e.g., based at least in part on receiving measurements from power measurement circuitries 742, temperature measurement circuitries 740, charge level of battery 717, and/or any other appropriate information that may be used for power management. To that end, PMIC 712 is communicatively coupled to one or more sensors to sense/detect various values/variations in one or more factors having an effect on power/thermal behavior of the system/platform. Examples of the one or more factors include electrical current, voltage droop, temperature, operating frequency, operating voltage, power consumption, inter-core communication activity, etc. One or more of these sensors may be provided in physical proximity (and/or thermal contact/coupling) with one or more components or logic/IP blocks of a computing system. Additionally, sensor(s) may be directly coupled to PCU 710 and/or PMIC 712 in at least one example to allow PCU 710 and/or PMIC 712 to manage processor core energy at least in part based on value(s) detected by one or more of the sensors.

Also illustrated is an example software stack of device 700 (although not all elements of the software stack are illustrated). Merely as an example, processors 704 may execute application programs 750, OS 752, one or more Power Management (PM) specific application programs (e.g., generically referred to as PM applications 757), and/or the like. PM applications 757 may also be executed by the PCU 710 and/or PMIC 712. OS 752 may also include one or more PM applications 756a, 756b, 756c (e.g., including an OSPM). The OS 752 may also include various drivers 754a, 754b, 754c, etc., some of which may be specific for power management purposes. In some examples, device 700 may further comprise a Basic Input/Output System (BIOS) 720. BIOS 720 may communicate with OS 752 (e.g., via one or more drivers 754), communicate with processors 704, etc.

For example, one or more of PM applications 757, 756, drivers 754, BIOS 720, etc. may be used to implement power management specific tasks, e.g., to control voltage and/or frequency of various components of device 700, to control wake-up state, sleep state, and/or any other appropriate power state of various components of device 700, control battery power usage, charging of the battery 717, features related to power saving operation, etc.

In some examples, multiple tasks are variously performed each with a respective one of application programs 750 and/or OS 752. At a given time during operation of computing device 700, at least some of the tasks each result in, or otherwise correspond to, a respective input being received via one or more human interface devices (HIDs). Said tasks each further include or otherwise correspond to a different respective data flow by which computing device 700 communicates with one or more networks (e.g., via connectivity circuitries 731). User input and/or other characteristics of user behavior are detected with the one or more HIDs, and provide a basis for detecting a relative interest by the user in one task over one or more other co-pending tasks. By way of illustration and not limitation, OS 752 provides a kernel space in which Qos logic, a filter driver, and/or other suitable software logic executes to detect a task which is currently of relatively greater user interest, and to prioritize a data flow which corresponds to said task. An indication of the relative prioritization of tasks (e.g., and the relative prioritization of corresponding data flows) is communicated, for example, from processor 704 to connectivity circuitries 731. Based on such signaling, connectivity circuitries 731 variously processes data packets according to the prioritization of tasks relative to each other.

In accordance with some examples, the SOC 701 further includes a HAM 760 and a HUM 765. The HAM 760 and/or the HUM 765 includes one or more features or aspects of the various other examples described herein for hardware-based cryptographic protection of tokens.

The HAM 760 and/or the HUM 765 may be implemented as separate circuit blocks. Alternatively, all or portions of the HAM 760 and/or HUM 765 may be implemented in one or more other circuit blocks of the SOC 701 (e.g., or outside the SOC 701), including the processor 704, the control hub 732, the PMIC 712 and/or a PCU (e.g., such as PCU 710a inside the core 707a, or such as the PCU 710b outside the processor 704).

Those skilled in the art will appreciate that a wide variety of devices may benefit from the foregoing examples. The following exemplary core architectures, processors, and computer architectures are non-limiting examples of devices that may beneficially incorporate examples of the technology described herein.

Example Computer Architectures.

Detailed below are descriptions of example computer architectures. Other system designs and configurations known in the arts for laptop, desktop, and handheld personal computers (PC) s, personal digital assistants, engineering workstations, servers, disaggregated servers, network devices, network hubs, switches, routers, embedded processors, digital signal processors (DSPs), graphics devices, video game devices, set-top boxes, micro controllers, cell phones, portable media players, hand-held devices, and various other electronic devices, are also suitable. In general, a variety of systems or electronic devices capable of incorporating a processor and/or other execution logic as disclosed herein are generally suitable. Some examples may be particularly beneficial for parallel computing applications, a GPU (e.g., as part of a discrete graphics card), a SIMD processor, an AI processor, ML applications, and neural network processing applications.

FIG. 8 illustrates an example computing system. Multiprocessor system 800 is an interfaced system and includes a plurality of processors or cores including a first processor 870 and a second processor 880 coupled via an interface 850 such as a point-to-point (P-P) interconnect, a fabric, and/or bus. In some examples, the first processor 870 and the second processor 880 are homogeneous. In some examples, first processor 870 and the second processor 880 are heterogenous. Though the example system 800 is shown to have two processors, the system may have three or more processors, or may be a single processor system. In some examples, the computing system is a system on a chip (SOC).

Processors 870 and 880 are shown including integrated memory controller (IMC) circuitry 872 and 882, respectively. Processor 870 also includes interface circuits 876 and 878; similarly, second processor 880 includes interface circuits 886 and 888. Processors 870, 880 may exchange information via the interface 850 using interface circuits 878, 888. IMCs 872 and 882 couple the processors 870, 880 to respective memories, namely a memory 832 and a memory 834, which may be portions of main memory locally attached to the respective processors.

Processors 870, 880 may each exchange information with a network interface (NW I/F) 890 via individual interfaces 852, 854 using interface circuits 876, 894, 886, 898. The network interface 890 (e.g., one or more of an interconnect, bus, and/or fabric, and in some examples is a chipset) may optionally exchange information with a coprocessor 838 via an interface circuit 892. In some examples, the coprocessor 838 is a special-purpose processor, such as, for example, a high-throughput processor, a network or communication processor, compression engine, graphics processor, general purpose graphics processing unit (GPGPU), neural-network processing unit (NPU), embedded processor, or the like.

A shared cache (not shown) may be included in either processor 870, 880 or outside of both processors, yet connected with the processors via an interface such as P-P interconnect, such that either or both processors' local cache information may be stored in the shared cache if a processor is placed into a low power mode.

Network interface 890 may be coupled to a first interface 816 via interface circuit 896. In some examples, first interface 816 may be an interface such as a Peripheral Component Interconnect (PCI) interconnect, a PCI Express interconnect or another I/O interconnect. In some examples, first interface 816 is coupled to a power control unit (PCU) 817, which may include circuitry, software, and/or firmware to perform power management operations with regard to the processors 870, 880 and/or co-processor 838. PCU 817 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate the appropriate regulated voltage. PCU 817 also provides control information to control the operating voltage generated. In various examples, PCU 817 may include a variety of power management logic units (circuitry) to perform hardware-based power management. Such power management may be wholly processor controlled (e.g., by various processor hardware, and which may be triggered by workload and/or power, thermal or other processor constraints) and/or the power management may be performed responsive to external sources (such as a platform or power management source or system software).

PCU 817 is illustrated as being present as logic separate from the processor 870 and/or processor 880. In other cases, PCU 817 may execute on a given one or more of cores (not shown) of processor 870 or 880. In some cases, PCU 817 may be implemented as a microcontroller (dedicated or general-purpose) or other control logic configured to execute its own dedicated power management code, sometimes referred to as P-code. In yet other examples, power management operations to be performed by PCU 817 may be implemented externally to a processor, such as by way of a separate power management integrated circuit (PMIC) or another component external to the processor. In yet other examples, power management operations to be performed by PCU 817 may be implemented within BIOS or other system software.

Various I/O devices 814 may be coupled to first interface 816, along with a bus bridge 818 which couples first interface 816 to a second interface 820. In some examples, one or more additional processor(s) 815, such as coprocessors, high throughput many integrated core (MIC) processors, GPGPUs, accelerators (such as graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays (FPGAs), or any other processor, are coupled to first interface 816. In some examples, second interface 820 may be a low pin count (LPC) interface. Various devices may be coupled to second interface 820 including, for example, a keyboard and/or mouse 822, communication devices 827 and storage circuitry 828. Storage circuitry 828 may be one or more non-transitory machine-readable storage media as described below, such as a disk drive or other mass storage device which may include instructions/code and data 830. Further, an audio I/O 824 may be coupled to second interface 820. Note that other architectures than the point-to-point architecture described above are possible. For example, instead of the point-to-point architecture, a system such as multiprocessor system 800 may implement a multi-drop interface or other such architecture.

Example Core Architectures, Processors, and Computer Architectures.

Processor cores may be implemented in different ways, for different purposes, and in different processors. For instance, implementations of such cores may include: 1) a general purpose in-order core intended for general-purpose computing; 2) a high-performance general purpose out-of-order core intended for general-purpose computing; 3) a special purpose core intended primarily for graphics and/or scientific (throughput) computing. Implementations of different processors may include: 1) a CPU including one or more general purpose in-order cores intended for general-purpose computing and/or one or more general purpose out-of-order cores intended for general-purpose computing; and 2) a coprocessor including one or more special purpose cores intended primarily for graphics and/or scientific (throughput) computing. Such different processors lead to different computer system architectures, which may include: 1) the coprocessor on a separate chip from the CPU; 2) the coprocessor on a separate die in the same package as a CPU; 3) the coprocessor on the same die as a CPU (in which case, such a coprocessor is sometimes referred to as special purpose logic, such as integrated graphics and/or scientific (throughput) logic, or as special purpose cores); and 4) a system on a chip (SOC) that may be included on the same die as the described CPU (sometimes referred to as the application core(s) or application processor(s)), the above described coprocessor, and additional functionality. Example core architectures are described next, followed by descriptions of example processors and computer architectures.

FIG. 9 illustrates a block diagram of an example processor and/or SOC 900 that may have one or more cores and an integrated memory controller. The solid lined boxes illustrate a processor 900 with a single core 902(A), system agent unit circuitry 910, and a set of one or more interface controller unit(s) circuitry 916, while the optional addition of the dashed lined boxes illustrates an alternative processor 900 with multiple cores 902(A)-(N), a set of one or more integrated memory controller unit(s) circuitry 914 in the system agent unit circuitry 910, and special purpose logic 908, as well as a set of one or more interface controller units circuitry 916. Note that the processor 900 may be one of the processors 870 or 880, or co-processor 838 or 815 of FIG. 8.

Thus, different implementations of the processor 900 may include: 1) a CPU with the special purpose logic 908 being integrated graphics and/or scientific (throughput) logic (which may include one or more cores, not shown), and the cores 902(A)-(N) being one or more general purpose cores (e.g., general purpose in-order cores, general purpose out-of-order cores, or a combination of the two); 2) a coprocessor with the cores 902(A)-(N) being a large number of special purpose cores intended primarily for graphics and/or scientific (throughput); and 3) a coprocessor with the cores 902(A)-(N) being a large number of general purpose in-order cores. Thus, the processor 900 may be a general-purpose processor, coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit), a high throughput many integrated core (MIC) coprocessor (including 30 or more cores), embedded processor, or the like. The processor may be implemented on one or more chips. The processor 900 may be a part of and/or may be implemented on one or more substrates using any of a number of process technologies, such as, for example, complementary metal oxide semiconductor (CMOS), bipolar CMOS (BiCMOS), P-type metal oxide semiconductor (PMOS), or N-type metal oxide semiconductor (NMOS).

A memory hierarchy includes one or more levels of cache unit(s) circuitry 904(A)-(N) within the cores 902(A)-(N), a set of one or more shared cache unit(s) circuitry 906, and external memory (not shown) coupled to the set of integrated memory controller unit(s) circuitry 914. The set of one or more shared cache unit(s) circuitry 906 may include one or more mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, such as a last level cache (LLC), and/or combinations thereof. While in some examples interface network circuitry 912 (e.g., a ring interconnect) interfaces the special purpose logic 908 (e.g., integrated graphics logic), the set of shared cache unit(s) circuitry 906, and the system agent unit circuitry 910, alternative examples use any number of well-known techniques for interfacing such units. In some examples, coherency is maintained between one or more of the shared cache unit(s) circuitry 906 and cores 902(A)-(N). In some examples, interface controller units circuitry 916 couple the cores 902 to one or more other devices 918 such as one or more I/O devices, storage, one or more communication devices (e.g., wireless networking, wired networking, etc.), etc.

In some examples, one or more of the cores 902(A)-(N) are capable of multi-threading. The system agent unit circuitry 910 includes those components coordinating and operating cores 902(A)-(N). The system agent unit circuitry 910 may include, for example, power control unit (PCU) circuitry and/or display unit circuitry (not shown). The PCU may be or may include logic and components needed for regulating the power state of the cores 902(A)-(N) and/or the special purpose logic 908 (e.g., integrated graphics logic). The display unit circuitry is for driving one or more externally connected displays.

The cores 902(A)-(N) may be homogenous in terms of instruction set architecture (ISA). Alternatively, the cores 902(A)-(N) may be heterogeneous in terms of ISA; that is, a subset of the cores 902(A)-(N) may be capable of executing an ISA, while other cores may be capable of executing only a subset of that ISA or another ISA.

Example Core Architectures-In-order and out-of-order core block diagram.

FIG. 10A is a block diagram illustrating both an example in-order pipeline and an example register renaming, out-of-order issue/execution pipeline according to examples. FIG. 10B is a block diagram illustrating both an example in-order architecture core and an example register renaming, out-of-order issue/execution architecture core to be included in a processor according to examples. The solid lined boxes in FIGS. 10A-B illustrate the in-order pipeline and in-order core, while the optional addition of the dashed lined boxes illustrates the register renaming, out-of-order issue/execution pipeline and core. Given that the in-order aspect is a subset of the out-of-order aspect, the out-of-order aspect will be described.

In FIG. 10A, a processor pipeline 1000 includes a fetch stage 1002, an optional length decoding stage 1004, a decode stage 1006, an optional allocation (Alloc) stage 1008, an optional renaming stage 1010, a schedule (also known as a dispatch or issue) stage 1012, an optional register read/memory read stage 1014, an execute stage 1016, a write back/memory write stage 1018, an optional exception handling stage 1022, and an optional commit stage 1024. One or more operations can be performed in each of these processor pipeline stages. For example, during the fetch stage 1002, one or more instructions are fetched from instruction memory, and during the decode stage 1006, the one or more fetched instructions may be decoded, addresses (e.g., load store unit (LSU) addresses) using forwarded register ports may be generated, and branch forwarding (e.g., immediate offset or a link register (LR)) may be performed. In one example, the decode stage 1006 and the register read/memory read stage 1014 may be combined into one pipeline stage. In one example, during the execute stage 1016, the decoded instructions may be executed, LSU address/data pipelining to an Advanced Microcontroller Bus (AMB) interface may be performed, multiply and add operations may be performed, arithmetic operations with branch results may be performed, etc.

By way of example, the example register renaming, out-of-order issue/execution architecture core of FIG. 10B may implement the pipeline 1000 as follows: 1) the instruction fetch circuitry 1038 performs the fetch and length decoding stages 1002 and 1004; 2) the decode circuitry 1040 performs the decode stage 1006; 3) the rename/allocator unit circuitry 1052 performs the allocation stage 1008 and renaming stage 1010; 4) the scheduler(s) circuitry 1056 performs the schedule stage 1012; 5) the physical register file(s) circuitry 1058 and the memory unit circuitry 1070 perform the register read/memory read stage 1014; the execution cluster(s) 1060 perform the execute stage 1016; 6) the memory unit circuitry 1070 and the physical register file(s) circuitry 1058 perform the write back/memory write stage 1018; 7) various circuitry may be involved in the exception handling stage 1022; and 8) the retirement unit circuitry 1054 and the physical register file(s) circuitry 1058 perform the commit stage 1024.

FIG. 10B shows a processor core 1090 including front-end unit circuitry 1030 coupled to execution engine unit circuitry 1050, and both are coupled to memory unit circuitry 1070. The core 1090 may be a reduced instruction set architecture computing (RISC) core, a complex instruction set architecture computing (CISC) core, a very long instruction word (VLIW) core, or a hybrid or alternative core type. As yet another option, the core 1090 may be a special-purpose core, such as, for example, a network or communication core, compression engine, coprocessor core, general purpose computing graphics processing unit (GPGPU) core, graphics core, or the like.

The front-end unit circuitry 1030 may include branch prediction circuitry 1032 coupled to instruction cache circuitry 1034, which is coupled to an instruction translation lookaside buffer (TLB) 1036, which is coupled to instruction fetch circuitry 1038, which is coupled to decode circuitry 1040. In one example, the instruction cache circuitry 1034 is included in the memory unit circuitry 1070 rather than the front-end circuitry 1030. The decode circuitry 1040 (or decoder) may decode instructions, and generate as an output one or more micro-operations, micro-code entry points, microinstructions, other instructions, or other control signals, which are decoded from, or which otherwise reflect, or are derived from, the original instructions. The decode circuitry 1040 may further include address generation unit (AGU, not shown) circuitry. In one example, the AGU generates an LSU address using forwarded register ports, and may further perform branch forwarding (e.g., immediate offset branch forwarding, LR register branch forwarding, etc.). The decode circuitry 1040 may be implemented using various different mechanisms. Examples of suitable mechanisms include, but are not limited to, look-up tables, hardware implementations, programmable logic arrays (PLAs), microcode read only memories (ROMs), etc. In one example, the core 1090 includes a microcode ROM (not shown) or other medium that stores microcode for certain macroinstructions (e.g., in decode circuitry 1040 or otherwise within the front-end circuitry 1030). In one example, the decode circuitry 1040 includes a micro-operation (micro-op) or operation cache (not shown) to hold/cache decoded operations, micro-tags, or micro-operations generated during the decode or other stages of the processor pipeline 1000. The decode circuitry 1040 may be coupled to rename/allocator unit circuitry 1052 in the execution engine circuitry 1050.

The execution engine circuitry 1050 includes the rename/allocator unit circuitry 1052 coupled to retirement unit circuitry 1054 and a set of one or more scheduler(s) circuitry 1056. The scheduler(s) circuitry 1056 represents any number of different schedulers, including reservations stations, central instruction window, etc. In some examples, the scheduler(s) circuitry 1056 can include arithmetic logic unit (ALU) scheduler/scheduling circuitry, ALU queues, address generation unit (AGU) scheduler/scheduling circuitry, AGU queues, etc. The scheduler(s) circuitry 1056 is coupled to the physical register file(s) circuitry 1058. Each of the physical register file(s) circuitry 1058 represents one or more physical register files, different ones of which store one or more different data types, such as scalar integer, scalar floating-point, packed integer, packed floating-point, vector integer, vector floating-point, status (e.g., an instruction pointer that is the address of the next instruction to be executed), etc. In one example, the physical register file(s) circuitry 1058 includes vector registers unit circuitry, writemask registers unit circuitry, and scalar register unit circuitry. These register units may provide architectural vector registers, vector mask registers, general-purpose registers, etc. The physical register file(s) circuitry 1058 is coupled to the retirement unit circuitry 1054 (also known as a retire queue or a retirement queue) to illustrate various ways in which register renaming and out-of-order execution may be implemented (e.g., using a reorder buffer(s) (ROB(s)) and a retirement register file(s); using a future file(s), a history buffer(s), and a retirement register file(s); using a register maps and a pool of registers; etc.). The retirement unit circuitry 1054 and the physical register file(s) circuitry 1058 are coupled to the execution cluster(s) 1060. The execution cluster(s) 1060 includes a set of one or more execution unit(s) circuitry 1062 and a set of one or more memory access circuitry 1064. The execution unit(s) circuitry 1062 may perform various arithmetic, logic, floating-point or other types of operations (e.g., shifts, addition, subtraction, multiplication) and on various types of data (e.g., scalar integer, scalar floating-point, packed integer, packed floating-point, vector integer, vector floating-point). While some examples may include a number of execution units or execution unit circuitry dedicated to specific functions or sets of functions, other examples may include only one execution unit circuitry or multiple execution units/execution unit circuitry that all perform all functions. The scheduler(s) circuitry 1056, physical register file(s) circuitry 1058, and execution cluster(s) 1060 are shown as being possibly plural because certain examples create separate pipelines for certain types of data/operations (e.g., a scalar integer pipeline, a scalar floating-point/packed integer/packed floating-point/vector integer/vector floating-point pipeline, and/or a memory access pipeline that each have their own scheduler circuitry, physical register file(s) circuitry, and/or execution cluster—and in the case of a separate memory access pipeline, certain examples are implemented in which only the execution cluster of this pipeline has the memory access unit(s) circuitry 1064). It should also be understood that where separate pipelines are used, one or more of these pipelines may be out-of-order issue/execution and the rest in-order.

In some examples, the execution engine unit circuitry 1050 may perform load store unit (LSU) address/data pipelining to an Advanced Microcontroller Bus (AMB) interface (not shown), and address phase and writeback, data phase load, store, and branches.

The set of memory access circuitry 1064 is coupled to the memory unit circuitry 1070, which includes data TLB circuitry 1072 coupled to data cache circuitry 1074 coupled to level 2 (L2) cache circuitry 1076. In one example, the memory access circuitry 1064 may include load unit circuitry, store address unit circuitry, and store data unit circuitry, each of which is coupled to the data TLB circuitry 1072 in the memory unit circuitry 1070. The instruction cache circuitry 1034 is further coupled to the level 2 (L2) cache circuitry 1076 in the memory unit circuitry 1070. In one example, the instruction cache 1034 and the data cache 1074 are combined into a single instruction and data cache (not shown) in L2 cache circuitry 1076, level 3 (L3) cache circuitry (not shown), and/or main memory. The L2 cache circuitry 1076 is coupled to one or more other levels of cache and eventually to a main memory.

The core 1090 may support one or more instructions sets (e.g., the x86 instruction set architecture (optionally with some extensions that have been added with newer versions); the MIPS instruction set architecture; the ARM instruction set architecture (optionally with optional additional extensions such as NEON)), including the instruction(s) described herein. In one example, the core 1090 includes logic to support a packed data instruction set architecture extension (e.g., AVX1, AVX2), thereby allowing the operations used by many multimedia applications to be performed using packed data.

Example Execution Unit(s) Circuitry.

FIG. 11 illustrates examples of execution unit(s) circuitry, such as execution unit(s) circuitry 1062 of FIG. 10B. As illustrated, execution unit(s) circuitry 1062 may include one or more ALU circuits 1101, optional vector/single instruction multiple data (SIMD) circuits 1103, load/store circuits 1105, branch/jump circuits 1107, and/or Floating-point unit (FPU) circuits 1109. ALU circuits 1101 perform integer arithmetic and/or Boolean operations. Vector/SIMD circuits 1103 perform vector/SIMD operations on packed data (such as SIMD/vector registers). Load/store circuits 1105 execute load and store instructions to load data from memory into registers or store from registers to memory. Load/store circuits 1105 may also generate addresses. Branch/jump circuits 1107 cause a branch or jump to a memory address depending on the instruction. FPU circuits 1109 perform floating-point arithmetic. The width of the execution unit(s) circuitry 1062 varies depending upon the example and can range from 16-bit to 1,024-bit, for example. In some examples, two or more smaller execution units are logically combined to form a larger execution unit (e.g., two 128-bit execution units are logically combined to form a 256-bit execution unit).

Example Register Architecture.

FIG. 12 is a block diagram of a register architecture 1200 according to some examples. As illustrated, the register architecture 1200 includes vector/SIMD registers 1210 that vary from 128-bit to 1,024 bits width. In some examples, the vector/SIMD registers 1210 are physically 512-bits and, depending upon the mapping, only some of the lower bits are used. For example, in some examples, the vector/SIMD registers 1210 are ZMM registers which are 512 bits: the lower 256 bits are used for YMM registers and the lower 128 bits are used for XMM registers. As such, there is an overlay of registers. In some examples, a vector length field selects between a maximum length and one or more other shorter lengths, where each such shorter length is half the length of the preceding length. Scalar operations are operations performed on the lowest order data element position in a ZMM/YMM/XMM register; the higher order data element positions are either left the same as they were prior to the instruction or zeroed depending on the example.

In some examples, the register architecture 1200 includes writemask/predicate registers 1215. For example, in some examples, there are 8 writemask/predicate registers (sometimes called k0 through k7) that are each 16-bit, 32-bit, 64-bit, or 128-bit in size. Writemask/predicate registers 1215 may allow for merging (e.g., allowing any set of elements in the destination to be protected from updates during the execution of any operation) and/or zeroing (e.g., zeroing vector masks allow any set of elements in the destination to be zeroed during the execution of any operation). In some examples, each data element position in a given writemask/predicate register 1215 corresponds to a data element position of the destination. In other examples, the writemask/predicate registers 1215 are scalable and consists of a set number of enable bits for a given vector element (e.g., 8 enable bits per 64-bit vector element).

The register architecture 1200 includes a plurality of general-purpose registers 1225. These registers may be 16-bit, 32-bit, 64-bit, etc. and can be used for scalar operations. In some examples, these registers are referenced by the names RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP, and R8 through R15.

In some examples, the register architecture 1200 includes scalar floating-point (FP) register file 1245 which is used for scalar floating-point operations on 32/64/80-bit floating-point data using the x87 instruction set architecture extension or as MMX registers to perform operations on 64-bit packed integer data, as well as to hold operands for some operations performed between the MMX and XMM registers.

One or more flag registers 1240 (e.g., EFLAGS, RFLAGS, etc.) store status and control information for arithmetic, compare, and system operations. For example, the one or more flag registers 1240 may store condition code information such as carry, parity, auxiliary carry, zero, sign, and overflow. In some examples, the one or more flag registers 1240 are called program status and control registers.

Segment registers 1220 contain segment points for use in accessing memory. In some examples, these registers are referenced by the names CS, DS, SS, ES, FS, and GS.

Machine specific registers (MSRs) 1235 control and report on processor performance. Most MSRs 1235 handle system-related functions and are not accessible to an application program. Machine check registers 1260 consist of control, status, and error reporting MSRs that are used to detect and report on hardware errors.

One or more instruction pointer register(s) 1230 store an instruction pointer value. Control register(s) 1255 (e.g., CR0-CR4) determine the operating mode of a processor (e.g., processor 870, 880, 838, 815, and/or 900) and the characteristics of a currently executing task. Debug registers 1250 control and allow for the monitoring of a processor or core's debugging operations.

Memory (mem) management registers 1265 specify the locations of data structures used in protected mode memory management. These registers may include a global descriptor table register (GDTR), interrupt descriptor table register (IDTR), task register, and a local descriptor table register (LDTR) register.

Alternative examples may use wider or narrower registers. Additionally, alternative examples may use more, less, or different register files and registers. The register architecture 1200 may, for example, be used in register file/memory, or physical register file(s) circuitry 1058.

Emulation (including binary translation, code morphing, etc.).

In some cases, an instruction converter may be used to convert an instruction from a source instruction set architecture to a target instruction set architecture. For example, the instruction converter may translate (e.g., using static binary translation, dynamic binary translation including dynamic compilation), morph, emulate, or otherwise convert an instruction to one or more other instructions to be processed by the core. The instruction converter may be implemented in software, hardware, firmware, or a combination thereof. The instruction converter may be on processor, off processor, or part on and part off processor.

FIG. 13 is a block diagram illustrating the use of a software instruction converter to convert binary instructions in a source ISA to binary instructions in a target ISA according to examples. In the illustrated example, the instruction converter is a software instruction converter, although alternatively the instruction converter may be implemented in software, firmware, hardware, or various combinations thereof. FIG. 13 shows a program in a high-level language 1302 may be compiled using a first ISA compiler 1304 to generate first ISA binary code 1306 that may be natively executed by a processor with at least one first ISA core 1316. The processor with at least one first ISA core 1316 represents any processor that can perform substantially the same functions as an Intel® processor with at least one first ISA core by compatibly executing or otherwise processing (1) a substantial portion of the first ISA or (2) object code versions of applications or other software targeted to run on an Intel processor with at least one first ISA core, in order to achieve substantially the same result as a processor with at least one first ISA core. The first ISA compiler 1304 represents a compiler that is operable to generate first ISA binary code 1306 (e.g., object code) that can, with or without additional linkage processing, be executed on the processor with at least one first ISA core 1316. Similarly, FIG. 13 shows the program in the high-level language 1302 may be compiled using an alternative ISA compiler 1308 to generate alternative ISA binary code 1310 that may be natively executed by a processor without a first ISA core 1314. The instruction converter 1312 is used to convert the first ISA binary code 1306 into code that may be natively executed by the processor without a first ISA core 1314. This converted code is not necessarily to be the same as the alternative ISA binary code 1310; however, the converted code will accomplish the general operation and be made up of instructions from the alternative ISA. Thus, the instruction converter 1312 represents software, firmware, hardware, or a combination thereof that, through emulation, simulation or any other process, allows a processor or other electronic device that does not have a first ISA processor or core to execute the first ISA binary code 1306.

Techniques and architectures for hardware-based cryptographic protection of tokens are described herein. In the above description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of certain examples. It will be apparent, however, to one skilled in the art that certain examples can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the description

ADDITIONAL NOTES AND EXAMPLES

Example 1 includes an apparatus, comprising first circuitry that is to be selectively locked and unlocked, second circuitry to process one or more tokens including an unlock token for the first circuitry, and hardware authentication circuitry to authenticate the unlock token for the first circuitry in response to a request from the second circuitry.

Example 2 includes the apparatus of Example 1, wherein the hardware authentication circuitry is further to generate a per-part unlock key, and provide the per-part unlock key for secure external storage.

Example 3 includes the apparatus of any of Examples 1 to 2, wherein the hardware authentication circuitry is further to generate a per-part unlock key at runtime, compute a tag based on the generated per-part unlock key, and determine whether the unlock token is authentic based on a comparison of the computed tag and an unlock tag from the unlock token.

Example 4 includes the apparatus of Example 3, wherein the hardware authentication circuitry is further to utilize symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime.

Example 5 includes the apparatus of any of Examples 1 to 4, further comprising hardware ungate circuitry to selectively gate and ungate one or more features of the first circuitry in response to an indication of a result of the authentication.

Example 6 includes the apparatus of Example 5, wherein the hardware ungate circuitry is further to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate one or more features of the first circuitry based on the comparison.

Example 7 includes the apparatus of any of Examples 1 to 6, wherein the second circuitry is further to selectively enable and disable one or more ungated features of the first circuitry.

Example 8 includes an apparatus, comprising first circuitry that is to be selectively locked and unlocked, second circuitry to process one or more tokens including an unlock token for the first circuitry, and hardware ungate circuitry to selectively gate and ungate one or more features of the first circuitry in response to an indication of whether the unlock token is authentic.

Example 9 includes the apparatus of Example 8, wherein the hardware ungate circuitry is further to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate one or more features of the first circuitry based on the comparison.

Example 10 includes the apparatus of Example 9, wherein the second circuitry is further to selectively enable and disable one or more ungated features of the first circuitry.

Example 11 includes the apparatus of any of Examples 8 to 10, further comprising hardware authentication circuitry to authenticate the unlock token for the first circuitry in response to a request from the second circuitry.

Example 12 includes the apparatus of Example 11, wherein the hardware authentication circuitry is further to generate a per-part unlock key, and provide the per-part unlock key for secure external storage.

Example 13 includes the apparatus of any of Examples 11 to 12, wherein the hardware authentication circuitry is further to generate a per-part unlock key at runtime, compute a tag based on the generated per-part unlock key, and determine whether the unlock token is authentic based on a comparison of the computed tag and an unlock tag from the unlock token.

Example 14 includes the apparatus of Example 13, wherein the hardware authentication circuitry is further to utilize symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime.

Example 15 includes an electronic device, comprising one or more technical features to be selectively locked and unlocked, memory to store firmware to process one or more tokens including an unlock token for at least one of the one or more technical features, wherein the unlock token includes a message portion and a tag portion, and hardware authentication circuitry to authenticate the message portion of the unlock token based at least in part on the tag portion of the unlock token.

Example 16 includes the electronic device of Example 15, wherein the hardware authentication circuitry is further to generate a per-device unlock key at provision time that is unique to the device, and provide the generated per-device unlock key for secure external storage.

Example 17 includes the electronic device of any of Examples 15 to 16, wherein the hardware authentication circuitry is further to generate a per-device unlock key at runtime based at least in part on the message portion of the unlock token, compute a tag based on the generated per-device unlock key, and determine whether the message portion of the unlock token is authentic based on a comparison of the computed tag and the tag portion of the unlock token.

Example 18 includes the electronic device of Example 17, wherein the hardware authentication circuitry is further to utilize symmetric key cryptography to generate the per-device unlock key at runtime to authenticate against another unlock key generated at provision time that is unique to the device.

Example 19 includes the electronic device of any of Examples 15 to 18, further comprising hardware ungate circuitry to selectively gate and ungate the one or more technical features in response to an indication of a result of the authentication.

Example 20 includes the electronic device of Example 19, wherein the hardware ungate circuitry is further to compare one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gate and ungate the one or more technical features based on the comparison.

Example 21 includes the electronic device of any of Examples 15 to 20, wherein the firmware is further to selectively enable and disable ungated features of the one or more technical features.

Example 22 includes a method, comprising providing one or more features that are to be selectively locked and unlocked, processing one or more tokens including an unlock token for the one or more features, and authenticating the unlock token in hardware.

Example 23 includes the method of Example 22, further comprising generating a per-part unlock key in hardware, and providing the per-part unlock key for secure external storage.

Example 24 includes the method of any of Examples 22 to 23, further comprising generating a per-part unlock key in hardware at runtime, computing a tag based on the generated per-part unlock key in hardware, and determining whether the unlock token is authentic based on a hardware comparison of the computed tag and an unlock tag from the unlock token.

Example 25 includes the method of Example 24, further comprising utilizing symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime.

Example 26 includes the method of any of Examples 22 to 25, further comprising selectively gating and ungating the one or more features in hardware in response to an indication of a result of the authentication.

Example 27 includes the method of Example 26, further comprising comparing one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and selectively gating and ungating the one or more features based on the comparison.

Example 28 includes the method of any of Examples 22 to 27, further comprising selectively enabling and disabling one or more ungated features in firmware.

Example 29 includes at least one non-transitory one machine readable medium comprising a plurality of instructions that, in response to being executed on a computing device, cause the computing device to perform one or more aspects of Examples 22 to 28.

Example 30 includes an apparatus, comprising means for providing one or more features that are to be selectively locked and unlocked, means for processing one or more tokens including an unlock token for the one or more features, and means for authenticating the unlock token in hardware.

Example 31 includes the apparatus of Example 30, further comprising means for generating a per-part unlock key in hardware, and means for providing the per-part unlock key for secure external storage.

Example 32 includes the apparatus of any of Examples 30 to 31, further comprising means for generating a per-part unlock key in hardware at runtime, means for computing a tag based on the generated per-part unlock key in hardware, and means for determining whether the unlock token is authentic based on a hardware comparison of the computed tag and an unlock tag from the unlock token.

Example 33 includes the apparatus of Example 32, further comprising means for utilizing symmetric key cryptography to generate the per-part unlock key at runtime to authenticate against another unlock key generated prior to runtime.

Example 34 includes the apparatus of any of Examples 30 to 33, further comprising means for selectively gating and ungating the one or more features in hardware in response to an indication of a result of the authentication.

Example 35 includes the apparatus of Example 34, further comprising means for comparing one or more of a hardware runtime state and one or more fuse settings against one or more ungate rules, and means for selectively gating and ungating the one or more features based on the comparison.

Example 36 includes the apparatus of any of Examples 30 to 35, further comprising means for selectively enabling and disabling one or more ungated features in firmware.

References to “one example,” “an example,” etc., indicate that the example described may include a particular feature, structure, or characteristic, but every example may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same example. Further, when a particular feature, structure, or characteristic is described in connection with an example, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other examples whether or not explicitly described.

Moreover, in the various examples described above, unless specifically noted otherwise, disjunctive language such as the phrase “at least one of A, B, or C” or “A, B, and/or C” is intended to be understood to mean either A, B, or C, or any combination thereof (i.e. A and B, A and C, B and C, and A, B and C).

Some portions of the detailed description herein are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the computing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the discussion herein, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Certain examples also relate to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs) such as dynamic RAM (DRAM), EPROMS, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and coupled to a computer system bus.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description herein. In addition, certain examples are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of such examples as described herein.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the disclosure as set forth in the claims.

HARDWARE-BASED CRYPTOGRAPHIC PROTECTION OF TOKENS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims