From media perspective, modern storage systems consist of heterogenous storage media. For example, a system may include a “hot” tier memory class storage device (e.g., Optane® SSD (solid-state drive), SLC (single-level cell) Flash) to provide high performance and endurance. A “cold” or “capacity” tier may employ a capacity device (e.g., NAND Quad-level cell (QLC, 4 bits per cell) or Penta-level cell (PLC, 5 bits per cell)) to deliver capacity at low cost but with lower performance and endurance.
Historically, platforms such as servers had their own storage resources, such as one or more mass storage devices (e.g., magnetic/optical hard disk drives or SSDs). Under such platforms different classes of storage media could be detected and selective access to the different classes could be managed by an operating system (OS) or applications themselves. In contrast, today's data center environments employ disaggregated storage architectures under which one or more tiers of storage are accessed over a fabric or network. Under these environments it is common for the storage resources to be abstracted as storage volumes. This may also be the case for virtualized platforms where the Type-1 or Type-2 hypervisor or virtualization layer presents physical storage devices as abstract storage resources (e.g., volumes). While abstracting the physical storage devices provides some advantages, it hides the input-output (IO) context on the disaggregated storage side.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified:
Embodiments of methods and apparatus for target triggered IO classification using a computational storage tunnel are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
For clarity, individual components in the Figures herein may also be referred to by their labels in the Figures, rather than by a particular reference number. Additionally, reference numbers referring to a particular type of component (as opposed to a particular component) may be shown with a reference number followed by “(typ)” meaning “typical.” It will be understood that the configuration of these components will be typical of similar components that may exist but are not shown in the drawing Figures for simplicity and clarity or otherwise similar components that are not labeled with separate reference numbers. Conversely, “(typ)” is not to be construed as meaning the component, element, etc. is typically used for its disclosed function, implement, purpose, etc.
To meet various requirements of performance, quality of service (QoS), media endurance, and reducing cost of the solution, intelligent data placement (IO classification) between hot and capacity tiers is required. For instance, if the hot portion of workload can be recognized and classified the storage service can stage it on a hot tier, which results in higher performance and saving write cycle of the capacity tier.
To make such IO classification, the host and/or initiator cannot focus on the raw storage domain only (e.g., using an abstracted filesystem with virtual volumes). The observability features must be extended, and the system context must be considered (e.g., filesystem information, application context, operating system telemetry). As we researched, when classifying a database workload (e.g., mongodb), to improve performance and increase endurance, journal IO shall be staged on hot tier (e.g., in mongodb, each IO which belongs to files in journal directory).
From a deployment point of view, applications and storage services are separated from each other and works in different domain. For example:
Under current approaches, the filesystem/system/application classification perception is obligatory to classify IO efficiently, however a storage disaggregation barrier (e.g., compute/target) makes it impossible because on the target side such context is invisible and not accessible.
In accordance with aspects of the embodiments disclosed herein, solutions for supporting efficient IO classification are provided. In one aspect, the target storage service notifies the initiator that it can provide an IO classifier program. The initiator downloads the program and loads/runs on the compute side. Whenever applications' IO is triggered, the IO classification program is executed. Input for the program is IO itself and other extensions like context of application, operating system, and filesystem. The program produces an IO class that is returned to the initiator's block layer and embedded to the IO (e.g., IO hint, stream id). For notification of the program availability, a computational storage protocol/tunnel is used. This solution can be perceived as a reverted computational storage. The target requests to execute a remote procedure on compute side, which is then used to direct storage data to the appropriate storage tier.
The solution can be used in a variety of compute/storage environments on one or more levels. The following discussion illustrated several non-limiting example use contexts.
The teachings and the principles described herein may be implemented using various types of tiered memory/storage architectures. For example,
Compute node 100 is further connected to SCM memory 110 and 112 in SCM memory nodes 114 and 116 which are coupled to compute node 100 via a high speed, low latency fabric 118. In the illustrated embodiment, SCM memory 110 is coupled to a CPU 120 in SCM node 114 and SMC memory 112 is coupled to a CPU 122 in SCM node 116.
Under one example, Tier 1 memory comprises DDR and/or HBM, Tier 2 memory comprises 3D crosspoint memory, and Tier 3 comprises pooled SCM memory such as but not limited to 3D crosspoint memory. In some embodiments Tier 3 comprises a cold or capacity tier. In some embodiments, the CPU may provide a memory controller that supports access to Tier 2 memory. In some embodiments, the Tier 2 memory may comprise memory devices employing a DIMM form factor.
For CXL, agent 130 or otherwise logic in MC 132 may be provided with instructions and/or data to perform various operations on IO memory 124. For example, such instructions and/or data could be sent over CXL link 128 using a CXL protocol. For pooled SMC memory or the like, a CPU or other type of processing element (microengine, FPGA, etc.) may be provided on the SCM node and used to perform the various operations disclosed herein. Such a CPU may have a configuration with a processor having an integrated memory controller or the memory controller may be separate.
Resource disaggregation is becoming increasingly prevalent in emerging computing scenarios such as cloud (aka hyperscaler) usages, where disaggregation provides the means to manage resource effectively and have uniform landscapes for easier management. While storage disaggregation is widely seen in several deployments, for example, Amazon S3, compute and memory disaggregation is also becoming prevalent with hyperscalers like Google Cloud.
In addition to the three configurations shown in
Generally, a compute brick may have dozens or even hundreds of cores, while memory bricks, also referred to herein as pooled memory, may have terabytes (TB) or 10's of TB of memory implemented as disaggregated memory. An advantage is to carve out usage-specific portions of memory from a memory brick and assign it to a compute brick (and/or compute resources in the compute brick). The amount of local memory on the compute bricks is relatively small and generally limited to bare functionality for operating system (OS) boot and other such usages.
The target requires IO classification based on application, operating system, and file system context. For this use case simple IO hinting based on raw block device domain is not sufficient. The storage infrastructure introduces separation between server and client so that it is not possible to interpret application-side context.
Prior to the message exchange, initiator 408 creates logical volume 410 on the compute side (402) In one aspect, logical volume 410 is a type of handler that is used by IO classifier program 412, as described below in further detail.
As message flow begins with initiator 408 sending an initiate( ) message 416 to target 414, which receives it and returns a response 418. The initiate( ) message is used to establish a communication channel to be used between client 402 and target 414.
Next, target 414 sends an asynchronous event to the client to request loading the IO classifier program, as depicted by an IO classifier load request 420. Also, it should be possible that the client can get the capabilities information to check if the classifier is available. Client 402 decides to apply the IO classifier, and sends a download classifier program( ) request 422 to target 414. The program is downloaded (depicted by return message 424) and loaded into client's environment, as depicted by operation 426.
As depicted by message flow 427, one or more of an application context, system context, and filesystem context is received by logical volume 410. One or more of these contexts is obtained by the IO classification program using APIs provided by the execution environment (e.g., a BPF program has an API provided by the Linux kernel). Examples of Application contexts include application name and PID (program identifier). Examples of system contexts include CPU core number on which IO is issued. Examples of Filesystem contexts include File name/location, File size, File extension, Offset in file, and IO is part of filesystem metadata.
For simplicity, the application context, system context, and filesystem context are shown in
The foregoing prepares the client for implementing the IO classifier program for subsequent IO requests to access storage resources on target 414. It is noted that one or more of the application context, system context, and filesystem context may change while an application is running, such that corresponding information is updated, if applicable, during run-time operations. For example, some of these values may be obtained using telemetry data generated by the operating system or other system software components.
When the application issues an IO request, the IO classifier program is executed. The program returns an IO class based on input delivered by the client's operating system. The program can be able to read and recognize application context, system context and filesystem context corresponding to the IO request (e.g., by looking at the source of the IO request, which in this example flow is application 406). The returned IO class (hint) is added to the IO protocol request and sent to the target side. There it can be intercepted, and data can be persisted respectively to the value of IO hint.
The foregoing is depicted in
Next, logical volume 410 sends an IO request 434 to target 414 including the LBA, length, data of the original IO request 428 plus the IO class (hint) returned by the IO classifier program. Target 414 then uses the IO class hint to determine on what tier to store the data. Upon success, target 414 returns a completion status in a message 436 to logical volume 410, which forwards the completion status via a message 438 from the logical volume to application 406.
In
As shown in
Memory controller 510 includes three memory channels 518, each connected to a respective DRAM or SDRAM DIMM 520, 522, and 524. CXL controller 512 includes two CXL interfaces 526 connected to respective CXL memory devices 528 and 530 via respective CXL flex-busses 532 and 534. CXL memory devices 528 and 530 include DIMMs 536 and 538, which may comprise CXL DIMMs or may be implemented on respective CXL cards and comprising any of the memory technologies described above.
IO interface 514 is coupled to a host fabric interface (HFI) 540, which in turn is coupled to a fabric switch 542 via a fabric link in a low-latency fabric 544. Also coupled to fabric switch 542 are server 2 . . . server n and an SCM node 546. SCM node 546 includes an HFI 548, a plurality of SCM DIMMs 550, and a CPU 552. Generally, SCM DIMMs may comprise NVDIMMs or may comprise a combination of DRAM DIMMs and NVDIMMs. In one embodiment, SCM DIMMs comprise 3D crosspoint DIMMs.
IO interface 516 is coupled to a NIC 518 that is coupled to a remote memory server 554 via a network/fabric 556. Generally, remote memory server 554 may employ one or more types of storage devices. For example, the storage devices may comprise high performance storage implemented as a hot tier and lower performance high-capacity storage implemented as a cold or capacity tier. In some embodiment, remote memory server 554 is operated as a remote memory pool employing a single tier of storage, such as SCM.
As further shown, DRAM/SDRAM DIMMs 520, 522, and 524 are implemented in memory tier 1 (also referred to herein as local memory or near memory), while CXL devices 528 and 530 are implemented in memory/storage tier 2. Meanwhile, SCM node 546 is implemented in memory/storage tier 3, and memory in remote memory server 554 is implemented in memory/storage tier 4 or memory/storage tiers 4 and 5. In this example, the memory tiers are ordered by their respective latencies, wherein tier 1 has the lowest latency and tier 4 (or tier 5) has the highest latency.
It will be understood that not all of cloud environment 500 may be implemented, and that one or more of memory/storage tiers 2, 3, and 4 (or 4 and 5) will be used. In other words, a cloud environment may employ one local or near memory tier, and one or more memory/storage tiers.
The memory resources of an SCM node may be allocated to different servers 501 and/or operating system instances running on servers 501. Moreover, a memory node may comprise a chassis, drawer, or sled including multiple SCM cards on which SCM DIMMs are installed.
In some embodiments, SoC 602a is a multi-core processor System on a Chip with one or more integrated memory controllers, such as shown depicted by a memory controller 630. SoC 602a also includes a memory management unit (MMU) 632 and an IO interface (UF) 634 coupled to NIC 610. In one embodiment, IO interface 634 comprises a Peripheral Component Interconnect Express (PCIe) interface.
Generally, DRAM devices 614-1 . . . 614-n are representative of any type of DRAM device, such as DRAM DIMMs and Synchronous DRAM (SDRAM) DIMMs. More generally, DRAM devices 614-1 . . . 614-n are representative of volatile memory, comprising local (system) memory 615.
A volatile memory is memory whose state (and therefore the data stored in it) is indeterminate if power is interrupted to the device. Dynamic volatile memory requires refreshing the data stored in the device to maintain state. One example of dynamic volatile memory includes DRAM, or some variant such as SDRAM. A memory subsystem as described herein can be compatible with a number of memory technologies, such as DDR3 (Double Data Rate version 3, original release by JEDEC (Joint Electronic Device Engineering Council) on Jun. 27, 2007). DDR4 (DDR version 4, initial specification published in September 2012 by JEDEC), DDR4E (DDR version 4), LPDDR3 (Low Power DDR version3, JESD209-3B, August 2013 by JEDEC), LPDDR4 (LPDDR version 4, JESD209-4, originally published by JEDEC in August 2014), WIO2 (Wide Input/Output version 2, JESD229-2 originally published by JEDEC in August 2014), HBM (High Bandwidth Memory, JESD325, originally published by JEDEC in October 2013), DDR5 (DDR version 5, currently in discussion by JEDEC), LPDDR5 (currently in discussion by JEDEC), HBM2 (HBM version 2, currently in discussion by JEDEC), or others or combinations of memory technologies, and technologies based on derivatives or extensions of such specifications. The JEDEC standards are available at www.jedec.org.
Software storage device 612 comprises a nonvolatile storage device, which can be or include any conventional medium for storing data in a nonvolatile manner, such as one or more magnetic, solid state, or optical based disks, or a combination. Software storage device 612 holds code or instructions and data in a persistent state (i.e., the value is retained despite interruption of power to compute platform 600a). A nonvolatile storage device can be generically considered to be a “memory,” although local memory 615 is usually the executing or operating memory to provide instructions to the cores on SoC 602a.
Firmware storage device 611 comprises a nonvolatile memory (NVM) device. A non-volatile memory (NVM) device is a memory whose state is determinate even if power is interrupted to the device. In one embodiment, the NVM device can comprise a block addressable memory device, such as NAND technologies, or more specifically, multi-threshold level NAND flash memory (for example, Single-Level Cell (“SLC”), Multi-Level Cell (“MLC”), Quad-Level Cell (“QLC”), Tri-Level Cell (“TLC”), Penta-level cell (“PLC”) or some other NAND. An NVM device can also comprise a byte-addressable write-in-place three dimensional cross point memory device, or other byte addressable write-in-place NVM device (also referred to as persistent memory), such as single or multi-level Phase Change Memory (PCM) or phase change memory with a switch (PCMS), NVM devices that use chalcogenide phase change material (for example, chalcogenide glass), resistive memory including metal oxide base, oxygen vacancy base and Conductive Bridge Random Access Memory (CB-RAM), nanowire memory, ferroelectric random access memory (FeRAM, FRAM), magneto resistive random access memory (MRAM) that incorporates memristor technology, spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a DW (Domain Wall) and SOT (Spin Orbit Transfer) based device, a thyristor based memory device, or a combination of any of the above, or other memory.
Software components in software storage device 612 are loaded into local memory 615 to be executed on one or more cores 626 on SoC 602a. The software components include an operating system 636 having a kernel 638 and applications 640. The address space of local memory 615 is partitioned into an OS/kernel space in which Operating system 636 and kernel 638 are stored, and a user space in which applications 640 are stored.
The address space allocated to applications (and their processes) is a virtual address space that may be extended across multiple memory tiers, including a memory tier in remote memory pool 606. The cloud service provider (CSP) or the like may allocate portions of the memory in remote memory pool 606 to different platforms (and/or their operating systems instances).
The labeling of CXL interface or controller 658 and CXL/MC interface 652 is representative of two different configurations. In one embodiment, CXL interface or controller 658 is a CXL interface and CXL/MX interface 652 is a CXL interface with a memory controller. Alternatively, the memory controller may be coupled to the CXL interface. In another embodiment, CXL interface or controller 658 comprises a CXL controller in which the memory controller functionality is implemented, and CXL/MX interface 652 comprises a CXL interface. It is noted that memory channels 656 may represent a shared memory channel implemented as a bus to which DIMMs 654 are coupled.
Generally, DIMMs 654 may comprising DRAM DIMMs or hybrid DIMMS (e.g., 3D crosspoint DIMMs). In some embodiments, a CXL card may include a combination of DRAM DIMMs and hybrid DIMMs. In yet another alternative, all or a portion of DIMMs 654 may comprise NVDIMMs.
As further shown in
Under some embodiments, the storage disaggregation barrier comprises a virtualization layer in a virtualized platform. Non-limiting examples of virtualized platforms are shown in
In some deployments, a bare metal abstraction layer 708 comprises a Type-1 Hypervisor. Type-1 Hypervisors run directly on platform hardware and host guest operating systems running on VMs, with or without an intervening host OS (with shown in
Bare metal cloud platform architecture 700a also includes three storage tiers 716, 718, and 722, also respectively labeled Storage Tier 2, Storage Tier 3, and Storage Tier 4. Storage tier 716 is a local storage tier that is part of platform hardware 702, such as a CXL card or CXL DIMM, an NVDIMM, or a 3D crosspoint DIMM. Other form factors may be used, such as M.2 memory cards or SSDs. Storage tier 718 is coupled to platform hardware 702 over a fabric 720, while storage tier 722 is coupled to platform hardware 702 over a network 724. In some embodiments, only one of storage tiers 718 and 722 may be employed. In one embodiment, storage tier 718 employs SCM storage. In one embodiment storage tier 4 is implemented with a storage server that may have one or more tiers of storage.
As further shown toward the top portion of
In one embodiment, a deployment employing a Linux operating system can be based on eBPF functionality and the NVMeOF (Non-volatile Memory Express over Fabric) protocol. eBPF (https://ebpf.io) is a mechanism for Linux applications to execute code in Linux kernel space.
With reference to Flowchart 800 in
This flow begins in a block 810 in which an application issues an IO storage request with LBA, length, and data). Whenever an application issues an IO request then the eBPF IO classification program is executed and it returns an IO class (e.g., a numeric value), as depicted in a block 812. In a block 814, this IO class value is encapsulated in an NVMe IO command using the stream ID field, in one embodiment. In a block 816, the target receives the NVMe IO commands and extracts the classified IO value by inspecting the stream ID field in the received NVMe IO command. In a block 818, the target then uses the IO class to determine what storage tier to use to store the data.
The classifier can be easily exchanged, for example, first the target can perform preliminary recognition of the client environment (e.g., looking for a specific application), and then can request to reload a new program specialized for the client's environment. The client's operating system doesn't have to be modified/patched/updated/restarted. In one embodiment, at any time the target can resend the asynchronous event for reloading the IO classification program, or loading a new IO classification program.
Some of the foregoing embodiments may be perceived as a reverted computational storage. For example, under an extension, the target requests to execute a remote procedure on compute side. The scope of the procedure doesn't have to be limited to IO classification. The target can schedule other procedures. For example, in one embodiment the target can recognize the client capabilities and it discovers an accelerator for compression; it loads the program to compress IO data before sending over the network for reducing network load. In another embodiment, the target recognizes a read workload locality; it loads the program which provides read cache functionality.
While various embodiments described herein use the term System-on-a-Chip or System-on-Chip (“SoC”) to describe a device or system having a processor and associated circuitry (e.g., Input/Output (“I/O”) circuitry, power delivery circuitry, memory circuitry, etc.) integrated monolithically into a single Integrated Circuit (“IC”) die, or chip, the present disclosure is not limited in that respect. For example, in various embodiments of the present disclosure, a device or system can have one or more processors (e.g., one or more processor cores) and associated circuitry (e.g., Input/Output (“I/O”) circuitry, power delivery circuitry, etc.) arranged in a disaggregated collection of discrete dies, tiles and/or chiplets (e.g., one or more discrete processor core die arranged adjacent to one or more other die such as memory die, I/O die, etc.). In such disaggregated devices and systems the various dies, tiles and/or chiplets can be physically and electrically coupled together by a package structure including, for example, various packaging substrates, interposers, active interposers, photonic interposers, interconnect bridges and the like. The disaggregated collection of discrete dies, tiles, and/or chiplets can also be part of a System-on-Package (“SoP”).
Although some embodiments have been described in reference to particular implementations, other implementations are possible according to some embodiments. Additionally, the arrangement and/or order of elements or other features illustrated in the drawings and/or described herein need not be arranged in the particular way illustrated and described. Many other arrangements are possible according to some embodiments.
In each system shown in a figure, the elements in some cases may each have a same reference number or a different reference number to suggest that the elements represented could be different and/or similar. However, an element may be flexible enough to have different implementations and work with some or all of the systems shown or described herein. The various elements shown in the figures may be the same or different. Which one is referred to as a first element and which is called a second element is arbitrary.
In the description and claims, the terms “coupled” and “connected,” along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Rather, in particular embodiments, “connected” may be used to indicate that two or more elements are in direct physical or electrical contact with each other. “Coupled” may mean that two or more elements are in direct physical or electrical contact. However, “coupled” may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. Additionally, “communicatively coupled” means that two or more elements that may or may not be in direct contact with each other, are enabled to communicate with each other. For example, if component A is connected to component B, which in turn is connected to component C, component A may be communicatively coupled to component C using component B as an intermediary component.
An embodiment is an implementation or example of the inventions. Reference in the specification to “an embodiment,” “one embodiment,” “some embodiments,” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions. The various appearances “an embodiment,” “one embodiment,” or “some embodiments” are not necessarily all referring to the same embodiments.
Not all components, features, structures, characteristics, etc. described and illustrated herein need be included in a particular embodiment or embodiments. If the specification states a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, for example, that particular component, feature, structure, or characteristic is not required to be included. If the specification or claim refers to “a” or “an” element, that does not mean there is only one of the element. If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.
As discussed above, various aspects of the embodiments herein may be facilitated by corresponding software and/or firmware components and applications, such as software and/or firmware executed by an embedded processor or the like. Thus, embodiments of this invention may be used as or to support a software program, software modules, firmware, and/or distributed software executed upon some form of processor, processing core or embedded logic a virtual machine running on a processor or core or otherwise implemented or realized upon or within a non-transitory computer-readable or machine-readable storage medium. A non-transitory computer-readable or machine-readable storage medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a non-transitory computer-readable or machine-readable storage medium includes any mechanism that provides (i.e., stores and/or transmits) information in a form accessible by a computer or computing machine (e.g., computing device, electronic system, etc.), such as recordable/non-recordable media (e.g., read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory devices, etc.). The content may be directly executable (“object” or “executable” form), source code, or difference code (“delta” or “patch” code). A non-transitory computer-readable or machine-readable storage medium may also include a storage or database from which content can be downloaded. The non-transitory computer-readable or machine-readable storage medium may also include a device or product having content stored thereon at a time of sale or delivery. Thus, delivering a device with stored content, or offering content for download over a communication medium may be understood as providing an article of manufacture comprising a non-transitory computer-readable or machine-readable storage medium with such content described herein.
Various components referred to above as processes, servers, or tools described herein may be a means for performing the functions described. The operations and functions performed by various components described herein may be implemented by software running on a processing element, via embedded hardware or the like, or any combination of hardware and software. Such components may be implemented as software modules, hardware modules, special-purpose hardware (e.g., application specific hardware, ASICs, DSPs, etc.), embedded controllers, hardwired circuitry, hardware logic, etc. Software content (e.g., data, instructions, configuration information, etc.) may be provided via an article of manufacture including non-transitory computer-readable or machine-readable storage medium, which provides content that represents instructions that can be executed. The content may result in a computer performing various functions/operations described herein.
As used herein, a list of items joined by the term “at least one of” can mean any combination of the listed terms. For example, the phrase “at least one of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C
The above description of illustrated embodiments of the invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention, as those skilled in the relevant art will recognize.
These modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the drawings. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.