The field relates generally to information processing systems, and more particularly to a artificial intelligence (AI) model management implemented in an information processing system.
Edge computing, considered the evolution of cloud computing, migrates the deployment of applications (e.g., applications implementing AI models) from a centralized data center downward to distributed edge nodes, thereby achieving shorter distances from data generated by consumers and the applications. Edge computing is also considered an important technology for meeting 3GPP 5G key performance indicators (especially in terms of minimized delays and increased bandwidth efficiency). The 3GPP 5G system specification allows a multi-access edge computing (MEC) system and a 5G system to cooperate in operations related to traffic direction and policy controls. The MEC system is a European Telecommunications Standards Institute (ETSI) defined architecture that offers application developers and content providers cloud-computing capabilities and an information technology service environment at the edge of a network, e.g., at the edge of a cellular network such as a 5G system. In a system architecture where a 5G system and a MEC system are deployed in an integrated manner, a data plane of a 5G core network can be implemented by a user plane function network element inside the MEC system. However, due to the mobility of system users from one edge node to another, MEC implementation can present challenges.
For example, user context (i.e., information representing one or more internal execution states of an application) migration is a basic requirement defined in a MEC system for applications running in an edge computing environment. Such migration is needed to implement an application mobility service (AMS) so that the MEC architecture can migrate the application from one edge node to another edge node to follow the geographic position of the user equipment and thereby perform computations closer to the data source. However, when an application is complex, for example, one that employs an AI model (such as, but not limited to, machine learning (ML) applications, deep learning (DL) applications, and data mining (DM) applications), user context migration is a significant challenge.
Embodiments provide techniques for user context migration of an application in an information processing system such as, but not limited to, user context migration of an artificial intelligence-based application in an edge computing environment.
According to one illustrative embodiment, in an information processing system with at least a first node and a second node separated from the first node, and each of the first node and the second node being configured to execute an application in accordance with at least one entity that moves from a proximity of the first node to a proximity of the second node, a method maintains, as part of a context at the first node, a set of status indicators for a set of computations associated with a computation graph representing at least a portion of the execution of the application at the first node. Further, the method causes the transfer of the context from the first node to the second node to enable the second node to continue execution of the application using the transferred context from the first node.
In further illustrative embodiments, the maintaining step may further comprise setting each of the set of status indicators for the set of computations to one of a plurality of statuses based on an execution state of each of the computations, wherein a first status of the plurality of statuses represents that the given computation is completed, a second status of the plurality of statuses represents that the given computation has started but not yet completed, and a third status of the plurality of statuses represents that the given computation has not yet started.
Advantageously, in illustrative MEC-based embodiments, a context migration solution is provided that can be integrated into any deep learning frameworks, to run any AI models, with any processing parallelisms, for both inference and training applications.
These and other features and advantages of embodiments described herein will become more apparent from the accompanying drawings and the following detailed description.
Illustrative embodiments will now be described herein in detail with reference to the accompanying drawings. Although the drawings and accompanying descriptions illustrate some embodiments, it is to be appreciated that alternative embodiments are not to be construed as limited by the embodiments illustrated herein. Furthermore, as used herein, the term “includes” and its variants are to be read as open-ended terms that mean “includes, but is not limited to.” The term “based on” is to be read as “based at least in part on.” The term “an embodiment” and “the embodiment” are to be read as “at least one example embodiment.” The terms “first,” “second,” and the like may refer to different or the same objects. Other definitions, either explicit or implicit, may be included below.
The growth of artificial intelligence (AI) models, such as a machine learning (ML) application, a deep learning (DL) application, and/or a data mining (DM) application, has resulted in a single computing device being unable to execute the entire AI model independently. It is to be understood that AI models typically have two stages: training and inference. Training refers to the process of creating the AI model based on training data, while inference refers to the process of using the AI model (trained in the training process) to generate a prediction (decision) based on input data. The concept of parallelism, e.g., model parallelism, data parallelism or pipeline parallelism, is employed to execute a large complicated AI model. Data parallelism is where each computing device in the computing environment has a complete copy of the AI model and processes a subset of the training data. For model parallelism, the AI model is split (partitioned) among computing devices such that each computing device works on a part of the AI model. Pipeline parallelism is, for example, where the AI model and/or data is concurrently processed across a set of multiple computing cores (central processing units (CPUs), graphic processing units (GPUs), combinations thereof, etc.) within one or more computing devices.
By way of further example, in the context of model parallelism approaches, artificial (dummy) compiler techniques have been proposed for collecting resource requirements of each computing device, as well as model parallelism partition techniques based on an intermediate representation (IR) that divide the entire model into partitions which can then be computed in parallel by multiple computing devices which also exchange parameters between one another. Further, techniques have been proposed for scheduling the partitions into computing devices in a load-balanced manner based on resource requirements of the computation and other resources available on the devices. For example, techniques have been proposed for scheduling partitions for execution and balancing the computing and memory storage loads based on the resources available on the computing devices. Some of these proposed techniques are implementable for training of large models in GPUs distributed in multiple computing nodes in a cloud computing environment.
Furthermore, techniques have been proposed to provide a framework for implementing AI parallelism in an edge computing environment. As mentioned above, edge computing is a distributed computing paradigm and typically comprises one or more edge servers running one or more application programs that interact with a plurality of heterogeneous computing devices (e.g., X86_64/ARM CPUs (central processing units), FPGAs (field programmable gate arrays), ASICs (application specific integrated circuits), programmable switches, etc.) which are normally computing resource-limited (e.g., limited in terms of processing and/or storage capacities).
In addition, edge computing is an emerging technology developing together with emerging 5G (3GPP 5th Generation) telecommunication network technology (MEC system) and equipped with many deep learning inference applications for autonomous driving, mobile mixed reality, drone pilot, smart home, Internet of Things (IoT) and virtual reality (VR) games, to name a few. Such applications typically need real-time responses or computing offload from servers, which cannot be adequately fulfilled by current cloud computing infrastructure. Thus, the emergence of edge computing is in response to the inability of centralized data centers to provide real-time or near-real-time compute capabilities to the vast (and growing) sources of decentralized data (so-called data “out in the wild”). Edge computing moves the computer workload closer to the consumer/data generator to reduce latency, bandwidth and overhead for the centralized data center and intermediate switches, gateways, and servers.
Furthermore, it is realized that a deep learning program can be developed by different frameworks to run different AI models, as well as use different parallelisms such as the above-mentioned data parallelism, model parallelism, and pipeline parallelism, wherein each will manage the computations differently. Also, an AI model usually has many computations and therefore a very complex user (application internal) context, especially when accelerators (e.g., GPUs) are used in the computing environment.
Hence, it is realized that, although managing the user context migration for an inference application (i.e., an AI model in the inference stage) is critical and meaningful, it is realized that an efficient implementation is very difficult to achieve in a real-time manner. By way of one example scenario to illustrate such real-time difficulty, assume a MEC system comprises an autonomous vehicle (auto-driving) system that employs an inference application running periodically on an edge node of an edge computing environment. The edge node serves multiple vehicles and each vehicle sends input data to the inference application. However, as vehicles move geographically closer to other edge nodes in the edge computing environment, it becomes necessary to migrate user context (i.e., information representing one or more internal execution states of an application) from one edge node to at least another edge node that is geographically closer to the vehicles. Existing systems are unable to efficiently handle this user context migration requirement.
Illustrative embodiments overcome the above and other drawbacks by providing solutions to efficiently migrate the user context of an application in an edge computing environment. Such solutions can be readily integrated into any frameworks to run any models with any types of parallelisms, not only for the inference stage but also for the training stage, based on the computation graph defined by an AI model. One or more embodiments can be integrated into commercially-available AI bundles (e.g., server, storage, networking platforms available from Dell Technologies Inc. of Hopkinton, Mass.), or applied to any private or public edge computing platform.
Hence, an application mobility service (AMS) is provided by the MEC system to optimize the migration process and help the applications to migrate the application instance and internal user context, as shown in the high-level information flow 200 in
As shown in
As explained in the above-referenced ETSI standard, the MEC system is able to detect that a UE is going to roam away from the current RAN and predicts the destination RAN this UE will roam into by listening to the notifications sent from the 5G network. Hence, the MEC system is able to send appropriate notifications (1 to 6 in
From
Runtime environments for provider-specific deep learning frameworks, for example, Tensorflow, PyTorch, or Keras, have a similar workflow which is illustrated in
More particularly, in one example, based on the vertexes in the computation graph 306, the framework compiler back-end 308 generates the implementations for all computation nodes (vertexes) by linking to third-party libraries such as cuDNN (Deep Neural Network) and cuBLAS (Basic Linear Algebra) for Nvidia GPU, Eigen library or BLAS for TensorFlow CPU, device drivers for proprietary accelerators such as TPU (Tensor Processing Unit), VTA (Versatile Tensor Accelerator) or ASICs, or directly generating the C function code for CPU or CUDA (Compute Unified Device Architecture) kernel functions. This implementation is JITed (Just-In-Time compiled) into binaries (i.e., binary representations of the vertexes of the computation graph) to be linked during the execution of the deep learning program. In a framework such as TVM (Tensor Virtual Machine), such computations can be compiled into a dynamically linked library to be deployed into computing devices in other computing nodes, with the computing devices being the same as the target when compiling the back-end binaries, i.e., cross-compilation. Based on the edges in the computation graph 306, the framework compiler back-end 308 generates scheduler code for the main CPU to schedule all kernel computations in order.
From
Referring back to
An edge inference application in a 5G network may serve one user equipment (UE) or a plurality of UEs at the same time, and such an application may have one or multiple process instances, hosted in a single or multiple edge nodes.
For example, in scenario 500 of
Each data frame is an independent input to the inference application. For example, the T1 and T2 from UE1 are independent of each other and T1 from UE1 is independent of T1 sent from UE2. As shown, there are many parallel running inference instances for different input.
For example, the same inference application manages the feed-forward of iteration of all computations for input T1 from UE1 and another iteration for input T1 from UE2, so there are two inference instances for these two input instances simultaneously in the same inference application but each inference instance is independent of the other.
Given the illustrative
Still further, even with the same framework, the same model, and the same parallelism, an application scenario can use the model for training or inference. Differences between the training and the inference are as follows. For training, there is another associated computation graph used for back-propagation. Thus, for training, both inputs to the model (and hence the input to each layer operation) and the parameters inside the model will be changed from epoch to epoch, hence both need to be migrated during the user context migration. For inference, only the input to the model (and hence the input to each layer operation) will be changed from input instance to instance, hence only the input needs to be migrated during the user context migration.
As described above, as each inference instance for different inputs is independent of each other, there is an independent user context for each running instance for each input. Thus, during user context migration, these different states for different input instances need to be migrated independently.
Also, as described above, due to the restrictions of network bandwidth and the application real-time response, although managing the user context migration for a deep learning application is critical and meaningful, efficient implementation is very difficult especially in real-time applications such as an auto-driving system.
Illustrative embodiments overcome the above and other drawbacks with user context migration by fixing (e.g., setting, selecting, establishing, prescribing, and the like) a computation model to be used to generate an order for executing computations in response to determining the input model from a first plurality of selectable input models and the AI (e.g., deep learning) framework from a second plurality of selectable AI frameworks.
More particularly,
There are many suitable ways to obtain the computation graph from the selected deep learning framework (e.g., 620-1 as illustrated). By way of example only, the computation graph can be reconstructed from an intermediate representation (IR).
Once the computation graph is fixed, different types of parallelisms can be applied to schedule the computations.
For implementation optimization, it is not necessary to use a computation graph or a computation scheduling scheme instance for each input, but rather all (or at least multiple) instances can share the same computation graph and scheduling scheme instance with different sets of flags on each instance. Advantageously, the runtime state for different input instances (e.g., mini-batches for training and input instances for inference) are defined by the flags (FINISHED, ONGOING, and NEXT) set for the computation graph and the computation scheduling scheme instance.
In accordance with illustrative embodiments, migration points are defined (i.e., as migration definitions or rules) as follows:
Rationale for point (i) is that migrating the user context of a running (ONGOING) computation is very inefficient and time-consuming, especially if it is executed in an accelerator (e.g., GPU), as it will migrate all main CPU machine states, the current registers, the function stack and sometimes needs to copy the parameters from the accelerator to the main CPU memory. In addition, sometimes it is not possible to resume the computation, for example, if a computation is executed inside a GPU, there is no way to resume the unfinished computation at another GPU.
After inference instance T2 from UE1 is migrated, there is no inference instance associated with UE1, so the UE1 can be migrated from the source edge node to the target edge node. It is to be understood that while migrating user context from a source edge node to a target edge node means transferring data from the source edge node to the target edge node, migrating the UE from the source edge node to the target edge node means that the UE is moving its association (e.g., communication session, security context, etc.) from the source edge node to the target edge node. One or more appropriate protocols for moving a UE association from one node to another can be employed.
A similar user context migration scenario occurs for instances T1 and T2 from UE2. Instance T1 from UE2 migrates from a source edge node to a target edge node as denoted by 1112 and 1114. Instance T2 from UE2 migrates from the source edge node to the target edge node as denoted by 1116 and 1118. After instances T1 and T2 from UE2 are migrated, the UE2 is migrated from the source edge node to the target edge node.
As shown, in step 1210, ISP component 1202 sends notification of the subject UE location change to source scheduler 1204 and target scheduler 1206. In step 1212, source scheduler 1204 obtains the device identifier (ID) of the subject UE. Target scheduler 1206 does the same in step 1214 and adds this UE to its current scheduling operations.
For each device ID that is being managed by the source edge node, source scheduler 1204 finds the UE in current structures in step 1216. Source scheduler 1204 then determines the target scheduler for this UE in step 1218. In step 1220, a communication connection is established between the respective schedulers 1204 and 1206 of the source edge node and the target edge node. In step 1222, source scheduler 1204 determines all tasks (computations) of this UE, and for each task, sets the appropriate value for its computation-status (migration) flag in step 1224.
For implementation optimization, if a certain computation will take too long a time to be considered FINISHED to satisfy the real-time migration demand, the ONGOING computation can be stopped and set as a NEXT computation to let it be migrated to the target edge node to be restarted.
It is to be appreciated that, to this point, it is assumed that the computations in an inference instance that will be migrated to the target are known. As such, the next step is to find the parameters associated with the computations to be migrated.
From a deep learning network associated with an AI model, each layer can be expressed mathematically as:
O
l+1=σ(Wl+1×Ol+bl+1) (Eq. 1)
where Ol+1 and O1 are the outputs of layer l+1 and layer l, σ is the activation function, Wl+1 and bl+1 are the parameters of layer l+1. From Eq. 1 above, it is evident that parameters to a certain computation can include: parameters such as Wl+1 and bl+1; and the output of other computations, e.g., the input to activation function a is the output of Wl+1×Ol and bl+1. So there are two type of parameters to each computation, i.e., the output from other computations and the model parameters. An illustrative explanation of how each type of parameter is handled will now be given.
The output of all computations will always change with different inputs. So all outputs from other computations input to NEXT computations need to be migrated.
To parse the output of other computations, the following information is determined:
(i) on which computations does the current computation depend (i.e., from which computations can the current computation get its input); and
(ii) where are the outputs of the dependent computations located.
Information (i) can be determined by using a reversed computation graph. For example, to migrate the inference T2 of UE1 in
From the reversed computation graph 1304 it is evident that: the NEXT computation D depends on computations B and C, so the output B and C need to be migrated; and the NEXT computation E depends on computations B and D. As B has already migrated for computation D, and D is flagged as a NEXT computation without output, no parameters need to migrate for computation E.
Determining information (ii), i.e., where are these parameters located, is different from deep learning framework to deep learning framework. But for all frameworks, it is assumed they have IRs to indicate all parameters for all computation nodes. For example, in TVM, each output, input, and computation has a unique node number, and from this node number, it is readily determined where the output and input are located. By way of another example, in ONNX, the parameter for each computation can be determined by parsing the above-mentioned protobuf file.
In inference applications, the model parameters will remain unchanged all the time once the training of this model is changed. To optimize the migration performance, the read-only model parameter can be treated as part of the application image and downloaded from the image repository. Therefore, no migration of the model parameters for inference applications is needed in such an illustrative embodiment.
For training applications, not only do the model parameters for all NEXT computations need to be migrated, but also the model parameters for all FINISHED computations as these parameters will be used in the training of the next mini-batch, otherwise all training results before the migration will be lost. Thus, in illustrative embodiments for a training application, instead of migrating model parameters computation by computation, all model parameters are migrated in one piece to improve network transportation performance. Typically, the size of the parameters of a model is very large, but on the other hand, training in an edge computing environment is not typical, and normally such applications have no real-time requirements. As such, this manner of handling the model parameters is acceptable.
Given the above description of illustrative embodiments, migration of runtime states and computation input parameters (i.e., user context migration) can be implemented by adapting the above-described information flow 200 in
1. Upon receiving the “user context transfer initiation” notification (in step 2 of
2. Further, upon receiving the “user context transfer initiation” notification from MEC, a network connection is established by the source and target application instances (i.e., between S-App 204 and T-App 216).
3. Upon receiving the “user context transfer preparation” notification (in step 3 of
4. Upon receiving the “user context transfer execution” notification (in step 4 of
5. Send the message “user context transfer completion” (in step 6 of
Further, illustrative embodiments provide a solution that can be integrated into any framework to run any model. As the solution is based on a fixed computation graph, instead of on application programming interfaces (APIs) provided by a framework, and a framework running a model is based on the computation graph, this solution can be easily integrated into any framework to run any model.
Still further, illustrative embodiments provide a solution that can be used for any type of parallelism. The difference between different parallelisms is the algorithm used inside the framework to sort the computation graph into a linear data structure. This linear data structure is the basis on which the scheduler schedules all computations. Once the computation graph and the parallelism are determined, the resultant linear data structure will not change with time and place, for example, it will not be changed during the migration from the source edge node to the target edge node. So how the scheduler schedules all computations are identical before and after the migration.
Illustrative embodiments also provide a solution that can be used for training and inference applications. The difference between migrating a training application and an inference application is how to migrate the model parameters. For the inference application, the model parameters are not migrated at all but rather downloaded directly from a repository during the application instance phase. For the training application, all model parameters are sent from the source to the target. In such a way, this solution supports user context transfer for both training and inference applications. Further, as this solution maintains the states of each inference instance independently, the solution can migrate multiple inference instances from the same or different UEs at the same time.
Illustrative embodiments are very efficient in both network transportation and execution. During the user context migration, only the states of each computation in the computation graph need to be synchronized, which normally is a very small data structure. For example, assume 1000 computations in a computation graph, and two bits are used for the state of each computation, then that results in about 250 bytes to be transferred. For the input parameters, depending on the parallelism degree, there may be four to eight computations which are in NEXT states. This means that there are four to eight vectors to be transferred. Again, model parameters can be directly downloaded from a repository for which, typically, the network latency is better than that of the edge network. Also, after all data are transferred, the application running on the target node is able to use these states seamlessly without any extra operations.
In summary, illustrative embodiments provide as solution that is very powerful, because it can be integrated into any frameworks, to run any models, with any parallelisms, for both the inference and training applications, yet it is very efficient because only a very small amount of data is transferred, without any extra processing for the user context migration.
As shown, the system 1500 includes a central processing unit (CPU) 1501 which performs various appropriate acts and processing, based on a computer program instruction stored in a read-only memory (ROM) 1502 or a computer program instruction loaded from a storage unit 1508 to a random access memory (RAM) 1503. The RAM 1503 stores therein various programs and data required for operations of the system 1500. The CPU 1501, the ROM 1502 and the RAM 1503 are connected via a bus 1504 with one another. An input/output (I/O) interface 1505 is also connected to the bus 1504.
The following components in the system 1500 are connected to the I/O interface 1505, comprising: an input unit 1506 such as a keyboard, a mouse and the like; an output unit 1507 including various kinds of displays and a loudspeaker, etc.; a storage unit 1508 including a magnetic disk, an optical disk, and etc.; a communication unit 1509 including a network card, a modem, and a wireless communication transceiver, etc. The communication unit 1509 allows the system 1500 to exchange information/data with other devices through a computer network such as the Internet and/or various kinds of telecommunications networks.
Various processes and processing described above may be executed by the CPU 1501. For example, in some embodiments, methodologies described herein may be implemented as a computer software program that is tangibly included in a machine readable medium, e.g., the storage unit 1508. In some embodiments, part or all of the computer programs may be loaded and/or mounted onto the system 1500 via ROM 1502 and/or communication unit 1509. When the computer program is loaded to the RAM 1503 and executed by the CPU 1501, one or more steps of the methodologies as described above may be executed.
Illustrative embodiments may be a method, a device, a system, and/or a computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to carry out aspects of illustrative embodiments.
The computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals sent through a wire. Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of illustrative embodiments may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Various technical aspects are described herein with reference to flowchart illustrations and/or block diagrams of methods, device (systems), and computer program products according to illustrative embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor unit of a general purpose computer, special purpose computer, or other programmable data processing device to produce a machine, such that the instructions, when executed via the processing unit of the computer or other programmable data processing device, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing device, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein includes an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing device, or other devices to cause a series of operational steps to be performed on the computer, other programmable devices or other devices to produce a computer implemented process, such that the instructions which are executed on the computer, other programmable devices, or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams illustrate architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, snippet, or portion of code, which includes one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reversed order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.