Neural networks attempt to simulate the operations of the human brain. They can be incredibly complicated and usually consist of millions of parameters to classify and recognize input they receive. Nowadays, neural networks are widely used in vision tasks, video creation, music generation, and other fields. Techniques of neural network generation are crucial to implementation of neural networks. However, conventional neural network generation techniques may not fulfil needs of users due to various limitations. Therefore, improvements in neural network generation techniques are needed.
The following detailed description may be better understood when read in conjunction with the appended drawings. For the purposes of illustration, there are shown in the drawings example embodiments of various aspects of the disclosure; however, the invention is not limited to the specific methods and instrumentalities disclosed.
Knowledge Distillation (KD) plays an important role in improving neural networks. KD is a model compression method in which a small model is trained to mimic a pre-trained larger model. KD can transfer knowledge from a good performing larger Deep Neural Network (DNN) to a given smaller network. For example, KD can transfer knowledge from a teacher model to a student model. In the past few years, KD has achieved remarkable improvements in training efficient models for image classification, image segmentation, object detection, and so on. Recently, KD is widely implemented in various model deployment over mobiles or other low-power computing devices. Improvements to knowledge distillation can bring strong benefits in numerous applications.
Improvements to automatically find an optimal teaching scheme of KD between a fixed teacher and a given student are desirable. The present disclosure provides techniques for automatically finding a teaching scheme for KD and efficiently learning an optimal KD scheme. For a given pair of teacher and student networks, a set of transmitting feature maps from the teacher network and receiving feature maps from the student network may be sampled and defined. Meanwhile, a set of transform blocks may be added for converting a transmitting feature map to match with a receiving feature map for loss computation. For each pathway, an importance factor a may be assigned, and a differentiable meta-learning pipeline may be used to find its optimal value. In some embodiments, a KD may be performed with the learnt a value.
For a given pathway of distillation, the frame LATTE (LeArning To Teach for KD) in accordance with the present disclosure may generate a weighting process. The weighting process may contain more information beyond the final learnt value. The weighting process is a learnt process for the importance factor a. The weighting process learnt by LATTE may produce better results than adopting a fixed distillation weight to balance different losses. In some embodiments, the learnt process may be adopted to reweight each pathway for KD training and generating a distilled student model for deployment. The techniques described in the present disclosure have been validated based on various vision tasks, such as image classification, image segmentation, and depth estimation. The framework in accordance with the present disclosure performs better than existing KD techniques.
The neural networks for improving knowledge distillation may be integrated into and/or utilized by a variety of systems.
A plurality of computing nodes 118 may perform various tasks, e.g., vision tasks. The plurality of computing nodes 118 may be implemented as one or more computing devices, one or more processors, one or more virtual computing instances, a combination thereof, and/or the like. The plurality of computing nodes 118 may be implemented by one or more computing devices. The one or more computing devices may comprise virtualized computing instances. The virtualized computing instances may comprise a virtual machine, such as an emulation of a computer system, operating system, server, and/or the like. A virtual machine may be loaded by a computing device based on a virtual image and/or other data defining specific software (e.g., operating systems, specialized applications, servers) for emulation. Different virtual machines may be loaded and/or terminated on the one or more computing devices as the demand for different types of processing services changes. A hypervisor may be implemented to manage the use of different virtual machines on the same computing device.
In an embodiment, the cloud network or server 102 and/or the client devices 104 may comprise one or more neural networks. The techniques described in the present disclosure may have been utilized to improve neural networks. For example, the techniques in accordance with the present disclosure may have been utilized to improve vision task models, such as an image classification model 110a, an image segmentation model 110b, a depth estimation model 110n. Other neural networks not depicted in
Feature maps Fit and Fjs may come from any stage of the teacher network and the student network. Consequently, the feature maps may be in different shapes. The feature maps in different shapes may not be compared directly. Thus, additional computation may be required to transform these feature maps into a same shape for comparison. To this end, a plurality of transform blocks 206 may be added after each of the student feature maps Fjs. The plurality of transform blocks 206 may be denoted as Mi,j,1, Mi,j,2, . . . , Mi,J,N. The transform blocks 206 may be any differentiable computation. For instance, the transform blocks 206 may comprise several convolution layers and an interpolation layer to transform the spatial resolution of the feature maps.
For each pair of teacher/student feature maps, a plurality of loss terms 208 may be computed to measure the difference between the teacher feature map and the student feature map. An importance factor αi,j,k (e.g, α1,1,1, a1,1,2, . . . α1,1,N etc.) may be assigned to each loss term. The importance factor αi,j,k may be used to evaluate the importance of each pathway for knowledge distillation.
For a given pair of teacher and student networks, a set of transmitting feature maps from the teacher (e.g., teacher feature maps 204) may be sampled and defined. A set of receiving feature maps from the student (e.g., student feature maps 202) may also be sampled and defined. A set of transforms (e.g., transform blocks 206) may be proposed as well. The set of transform blocks 206 may be pre-defined. The set of transform blocks 206 may convert a receiving feature map to match with a transmitting feature map for loss computation.
A set of distillation pathways from transmitting layers in the teacher network to receiving layers in the student network may be generated. For each pathway, an importance factor may be assigned. A differentiable meta-learning pipeline may be used to find its optimal value. Optimized importance factors may be found and stored. Using the learnt importance factors, each pathway may be reweighted for KD training and generating a distilled student model for deployment.
The example framework 300 may comprise a searching phase 302 and a retraining phase 304. The optimal KD scheme may be found during the searching phase 302. The optimal KD scheme may be found by optimizing the importance factor. The searching phase 302 may be a process of training a student network. A dataset may be split into a training dataset and a validation dataset for the process of training the student network. The student network may be trained on the training dataset with a training loss Ltrain encoding the supervision from both ground truth labels 306 associated with the training dataset and the teacher network (e.g., the teacher feature maps 204b). The validation dataset may be used to evaluate the performance of the student network. In the validation process, a validation loss Lval may only measure a difference between the output of the student network and ground truth labels 308 associated with the validation dataset. In an example, the importance factor 310 and parameters of the student network may be updated alternately in the searching phase. An optimized importance factor minimizing the validation loss may be found in the searching phase.
In the retraining phase 304, the student network may be retrained using the optimized importance factor obtained from the searching phase 302 and all available data. For example, all available data may comprise the training dataset and the validation dataset used during the process of training the student network. Each pathway may be reweighted based on a learned process for the importance factor. In the retraining phase, only the parameters of the student network are updated. Knowledge distillation may be performed by retraining the student network with the optimized importance factor and all available data. For example, knowledge may be transferred from the teacher feature map 204b to the student feature map 202b by retraining the student network using the entire set of data (including both the training dataset and validation dataset used in the searching phase) and the optimized importance factor 312.
In a teacher network, intermediate feature maps may contain plentiful knowledge. The knowledge may be transferred from the teacher network to a student network. For an input image, the output of the student network may be shown as follows.
S(X):=SL
wherein S denotes the student network, X denotes the input image, Si represents the i-th layer of the student network, and Ls represents the number of layers in the student network.
In a student network, the k-th intermediate feature map of the student network may be defined as follows.
F
k
s(X):=Sk∘ . . . ∘S2∘S1(X), 1≤k≤Ls, Equation 2
wherein Fks denotes the k-th intermediate feature map of the student network, X denotes the input image, Sk represents the k-th layer of the student network, and Ls represents the number of layers in the student network.
The intermediate feature map of the teacher neural network may be denoted by Fkt, 1≤k≤Lt, wherein Lt represents the number of layers in the teacher neural network. The i-th feature map of the teacher neural network may be denoted by Fit. The the j-th feature map of the student neural network may be denoted by Fjs. As mentioned above, knowledge may be transferred from the i-th feature map of the teacher neural network (i.e., Fit) to the j-th feature map of the student neural network (i.e., Fjs).
Feature maps Fit and Fjs may come from any stage of the teacher neural network and the student neural network. Consequently, the feature maps may be in different shapes. The feature maps in different shapes may not be compared directly. Therefore, additional computation may be required to transform the feature maps, which are in different shapes, into a same shape for comparison. To implement the transformation, transform blocks (e.g., transform blocks 206 as shown in
The loss term may be used to measure the difference between the feature maps of teacher neural network (i.e., Fit) and the feature maps of the student neural network (i.e., Fjs). The loss may be computed by the following equation.
ι(Fjs, Fit:=δ(M(Fjs), Fit), Equation 3
wherein ι denotes the loss term, M denotes the transform blocks, δ represents the distance function and may be L1 distance, L2 distance, etc.
Input from the example algorithm 400 may comprise a dataset D, a pre-trained teacher model, initialized importance factors α. Nsearch denotes a number of iterations in the searching phase 402. Nretrain denotes a number of iterations in the retraining phase 404. The searching phase 402 may be utilized to search optimal importance factors α. In the searching phase 402, the dataset D may be split into a training dataset Dtrain and a validation dataset Dval for training the student network. For example, 80% of the dataset D may be used for training (i.e., Dtrain) and 20% of the dataset D may be used for validating (i.e., Dval). The student model may be trained on the training dataset Dtrain with a loss encoding the supervision from both the ground truth label and the teacher neural network. The validation dataset Dval may be used to evaluate the performance of the trained student on unseen inputs. During validation, the validation loss may only measure the difference between the output of the student and the ground truth label.
The training dataset may be represented by Dtrain:={(Xi, yi)}i=1|D
The student model may be trained on the training dataset Dtrain with a loss encoding the supervision from both the ground truth label and the teacher neural network. The loss on the training dataset, Ltrain(w, α), may be defined as follows.
wherein w denotes the parameters of the student neural network, α denotes the importance factors and α ∈≥0L
The validation dataset may be used to evaluate the performance of the trained student on unseen inputs. In the process of validation, the validation loss may only measure the difference between the output of the student and the ground truth. The loss on the validation set, Lval(w), may be defined as follows.
wherein w denotes the parameters of the student neural network, Dval denotes the validation dataset, X represents the input image, y represents the label of image X, δlabel represents a distance function which measures difference between labels, S(X) represents the output of the student neural network.
The optimization problem may be formulated as follows.
wherein w*(α) denotes the parameters of the student network trained with the importance factor α, Lval(w*(α)) denotes the loss of w*(α) on the validation dataset, Ltrain(w, α) denotes the loss on the training dataset.
The optimal KD scheme may be found by optimizing the importance factor α. An optimal importance factor minimizing the validation loss Lval(w*(α)) may be found in the searching phase. However, this is a nested optimization problem and is difficult to solve. To address the issue, gradient-based method may be utilized. Instead of computing gradient at the exact optimum w*(α) of the inner optimization, the gradient with respect to α, i.e., ∇αLval(w*(α)), may be computed at the result of the single-step gradient descent as follows.
∇αLval(w*(α))≈∇αLval(w−ξ∇wLtrain(w, α)), Equation 7
wherein α represents the importance factor, w represents the current parameters of the student neural network, ξ represents the learning rate of the inner optimization, Lval represents the loss on validation dataset, and Ltrain represents the loss on training dataset.
The parameters of the student neural network trained with importance factor w*(α) may be approximated with a single step of gradient descent from the current parameter w. More sophisticated gradient-based method may be used to solve the inner optimization, e.g., gradient descent with momentum. When using other gradient-based method, Equation 7 may be modified accordingly. The chain rule may be applied to Equation 7. The result of applying the chain rule may be shown as follows.
∇αLval(w−ξ∇wLtrain(w, α))=−ξ∇α,w2Ltrain(w, α)∇hd wLval(w′), Equation 8
wherein w′=w−ξ∇w−Ltrain(w, α). In Equation 8, there are second-order derivatives which may result in expensive computation. Therefore, the second-order derivatives may be approximated with finite difference. Consequently, the follow equation may be obtained.
wherein w+ denotes w+∈∇wLval(w′), w− denotes w−∈∇wLval(w′), and ∈ denotes a small positive scalar.
The following approximation may be obtained:
wherein Lval denotes the loss on validation dataset, Ltrain denotes the loss on training dataset, w*(α) denotes the parameters of the student neural network trained with importance factor α, and ξ denotes the learning rate of the inner optimization.
To evaluate the expression in Equation 10, the following items may be computed. First, computing w′ may require a forward pass and a backward pass of the student and a forward pass of the teacher. Afterwards, computing w± may require a forward pass and a backward pass of the student. Finally, computing ∇αLtrain(w±, α) may require two forward passes of the student. The gradient of Ltrain with respect to an element of α is just the feature map loss corresponding to this element, so no further backward pass of the student is needed. In conclusion, evaluating the approximated gradient in Equation 10 entails one forward pass of the teacher, and four forward passes and two backward passes of the student.
The importance of each knowledge transfer may be adjusted by regulating α. The importance factor α may be optimized to find the optimal KD scheme. In fact, the real decision variable of the optimization is {tilde over (α)} instead of α. For the purpose of numerical stability, normalization may be applied to {tilde over (α)}. The importance factor α may be obtained by normalizing {tilde over (α)}. A plurality of normalization methods may be evaluated.
The importance factors a and the parameters of the student neural network w may be updated alternately in the searching phase 402. Due to the efficiency for KD scheme learning of the gradient-based method, importance factors may be updated by descending a gradient approximation based on Equation 10. The evolution of the importance factor in the searching phase 402 may encode much richer information than the final importance factor.
In the retraining phase 404, the optimal importance factors found in the searching phase 402 may be used for KD training and generating a distilled student model for deployment. Only the parameters of the student neural network w may be updated during the retraining phase 404. The retraining phase 404 may be configured to retrain the student neural network with the optimal importance factor and all available data. All available data D may comprise the training dataset Dtrain and the validation dataset Dval used during the process of training the student network.
There may be different ways to use the optimal importance factors. In some embodiments, the importance factor obtained at the last iteration in the searching phase may be used for each iteration of the retraining phase. The student network may be retrained using the same importance factor obtained at the last iteration for each iteration of the retraining process. The evolution of the importance factor in the searching phase may encode much richer information than the final importance factor. To make use of that information, in other embodiments, each iteration of the retraining process may use different importance factors. To this end, a new value from the stored importance factors may be loaded (shown in Line 11 in
At 502, a search space may be configured by establishing a plurality of pathways between a teacher network and a student network and assigning an importance factor to each of the plurality of pathways. The teacher network is pre-trained. The search space (e.g., the search space as shown in
In one embodiment, for a given pair of teacher and student networks, a set of transmitting feature maps of the teacher network and receiving feature maps of the student network may be sampled and defined. A plurality of pathways from transmitting layers to receiving layers may be established. In an example, a transform block may be added after each feature map of the student network. The transform block may convert a receiving feature map of the student network to match with a transmitting feature map of the teacher network for loss computation. The transform block may be any differentiable computation. In one example, a transform block may comprise several convolution layers and an interpolation layer to transform the spatial resolution of the feature map.
At 504, an optimal KD scheme may be searched by updating the importance factor and parameters of the student network during a process of training the student network. The optimal KD scheme may be searched during a process of training the student network. A dataset may be split into a training dataset and a validation dataset for the process of training the student network. The student network may be trained on the training dataset with a training loss encoding the supervision from both the teacher network and the ground truth labels associated with the training dataset. The validation dataset may be used to evaluate the performance of the student network. In the validation process, a validation loss may only measure a difference between the output of the student network and the ground truth labels associated with the validation dataset.
In one example, the importance factors and the parameters of the student neural network may be updated alternately during the process of training the student network. The training process is for searching an optimal scheme. The importance factor a obtained in each iteration may be stored for future use. The optimized importance factor may be found in the searching phase. A learned process for the importance factor may comprise much richer information than the final importance factor value. For example, it has been found that the weights at pathways from low-level feature maps of the teacher networks are relatively large at the beginning and small at the end; however, the weights at pathways from high-level feature maps of the teacher networks are relatively small at the beginning and large at the end. This information indicates that an optimal routine for KD could be that the student network learns simple knowledge at early stage and learns difficult knowledge at later stage.
At 506, knowledge distillation may be performed from the teacher network to the student network by retraining the student network based at least in part on the optimized importance factors. An optimal KD scheme may be identified by optimizing the importance factor. The optimized importance factor may be found during the process of training the student network. The optimized importance factor as well as all available data may be used to retrain the student network to perform KD. All the available data may comprise the training dataset and the validation dataset used during the process of training the student network. During the retraining process, only parameters of the student network are updated.
During the retraining process, there may be different ways to use the optimized importance factor obtained from the searching phase. In some embodiments, the importance factor obtained at the last iteration in the searching phase may be used for each iteration of the retraining phase. The student network may be retrained using the same importance factor obtained at the last iteration for each iteration of the retraining process. The evolution of the importance factor in the searching phase (i.e., the learnt process for important facotr) may encode much richer information than the final importance factor value. To make use of that information, in other embodiments, each iteration of the retraining process may use different importance factors. Since a number of iterations in the retraining process may be different from a number of iterations in the training process, linear interpolation may be used to compute the different importance factors for each iteration in the retraining process.
At 602, a search space may be configured by establishing a plurality of pathways between a teacher network and a student network and assigning an importance factor to each of the plurality of pathways. The teacher network is pre-trained. The search space (e.g., the search space as shown in
In one embodiment, for a given pair of teacher and student networks, a set of transmitting feature maps of the teacher network and receiving feature maps of the student network may be sampled and defined. A plurality of pathways from transmitting layers to receiving layers may be established. An importance factor may be assigned to each of the plurality of pathways. The importance factor may be used to evaluate the importance of each pathway for knowledge distillation. The optimal KD scheme may be found by optimizing the importance factor.
At 604, a transform block may be added after each feature map of the student network. Knowledge may be transferred from at least one feature map of the teacher network to the student network. The transform block may comprise convolution layers and an interpolation layer. In one embodiment, knowledge may be transferred from any feature map of the teacher network (i.e., Fit) to any feature map of the student network (i.e., Fjs) by penalizing the difference between these two feature maps. Since the feature maps may come from any stage of the neural network, they might be in different shapes and thus not directly comparable. Thus, additional computation may be required to bring these two feature maps into the same shape. To this end, a transform block may be added after each feature map of the student network. The transform block may convert a receiving feature map of the student network to match with a transmitting feature map of the teacher network for loss computation. The transform block could be any differentiable computation. A transform block may comprise several convolution layers and an interpolation layer to transform the spatial resolution of feature maps.
Referring back to
In one example, the importance factors and the parameters of the student neural network may be updated alternately during the process of training the student network. The importance of each knowledge transfer (i.e., each pathway) may be adjusted by regulating the importance factor. Due to the efficiency for KD scheme learning of the gradient-based method, importance factors may be updated by descending a gradient approximation, such as the approximation based on Equation 10.
The validation dataset may be used to evaluate the performance of the student network. At 608, the trained student may be evaluated on a validation dataset. A validation loss may only measure a difference between an output of the student network and the ground truth label information. The ground truth label information may be associated with the validation dataset. In one embodiment, the validation dataset may comprise 20% of the entire dataset. The loss on the validation dataset (i.e., validation loss), for example, may be defined based on Equation 5. An optimal importance factor minimizing the validation loss may be found in the searching phase. An optimal KD scheme may be identified to minimize the validation loss by applying a gradient-based mechanism. The optimal importance factors may be stored and used for the retraining phase.
The retraining phase may be configured to retrain the student network with the optimized importance factor. At 610, the student network may be retrained using the optimized importance factors and an entire set of data. The entire set of data may comprise a training dataset and a validation dataset used during the process of training the student network. In one embodiment, the optimized importance factor found in the searching phase may be used for KD in the retraining phase. Each pathway may be reweighted based on the optimal importance factor obtained from the searching phase. During retraining, only the parameters of the student network may be updated. The retraining phase may be utilized to retrain the student neural network with the optimal importance factor and all the available data. All the available data may comprise the training dataset and the validation dataset used during the process of training the student network (i.e., the searching phase). Knowledge distillation may be performed from the teacher network to the student network by retraining the student network with the optimal importance factor.
There may be different ways to use the optimal importance factors found in the searching phase. For example, the retraining phase may only use the optimized importance factors obtained at the last iteration in the searching phase for each iteration of the retraining phase. The student network may be retrained using the same importance factor obtained at the last iteration in the searching phase for each iteration of the retraining process. The evolution of the importance factor in the searching phase may encode much richer information than the final importance factor value. To make use of that information, in another example, each iteration of the retraining process may use different importance factors. Since a number of retaining iterations may be different from a number of training iterations, linear interpolation may be used to compute the different importance factors for each iteration in the retraining process.
To evaluate the performance of the framework described in the present disclosure, a plurality of benchmark tasks may be adopted. The plurality of benchmark tasks may include image classification, semantic segmentation, and depth estimation. For image classification, popularly used CIFAR-100 dataset may be used. For semantic segmentation, CityScapes dataset may be used. For depth estimation, NYUv2 dataset may be used. The proposed method may be compared mainly with knowledge review and corresponding baseline models on each task. All the methods may use the same training setting and hyper-parameters to implement a fair comparison. The training setting may comprise data pre-processing, learning rate schedule, number of training epochs, batch size and so on.
Different network architectures are adopted for performance comparison. The network architectures may comprise ResNet, WideResNet, MobileNet, and ShuffleNet. The models may be trained for 240 epochs. The learning rate may be decayed by 0.1 for every 30 epochs after the first 150 epochs. Batch size is 128 for all the models. The initial learning rate is 0.02 for ShuffleNet and 0.1 for other models. The models may be trained with the same setting five times. The mean and variance of the accuracy on the testing set may be reported.
Using the CIFAR-100 dataset, the search may be run for 40 epochs. The learning rate for w (i.e., parameters of the student neural network) may be decayed by 0.1 at epoch 10, 20, and 30. The learning rate for α may be set to 0.05. Not all feature maps are used for knowledge distillation. Instead, only the ones after each down sampling stage may be used. To make the comparison fair and meaningful, Hierarchical Context Loss (HCL) may be used. For the retraining phase, linear interpolation may be used to expand the process of α from 40 epoch to 240 epochs to match its needed in KD.
Results are average values based on 5 runs. Variances are reported in the parentheses. The results prove that the LATTE scheme has significant improvements compared to other neural network architectures. In
There may be two ways to use importance factor α. The first way is adopting the final learnt importance factor α values at the end of training. The results are shown in the row “Use final α”. In the row “Use final α”, the finally converged importance factor α is used at each iteration of the retrain phase.
The second way is adopting the learnt process (i.e., the evolution of the importance factor in the searching phase). The results are shown in the row “LATTE”.
The importance factor a generated in the searching phase may be used in different ways. For example, as shown in
The importance factor may be used to evaluate the importance of each pathway for knowledge distillation. For the purpose of numerical stability, normalization may be applied to the importance factor.
The computing device 1500 may include a baseboard, or “motherboard,” which is a printed circuit board to which a multitude of components or devices may be connected by way of a system bus or other electrical communication paths. One or more central processing units (CPUs) 1504 may operate in conjunction with a chipset 1506. The CPU(s) 1504 may be standard programmable processors that perform arithmetic and logical operations necessary for the operation of the computing device 1500.
The CPU(s) 1504 may perform the necessary operations by transitioning from one discrete physical state to the next through the manipulation of switching elements that differentiate between and change these states. Switching elements may generally include electronic circuits that maintain one of two binary states, such as flip-flops, and electronic circuits that provide an output state based on the logical combination of the states of one or more other switching elements, such as logic gates. These basic switching elements may be combined to create more complex logic circuits including registers, adders-subtractors, arithmetic logic units, floating-point units, and the like.
The CPU(s) 1504 may be augmented with or replaced by other processing units, such as GPU(s). The GPU(s) may comprise processing units specialized for but not necessarily limited to highly parallel computations, such as graphics and other visualization-related processing.
A user interface may be provided between the CPU(s) 1504 and the remainder of the components and devices on the baseboard. The interface may be used to access a random access memory (RAM) 1508 used as the main memory in the computing device 1500. The interface may be used to access a computer-readable storage medium, such as a read-only memory (ROM) 1520 or non-volatile RAM (NVRAM) (not shown), for storing basic routines that may help to start up the computing device 1500 and to transfer information between the various components and devices. ROM 1520 or NVRAM may also store other software components necessary for the operation of the computing device 1500 in accordance with the aspects described herein. The user interface may be provided by a one or more electrical components such as the chipset 1506.
The computing device 1500 may operate in a networked environment using logical connections to remote computing nodes and computer systems through local area network (LAN). The chipset 1506 may include functionality for providing network connectivity through a network interface controller (NIC) 1522, such as a gigabit Ethernet adapter. A NIC 1522 may be capable of connecting the computing device 1500 to other computing nodes over a network 1513. It should be appreciated that multiple NICs 1522 may be present in the computing device 1500, connecting the computing device to other types of networks and remote computer systems.
The computing device 1500 may be connected to a storage device 1528 that provides non-volatile storage for the computer. The storage device 1528 may store system programs, application programs, other program modules, and data, which have been described in greater detail herein. The storage device 1528 may be connected to the computing device 1500 through a storage controller 1524 connected to the chipset 1506. The storage device 1528 may consist of one or more physical storage units. A storage controller 1524 may interface with the physical storage units through a serial attached SCSI (SAS) interface, a serial advanced technology attachment (SATA) interface, a fiber channel (FC) interface, or other type of interface for physically connecting and transferring data between computers and physical storage units.
The computing device 1500 may store data on a storage device 1528 by transforming the physical state of the physical storage units to reflect the information being stored. The specific transformation of a physical state may depend on various factors and on different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the physical storage units and whether the storage device 1528 is characterized as primary or secondary storage and the like.
For example, the computing device 1500 may store information to the storage device 1528 by issuing instructions through a storage controller 1524 to alter the magnetic characteristics of a particular location within a magnetic disk drive unit, the reflective or refractive characteristics of a particular location in an optical storage unit, or the electrical characteristics of a particular capacitor, transistor, or other discrete component in a solid-state storage unit. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this description. The computing device 1500 may read information from the storage device 1528 by detecting the physical states or characteristics of one or more particular locations within the physical storage units.
In addition or alternatively to the storage device 1528 described herein, the computing device 1500 may have access to other computer-readable storage media to store and retrieve information, such as program modules, data structures, or other data. It should be appreciated by those skilled in the art that computer-readable storage media may be any available media that provides for the storage of non-transitory data and that may be accessed by the computing device 1500.
By way of example and not limitation, computer-readable storage media may include volatile and non-volatile, transitory computer-readable storage media and non-transitory computer-readable storage media, and removable and non-removable media implemented in any method or technology. Computer-readable storage media includes, but is not limited to, RAM, ROM, erasable programmable ROM (“EPROM”), electrically erasable programmable ROM (“EEPROM”), flash memory or other solid-state memory technology, compact disc ROM (“CD-ROM”), digital versatile disk (“DVD”), high definition DVD (“HD-DVD”), BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, other magnetic storage devices, or any other medium that may be used to store the desired information in a non-transitory fashion.
A storage device, such as the storage device 1528 depicted in
The storage device 1528 or other computer-readable storage media may also be encoded with computer-executable instructions, which, when loaded into the computing device 400, transforms the computing device from a general-purpose computing system into a special-purpose computer capable of implementing the aspects described herein. These computer-executable instructions transform the computing device 1500 by specifying how the CPU(s) 1504 transition between states, as described herein. The computing device 1500 may have access to computer-readable storage media storing computer-executable instructions, which, when executed by the computing device 1500, may perform the methods described in the present disclosure.
A computing device, such as the computing device 1500 depicted in
As described herein, a computing device may be a physical computing device, such as the computing device 1500 of
One skilled in the art will appreciate that the systems and methods disclosed herein may be implemented via a computing device that may comprise, but are not limited to, one or more processors, a system memory, and a system bus that couples various system components including the processor to the system memory. In the case of multiple processors, the system may utilize parallel computing.
For purposes of illustration, application programs and other executable program components such as the operating system are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing device, and are executed by the data processor(s) of the computer. An implementation of service software may be stored on or transmitted across some form of computer-readable media. Any of the disclosed methods may be performed by computer-readable instructions embodied on computer-readable media. Computer-readable media may be any available media that may be accessed by a computer. By way of example and not meant to be limiting, computer-readable media may comprise “computer storage media” and “communications media.” “Computer storage media” comprise volatile and non-volatile, removable and non-removable media implemented in any methods or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Exemplary computer storage media comprises, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information, and which may be accessed by a computer. Application programs and the like and/or storage media may be implemented, at least in part, at a remote system.
As used in the specification and the appended claims, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect.
It will be apparent to those skilled in the art that various modifications and variations may be made without departing from the scope or spirit. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims.