Data may be stored in computer-readable databases. These databases may store large volumes of data collected over time. Computers may be used to retrieve and process the data stored in databases.
Increasing volumes of data create increased complexity when storing, manipulating, and assessing the data. For example, with increases in the connectively of devices and the number of sensors in the various components of each device making time-series measurements, the generated data is increasingly voluminous and complex.
Complexity in retrieving and manipulating datasets may arise from the complex data structures of systems, system components, and component attributes and their corresponding values. In addition, such complexity may arise from the large volumes of data generated by lengthy time-series measurements related to ensembles of numerous systems.
Memory 105 may include a non-transitory machine-readable storage medium that may be an electronic, magnetic, optical, or other physical storage device that stores executable instructions. The machine-readable storage medium may include, for example, random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), flash memory, a storage drive, an optical disc, and the like. The machine-readable storage medium may be encoded with executable instructions. In some example systems, memory 105 may include a database.
Memory 105 is to store data 115 including an entity value 120 of an entity stored in association with an attribute value 125 of an attribute of the entity. Entity value 120 and attribute value 125 may be associated with one another in a suitable manner; for example, entity value 120 and attribute value 125 may be stored in a common row of a data table stored in memory 105.
Processor 110 may store an entity value identifier 135 in association with an attribute value identifier 140 to obtain modified data 130. Entity value identifier 135 may be associated with entity value 120 and attribute value identifier 140 may be associated with attribute value 125. The values and their corresponding identifiers may be associated with one another in a suitable manner; for example, a value and its corresponding identifier may be stored in a common row of a table stored in memory 105 and/or in another data storage.
In addition, processor 110 may transform modified data 130 by applying a transformation to modified data 130 to obtain transformed data 145. Transformed data 145 may include an entity value identifier 150 stored in association with an attribute value identifier 155. In some example systems, entity value identifier 150 may be the same as entity value identifier 135, and attribute value identifier 155 may be the same as attribute value identifier 140. The transformation may be used to condition modified data 130 for subsequent use or processing. For example, if modified data 130 comprises time-series data with missing data points at one or more of the data collection time points, the transformation may comprise filling in the missing data points using imputation and/or other suitable techniques. While imputation to fill in missing time series data points is described herein, it is contemplated that other suitable transformations may be applied to modified data 130 to obtain transformed data 145.
Moreover, processor 110 may output further modified data 160 from transformed data 145. Further modified data 160 may comprise transformed data 145 with entity value identifier 150 replaced with an entity value 165 and attribute value identifier 155 replaced with an attribute value 170. Entity value 165 may be the same as entity value 120, and attribute value 170 may be the same as attribute value 125. This further modification may allow further modified data 160 to be presented in terms of entity and attribute values, similar to data 115.
To output further modified data 160, processor 110 may store further modified data 160 in memory 105 and/or another storage, send further modified data 160 to another component of system 100 or to another system, send further modified data 160 to an output terminal (not shown) of system 100, or the like.
In the example where one or more of the data types is replaced by its successor, modified data 130 may replace data 115, whereby entity value identifier 135 may replace entity value 120, and attribute value identifier 140 may replace attribute value 125. Similarly, transformed data 145 may replace modified data 130, and further modified data 160 may replace transformed data 145. These successive replacements may avoid the need to store multiple versions of the data in memory 105, which in turn may yield storage capacity savings when handling datasets. The larger the datasets, the larger will be the storage capacity savings.
The entity and attribute value identifiers may be incrementable. An incrementable identifier may be one where the next identifier may be obtained by incrementing the previous identifier. Incrementable identifiers may be those identifiers where, in order to determine the next identifier to be used, it is not necessary to consult a reference such as a look-up table. A series of incrementable identifiers may be deterministic, in that given an identifier, the next identifier is quickly obtainable. Eliminating or reducing the need to consult a reference to determine the next identifier may reduce the amount of computational resource such as time, energy, working memory; and processing power needed to generate modified data 130 by assigning identifiers to values in data 115. Examples of incrementable identifiers include numbers, such as natural numbers, integers, and the like.
In addition, modifying the data by replacing values with incrementable identifiers prior to the transformation may reduce the amount of memory and other computational resources used to perform the transformation. For example, replacing longer strings of values with relatively shorter natural number identifiers may allow the information to be stored using fewer characters. These fewer characters in turn may require less memory to store, and take up less computational resources during the transformation.
In some example systems, processor 110 may assign to an additional unique entity value a next incremented entity value identifier; and assign to an additional unique attribute value a next incremented attribute value identifier. In this manner, each additional unique entity or attribute value may be assigned an identifier by incrementing the identifier to the next incremented identifier. A unique entity value may be unique among the entity values, and need not be unique when compared to the attribute values. Similarly, a unique attribute value may be unique among the attribute values, and need not be unique when compared to the entity values.
The size of the increment may be predetermined; for example, natural number identifiers may be incremented by 1, real number identifiers may be incremented by 0.1, and the like. Examples of assigning identifiers to additional entity and attribute values are shown in
Moreover, in some example systems, data 115 further comprises a time value stored in association with attribute value 125 and entity value 120. To obtain modified data 130, processor 110 may apply a time transformation to the time value to generate a modified time value, and store the modified time value in association with entity value identifier 135 and attribute value identifier 140.
This type of time transformation may condition the time value for further use or processing of the data. An example of such a time transformation may include truncating or removing characters from the time value, which may in turn reduce the amount of storage or other computational resources needed to handle the time values. When the time value comprises a date, the time transformation may comprise converting the date into a format having a precision of one day. Examples of time transformations are shown in
In some example systems, data 115 may comprise entity value 120 and attribute value 125 stored in association with a latest time value. Data 115 may further comprise a further entity value and a corresponding further attribute value stored in associate with a further latest time value. The further latest time value may be later than the latest time value by a data collection time point. To obtain transformed data 145, processor 110 may, for the data collection time point, store an imputed entity value identifier in association with an imputed attribute value identifier. The imputed entity value identifier may be associated with an imputed entity value and the imputed attribute value identifier may be associated with an imputed attribute value corresponding to the imputed entity value.
When data 115 includes time values, i.e. when data 115 is in the form of a time-series data which has missing data points, imputation may be used to fill in the missing data points in the time series data. Examples of time imputation are shown in
In some example systems, data 115 may further comprise an additional entity value of the entity stored in association with an additional attribute value. In data 115, entity value 120 and attribute value 125 may be stored in association with a time point in a row of a data table. Moreover, the additional entity value and the additional attribute value may be stored in association with the time point in another row of the data table. In further modified data 160, attribute value 170 and the additional attribute value are stored in a given row of a modified data table, the given row further containing the time point, entity value 165, and the additional entity value. Entity value 120 may be the same as entity value 165, and attribute value 125 may be the same as attribute value 170.
Combining multiple rows of the data table associated with the same data collection time point on one row may reduce the number of rows that need to be reviewed in order to assess and draw a conclusion from the data relating to a given time point. An example of this combining of rows is shown in
In addition, in some example systems processor 110 may, before modifying data 115 to generate modified data 130, store entity value identifier 135 in association with entity value 120, and attribute value identifier 140 in association with attribute value 125. Keeping a record of the associations between the values and their corresponding identifiers may allow replacement of the identifiers of transformed data 145 with their corresponding values to generate further modified data 160. An example of a data table storing the values in association with their identifiers is shown in
In some example systems, processor 110 may assess attribute value 170 in further modified data 160 using a predetermined criterion. This assessment may be used to draw conclusions from modified data 160. An example of such an assessment is described in relation to
Some example systems may be implemented using Apache™ SPARK and Apache™ HADOOP™ within a custom Scala 2.x application that integrates with Amazon™ Redshift and Amazon™ EMR. In these example systems the Amazon™ Redshift database may be used to store the initial data and/or one or more of the modified, transformed, and further modified versions of the data. Amazon™ Redshift JDBC client may be used to provide communication to and from the Amazon™ Redshift database.
The removal of the time-of-day information from the time value reduces the number of characters needed to store the time value, which in turn may reduce the amount of memory and other computational resources used during the subsequent transformation of the data.
It is contemplated that in other examples, a different suitable time transformation may be applied, and that the transformed time value may have a precision other than one day.
The second change to the data shown in
In
Referring to
Once the data is transformed and further modified, the attribute values in
While
Similarly, AttValue1 is assigned identifier ‘1’. The next unique attribute value AttValue2 is assigned the next incremented identifier ‘2’. Moreover, the next unique attribute value AttValue3 is assigned the next incremented identifier ‘3’. It can be seen that replacing value strings, e.g. EntValue1 and AttValue1, with shorter natural number identifiers, e.g. ‘1’, may reduce the amount of storage needed to store the modified data and the amount of computational resources needed to transform the modified data.
While not shown in the drawings, it is contemplated that
As discussed above, manipulating and/or transforming the modified data comprising identifiers may use less memory and/or other computational resources than using the original data comprising values. As such, performing the identification of missing data points using identifiers as shown in
Turning now to
Instructions to access data 605 may comprise instructions to cause the processor to access data comprising an entity value of an entity stored in association with an attribute value of an attribute of the entity. Moreover, instruction to generate modified data 610 may comprise instructions to cause the processor to generate modified data by replacing in the data the entity value with an entity value identifier and the attribute value with an attribute value identifier. The entity value identifier and the attribute value identifier may be incrementable.
Furthermore, instructions to generate transformed data 615 may comprise instructions to cause the processor to generate transformed data by applying a transformation to the modified data. Instructions to output further modified data 620, in turn, may comprise instructions to cause a processor to output further modified data by replacing in the transformed data the entity value identifier with the entity value and the attribute value identifier with the attribute value.
CRSM 600, and the instructions stored therein, may cause a processor to perform a selection of or all of the functions described therein.
In some example CRSMs, the instructions may further case the processor to, before the modified data is generated, store the entity value identifier in association with the entity value and the attribute value identifier in association with the attribute value.
Furthermore, in some example CRSMs, to generate the modified data, the instructions may further cause the processor to replace an additional unique entity value with a next incremented entity value identifier and replace an additional unique attribute value with a next incremented attribute value identifier. The entity value identifier and the attribute value identifier may comprise natural numbers.
Moreover, in some example CRSMs, the data may comprise the entity value and the attribute value stored in association with a latest time value. The data may further comprise a further entity value and a corresponding further attribute value stored in associate with a further latest time value. The further latest time value may be later than the latest time value by a data collection time point. Furthermore, to generate the transformed data, the instructions may cause the processor to, for the data collection time point, store an imputed entity value identifier in association with an imputed attribute value identifier. The imputed entity value identifier may be associated with an imputed entity value, and the imputed attribute value identifier may be associated with an imputed attribute value corresponding to the imputed entity value.
Moreover, box 720 includes generating modified data. The modified data may be generated by storing the entity value identifier in association with the attribute value identifier. Box 725, in turn, includes generating transformed data, which may be generated by applying a transformation to the modified data. Furthermore, box 730 includes outputting further modified data. The outputting the further modified data may include replacing in the transformed data the entity value identifier with the entity value and the attribute value identifier with the attribute value.
In some examples, method 700 may further include a selection of or all of the features and/or functions described therein. For example, method 700 may further include assigning to an additional unique entity value a next incremented entity value identifier and assigning to an additional unique entity value a next incremented entity value identifier. Examples of assigning next incremented identifiers have been discussed in relation to
Furthermore, in some examples of method 700, the data may further comprise a time value stored in association with the attribute value and the entity value. The generating the modified data of box 720 may comprise: generating a modified time value by applying a time transformation to the time value, and storing the modified time value in association with the entity value identifier and the attribute value identifier. The time value may comprise a date, and the time transformation may comprise converting the date into a format having a precision of one day. Examples of time transformations have been discussed in relation to
In addition, in some examples of method 700, the data may comprise the entity value and the attribute value stored in association with a latest time value. Moreover, the data may further comprise a further entity value and a corresponding further attribute value stored in associate with a further latest time value, the further latest time value being later than the latest time value by a data collection time point. In such a case, the generating the transformed data may comprise: for the data collection time point, storing an imputed entity value identifier in association with an imputed attribute value identifier. The imputed entity value identifier may be associated with an imputed entity value and the imputed attribute value identifier may be associated with an imputed attribute value corresponding to the imputed entity value. The imputed entity value and the imputed attribute value may be generated using a last observation carried forward imputation. Examples of such imputations have been discussed in relation to
In some examples, method 700 may further comprise assessing the attribute value in the further modified data using a predetermined criterion. Examples of such assessments have been discussed in relation to
Moreover, in some examples of method 700, the data may further comprise an additional entity value of the entity stored in association with an additional attribute value. In the data, the entity value and the attribute value may be stored in association with a time point in a row of a data table, and the additional entity value and the additional attribute value may be stored in association with the time point in another row of the data table. In the further modified data, the attribute value and the additional attribute value may be stored in a given row of a modified data table. The given row may further contain the time point, the entity value, and the additional entity value. An example of storing modified data on a given row is discussed in relation to
Referring now to
Storage 820 may comprise StorageDevice1 850 and StorageDevice2 855. Furthermore, information relating to adapter 860 may be recoded in association with graphics 825.
While
The changes to the data shown in
Moreover, in
In
Furthermore, in
Combining data values for a given date on the same row may allow for assessing the system according to a predetermined criterion to obtain a conclusion about the system on the given date. For example, the given criterion may be that storage capacity for a storage device being below 20% indicates that the system is unhealthy.
For November 13, assessing the health of the system may comprise determining the smallest value in the capacity cell of the data table in
After the assessment of the further modified data is completed, the further modified data may be stored in the database for later reference and/or processing.
The systems, CRSMs, and methods described herein may include the features and/or perform the functions described herein in association with one or a combination of the other systems, CRSMs, and methods described herein.
The systems, CRSMs, and methods described herein may allow large datasets to be manipulated and/or conditioned using reduced memory and/or other computational resources. Moreover, datasets with complex data structures may be transformed, conditioned, and/or assessed while allowing the data to be reassembled into substantially its original data structure.
It should be recognized that features and aspects of the various examples provided above may be combined into further examples that also fall within the scope of the present disclosure.