The amount of data in an enterprise presents challenges. Indeed for some enterprises the amount of data grows in volume at an exponential rate. Such data may reside in a data store in different file formats. This data may include business information related to revenue, sales, operational data, or the like, associated with the enterprise. For instance, the sales data includes sales information represented by different attributes and associated values. Some attributes and values may be identical. Conventional data processing systems access and retrieve the data from the data store, analyze the attribute and values, generate results based on the analysis and display it on a user interface. However, the conventional data processing systems may not provide a mechanism to modify the attribute values in real time via a user interface. The conventional data processing systems do not provide a mechanism to determine the modified values. Hence, identifying the modified attributes and values in a large volume of data becomes challenging.
The claims set forth the embodiments with particularity. The embodiments are illustrated by way of examples and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. The embodiments, together with its advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings.
Embodiments of techniques for determination of data modification are described herein. In the following description, numerous specific details are set forth to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail.
Reference throughout this specification to “one embodiment”, “this embodiment” and similar phrases, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one of the one or more embodiments. Thus, the appearances of these phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Volume of business data associated with an enterprise has evidenced an exponential growth as a function of time. The business data may be represented as datasets having data fields and may reside in a data store in different file formats. To sustain growing business demands, enterprises may need systems including applications that transmute this data into meaningful information. These applications transmute the data by processing the data, analyzing, and structuring the data to convey useful information. These applications may be configured to implement business intelligence techniques, advanced data processing techniques, and mathematical models. Further, the applications may provide design and runtime tools for generating charts from the data, and the like, to analyze and structure the data into useful information.
The applications herein referred to as business intelligence (BI) applications or a BI application may be developed using different technologies and may be deployed on diverse platforms or frameworks. The BI application may provide a collaborative platform for sharing data, managing and sharing knowledge and information and regulating flow of the data including information across the enterprise. The BI application may be operable to connect to operational data store and interpret associated data definitions. Based on the associated data definitions, the BI application may be able to identify the association of data—with diverse processes of the enterprise. In an embodiment, the BI application transmutes the data into useful information, to provide assistance in making important business decisions. The BI application hence provides a standalone consistent solution to process, analyze and structure the data to provide fact based support systems for the enterprises.
In an embodiment, an application, for example a BI application 110, is communicatively coupled to the data store 120. The BI application 110 includes multiple interfaces that provide a diverse set of functionalities. The set of functionalities may include selecting the dataset from the data store 120; filtering the dataset to generate consistent data format; sorting the dataset per a user's preference; displaying the selected datasets retrieved from data store 120; providing tools and models to process, analyze and structure the dataset in a user defined format; generating and rendering visualizations based on the analysis and structuring of the dataset, and the like.
In an embodiment, based on the selected dataset 130 from the data store 120, the BI application 110 retrieves dataset 130 including the data fields 140, and displays the dataset 130 on a user interface UI) in multiple cells arranged as rows and columns. The data fields 140 include attributes, represented by the columns and associated values represented by the rows. A function, for example a hash function, is associated with the rows of the dataset 130. Based on the hash function, the BI application 110 generates database indices, for example a first database index corresponding to each row and stores the first indices in a column, for example, a technical column, associated with the dataset. The technical column including the first indices is stored in the data store 120, in another embodiment, the BI application 110 may generate database indices based on an algorithm for example, a hash algorithm associated with the rows of the dataset 130.
In an embodiment, the UI is operable to receive an input from a user in real time to modify or manipulate the dataset 130 corresponding to the row of the dataset 130. The BI application 110 detects the modification and saves the modified dataset 130 to the data store 120. Based on the hash function associated with the rows of the dataset 130, the BI application 110 generates another database index, for example a second database index, corresponding to the row including modified dataset 130. The second database index is stored in the data field 140 of the technical column corresponding to the row including the modified dataset 130. Based on the second index stored in the technical column, the row including the modified dataset can be determined.
In another embodiment, a framework is generated to determine a row with transformed dataset. The framework includes a mechanism to retrieve a tabular data from a data store. The rows associated with the tabular data displayed on a computer generated user interface are determined. A first database index corresponding to the rows of the tabular data is generated and stored in a generated technical column residing in the data store. A modification on the dataset is received on the row of the tabular data. A second database index corresponding to the row including the modified data is generated and the corresponding technical column is updated with the second database index. Based on the second database index stored in the technical column, the identification framework is generated to identify the row with the modified data.
In an embodiment, the BI system 300 includes a processor 302 and a memory device 304 communicatively coupled to a data store 316 over a network (not shown). The BI system 300 includes business intelligence (BI) engine 306, a visualization engine 310, an indexing module 308, a forecasting module 312, and a reporting module 314 configured to work in conjunction with each other.
In an embodiment, the BI system 300 includes multiple interfaces that provide a diverse set of functionalities, as explained in the detailed description of
In an embodiment, the attribute values displayed on the second UI may be manipulated or modified in real time. The second UI of BI system 300 can receive the user input to modify the attribute values. The BI engine 306 of the system 300 identifies or determines the modified attribute value and saves the modified dataset in the data store 316. A process or a sequence of steps, herein referred as “transformation” is executed by the BI system 300 to identify the modified attribute value. The process of transformation includes determining the modified attribute value corresponding to the row; saving the modified attribute value representing the modified data in the data store 316; retrieving the dataset including modified attribute value from the data store 316; and refreshing or reloading the displayed data on the second UI to include the modified dataset. The row including the modified data may be dynamically repositioned on the second UI. For example, the second UI displays the dataset including the attribute values represented in ten rows. When the user modifies the attribute value corresponding to a third row, based on this modification, the BI system 300 executes the process of transformation. The second UI displaying the dataset includes the row with the modified attribute value and repositioned to represent an eighth row.
In an embodiment, the indexing module 308 implements a function, for example, a hash function to generate database indices corresponding to the rows of the dataset. The hash function is associated with the rows of the dataset and generates unique database indices, for example a first database index associated with each row of the dataset. The BI system generates a column, for example, a technical column associated with the dataset and stores the first database indices in the technical column.
In an embodiment, the hash function generates another database index, for example a second database index corresponding to the row including modified data. Each second database index is unique and provides an indication that the dataset or the attribute value in the corresponding row has been modified. The generated second database index is stored in the field of the technical column associated with the modified attribute value. For example, for the dataset displayed on the second UI, the hash function generates the first database index value, referenced as ‘13’ corresponding to the third row and stores the first database index value in the technical column in the data store. Upon modifying the attribute value corresponding to the third row, the hash function generates the second database index value, referenced as ‘131’ and updates the associated field in the technical column with the second database index value. Hence the field in the technical column corresponding to the third row will include the second database index value ‘131’, indicating that the dataset or the attribute value corresponding to the third row is modified. The row including the modified data is determined by identifying or determining the second database index stored in the technical column.
In an embodiment, consider an instance of the user modifying more than one attribute value corresponding to a row. For each instance of the modified attribute value corresponding to the row, a new index is regenerated and the corresponding field in the technical column is updated with the new index. For example, consider a dataset displayed on the second UI including five attributes represented by the columns C1, C2, C3, C4 and C5. These five attributes include values represented by the rows R1, R2, R3, R4, R5, R6, etc. The indexing module 308 of the BI system 300 generates unique first indices corresponding to the rows R1-R6 and stores the first indices in the technical column residing in the data store 316. On the displayed dataset, consider a user modifying an attribute value corresponding to the row R4 and the column C3. The indexing module 308 of the BI system 300 generates a unique second database index corresponding to the row R4; updates the corresponding field of the technical column in the data store 316 with the second database index value; and executes the process of transformation. Subsequently, consider the user modifying the attribute value corresponding to the row R4 and the column C2. The indexing module 308 of the BI system 300 generates a unique third database index corresponding to the row R4; updates the corresponding field in the technical column in the data store 316 with the third database index value; and executes the process of transformation.
In an embodiment, for each instance the modified data corresponding to the row, the indexing module 308 of the BI system 300 generates a unique database index; updates the corresponding field of the technical column in the data store 316; and executes the process of transformation. Based on the unique database index stored in the technical column, the row including modified data is determined. In an embodiment, modifying the dataset includes updating or modifying the attribute value of the dataset in the cells corresponding to the rows, deleting the attribute values of the dataset in the rows, deleting the rows, inserting new attribute values in the dataset, inserting one or more rows, or the like.
In an embodiment, the visualization engine 310 is configured to generate visualizations including graphical illustrations based on the processing and analysis of the dataset; and customizing the row including the modified dataset with a special icon or visual indicia to indicate that the corresponding row includes modified data. The visual indicia include, for example, highlighting the row including modified data; changing the font corresponding to the row including modified data, and the like. The forecasting module 312 is configured to generate forecasting information including graphical illustrations. The forecasting module 312 includes functions, algorithms, routines, procedures, statistical models, mathematical models, or the like, related business intelligence, artificial intelligence, etc. The forecasting module 312 generates forecasting reports based on the dataset associated with the enterprises. The reporting module 314 is configured to generate reports based on the analysis and processing of the dataset associated with the enterprise.
Some embodiments may include the above-described methods being written as one or more software components. These components, and the functionality associated with each, may be used by client, server, distributed, or peer computer systems. These components may be written in a computer language corresponding to one or more programming languages such as, functional, declarative, procedural, object-oriented, lower level languages and the like. They may be linked to other components via various application programming interfaces and then compiled into one complete application for a server or a client. Alternatively, the components maybe implemented in server and client applications. Further, these components may be linked together via various distributed programming protocols. Some example embodiments may include remote procedure calls being used to implement one or more of these components across a distributed programming environment. For example, a logic level may reside on a first computer system that is remotely located from a second computer system containing an interface level (e.g., a graphical user interface). These first and second computer systems can be configured in a server-client, peer-to-peer, or some other configuration. The clients can vary in complexity from mobile and handheld devices, to thin clients and on to thick clients or even other servers.
The above-illustrated software components are tangibly stored on a computer readable storage medium as instructions. The term “computer readable storage medium” should be taken to include a single medium or multiple media that stores one or more sets of instructions. The term “computer readable storage medium” should be taken to include any physical article that is capable of undergoing a set of physical changes to physically store, encode, or otherwise carry a set of instructions for execution by a computer system which causes the computer system to perform any of the methods or process steps described, represented, or illustrated herein. A computer readable storage medium may be a tangible computer readable storage medium. A computer readable storage medium may be a non-transitory computer readable storage medium. Examples of a non-transitory computer readable storage media include, but are not limited to: magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs, DVDs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store and execute, such as application-specific integrated circuits (“ASICs”), programmable logic devices (“PLDs”) and ROM and RAM devices. Examples of computer readable instructions include machine code, such as produced by a compiler, and files containing higher-level code that are executed by a computer using an interpreter. For example, an embodiment may be implemented using Java, C++, or other object-oriented programming language and development tools. Another embodiment may be implemented in hard-wired circuitry in place of, or in combination with machine readable software instructions.
A data source is an information resource. Data sources include sources of data that enable data storage and retrieval. Data sources may include databases, such as, relational, transactional, hierarchical, multi-dimensional (e.g., OLAP), object oriented databases, and the like. Further data sources include tabular data (e.g., spreadsheets, delimited text files), data tagged with a mark-up language (e.g., XML data), transactional data, unstructured data (e.g., text files, screen scrapings), hierarchical data (e.g., data in a file system, XML data), files, a plurality of reports, and any other data source accessible through an established protocol, such as, Open Data Base Connectivity ODBC), produced by an underlying software system (e.g., ERP system), and the like. Data sources may also include a data source where the data is not tangibly stored or otherwise ephemeral such as data streams, broadcast data, and the like. These data sources can include associated data foundations, semantic layers, management systems, security systems and so on.
In the above description, numerous specific details are set forth to provide a thorough understanding of embodiments. One skilled in the relevant art will recognize, however that the embodiments can be practiced without one or more of the specific details or with other methods, components, techniques, etc, in other instances, well-known operations or structures are not shown or described in details.
Although the processes illustrated and described herein include series of steps, it will be appreciated that the different embodiments are not limited by the illustrated ordering of steps, as some steps may occur in different orders, some concurrently with other steps apart from that shown and described herein. In addition, not all illustrated steps may be required to implement a methodology in accordance with the one or more embodiments. Moreover, it will be appreciated that the processes may be implemented in association with the apparatus and systems illustrated and described herein as well as in association with other systems not illustrated.
The above descriptions and illustrations of embodiments, including what is described in the Abstract, is not intended to be exhaustive or to limit the one or more embodiments to the precise forms disclosed. While specific embodiments of, and examples for, the one or more embodiments are described herein for illustrative purposes, various equivalent modifications are possible within the scope, as those skilled in the relevant art will recognize. These modifications can be made in light of the above detailed description. Rather, the scope is to be determined by the following claims, which are to be interpreted in accordance with established doctrines of claim construction.