A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Today, many organizations manage large amounts of data. For example, companies may have data about the customers for which the company sells goods or services. This customer information can help the company provide further services or sell additional goods. However, data generally becomes outdated. Customers move or change phone numbers causing the data in the company's database to become incorrect. This trend of data becoming outdated over time is referred to as the data becoming stale or the data decaying.
Organizations and companies with large databases understand that stale data permeates the databases that the organization uses. However, organizations often do not know the severity of the staleness or which items of data require updating. Thus, organizations often make decisions based upon stale data and, sometimes, those decisions are incorrect because the foundations of the decisions, the stale data, are incorrect.
It is in light of these and other considerations that the present application is being presented.
Embodiments presented herein provide systems and methods for managing data decay. A system is provided for maintaining metadata about data attributes or relationships between data. A data decay engine can read the metadata and perform a decay calculation. The type of decay calculation can be associated with the type of data or be determined from user inputs. The decay engine, provides a score as to the staleness of the data. An update engine can determine specific data attributes that may require updating. The update engine may be able to update the data from external data sources.
This Summary is offered to provide a simplified description of one or more embodiments. This Summary is not meant to limit the scope of the embodiments. Rather, the possible embodiments are as defined by the claims attached herein.
Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:
Systems and methods in accordance with various embodiments overcome the aforementioned and other deficiencies in the processes and systems for managing data decay in a database. The following description includes some possible embodiments. However, one skilled in the art will recognize that the invention is not limited to the embodiments disclosed herein. Rather, the possible embodiments are defined by the claim attached hereto. A data within a database becomes outdated. For example, as customers move or change phone numbers, the data in a customer database becomes incorrect because the database includes the previous address or phone number. With thousands or millions of customers, the data within the database constantly becomes outdated. Data that has not been updated for a period of time is referred to as stale data.
The system provides metadata that is associated with one or more data attribute or relationship between one item of data and one or more other items of data. A data attribute may be a characteristic of the data or an item of data. For example, a data attribute may be the date the customer last provided his or her address or may be the customer's address. A decay engine in the database reads the metadata. From the metadata, the decay engine determines how long since the data attribute was updated. By analyzing the data in a predefined calculation, the decay engine can determine if the data is stale. The decay engine can then fold up the attribute decay determinations into a database wide determination of data staleness. For example, the decay engine can determine that any data attribute not updated in the last six months is stale. The decay engine can then determine the number of database attributes that are stale. The number of stale attributes may equate to the staleness of the entire database.
The database updates stale data. For example, if a customer's address is stale, the database accesses a public resource to update the address. The database, for example, accesses a credit reporting agency to determine if the reporting agency has a more recent address. If the address is more recent, the database can read the address information and replace the existing address with the read address from the credit reporting agency. Managing the decay of the data in an organization's database provides the advantage of knowledge of how “up-to-date” a database is. This insight can lead to changes in data collection or refreshment for the organization. Further, the organization can determine the staleness of specific data before using the data in a decision process. Thus, the organization can alleviate faulty assumptions.
A block diagram of a database system 100 that provides a data decay management is shown in
The transactional server 108, 110, or 112 is a computing system, as explained in conjunction with
The one or more transactional servers 108, 110, or 112 are in communication with a Master Data Management (MDM) server 114. The MDM server 114 is a computing system, as explained in conjunction with
The database application executed by the MDM server 114 creates a data object hierarchy, as explained in conjunction with
The MDM database 116 is a database storing organizational-wide information for database users. The MDM database 116 can include any type of data stored in any type of storage configuration (e.g., hierarchical file, flat file, etc.) on a storage medium, as explained in conjunction with
The MDM database 116 can consist of one or more different layers and/or types of objects. For example, the database may consist of one or more data objects in a logic layer and one or more data files in a data layer. The objects in the logic layer provide the logic or methods that allow the database to function. The data layer provides files for storing records or instances of data. For example, one customer's data can be stored in a first data file while another customer's data can be stored in a second data file. The database may also consist of other objects and data, for example, data history objects with associated history data files and integration objects with associated integration data. Data history objects and history data can include data and/or metadata associated about the data. Integration objects and integration data are associated with linking the data in the MDM database 116 and the one or more transactional databases 102, 104, and/or 106
An embodiment of a database system 200 is shown in
The database system 200 comprises a decay engine 204. The decay engine 504204 determines the amount of decay in the database 206. The decay engine 204 can provide information about the data decay to a user interface 202. The user interfaces 202 can include one or more windows rendered on a user interface 202 that is in communication with the decay engine 204. The user interface 202 may be as explained in conjunction with
The database system 200 further includes an update engine 208. The update engine 208 determines if an attribute or item of data can be updated and updates the item of data in the database 206. The update engine 208 can retrieve information from one or more external databases 212 over a network 210. The external databases 212 may be a public database, for example, the State Department of Motor Vehicle, a private database, for example, a credit agency database, or some other database that can be accessed by the database system 200. The network 210 may be a local area network (LAN), wide area network (WAN), the Internet, or some other network. The update engine 208 accesses or receives one or more inputs from the database engine 216. For example, the update engine 208 receives which database attributes to update. In other embodiments, the update engine 208 determines which database data can be updated by querying the database engine 216. A database administrator may determine which database data to update and provide the determinations to the update engine 208. The update engine 208 stores and retrieves these update rules 218.
A database engine 216 is in communication with the decay engine 204 and/or the update engine 208. The database engine 216 receives inputs from the decay engine 204 and/or the update engine 208 to determine data decay and update decayed data. The database engine 216 can retrieve data and/or update data or metadata based on the inputs from the decay engine 204 and/or the update engine 208. The database engine 216 stores the data in the database 206. The database engine 216 also retrieves data from the database 206 to determine data decay or update data.
An embodiment of a data structure 300 is shown in
The data structure 300 contains data associated with data decay. The data 300 can be associated with an item of data or a relationship between two or more items of data. For example, the decay data 300 may be related to the staleness of a customer's address. In another example, the decay data 300 may be related to the relationship between a customer's home phone and the customer's address. The decay data 300 can include one or more of, but is not limited to, a time stamp data field 302, a date stamp data field 304, an out-of-date flag data field 306, an update flag data field 308, and/or a decay metric data field 310. These fields are described hereinafter.
A time stamp data field 302 may include the time of day that a data attribute was stored or updated. The time stamp data field 302 is the hour, minute, and second that a data item was created. The time stamp data field 302 includes a time stamp for when the data was first stored and one or more time stamps for when the data was updated. Thus, the time stamp data field 302 can include a log of time stamps representing a list of changes for the data item. If the data 300 is associated with a relationship between items of data, the time stamp may be the time when any of the data was changed. In another embodiment, the time stamp may be the time for the oldest change for any of the data associated with the relationship.
A date stamp data field 304 may include the day of the year that a data attribute was stored or updated. The date stamp data field 304 is the day of year, e.g., day 125. The date stamp data field 304 includes a date stamp for when the data was first stored and one or more date stamps for when the data was updated. Thus, the date stamp data field 304 can include a log of date stamps representing a list of changes for the data item. If the data 300 is associated with a relationship between items of data, the date stamp may be the date when any of the data was changed. In another embodiment, the date stamp may be the date for the oldest change for any of the data associated with the relationship. With the time stamp 302, the date stamp 304 provides the time history for changes to data. The time stamp 302 and date stamp 304 can be used to determine if the data is stale or decayed.
An out-of-date flag 306 is optional. The out-of-date flag 306 can be a binary data point where if the update engine 208 (
The update flag 308 is also optional. The update flag 308 is set by the either the update engine 208 (
The data 300 also includes a decay metric 310. The decay metric 310 can represent a numeric value for the staleness or decay of the data. The data decay engine 204 (
An embodiments of a method 400 for determining data decay is shown in
A decay engine 205 (
The decay engine 205 (
The decay engine 205 (
The database administrator can create the calculation rule. The calculation rule may then be stored in the decay rules 218 (
The decay engine 205 (
The decay engine 205 (
The decay engine 205 (
An embodiment of a method 500 for updating decayed data is shown in
An update engine 208 (
The update engine 208 (
The update engine 208 (
The out-of-date flag 306 (
Update engine 208 (
The update engine 208 (
The update engine 208 (
In most embodiments, the system 600 includes some type of network 610. The network may can be any type of network familiar to those skilled in the art that can support data communications using any of a variety of commercially-available protocols, including without limitation TCP/IP, SNA, IPX, AppleTalk™, and the like. Merely by way of example, the network 610 can be a LAN, such as an Ethernet network, a Token-Ring network and/or the like; a WAN a virtual network, including without limitation a virtual private network (VPN); the Internet; an intranet; an extranet; a public switched telephone network (PSTN); an infra-red network; a wireless network (e.g., a network operating under any of the IEEE 802.11 suite of protocols, GRPS, GSM, UMTS, EDGE, 2G, 2.9G, 3G, 4G, Wimax, WiFi, CDMA 2000, WCDMA, the Bluetooth protocol known in the art, and/or any other wireless protocol); and/or any combination of these and/or other networks.
The system may also include one or more server computers 602, 604, 606 which can be general purpose computers, specialized server computers (including, merely by way of example, PC servers, UNIX servers, mid-range servers, mainframe computers rack-mounted servers, etc.), server farms, server clusters, or any other appropriate arrangement and/or combination. One or more of the servers (e.g., 606) may be dedicated to running applications, such as a business application, a Web server, application server, etc. Such servers may be used to process requests from user computers 612, 614, 616, 618. The applications can also include any number of applications for controlling access to resources of the servers 602, 604, 606.
The Web server can be running an operating system including any of those discussed above, as well as any commercially-available server operating systems. The Web server can also run any of a variety of server applications and/or mid-tier applications, including HTTP servers, FTP servers, CGI servers, database servers, Java® servers, business applications, and the like. The server(s) also may be one or more computers which can be capable of executing programs or scripts in response to the user computers 612, 614, 616, 618. As one example, a server may execute one or more Web applications. The Web application may be implemented as one or more scripts or programs written in any programming language, such as Java®, C, C# or C++, and/or any scripting language, such as Perl, Python, or TCL, as well as combinations of any programming/scripting languages. The server(s) may also include database servers, including without limitation those commercially available from Oracle®, Microsoft®, Sybase®, IBM® and the like, which can process requests from database clients running on a user computer 612, 614, 616, 618.
The system 600 may also include one or more databases 620. The database(s) 620 may reside in a variety of locations. By way of example, a database 620 may reside on a storage medium local to (and/or resident in) one or more of the computers 602, 604, 606, 612, 614, 616, 618. Alternatively, it may be remote from any or all of the computers 602, 604, 606, 612, 614, 616, 618, and/or in communication (e.g., via the network 610) with one or more of these. In a particular set of embodiments, the database 620 may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers 602, 604, 606, 612, 614, 616, 618 may be stored locally on the respective computer and/or remotely, as appropriate. In one set of embodiments, the database 620 may be a relational database, such as Oracle® 10g, that is adapted to store, update, and retrieve data in response to SQL-formatted commands.
The computer system 700 may additionally include a computer-readable storage media reader 712, a communications system 714 (e.g., a modem, a network card (wireless or wired), an infra-red communication device, etc.), and working memory 718, which may include RAM and ROM devices as described above. In some embodiments, the computer system 700 may also include a processing acceleration unit 716, which can include a digital signal processor (DSP), a special-purpose processor, and/or the like.
The computer-readable storage media reader 712 can further be connected to a computer-readable storage medium 710, together (and, optionally, in combination with storage device(s) 708) comprehensively representing remote, local, fixed, and/or removable storage devices plus storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information. The communications system 714 may permit data to be exchanged with the network and/or any other computer described above with respect to the system 700.
The computer system 700 may also comprise software elements, shown as being currently located within a working memory 718, including an operating system 720 and/or other code 722, such as an application program (which may be a client application, Web browser, mid-tier application, RDBMS, etc.). It should be appreciated that alternate embodiments of a computer system 700 may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets), or both. Further, connection to other computing devices such as network input/output devices may be employed.
Storage media and computer-readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information, such as, computer-readable instructions, data structures, program modules, or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, data signals, data transmissions, or any other medium which can be used to store or transmit the desired information and which can be accessed by the computer. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.
An exemplary class hierarchy 800 for an embodiment of software for effectuating the decay engine 204 (
Embodiments presented herein have several advantages. Namely, the decay engine 204 (
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims.
Number | Name | Date | Kind |
---|---|---|---|
4780845 | Threewitt | Oct 1988 | A |
5572628 | Denker et al. | Nov 1996 | A |
5699246 | Plasek et al. | Dec 1997 | A |
5880830 | Schechter | Mar 1999 | A |
5890115 | Cole | Mar 1999 | A |
6052185 | Banet et al. | Apr 2000 | A |
6158381 | Bray | Dec 2000 | A |
6185385 | Mestha et al. | Feb 2001 | B1 |
6203993 | Shuber et al. | Mar 2001 | B1 |
6300077 | Shuber et al. | Oct 2001 | B1 |
6507802 | Payton et al. | Jan 2003 | B1 |
6575751 | Lehmann et al. | Jun 2003 | B1 |
6612842 | Wen et al. | Sep 2003 | B2 |
6616051 | Zidon | Sep 2003 | B1 |
6681247 | Payton | Jan 2004 | B1 |
6928391 | Fujiyama et al. | Aug 2005 | B2 |
7136856 | Birbo et al. | Nov 2006 | B2 |
7190163 | Rajagopalan et al. | Mar 2007 | B2 |
7216046 | Agoston et al. | May 2007 | B2 |
7505969 | Musgrove et al. | Mar 2009 | B2 |
7512612 | Akella et al. | Mar 2009 | B1 |
7539697 | Akella et al. | May 2009 | B1 |
7626537 | Andrusiak et al. | Dec 2009 | B2 |
8107328 | Liu et al. | Jan 2012 | B1 |
8165881 | Kirsch et al. | Apr 2012 | B2 |
8166032 | Sommer et al. | Apr 2012 | B2 |
8176186 | McCanne et al. | May 2012 | B2 |
8267311 | Cleary et al. | Sep 2012 | B2 |
8321317 | Hansen | Nov 2012 | B2 |
20020199193 | Gogoi et al. | Dec 2002 | A1 |
20030135490 | Barrett et al. | Jul 2003 | A1 |
20030191606 | Fujiyama et al. | Oct 2003 | A1 |
20040186673 | Agoston et al. | Sep 2004 | A1 |
20050221401 | Nomura et al. | Oct 2005 | A1 |
20060020424 | Quindel | Jan 2006 | A1 |
20060265435 | Denissov | Nov 2006 | A1 |
20080320151 | McCanne et al. | Dec 2008 | A1 |
20090051583 | Andrusiak et al. | Feb 2009 | A1 |
20090106178 | Chu | Apr 2009 | A1 |
20090172058 | Cormode et al. | Jul 2009 | A1 |
20090327347 | Hoang et al. | Dec 2009 | A1 |
20100057464 | Kirsch et al. | Mar 2010 | A1 |
20100070700 | Borst et al. | Mar 2010 | A1 |
20100114836 | Chan et al. | May 2010 | A1 |
20100138370 | Wu et al. | Jun 2010 | A1 |
20100262454 | Sommer et al. | Oct 2010 | A1 |
20110006110 | Cleary et al. | Jan 2011 | A1 |
20110052423 | Gambier et al. | Mar 2011 | A1 |
20120066262 | Greenberg | Mar 2012 | A1 |
20120095995 | Aravamudan et al. | Apr 2012 | A1 |
20120144280 | Deng et al. | Jun 2012 | A1 |
20120197965 | McCanne et al. | Aug 2012 | A1 |
20120221563 | De et al. | Aug 2012 | A1 |
20120233056 | Hansen | Sep 2012 | A1 |
20130001299 | Cleary et al. | Jan 2013 | A1 |
Number | Date | Country | |
---|---|---|---|
20100114836 A1 | May 2010 | US |