Logic operations may include operations that manipulate Boolean (i.e., true/false) values. However, analysis of complex sets of data may require stringing and nesting complicated logic operations. Moreover, a user may face difficulty generating complex operations and visualizing the data underlying these operations.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the present disclosure and, together with the description, further serve to explain the principles of the disclosure and to enable a person skilled in the relevant art to make and use the disclosure.
Forming logic operations on datasets and combinations of datasets can be an arduous, time and labor consuming process. A person performing such operations is expected to have a deep understanding of Boolean algebra or other mathematic logic, as well as an extensive understanding of all datum underlying the operation.
In many fields involving data arranged in diverse datasets, logic operations may be performed on large and complex arrangements of datasets. Such fields may include, but are not limited to, network management, advertisement delivery, analytics and data research, statistics, risk and actuarial management, and the like. Respective groups of data may be required to be combined and fragmented and recombined with each other in a very complicated manner.
The application of Boolean logic combined with set theory may involve both complicated arrangements of sets, and members thereof (in unions, intersections, complements or other dispositions) and complex logical formulations. Such problems can be exacerbated by a person having technical unfamiliarity with these applications and/or the data underlying intended operations. The present invention provides a Data Deduplication System to provide a user-friendly system and process of extracting, combining and/or deduplicating members of sets, including complex sets of data and convoluted combinations thereof.
For example, an advertisement campaign may require an audience selection process that is timely, accurate and precise to get the most return on the advertisement budget. A media buyer may wish to select consumer groups for targeting in an extensive but timely manner. However, the decision maker may not possess the technical or informational ability to complete this task. On the other hand, a technician proficient in Boolean logic may not have a prescient understanding of the consumer characteristics and group attributes that constitute available data.
It is increasingly difficult to reach consumers that relate uniformly by a generalized advertisement campaign. Demand is surging for interactive and/or programmatic ways to selectively reach consumers through advertisements and content. Therefore, it is essential for advertisers, media providers and others to discern, in or near real time, trends among consumers and viewers, including behavioral targeting and viewership tendencies. However, the decision maker, technician or other person involved in the campaign encounters the above-mentioned difficulties. As a result, systems for performing such logic and assembling data may be unsatisfactory and untimely for a campaign that requires large and complicated combinations of data.
Likewise, discerning concerns or trends in network performance and stability may require management of numerous systems or devices that have varying attributes. It may be difficult to express a logic operation on large and complex combinations over such data, particularly in a manner timely enough to ensure a stable network.
The systems and methods disclosed herein provide a user with near instantaneous visualization and de-duplication of data that may potentially involve large sets and/or complex combinations. De-duplication refers to a process of avoiding or removing duplicate or redundant members from a dataset. The data de-duplication system (DDDS) permits a user to perform combinations and de-duplication without necessarily possessing either a technical understanding of logic mathematics or a deep knowledge of the underlying data attributes. As a result, the user can acquire and utilize datasets having intended attributes and potentially real-time information with minimal delay and skill.
Provided herein are system, apparatus, article of manufacture, method, and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for providing a data de-duplication system, and components and applications thereof. In some embodiments, the data de-duplication system includes a method for generating a query based on a population selected from a graphical user interface.
In some embodiments, the data de-duplication system may include the graphical user interface presented to a user on a client device. The graphical user interface may present a visualization of available data on a client device. The graphical user interface may present multiple datasets from multiple data collections and/or at least one subset of members of the multiple datasets, the multiple datasets and the at least one subset being selectable by a user of the DDDS through the graphical user interface.
In some embodiments, a graphical user interface of the DDDS may include a visualization of the available datasets in a Euler or Venn diagram. A user may select components of interest from the Euler or Venn diagram, including datasets, unions, intersections, complements and/or other parts thereof.
In some embodiments, the DDDS may include a method to provide a query results set to the user, the query results set including the query results. The query results set may be presented, for example, through the graphical user interface, as a separate delivery, or both. Alternatively or additionally, the query results set may be implemented as one dataset presented in the graphical user interface in a continued iteration of the DDDS.
In some embodiments, the query results set may alternatively or additionally be provided to a content delivery system that enables content delivery to members of one or more selected populations. In some embodiments, a content delivery system may enable delivery of advertisements to members of one or more selected populations.
In some embodiments, a query may be generated based on selected populations utilizing a logic interpreter. In a non-limiting example, a logic interpreter may translate the user's selections to a suitable means for querying data, such as a Boolean expression, SQL instruction or the like.
In some embodiments, a logic interpreter may be implemented by programmatic means, such as a software module, including software including a mathematical table, truth table, or the like. In some embodiments, a logic interpreter may be implemented by physical hardware, such as a microcontroller, FPGA, ASIC or other suitable device. In some embodiments, a logic interpreter may include both software and hardware.
In a non-limiting example, a data de-duplication system may deduplicate and provide data selected by a user from datasets, parts and/or combinations thereof, including audience members.
Further embodiments, features, and advantages of the disclosure, as well as the structure and operation of the various embodiments, are described in detail below with reference to accompanying drawings.
In the detailed description herein, references to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
The foregoing description of specific embodiments will so fully reveal the general nature of the disclosure that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present disclosure. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
In data analytics and processing, decision making processes may be performed by parties that do not have sophisticated technical knowledge to assemble precise datasets and/or logic operations. Therefore, the present disclosure provides DDDS 100 that can include graphical user interface 110 and logic interpreter 112 to mitigate capabilities of a user 102 in assembling data queries potentially requiring complex logic or combinations of datasets.
More specifically, database 116 may be a single database or distributed at multiple sites, and may include a centralized data structure that stores data in one or more logical databases. DDDS 100 may first use a database query engine 114, which may execute an application, such as a structured query language (SQL) application to provide an example, to process a query 126.
Thereby, GUI 110 can include components having datasets populated by database query engine 114 from database 116. These datasets may be available to a user 102 as components (also referred to herein as “zones”) of GUI 110. Datasets may include mutually exclusive data members, or may include data members common to more than one available dataset. GUI 110 may further include, as components or zones, subsets of the available datasets, including subsets common to more than one available dataset (e.g., an intersection of two or more available datasets). In some embodiments, described in more detail below, datasets and subsets thereof may be disposed in a Euler or Venn diagram, for example. Through GUI 110, user 102 may select dataset components of interest 122 from the available components. These dataset components of interest 122 can be input to logic interpreter 112 by the user through GUI 110.
Logic interpreter 112 can translate the selected dataset components (or zones) of interest 122 into a logic expression 123 that may be resolved by database query engine 114. For example, logic interpreter 112 may include a table, such as a lookup table, which may be used to interpret attributes of each of selected dataset components of interest as a Boolean operation or other logic expression 123. Logic interpreter 112 can be configured to assemble logic expression 123 that may perform data de-duplication. As described in more detail below, GUI 110 can present multiple datasets and zones.
Database query engine 114 can process logic expression 123 to determine terms of query 126. The query results 124 may then be provided to the user 102 through GUI 110. The query results 124 can be used to populate datasets available to user 102 as components of GUI 110, as described above. In this case, query results 124 obtained based on each logic expression 123 can be used to re-populate some or all datasets available to a user 102 as components of GUI 110. Additionally, the query results can be provided to user 102 through GUI 110, not as components but as some other media that may be accessed by user 102 (such as a retrievable file or export).
Thus, DDDS 100 can permit user 102 to perform complex data operations, including querying, merging, and de-duplication of data, in a straightforward manner with minimal technical knowledge of dataset or logic assembly.
A user may then select components of interest among the available components (such components potentially including datasets, unions, intersections, complements and other parts thereof, as described above). The selected components of interest are input to a logic interpreter at operation 206. The logic interpreter parses or otherwise translates the input, including the selected components of interest, to generate a logic expression such as a Boolean or SQL operation, for example, at operation 208.
After a suitable logic expression is generated, a query based on the logic expression is submitted to the database at operation to retrieve query results at operation 210. The query results are presented to the user, through the GUI, as a communication separate from the GUI, or both, in operation 212. Furthermore, upon generating query results, one or more datasets may be populated or repopulated from the query results in operation 202.
As described in further detail below, a GUI may comprise datasets disposed in the form of a Euler or Venn diagram. Additional manners of arranging datasets in a GUI may be apparent to persons having ordinary skill in the art and are within the scope of this disclosure. Such additional arrangements may include, for example, disposing datasets for visualization and selection in a Kernaugh map, Carroll diagram, Hasse diagram, or the like.
B. Visualizing and De-Duplicating Data with a Euler or Venn Diagram
For example,
As described throughout this application, a user may select from any individual zone or combination of zones among datasets 303a, 303b, 303c disposed in GUI 300.
With reference to
As described above, a user may select from the available components and submit the selection as an input. A logic interpreter, such as logic interpreter 112, can translate the user's selection into a suitable logic expression such as a Boolean operation, SQL expression, or the like. Here, the selection of zones 3, 4 and 6 may be translated by logic interpreter as logic expression 423, “303C NOT IN (303A AND 303B AND 303C).” That is, a logic interpreter can translate the user's selection as a logic instruction to include all members of dataset 303C that are not included in the intersection of all datasets.
In some embodiments, the GUI 300 may include datasets and components thereof that are sized according to the member population, e.g., an intersection having many members may be presented large relative to an intersection having less members to optimize data visualization.
In
However, also for ease of use and visualization, some embodiments may limit datasets available through a GUI to a maximum of three. Otherwise, the number of zones including unions and intersections may potentially increase at an exponential rate. Even where a GUI may contain a limit on available datasets (such as two or three), additional datasets may be further processed by continuing iterations of the above described systems and methods, according to some embodiments and as described in further detail below.
For example, as shown in
In a second iteration shown in
Accordingly, the user can perform operations on potentially complex combinations of data nearly instantaneously without possessing technical knowledge on assembling datasets or Boolean expressions that may otherwise be necessary.
As described throughout this disclosure, DDDS 600 may present datasets on GUI 610, which may include mutually exclusive data members, or may include data members common to more than one dataset. The datasets retrieved and presented on GUI 610 may correspond with datasets 501 to 503 of
In the context of network management, datasets retrieved from database 616 may comprise unique identifiers corresponding to equipment, such as computer terminals, in a non-limiting example. Such datasets may respectively also comprise information such as system error information, operating system (OS) version, and processor speed. Thus, in GUI 610, datasets 501 to 503 may include information relating to network equipment and user 602 may be a person involved in the network management process, such as a network specialist.
In this example, dataset 501 may comprise members that have encountered a specific system error in the last thirty days. Dataset 502 may comprise members having a processor speed below some threshold value. Dataset 503 may comprise members that each have an OS missing some update or patch. User 602 may select components of interest through GUI 610. As shown in
The selected components of interest 504 can be input to a logic interpreter 612. The logic interpreter translates the selected components of interest 504 to generate a logic expression such as a Boolean or SQL operation for processing by database query engine 614. Thereafter, the de-duplicated query results can be provided to the user though the GUI 610 or as a separate delivery, such as a file export. This content may comprise the unique identifiers constituting the query results, a count of the unique identifiers, or both. In this example, the de-duplicated data may be delivered to a content delivery system 620 for delivery to targets (which may be the computer terminals of interest, for example) 630. Thus, content targeting server 622 can receive the de-duplicated list of unique identifiers. The content targeting server can then retrieve the content intended for the targets from content database 626 and deliver it to targets 630. In this example, the content may include a patch or update to be delivered to computer terminals of interest.
Although DDDS 600 and content delivery system 620 are shown as separate systems, these elements may constitute integrated members of a single system and/or device. For example, components of DDDS 600 and content delivery system 620 may be disposed in a single device, separate devices or other suitable arrangement.
An essential element of any advertising campaign is targeting it accurately and with precision. A process of directing advertisements to consumers may begin with deciding what audience a brand, advertiser or other party wishes to reach. A decision maker may wish to target specific audiences based on demographics. Demographics provides information on specific segments of the populace. This information may be nearly limitless and perpetually increasing. For example, demographic information may provide insights into consumers by informing the decision maker about their age, gender, marital status, home ownership, geographic location, political affiliation, and the like. Demographic information may include media content (e.g., television content, internet content, or the like) that a populace has viewed.
For example, a beverage brand may want to target consumers that include men who love sports and also women, two mutually exclusive groups. In some cases, the intended targets may be mutually exclusive, in other cases the target groups may have common members, requiring such audience lists to be combined and de-duplicated. Thus, the available data in targeting an advertising campaign can be expansive and complex. Generally, identifying lists of consumers may require a decision maker or other individual to understand Boolean algebra and have the ability to assemble datasets directed to target groups. A person involved in the process may not possess the technical knowledge to assemble data or combine logic in a meaningful way sufficient to identify an audience.
As shown in
In an example, the datasets retrieved and presented on GUI 710 may correspond with datasets 501 to 503 of
In this example, dataset 501 may comprise members who watched the season premiere of television show X. Dataset 502 may comprise members that have watched television network Y in the last thirty days. Dataset 503 may comprise women. User 702 may select components of interest through GUI 710. As described in other embodiments, and shown in
The selected components of interest 504 can be input to and translated by a logic interpreter 712. The resulting logic expression may be processed by database query engine 714 and the combined, de-duplicated audience can be provided to user 702 though the GUI 710 or separate delivery. This content may comprise the unique identifiers constituting the query results, a count of the unique identifiers, or both and may then be provided, either directly or by the user 702 to advertisement delivery system 720. Alternatively, as described above user 702 may process components 504 in one or more additional iterations with other datasets, for further combining and de-duplication as described above.
In this example, the de-duplicated data may be delivered to an advertisement delivery system 720 for delivery to targets 730. For example, targets 730 may be content viewing devices, such as televisions or web browsers, for example. Thus, advertisement targeting server 722 can receive the de-duplicated list of unique identifiers. The advertisement targeting server can then retrieve the advertisement intended for the targets from an advertisement database 726 and deliver it to targets 730. In this example, the content may include an advertisement for a certain product or brand to be delivered to target audiences.
Optionally, DDDS 700 may be configured to communicate with one or more audience event producers 718, which may collect and aggregate audience events to be ingested to audience database 716. Such information may be collected and aggregated by a service provider, for example. The one or more audience event producers 718 communicate audience events to audience database 716, for example, in real time, on demand, or in accordance with some other suitable timing. In some embodiments, datasets and query results sets can be populated from real-time audience event data, such as television shows or networks viewed. Therefore, applications of DDDS 700 in some embodiments can include delivery of targeted advertisements based on data ingested in real time.
As described above, DDDS 700, advertisement delivery system 720 and audience event producers 718 are shown as separate systems but may be disposed in a single device or system. For example, components of DDDS 700, advertisement delivery system 720 and audience event producers 718 may be disposed in a single device, separate devices or other suitable arrangement.
Computer system 800 includes one or more processors, such as processor 804. Processor 804 may comprise suitable logic, circuitry, dedicated circuits, and/or code that may enable processing data and/or controlling operations of computer system 800. Processor 804 can be a special purpose or a general purpose processor. Processor 804 may be connected to a communication infrastructure 806 (for example, a bus or network). Processor 804 may be enabled to provide control signals to the various other portions of computer system 800 via communication infrastructure 806, for example.
Computer system 800 also includes a main memory 808, and may also include a secondary memory 809. Secondary memory 809 may include, for example, a hard disk drive 812, a removable storage drive 814, and/or a memory stick. Removable storage drive 814 may comprise a floppy disk drive, a magnetic tape drive, an optical disk drive, a flash memory, or the like. The removable storage drive 814 reads from and/or writes to a removable storage unit 815 in a well-known manner. Removable storage unit 815 may comprise a floppy disk, magnetic tape, optical disk, etc. that is read by and written to by removable storage drive 814. As will be appreciated by persons skilled in the relevant art(s), removable storage unit 815 includes a computer usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 809 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 800. Such means may include, for example, a removable storage unit 817 and an interface 816. Examples of such means may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 817 and interfaces 816 that allow software and data to be transferred from the removable storage unit 817 to computer system 800.
Computer system 800 may also include a communications interface 824. Communications interface 824 allows software and data to be transferred between computer system 800 and external devices (such as a server of one or more embodiments as described above) 820. Communications interface 824 may include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, or the like. Software and data transferred via communications interface 824 are in the form of signals that may be electronic, electromagnetic, optical, or other signals capable of being received by communications interface 824. These signals are provided to communications interface 824 via a communications path 810. Computer system 800 may be configured to enable communications between computer system 800 and external devices such as server 820 (such as an advertisement delivery server, for example).
In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to media such as removable storage unit 815, removable storage unit 817, and a hard disk installed in hard disk drive 812. Computer program medium and computer usable medium can also refer to memories, such as main memory 808 and secondary memory 809, which can be memory semiconductors (e.g., DRAMs, etc.). These computer program products are means for providing software to computer system 800.
Computer programs (also called computer control logic) are stored in main memory 808 and/or secondary memory 809. Computer programs may also be received via communications interface 824. Such computer programs, when executed, enable computer system 800 to implement the embodiments as discussed herein. In particular, the computer programs, when executed, enable processor 804 to implement the disclosed processes, such as the steps in the method 200 of
Embodiments are also directed to computer program products including software stored on any non-transitory computer useable medium. Such software, when executed in one or more data processing device, causes a data processing device(s) to operate as described herein. Embodiments employ any computer useable or readable medium, known now or in the future. Examples of computer useable mediums include, but are not limited to, primary storage devices (e.g., any type of random access memory), secondary storage devices (e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage device, etc.), and communication mediums (e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.).
It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections, is intended to be used to interpret the claims. The Summary and Abstract sections may set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit the embodiments and the appended claims in any way.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the embodiments and, together with the description, further serve to explain the principles of the embodiments and to enable a person skilled in the pertinent art to make and use the embodiments. The embodiments will be described with reference to the accompanying drawings. Generally, the drawing in which an element first appears is typically indicated by the leftmost digit(s) in the corresponding reference number.
The breadth and scope of the present invention should not be limited by any of the above-described embodiments, but should be defined only in accordance with the following claims and their equivalents.
Number | Date | Country | |
---|---|---|---|
62697904 | Jul 2018 | US |