The present disclosure relates generally to data analytics, and more particularly, but not exclusively to a method and a system for generation of context aware insights using semantic web, deep learning model and domain specific knowledge base for a business user to derive insights from visualizations such as graphs.
Traditionally, organizations employ several Business Intelligence (BI) computer software solutions and data visualization tools to extract insights for critical operations by utilizing the various features of analytics and reporting functionalities integrated within such BI software solutions. There exists several kinds of BI computer software solutions comprising a variety of dashboards with different types of visualizations that may display a plethora of parameters related to an organization such as the status of business analytics metrics, key performance indicators (KPIs) and important data points for an organization, department, team or process on a single screen.
However, visualizations displayed through BI software solutions are complex and at many instances difficult to interpret. The interpretation is a manual process which may create a scope for mistakes in analyzing the analyzed data. Moreover, interpretation of such visualizations may only be possible by professionals having subject matter expertise such as data analysts, business analysts, data scientists, etc. This results in a lot of time being spent by the decision makers of the organizations to interpret data rather than focusing upon the planning strategies to improve operation, sales of a team or an organization.
Essentially, it is important to select the most appropriate visualization method for a given data set with the right context. Often the business analysts have to work with data that come from unknown domains wherein the lack of domain knowledge is a prime reason for incorporating either inappropriate or non-optimal visualization techniques. Domain experts can easily recommend commonly used best visualization types for a given data set in that domain. However, availability of a domain expert in every data analysis project cannot be guaranteed.
Currently, several techniques exist to solve the afore-mentioned problem of complex visualizations and its interpretations along with generating context aware insights by automatically generating summary of the visualizations using machine learning and deep learning techniques. Systems and applications that act on or change their behavior based on perceived context aspects are context-aware. Thus, these systems are aware of their environment and can automatically react to changes. Such context aware systems are built utilizing rule-based, statistical and template-based methods of machine learning techniques. For example, the rule-based system utilizes domain dependent rules to manipulate different stores of data to generate a “natural” sounding text. Moreover, the statistical Natural Language Generation (NLG) system bypasses extensive rule construction by using corpus data to “learn” the set of rules and creates alternative generations of natural language text from the statistical rules and then chooses the best alternative at a given point in a generated discourse which is governed by a decision model. On the other hand, the template-based NLG system creates a template where empty slots are replaced by specific information.
U.S. Ser. No. 10/366,167 (US'167) discloses a system and a method for generating a contextual summary of one or more charts. The system comprises a summary generating system capable of extracting chart data associated with each chart received from one or more sources and determining context of the chart data. The summary generating system computes statistical data of each chart by analyzing chart data based on predefined rules corresponding to the context. The form of analysis to be performed depends on the context of the chart data. Furthermore, insights of each chart are generated by mapping the statistical data with predefined narratives corresponding to the context. The summary generating system, automatically generates the contextual summary of the charts corresponding to the context of the chart data in a predefined template format using the generated insights of each of the one or more charts. The contextual summary provides holistic information of the interpreted charts. However, US'167 fails to disclose a concept of deep learning paraphrase model for generating summary of the charts.
U.S. Pat. No. 9,529,795 (US'795) discloses a method of receiving a corpus comprising a set of pre-segmented texts. The method further includes creating a plurality of modified pre-segmented texts for the set of pre-segmented texts by extracting a set of semantic terms for each pre-segmented text within the set of pre-segmented texts and applying at least one domain tag for each pre-segmented text within the set of pre-segmented texts. The method further includes clustering the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the pluralities of modified pre-segmented texts. However, US'795 fails to mention the concept of parsing graph data.
U.S. Pat. No. 9,405,448 (US'448) provides a method and a system for generating and annotating a graph. The method discloses a concept of determining one or more key patterns in a primary data channel, wherein the primary data channel is derived from raw input data in response to a constraint being satisfied. A method may further include determining one or more significant patterns in one or more related data channels. A method may further include generating a natural language annotation for at least one of the one or more key patterns or the one or more significant patterns. A method may further include generating a graph that is configured to be displayed in a user interface, the graph having at least a portion of the one or more key patterns, the one or more significant patterns and the natural language annotation. However, US'167 fails to disclose a concept of deep learning paraphrase model for generating summary of the charts.
U.S. Pat. No. 9,396,181 (US'181) discloses a method and a system for natural language generation and data analysis system. The user context is analyzed from user question and converted into SQL query to pull required data subset from data repository. US'181 discloses a concept of analyzing the data and generating natural language sentences/phrases. Moreover, US'181 discloses a concept of ontology comprising object concepts and relationship. However, the ontology concept disclosed in US'181 is different from as disclosed in the present invention.
However, the above mentioned state-of-the-art techniques and methods fail to generate context aware insights using semantic web, deep learning model and domain specific knowledge base for the business user to derive actionable insights from visualizations which may result in generating inaccurate or inefficient insights due to the number of complexities involved in the utilized machine learning techniques.
Thus, there arises a need for an automated and simpler method and system for automatically analyzing the data behind visualizations, interpret the data and graphs using business inputs, ontology structure comprising semantic relationships and deep learning paraphrasing model that may comprise several domain-based actions to help generate narratives that precisely answers business user question, attempts to answer any follow-up questions. This will enable the data scientists to make visualization decisions with limited domain knowledge.
One or more shortcomings of prior art are overcome, and additional advantages are provided through present disclosure. Additional features are realized through techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the present disclosure.
In one aspect of the disclosure, a method for generating a contextual narrative of one or more visualizations is disclosed. The method includes providing an input feed to a processor; processing the input feed based on a set of predefined business rules, wherein the method of generating the contextual narrative further comprises, generating a narrative of a visualization based on the processed input feed, wherein a context is provided to the generated narrative based on a plurality of semantic relationships established in an ontology file obtained from the input feed and at least one search criterion including, but not limited to, one or more filters and one or more aggregation types.
In another aspect of the disclosure, a system for generating a contextual narrative of one or more visualizations is disclosed. The system includes a data input module, wherein the data input module provides an input feed; a processor for processing the input feed based on a set of predefined business rules; and wherein the system for generating the contextual narrative further comprises, a narrative generator module for generating a narrative of a visualization based on the processed input feed, wherein a context is provided to the generated narrative based on a plurality of semantic relationships established in an ontology file obtained from the input feed and at least one search criteria including, but not limited to, one or more filters and one or more aggregation types.
Foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to drawings and following detailed description.
In following detailed description of embodiments of present disclosure, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. However, it will be obvious to one skilled in art that the embodiments of the disclosure may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the disclosure.
References in the present disclosure to “one embodiment” or “an embodiment” mean that a feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the disclosure. Appearances of phrase “in one embodiment” in various places in the present disclosure are not necessarily all referring to same embodiment.
In the present disclosure, word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or implementation of present subject matter described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.
The present disclosure may take form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects that may all generally be referred to herein as a ‘system’ or a ‘module’. Further, the present disclosure may take form of a computer program product embodied in a storage device having computer readable program code embodied in a medium.
While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the forms disclosed, but on contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within scope of the disclosure.
Terms such as “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by “comprises . . . a” does not, without more constraints, preclude existence of other elements or additional elements in the system or apparatus.
In following detailed description of the embodiments of the disclosure, reference is made to drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in enough detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
The present disclosure discloses a system and method comprising a Natural Language Generation (NLG) module in a data visualization environment for generating a contextual narrative for visualization e.g. graphs in natural language. The narrative is generated using a composite system comprising business input, ontology structure comprising semantical relationship and a deep learning paraphrase model to express and enable semantics in a personalized manner. The essential element of the disclosure is the system and method for providing context to generated narrative, using ontology structure comprising semantic relationships and search criteria including, but not limited to, filters and types of aggregations.
Further, the output of the insight resolution module 203 is fed into a filter resolution module 207, wherein a filter resolution module 207 produces final set of analysis, and as illustrated at 208 the filter resolution module 207 produces final set of insights/templates 206. These final set of analysis and insights/templates 206 are passed to NLG analytic module 209 for computation, wherein the computation utilizes computation data file present within raw data file 210. The final results of analysis and final templates 211 are fed into the language generation module 212 for narrative 217 generation. The language generation module 212 comprises domain level language resolution 213, data ingestion module 215 and template order identification and language formatting 216. The predefined template structure 214 is utilized by domain level language resolution 213 for analysis.
In one exemplary embodiment
Further, the ontology store 204 depicting the semantic relationships between the attributes in the study will have details regarding these measures and dimensions. Also, it may comprise formatting details like how the sales numbers should look like in the narrative, or alternate usages that should appear instead of the chart labels appearing in the narrative.
In embodiments, the natural language generation business configuration file will include the various sets of analysis that may be triggered for a chart plotting a measure across a categorical dimension. Further, in order to fine tune this set of analysis further, the insight resolution module considers the domain of the data and the semantic relationships associated with the measures and dimensions. Moreover, the knowledge graph can semantically differentiate between measures like “Sales”, “Profit” and categorical dimensions like “Brands”, “Countries”. One of such examples is key player analysis.
The filter resolution module 207 can further trigger benchmark analysis on previous year data. These final set of analysis end in a final set of insights/templates.
Another embodiment of the present invention discloses a language enrichment module. The said generated narrative is fed into various models such as plural/singular check, lexical entailment, activity, symmetry, predicate argument structure, alterations, ellipsis, quantification, grammar check, propositional structure, etc.
Another embodiment of the present invention discloses a paraphrasing module. The final narration 217 post being fed and refined by various models of language enrichment module, is fed sequentially into neural network model (sequence to sequence builder (LSTM-CNN), generator module training, evaluator module for identification and finally paraphrase generation module. The paraphrasing module comprises set of neural networks trained with corpus of training data related to natural language generation templates.
Below given exemplary embodiment provides detailed explanation for different parts of generating contextual narrative.
Part 1 is input data feed (JSON structure):
Part 2 is initial set-up of natural language generation module:
Part 3 is setting up of ontology structure:
Part 4 is feeding of raw data:
Another embodiment of the present disclosure describes the process of automatically generating natural language from non-linguistic input which comprises three steps namely, content determination, sentence planning and surface planning. The workflow for natural language generation for visualization environment is as follows:
The NLG business configuration file comprises keyword-based knowledge graph (ontology structure) containing knowledge on various domains like Finance, Pharma etc., and entities stored in the knowledge graph can be domains, domain-specific attributes, set of possible relationships between attributes, set of possible insights, set of possible analysis, set of possible actions.
Insight resolution generates a fixed set of insights considering rules based on categories of entities and their relationship to arrive at the final set of insights.
Predefined set of templates is a file storage comprising large set of templates in natural language, categorized based on insight type and further divided into template formations containing enhanced language features based on domain. The default set of templates and other template groups with domain-based distinctions has hierarchical relationship between them, wherein this relationship is utilized by the NLG module to arrive at the final set of triggered insights. The actual natural language realization of complete insight set is stored in these template files. Further, this file contains the following additional details related to each template:
Another embodiment of the present disclosure explains in detail the entire process of natural language generation being performed in the past and integration of the present disclosure to improve user experience by providing contextual narratives for visualizations. The contextual narratives are generated by using semantic web, deep learning model and domain specific knowledge base for the business user to derive insights from visualizations.
In an embodiment, the computer system may be a communication unit, which is used for pushing the plurality of messages from the first node to the second node. The computer system may include a central processing unit (“CPU” or “processor”). The processor may comprise at least one data processor for executing program components for executing user or system-generated business processes. The processor may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc.
The processor may be disposed in communication with one or more input/output (I/O) devices via I/O interface. The I/O interface may employ communication protocols/methods such as, without limitation, audio, analog, digital, stereo, IEEE-1394, serial bus, Universal Serial Bus (USB), infrared, PS/2, BNC, coaxial, component, composite, Digital Visual Interface (DVI), high-definition multimedia interface (HDMI), Radio Frequency (RF) antennas, S-Video, Video Graphics Array (VGA), IEEE 802.n/b/g/n/x, Bluetooth, cellular (e.g., Code-Division Multiple Access (CDMA), High-Speed Packet Access (HSPA+), Global System For Mobile Communications (GSM), Long-Term Evolution (LTE) or the like), etc.
Using the I/O interface, the computer system may communicate with one or more I/O devices. In some implementations, the processor may be disposed in communication with a communication network via a network interface. The network interface may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), Transmission Control Protocol/Internet Protocol (TCP/IP), token ring, IEEE 802.11a/b/g/n/x, etc. Using the network interface and the communication network, the computer system may be connected to the sender server and the recipient server.
The communication network can be implemented as one of the several types of networks, such as intranet or any such wireless network interfaces. The communication network may either be a dedicated network or a shared network, which represents an association of several types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the communication network 508 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.
In some embodiments, the processor may be disposed in communication with a memory e.g., RAM, and ROM, etc., via a storage interface. The storage interface may connect to memory including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as Serial Advanced Technology Attachment (SATA), Integrated Drive Electronics (IDE), IEEE-1394, Universal Serial Bus (USB), fiber channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.
The memory may store a collection of program or database components, including, without limitation, user/application, an operating system, a web browser, a mail client, a mail server, a user interface, and the like. In some embodiments, computer system may store user/application data, such as the data, variables, records, etc. as described in this invention. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle or Sybase.
The operating system may facilitate resource management and operation of the computer system. Examples of operating systems include, without limitation, Apple Macintosh™ OS X™, UNIX™, Unix-like system distributions (e.g., Berkeley Software Distribution (BSD), FreeBSD™, Net BSD™, Open BSD™, etc.), Linux distributions (e.g., Red Hat™, Ubuntu™, K-Ubuntu™, etc.), International Business Machines (IBM™) OS/2™, Microsoft Windows™ (XP™, Vista/7/8, etc.), Apple iOS™, Google Android™, Blackberry™ Operating System (OS), or the like. A user interface may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, user interfaces may provide computer interaction interface elements on a display system operatively connected to the computer system, such as cursors, icons, check boxes, menus, windows, widgets, etc. Graphical User Interfaces (GUIs) may be employed, including, without limitation, Apple™ Macintosh™ operating systems' Aqua™, IBM™ OS/2™, Microsoft™ Windows™ (e.g., Aero, Metro, etc.), Unix X-Windows™, web interface libraries (e.g., ActiveX, Java, JavaScript, AJAX, HTML, Adobe Flash, etc.), or the like.
The present computer implemented system includes a system, a network, a plurality of user devices, a database, a memory, a processor, I/O interfaces, a plurality of modules, and plurality of data.
The network interconnects the user devices and the database with the system. The network includes wired and wireless networks. Examples of the wired networks include a Wide Area Network (WAN) or a Local Area Network (LAN), a client-server network, a peer-to-peer network, and so forth. Examples of the wireless networks include Wi-Fi, a Global System for Mobile communications (GSM) network, and a General Packet Radio Service (GPRS) network, an enhanced data GSM environment (EDGE) network, 802.5 communication networks, Code Division Multiple Access (CDMA) networks, or Bluetooth networks.
In the present implementation, the database may be implemented as enterprise database, remote database, local database, and the like. The database may be located within the vicinity of the system or may be located at different geographic locations as compared to that of the system. Further, the database may themselves be located either within the vicinity of each other or may be located at different geographic locations. Furthermore, the database may be implemented inside the system and the database may be implemented as a single database or as multiple databases.
In the present implementation, the system includes one or more processors. The processor may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor is configured to fetch and execute computer-readable instructions stored in the memory.
The memory may be coupled to the processor. The memory can include any computer-readable medium known in the art including, for example, volatile memory, such as static random-access memory (SRAM) and dynamic random-access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
Further, the system includes modules. The modules include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. In one implementation, the module includes an input module, an estimation module, a display module and other modules. The other modules may include programs or coded instructions that supplement applications and functions of the system.
As described above, the modules, amongst other things, include routines, programs, objects, components, and data structures, which perform particular tasks or implement particular abstract data types. The modules may also be implemented as, signal processor(s), state machine(s), logic circuitries, and/or any other device or component that manipulate signals based on operational instructions. Further, the modules can be implemented by one or more hardware components, by computer-readable instructions executed by a processing unit, or by a combination thereof.
Furthermore, one or more computer-readable storage media may be utilized in implementing some of the embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., non-transitory. Examples include Random Access Memory (RAM), Read-Only Memory (ROM), volatile memory, non-volatile memory, hard drives, Compact Disc (CD) ROMs, Digital Video Disc (DVDs), flash drives, disks, and any other known physical storage media.
The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.
It is intended that the disclosure and examples be considered as exemplary only, with a true scope of disclosed embodiments being indicated by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202021040776 | Sep 2020 | IN | national |