The disclosure generally relates to the field of predictive modeling based on text mining, of, e.g., documents, and more specifically relates to an improved user interface for displaying results of the predictive modeling (e.g., by identifying portions of a document that most heavily influenced a given information request).
Related art systems rely on machine learning algorithms to hone determinations of relevance to users. However, after determining relevance of candidate objects (e.g., documents), the output of these algorithms is a ranked list of results, without indicia of why the ranking was generated in the output manner. For example, in the context of a document search, based on machine learning models as applied to the input of a search, some documents are presented to a user over other candidate documents as a result of a search. While some related art systems show that a particular searched term exists in a search result (e.g., a keyword of a search string may be highlighted when a search result is selected), these related art systems fail to highlight with any level of granularity portions of free text that lead to a prediction (e.g., of relevance, sentiment, and the like), beyond mere keyword matching. This problem is especially pronounced in long documents, where, for example, ten non-consecutive pages out of one thousand may have factored into a determination to output the long document as a search result; simply noting a particular keyword of a search that was prevalent in the long document does not inform the user of why the machine model itself selected the document, or ranked the document a certain way.
The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.
Figure (
The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
One embodiment of a disclosed system, method and computer readable storage medium that includes an output of a user interface with a heat map that explains why a search module's machine learning model made a certain prediction. For example, an insurance company that employs defense attorneys to defend workers compensation claims may wish to determine whether a given defense attorney is performing adequately. A worker of the insurance company may run a search through textual documents that are used to predict, based on a machine learning model, whether the given defense attorney is performing adequately. The worker may be presented with a user interface that shows a document resulting from the search concurrently with a heat map that shows which areas of the document(s) informed a prediction that the given defense attorney is, or is not, performing adequately as well as (potentially) the degree to which each portion of these documents(s) contributed to said prediction. For example, the heat map may be a scroll bar that corresponds to the document, and that is encoded based on a sentiment analysis of the document.
The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions 224 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 124 to perform any one or more of the methodologies discussed herein.
The example computer system 200 includes a processor 202 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 204, and a static memory 206, which are configured to communicate with each other via a bus 208. The computer system 200 may further include visual display interface 210. The visual interface may include a software driver that enables displaying user interfaces on a screen (or display). The visual interface may display user interfaces directly (e.g., on the screen) or indirectly on a surface, window, or the like (e.g., via a visual projection unit). For ease of discussion the visual interface may be described as a screen. The visual interface 210 may include or may interface with a touch enabled screen. The computer system 200 may also include alphanumeric input device 212 (e.g., a keyboard or touch screen keyboard), a cursor control device 214 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 216, a signal generation device 218 (e.g., a speaker), and a network interface device 220, which also are configured to communicate via the bus 208.
The storage unit 216 includes a machine-readable medium 222 on which is stored instructions 224 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 224 (e.g., software) may also reside, completely or at least partially, within the main memory 204 or within the processor 202 (e.g., within a processor's cache memory) during execution thereof by the computer system 200, the main memory 204 and the processor 202 also constituting machine-readable media. The instructions 224 (e.g., software) may be transmitted or received over a network 226 via the network interface device 220.
While machine-readable medium 222 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 224). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 224) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.
Search module 302, when executed, may call on user interface module 308 to generate for display a user interface to a user (e.g., by way of visual interface 210 of client device 110). The user interface may accept search terms, such as any of, or a combination of, a selection of a defense attorney, a selection of a document, a selection of a filter (to be described in further detail with respect to
Processor 202 may generate for display search results found by way of search module 302, and may cause the search results to be displayed to a user of client device 110 by executing user interface module 308. Search results may include identifiers of topics and/or documents corresponding to a selected topic. Again following the defense attorney example mentioned above, in response to detecting selection of a given case, user interface module 308 may generate for display an indication that there is a 67% chance that a workers compensation applicant is going to win a particular case, and a 33% chance that the respondent (e.g., insurance company) will win the case, as part of the search results. This determination of probabilities may be made by, e.g., heat map generation module 304, or any other predictive module. In addition to search results, user interface module 308 may generate for display at client device 110 a heat map that explains what parts of a given document led to a prediction (e.g., of defense attorney performance). The given document may be selected by a user of client device 110. For example, selection of a topic from search results indicating odds of success may result in a listing of documents that were used to inform the determination of the odds of success. Again following from the defense attorney example mentioned above, user interface module 308 may detect selection of a selectable option that requests more information as to why the depicted probabilities are accurate representation of outcome chance. Heat map module 304 may be used, as described below, to highlight certain portions of one or more documents to accentuate portions of the documents that led to the prediction of probabilities.
In order to populate the heat map, processor 202 executes heat map generation module 304. Processor 202 may execute heat map generation module 304 based on a user request, or processor 202 may execute heat map generation module 304 automatically, either when a search result is populated, or when a user of client device 110 selects a document. Heat map generation module 304 generates a heat map (e.g., a scroll bar including different emphases or colorations) based on how influential a given portion of a document that corresponds to a point on, e.g., the scroll bar, was in causing the document to be part of the search result, or in causing a prediction to be made (e.g., a defense attorney score). The heat map, and how the heat map is generated, will be described in further detail with respect to
In some embodiments, as an alternative, or as an addition, to heat map generation module 304, processor 202 may execute highlight module 306. Highlight module 306 may highlight terms of a document that influenced the prediction made by search module 302 based on machine learning model 310. The highlight may be performed in different colors, or different levels of grayscale, based on a type of influence (e.g., positive or negative sentiment), based on a degree of influence (e.g., highly positive, or marginally positive), and based on any other factor. The term highlight may also refer to any other form of emphasis, such as bolding, italicizing, or any other manipulation of text that differentiates the text being highlighted from other text in the document. Further details about highlight module 306 will be described below with respect to
The machine learning model may be trained using training data including portions of documents that are annotated with a corresponding category, and optionally, a degree to which the portions correspond to the category. For example, for a defense attorney performance machine learning model, the training data may indicate whether, and to what degree, certain words, strings of words, and the like indicate that the defense attorney is likely to succeed.
Following training, the machine learning model may take one or more documents as input. Search module 302 and/or heat map generation module 304 may select documents for input by searching for documents based on parameters of the search request. In the context of the defense attorney example, the documents may be selected based on their being drafted by the defense attorney, based on their being involved in a case that the defense attorney worked on, based on their informing an issue relating to a case for which likelihood of success by a particular defense attorney is being determined, and so on. Optionally, the machine learning model may be trained to additionally take as input one or more parameters (e.g., attorney involvement, related to surgery, etc).
The machine learning module may output, in addition to its determination of defense attorney performance, a weight applied to relevant portions of documents in informing the determination of defense attorney performance. Heat map generation module 304 and/or highlight module 306 may use the weights to determine what to highlight and how to generate the heat map for a given document. Optionally, where parameters were taken as input, the machine learning model may output one or more parameter-specific weights that indicate a degree to which a part of a document that informed determination of defense attorney performance corresponds to the particular parameter. For example, where the parameter “surgery” was taken as input, the machine learning model may indicate which relevant portions of the documents in informing determination of defense attorney performance also correspond to surgery. Optionally, multiple machine learning models may be used, one dedicated to identifying portions of documents that inform a prediction, and additional one(s) dedicated to identifying a degree of influence of each identified portion, identifying which portions correspond to given parameters, and so on. While the disclosure is primarily described with respect to a machine learning model, the term encompasses any type of model, including classifiers, predictive models, and so on.
In an embodiment, processor 202 may execute heat map generation module 304 and/or highlight module 306, which may, as described above, input a document into machine learning model 310, which outputs weights that inform a determination of aspects of the document that form an inference of applicant satisfaction, and applicant dissatisfaction (which in turn form an inference of defense attorney performance). For example, heat map generation module 304 may highlight section 402 with a light color, such as green, because section 402 correlates to positive performance, given that it indicates applicant satisfaction because the applicant is recovering more quickly than expected. Heat map generation module 304 highlights section 404 a dark color, such as red, because section 404 correlates to negative performance, given that it indicates applicant dissatisfaction because the applicant is not happy. While sections 402 and 404 are highlighted in light and dark colors, heat map generation module and/or highlight module 306 may use any color or other form of shading, bordering, or other accentuation to indicate positive or negative performance.
Heat map generation module 304 may determine that different sections of the document correspond to different degrees of defense attorney performance (e.g., based on the weights output by the machine learning model); depth of coloration or shading or bordering may be output by heat map generation module 304 to indicate sections that form a strong or weak prediction. For example, if a section forms a particularly strong inference that an applicant attorney is performing poorly, then the section may be highlighted in deep red, or may be bordered by a solid black line. If a section forms a particularly weak inference that an applicant attorney is performing poorly, then the section may be highlighted in faint red, or may be bordered by a thin, dashed line. The manner in which degrees of inference are highlighted may be commanded by a user of client device 110 (e.g., by way of a selection of settings on user interface 400), or may be default settings programmed for user by user interface module 308.
In addition to, or alternative to, the use of highlighting, heat map generation module 304 may generate heat map 406. Heat map 406 is depicted as a scroll bar, where each point along the scroll bar corresponds to progress through a given document. For example, twenty-five percent of the way down the scroll bar may correspond to a page of the document that is twenty-five percent of the way through the document. The scroll bar is merely exemplary; the heat map may be depicted in any form, such as a direct background of the document or in any other form. Heat map generation module 304 generates heat map 406 based on a degree to which sections that, in the aggregate, form a portion of heat map 406 form a positive or negative correlation. For example, section 408 of heat map 406 is white. This is because heat map generation module 304 determines that the section, which is depicted in user interface 400, includes section 402, which forms an inference of positive performance, and also includes section 404, which forms an inference of negative performance. Heat map generation module 304 determines that sections 402 and 404 cancel each other out, and thus section 408 of heat map 406 does not, as a whole, add to an inference of either positive or negative applicant attorney performance.
In some embodiments, user interface module 308 may generate more than two heat maps (e.g., scroll bars), where some, e.g., scroll bars conform to a heat map based on inferences of attorney performance, and other scroll bars may conform to areas of the document where text corresponding to a selectable option occurs.
Selectable option 651 corresponds to a heat map for positive sentiment, selectable option 652 corresponds to a heat map for negative sentiment, selectable option 653 corresponds to a heat map for surgery, and selectable option 654 corresponds to a heat map for attorney involvement. Selectable option 653 is not checked off (i.e., not selected by a user), and thus no heat map for surgery is displayed as part of user interface 600. Selectable options 651, 652, and 654, however, are checked off, and thus heat maps 606, 607, and 620 are displayed as part of user interface 600, showing information corresponding to the selected options, respectively.
Notably, as depicted in
The selectable options of user interface 650, rather than causing an additional heat map to display when selected, may act as filters. For example, if selectable option 654 is selected, then sentiment analysis as to whether a defense attorney is performing satisfactorily may be performed only where that information relates to parts of a document where an attorney is involved. For example, if a document indicates that a surgeon made a mistake when performing surgery in connection with a workers compensation injury, if selectable option 654 is selected, then because the surgeon's mistake has nothing to do with attorney involvement, heat map generation module will not include negative highlighting or negative showing in heat map 607 based on the portion of the document discussing the mistake.
In some embodiments, color intensity on a given heatmap (e.g., heatmap 607) may become more or less intense to show dimensions other than intensity of a sentiment. For example, if machine learning model correlates some text to an emotion of “annoyed,” and other text to an emotion of “angry,” then “annoyed” may be shaded a light red, and “angry” may be shaded a deep red. Any such dimension or level of intensity may be used to drive the depth of shading of the heat map. Furthermore, colors depend on the eye of the beholder; if the same user interface 400 was to be used to analyze performance of the applicant's attorney, rather than the defense attorney, then the colors may be inverted, as good defense attorney performance may correlate inversely to good applicant attorney performance.
The server inputs 708 at least the portions of the documents into a machine learning model (e.g., machine learning model 310). The server receives 710, as output from the machine learning model, the prediction. The server outputs 712 the prediction for display to the user (e.g., using user interface module 308, the output reaching client device 110 by way of network 120). The server receives 714 a request from the user to view a document that informed the prediction (e.g., the document depicted in user interface 400). The server generates for display 716 with the document a heat map (e.g., heat map 406) that indicates how parts of the document that are included in the portions of the documents that informed the prediction influenced the prediction. The document may be additionally highlighted using highlight module 306.
Advantages of the heat maps and highlighting discussed in this disclosure include an improved user interface that enables users to identify sections of documents that influenced a determination (e.g., a determination of likelihood of success), as well as a degree to which those sections caused influence. Moreover, users are able to easily isolate one or more specific parameters that influenced the determination and cause the heat map to reflect those parameters, to the exclusion of other parameters that may be causing noise in the analysis.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.
As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for generating a heat map through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.
This application claims the benefit of U.S. Provisional Application No. 62/802,169, filed Feb. 6, 2019, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62802169 | Feb 2019 | US |