The present disclosure relates generally to search functionality, and
more specifically, to the integration of text analysis and searching of documents and other data objects.
Text analysis tools are often used to generate structured data (such as, for example, spreadsheets and structured business data employable in enterprise resource planning (ERP) systems) from unstructured data (such as word processing files, displayable electronic documents, and the like). While some worthwhile results from text analysis, such as the identification of key terms or phrases, does not often require any additional input beyond the document or text being analyzed, other results, such as the identification of entity instances (for example, dates, locations, names, and so on) are typically based on entity-specific rules which are made available to the text analysis function in addition to the documents being analyzed. In many cases, structured data is easier for both users and computer-based applications to utilize, given the added organization and context provided in structured data over its unstructured counterpart.
Search tools, generally speaking, facilitate the discovery and subsequent access of documents, business data objects, and other types of structured and unstructured data that are logically related to a particular search query. The use of these search tools often relieves a user of the burden of perusing each potential document or data object, one by one, in order to find data of interest. Typically, the usefulness of search tools increases as the number of potential documents and other data objects increases.
The present disclosure is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
The description that follows includes illustrative systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative embodiments. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be evident, however, to those skilled in the art that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques have not been shown in detail.
At least some of the embodiments described herein provide various techniques for integrating text analysis and search functions via the use of tagging data (or, alternatively, data “tags”) associated with one or more documents or data objects of interest.
As is described in greater detail below, in one example, a plurality of documents, as well as search information comprising search terms for a search category, are accessed. As employed throughout this disclosure, documents may refer to document files or other data objects that may be the subject of a search operation. Those of the plurality of documents that include at least one of the search terms are identified. The identified documents are further analyzed (for example, by way of text analysis) to determine those of the identified documents that are logically associated with the search category. Each of the determined documents are then tagged with the search category, possibly including one or more search terms that apply to the particular document being tagged. Presuming a search request is received that indicates the search category, the documents that are tagged with the search category may then be returned in response to the search request. As a result, text analysis results may be employed to enhance the results of a search request or query. Other aspects of the embodiments discussed herein may be ascertained from the following detailed description.
Turning specifically to the enterprise application platform 112, web servers 124, and Application Program Interface (API) servers 125 are coupled to, and provide web and programmatic interfaces to, application servers 126. The application servers 126 are, in turn, shown to be coupled to one or more database servers 128 that may facilitate access to one or more databases 130. The web servers 124, Application Program Interface (API) servers 125, application servers 126, and database servers 128 may host cross-functional services 132. The application servers 126 may further host domain applications 134.
The cross-functional services 132 may provide user services and processes that utilize the enterprise application platform 112. For example, the cross-functional services 132 may provide portal services (e.g., web services), database services, and connectivity to the domain applications 134 for users that operate the client machine 116, the client/server machine 117, and the small device client machine 122. In addition, the cross-functional services 132 may provide an environment for delivering enhancements to existing applications and for integrating third party and legacy applications with existing cross-functional services 132 and domain applications 134. Further, while the system 110 shown in
The portal modules 240 may enable a single point of access to other cross-functional services 132 and domain applications 134 for the client machine 116, the small device client machine 122, and the client/server machine 117 of
The relational database modules 242 may provide support services for access to the database 130 (
The connector and messaging modules 244 may enable communication across different types of messaging systems that are utilized by the cross-functional services 132 and the domain applications 134 by providing a common messaging application processing interface. The connector and messaging modules 244 may enable asynchronous communication on the enterprise application platform 112.
The Application Program Interface (API) modules 246 may enable the development of service-based applications by exposing an interface to existing and new applications as services. Repositories may be included in the platform as a central place to find available services when building applications.
The development modules 248 may provide a development environment for the addition, integration, updating, and extension of software components on the enterprise application platform 112 without impacting existing cross-functional services 132 and domain applications 134.
Turning to the domain applications 134, the customer relationship management applications 250 may enable access to and facilitate collecting and storing of relevant personalized information from multiple data sources and business processes. Enterprise personnel that are tasked with developing a buyer into a long-term customer may utilize the customer relationship management applications 250 to provide assistance to the buyer throughout a customer engagement cycle.
Enterprise personnel may utilize the financial applications 252 and business processes to track and control financial transactions within the enterprise application platform 112. The financial applications 252 may facilitate the execution of operational, analytical, and collaborative tasks that are associated with financial management. Specifically, the financial applications 252 may enable the performance of tasks related to financial accountability, planning, forecasting, and managing the cost of finance.
The human resources applications 254 may be utilized by enterprise personal and business processes to manage, deploy, and track enterprise personnel. Specifically, the human resources applications 254 may enable the analysis of human resource issues and facilitate human resource decisions based on real-time information.
The product life cycle management applications 256 may enable the management of a product throughout the life cycle of the product. For example, the product life cycle management applications 256 may enable collaborative engineering, custom product development, project management, asset management, and quality management among business partners.
The supply chain management applications 258 may enable monitoring of performances that are observed in supply chains. The supply chain management applications 258 may facilitate adherence to production plans and on-time delivery of products and services.
The third-party applications 260, as well as legacy applications 262, may be integrated with domain applications 134 and utilize cross-functional services 132 on the enterprise application platform 112.
The tagging module 302 may perform any of the functions related to the tagging of documents and other data objects, including the generation, storage, maintenance, and/or use of the tagging data. In some examples, the tagging module 302 may be a combination of multiple modules, each of which provides separate functionality regarding the tagging of data objects. The operations of the tagging module 302 as they pertain to the text analysis and search functions presented herein are discussed below.
The text analysis module 304 and the search module 306 provide the text analysis and search capabilities described more fully below with respect to documents and other data objects. More specifically, the text analysis module 304 may analyze the text of documents to determine whether they are logically associated with a given search category or term, and communicate with the tagging module 302 to tag the documents with information to be used in a document search. A document is logically associated with a search category or term when at least a portion of the content of the document describes or addresses at least one aspect of the search category or term. Accordingly, the search module 306 employs the tagging to perform searches based on queries provided by users or other applications.
The storage module 308 may facilitate the storage and retrieval of both the documents and the tagging data. One example of the storage module 308 is a relational database, but any other type of storage facility capable of performing the various storage and retrieval functions compatible with the various examples discussed below may also serve as the storage module 308.
The user interface module 310 may provide an end user access to the search functionality described in greater detail below. In addition, the user interface module 310 may provide other types of users, such as programmers, content managers, administrators, and the like, access to the tagging data, documents, data objects, and related information described below in other examples.
As shown in
analysis portion 401 and a search portion 411, showing generally how the two phases are integrated. In the method 400, a plurality of documents is accessed (operation 402). In some examples, a document may be any file or other data structure that includes text, including both structured and unstructured data, such as, for example, text files, word processing files, printable or displayable documents, spreadsheets, business records, and so on.
Search information is also accessed (operation 404). The search information may include or indicate a search category and associated search terms. In one example, the search category is a character string, word, term, phrase, or the like that may be subsequently used in a search request or query. In another example, the search terms may include specific examples or subcategories of the search category. For example, in examples discussed below in conjunction with
Each of the documents that include at least one of the search terms may be identified (operation 406). Continuing with the example of a “Car” search category, those documents that contain the search terms associated with the “Car” category, such as the car companies, or “makes,” mentioned above, may be identified. In an implementation, the identified documents are considered to be candidates for a text analysis phase to follow, as words or phrases in a document, while appearing to be equivalent to the search terms, may not be synonymous with the search terms when taken in context with other portions of the document. In other examples, other types of search terms, such as the country of origin of each make, may be included in the search terms and used to identify the candidate documents.
The identified documents may then be analyzed to determine those documents that are logically associated with the search category (operation 408). In one example, the analysis may at least include text analysis that takes as input the documents to be analyzed, as well as entity or search term candidates to direct the analysis, examples of which are provided below. Those identified documents that are found to be logically associated with the search category are then tagged with the search category (operation 410). In addition, each of the tagged documents may be tagged with the particular search term found in, or otherwise associated with, the document.
As a result of the tagging and analysis functions 401, the data tags linked to, or associated with, the documents provides information that facilitates a more complete and focused search of the documents. To that end, in the search function 411, a search request including the search category may be received (operation 412). In response to the request, the tagged documents (i.e., those documents found to be logically associated with the search category) may be returned as results (operation 414).
The tagging and analysis portion 401 of the method 400 may be
initiated in a number of ways. For example, the reception of a search query (operation 412) may cause the tagging and analysis portion 401 to begin, especially if the tagging and analysis portion 401 has not been performed previously for a search category referenced in the search query. In some implementations, the tagging and analysis portion 401 may also be performed on documents that have been changed, added to the system, or deleted from the system so that the tagging data associated with the current documents remains up-to-date.
While the operations of the method 400 of
In the method 500 of
Given the search object types 504A, 504B, those of the documents 502A-502H that are relevant for further text analysis are identified (operation 510 of
The resulting relevant documents 512, as described above, are depicted in
In one example, the entity instance candidates 514 may be data tags that are linked or otherwise associated with their respective relevant documents 512. Examples of the types of data tags that may be employed are provided in
The identification function 510 may be provided automatically in the tagging module 302 (
The relevant documents 512 and the entity instance candidates 514 are forwarded to a text analysis function (operation 520 of
For example, regarding the search category of “Car,” the term “Mercedes-Benz” appearing in the relevant document 512A may, in and of itself, indicate that a car is being referred to or discussed, and the presence of the words “model” and “Detroit” may provide further verification. In the relevant document 512E, the mere existence of the word “Chrysler” may be enough to indicate that a car is being discussed therein, emphasized by the inclusion of the phrase “Chrysler Corporation” in the document 512E.
As to the search category “U.S. President,” the presence of the term “Obama” in the relevant document 512B, possibly in conjunction with a reference to a crowd in Berlin, is likely sufficient to indicate that a U.S. president is being referenced. On the other hand, text analysis may determine that the appearance of the word “Bush” in conjunction with the term “Furniture” indicates that a furniture business is being discussed, as opposed to a U.S. president.
On the other hand, the presence of the term “Ford” in both relevant documents 512D and 512G is applicable at first glance to both the “Car” and “U.S. President” search categories. However, text analysis may determine that the presence of the term “dealer” adjacent to the word “Ford” in relevant document 514D may indicate that “Ford” refers to the carmaker, and that relevant document 514D is thus logically associated to the “Car” search category, and not the “U.S. President” category. Oppositely, the use of the term “Ford” in relation to a marriage in 1948, as the term appears in relevant document 512G, indicates that the relevant document 512G is more likely to be logically associated with the “U.S. President” category than the “Car” category.
As a result of the text analysis operation 520, performed in at least one example by the text analysis module 304 (
In response to receiving the analyzed documents 522 and their corresponding identified entity instances 524, the tagging function 530 may tag each of the analyzed documents with the information in the identified entity instances 524, resulting in tagged documents 532A, 532B, 532D, 532E, and 532G illustrated in
As shown in
In reference to
As a result of the embodiments described above, a more accurate and focused search functionality may be provided due to the text analysis and associated tagging functions integrated with the search. For example, each of the search results 542 of
Further, as a result of the document tagging function 530 (
As discussed above, any and/or all of the document identification function 510, the text analysis function 520, and the document tagging function 530 may involve the tagging of one or more documents. Each of
In some examples, each of the tags 1201A, 1201B, and 1201C may be implemented as a data object separate from the one or more data objects associated with the tag 1201, as shown in
Depending on the type of tagging to be performed, more than one of the tagging formats 1200A, 1200B, and 1200C may be employed for a particular tag. For example, tagging a document file represented by a data object 1204 with the name of an author can be accomplished by any of tagging by value 1200A (by using the name of the author as a tag value 1202), tagging by type 1200B (by using the name of the author as a tag value 1205, and a tag type 1203 of “author”), and tagging by object 1200C (by using a tag 1201C to link the data object 1204 for the document with a second data object 1206 representing the author). In some implementations, the tagging module 302 (
In the implementations described above, the tagging data is generated automatically by a computer-implemented process, such as the tagging module 302 (
In at least some embodiments discussed herein, the integration of text analysis and search functionality by way of using data tags may increase the efficiency and accuracy of a search function, as well as possibly improve the text analysis function, as discussed above with respect to the examples of
The machine is capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example of the processing system 1300 includes a processor 1302 (for example, a central processing unit (CPU), a graphics processing unit (GPU), or both), a main memory 1304 (for example, random access memory), and static memory 1306 (for example, static random-access memory), which communicate with each other via bus 1308. The processing system 1300 may further include video display unit 1310 (for example, a plasma display, a liquid crystal display (LCD), or a cathode ray tube (CRT)). The processing system 1300 also includes an alphanumeric input device 1312 (for example, a keyboard), a user interface (UI) navigation device 1314 (for example, a mouse), a disk drive unit 1316, a signal generation device 1318 (for example, a speaker), and a network interface device 1320.
The disk drive unit 1316 (a type of non-volatile memory storage) includes a machine-readable medium 1322 on which is stored one or more sets of data structures and instructions 1324 (for example, software) embodying or utilized by any one or more of the methodologies or functions described herein. The data structures and instructions 1324 may also reside, completely or at least partially, within the main memory 1304, the static memory 1306, and/or within the processor 1302 during execution thereof by processing system 1300, with the main memory 1304 and processor 1302 also constituting machine-readable, tangible media.
The data structures and instructions 1324 may further be transmitted or received over a computer network 1350 via network interface device 1320 utilizing any one of a number of well-known transfer protocols (for example, HyperText Transfer Protocol (HTTP)).
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (for example, code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is a tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (for example, the processing system 1300) or one or more hardware modules of a computer system (for example, a processor 1302 or a group of processors) may be configured by software (for example, an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may include dedicated circuitry or logic that is permanently configured (for example, as a special-purpose processor, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also include programmable logic or circuitry (for example, as encompassed within a general-purpose processor 1302 or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (for example, configured by software) may be driven by cost and time considerations.
Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (for example, hardwired) or temporarily configured (for example, programmed) to operate in a certain manner and/or to perform certain operations described herein. Considering embodiments in which hardware modules are temporarily configured (for example, programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules include a general-purpose processor 1302 that is configured using software, the general-purpose processor 1302 may be configured as respective different hardware modules at different times. Software may accordingly configure a processor 1302, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Modules can provide information to, and receive information from, other modules. For example, the described modules may be regarded as being communicatively coupled. Where multiples of such hardware modules exist contemporaneously, communications may be achieved through signal transmissions (such as, for example, over appropriate circuits and buses) that connect the modules. In embodiments in which multiple modules are configured or instantiated at different times, communications between such modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple modules have access. For example, one module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further module may then, at a later time, access the memory device to retrieve and process the stored output. Modules may also initiate communications with input or output devices, and can operate on a resource (for example, a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors 1302 that are temporarily configured (for example, by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors 1302 may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, include processor-implemented modules.
Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors 1302 or processor-implemented modules. The performance of certain of the operations may be distributed among the one or more processors 1302, not only residing within a single machine but deployed across a number of machines. In some example embodiments, the processors 1302 may be located in a single location (for example, within a home environment, within an office environment, or as a server farm), while in other embodiments, the processors 1302 may be distributed across a number of locations.
While the embodiments are described with reference to various implementations and exploitations, it will be understood that these embodiments are illustrative and that the scope of claims provided below is not limited to the embodiments described herein. In general, the techniques described herein may be implemented with facilities consistent with any hardware system or hardware systems defined herein. Many variations, modifications, additions, and improvements are possible.
Plural instances may be provided for components, operations, or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the claims. In general, structures and functionality presented as separate components in the exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the claims and their equivalents.