As provided for under 35 U.S.C. §120, this patent claims benefit of the filing date of the following U.S. patent application, herein incorporated by reference in its entirety:
“Methods and Apparatus for Identification and Analysis of Temporally Differing Corpora,” filed 2013 Mar. 15 (y/m/d), having inventors Jens Erik Tellefsen, and Ranjeet Singh Bhatia, and application Ser. No. 13/836,416.
This application is related to the following U.S. patent application(s), which are herein incorporated by reference in their entirety:
“Graphical Representation of Frame Instances and Co-occurrences,” filed 2012 Nov. 13 (y/m/d), having inventor(s) Michael Jacob Osofsky application. Ser. No. 13/676,073 (“the '073 Application”); and
“Methods and Apparatuses For Sentiment Analysis,” filed 2012 May 14 (y/m/d), having inventors Lisa Joy Rosner, Jens Erik Tellefsen, Michael Jacob Osofsky, Jonathan Spier, Ranjeet Singh Bhatia, Malcolm Arthur De Leo, and Karl Long and application. Ser. No. 13/471,417 (“the '417 Application”);
“Methods and Apparatuses for Clustered Storage of Information and Query Formulation,” filed 2011 Oct. 24 (y/m/d), having inventors Mark Edward Bowles, Jens Erik Tellefsen, and Ranjeet Singh Bhatia and application. Ser. No. 13/280,294 (“the '294 Application”); and
“Method and Apparatus for HealthCare Search,” filed 2010 May 30 (y/m/d), having inventors Jens Erik Tellefsen, Michael Jacob Osofsky, and Wei Li and application. Ser. No. 12/790,837 (“the '837 Application”).
Collectively, the above-listed related applications can be referred to herein as “the Related Applications.”
The present invention relates generally to the identification and analysis of differences between corpora, and more particularly to identification and analysis between corpora representing differing time intervals.
It is well known that tracking customer satisfaction is an important technique for sustained competitive advantage.
More recently customers are using online tools to express their opinions about a wide range of products and services. Many such online tools can be described as being under the general category of “Social Media” (or SM). Online tools in this category include, but are not limited to, the following:
The availability of such SM content raises the question of whether, with appropriate technology, tools can be applied to identify and analyze changes in consumer opinion over time, as expressed in such data.
The accompanying drawings, that are incorporated in and constitute a part of this specification, illustrate several embodiments of the invention and, together with the description, serve to explain the principles of the invention:
Reference will now be made in detail to various embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
In addition to being incorporated by reference in their entirety, the description presented herein specifically relies on many sections of the Related Applications. Please refer to Section 3.1 (“Referring to Related Applications”) for an explanation of a systematic notation for reference to them.
Please refer to Section 4 (“Glossary of Selected Terms”) for the definition of selected terms used below.
1 Brand Tracking and Analysis
2 Temporal Analysis
2.1 Introduction
2.2 Finding Lexical Units and/or Phrases
2.3 Visualization
2.4 Periodic Analysis
3 Additional Information
3.1 Referring To Related Applications
3.2 Additional Calculations
An example area of application, within which the present invention can be utilized, is the performance, by a brand manager, of brand tracking and analysis. A brand manager is defined herein as a person responsible for the continued success of a brand. Focused upon herein, as an example area of applicability, are branded products and/or services sold to consumers, also referred to herein as “consumer brands.” A brand manager needs to monitor consumer interest in, and sentiment toward, each consumer brand that constitutes an area of his or her responsibility.
A type of search to accomplish this, described in Section 2.1 of the '294 Application (“Consumer Sentiment Search”), is called a “consumer sentiment search.” The '294 Application describes the searching of a database that includes the collection, in a large scale and comprehensive way, of postings (such as “tweets” on Twitter) to social media (or SM) web sites or services. This type of social media inclusive database is referred to as an “SM_db,” and its basic elements are called documents (even though, in social media, an individual posting may be quite short). To create a suitably fast search tool, the '294 Application describes comprehensive pre-query scanning of the documents in an SM_db.
A consumer sentiment search is predicated upon an ability to identify, within an SM_db, the instances in which the particular brand of interest is being mentioned. Techniques for accomplishing this are addressed, in detail, in the '294 and '417 Applications. The '417 Application refers to this as producing an object-specific corpus, where the example “object” selected-for is a brand. Techniques presented, for production of an object-specific corpus, include an ability to recognize instances of frames, as well as an ability to utilize “exclude terms,” used for ruling out appearances of lexical units used for meanings other than identification of the brand.
In the '294 Application, similar sentiments towards a brand are grouped together, so that the user can see a list summarizing the kinds of sentiments expressed along with an indication of the frequency with which each sentiment occurred. While this type of the summarization does provide a relatively detailed analysis of consumer sentiment, at a lexical unit and/or phrase level, it provides no specific tools for understanding how such consumer sentiment can change over time.
The '417 Application presents further techniques (e.g., Section 1.3, “Sentiment Analysis”), by which the results of a consumer sentiment search can be analyzed. An embodiment is presented where the sentiment of each statement is evaluated along at least one, or both, of the following two (relatively independent) dimensions:
Metrics are presented for summarizing, in an aggregate (or net) way, a population of statements that have been evaluated according to at least one of these two dimensions. A metric for each of polarity and intensity are presented. They are referred to as, respectively:
Section 1.5 of the '417 Application (“Temporal Dimension”) discusses the fact that a temporal dimension can be added to the analysis of a consumer sentiment search. Sentiment analysis can be applied to a time-series of corpora (also referred to herein as a corpora time-series), each collected in essentially the same way, except for the following: each corpus of the time-series represents data that is linked to a different (and often successive) time interval. For example, in the case of an SM_db, each corpus of a corpora time-series can represent those items of social media posted (e.g., TWITTER Tweets posted) during the time interval represented by the corpus.
A tool for monitoring the volume of social media about a particular brand, that we shall also refer to herein as a “Volume Monitor,” can be useful to a brand manager.
To provide an example for further discussion below, it is assumed there is a brand manager (sometimes referred to herein as “bm_1”), with responsibility for a (fictitious) brand of fast-food eating establishments called “FastEats.”
Regarding the vertical volume axis, while AXv is shown as divided into 10 units, the actual volume represented by each unit can be set at any appropriate level by choosing a suitable value for scaling factor “v.”
2.1 Introduction
For purposes of example, it is assumed that FastEats specializes in providing breakfast and lunchtime foods, with lunchtime being the more important of the two. As can be seen in
While a Volume Monitor, like that of
Regarding Use Scenario 1, let's suppose, with respect to the Volume Monitor of
Regarding Use Scenario 2, we will assume a situation that is the same as is described above, with regard to Use Scenario 1, except that:
Under Use Scenario 2, the TPI is always the rightmost hour of Volume Monitor 100. In
While the use of a Volume Monitor is used herein as an example context in which a brand manager may wish to utilize the present invention, it should be noted that Section 1.5 of the '417 Application also discusses a similar display for monitoring, over time, variation of the Net Polarity or Net Intensity Metrics.
While the above-described Volume and Polarity Monitors can be of great value to a brand manager, they are limited to representing aggregate or net changes between corpora. The present invention focuses upon identifying differences at the lexical unit and/or phrase level, between time-varying corpora.
For purposes of presenting the example of FastEats, in greater detail, the following SM_db, called SM_db_1, is defined:
As has been discussed, in the '294 and '417 Applications, an SM_db like SM_db_1 can comprise hierarchically-organized clusters of records, each cluster structured as follows:
To SentenceObj can be added another field, called (for example) “PostTime,” indexed according to the time at which each SentenceObj is posted. Thus, a suitably encoded representation of a time interval can be input to the index of PostTime, resulting in a retrieval of those SentenceObj's that occurred during the specified time interval.
Using appropriate queries of SM_db_1, an object-specific corpus can be produced. For purposes of the example, appropriate queries can be used to identify those statements discussing the FastEats brand and this object-specific corpus can be referred to herein as SM_db_FastEats.
Given the above-described SM_db_1 and SM_db_FastEats, a temporal analysis, at the lexical unit or phrase level, can be accomplished as follows. Section 2.2 below describes what is also called a “Stage 1” analysis. Section 2.3 discusses ways to visualize such Stage 1 results. Section 2.4 discusses a further analysis, that can be applied to the Stage 1 result, called Periodic Analysis.
2.2 Finding Lexical Units and/or Phrases
From the object-specific corpus of data (e.g., SM_db_FastEats), upon which the analysis is to be based, can be derived two main parts:
In accordance with Use Scenario 1, it is assumed bm_1 wishes to “drill down” into the reasons why the volume of social media posts, about FastEats, rises to a total of 650 during the hour of 10:01 AM-11:00. These 650 posts can comprise the above-described corpusTPI, and can be referred to as SM_db_FastEatsTPI. A corpus such as SM_db_FastEatsTPI can be produced from SM_db_FastEats, for example, by applying appropriate queries to the above-described PostTime index. The posts occurring in the preceding 24 hours comprise the above-described corpusREF, and can be referred to as SM_db_FastEatsREF. With regard to
The brand manager's interest in the hour of 10:01 AM-11:00 can also apply to Use Scenario 2, by simply assuming that this hour is the most recent hour for which complete data is available. As each additional hour of information is available, the analysis described below, for the hour of 10:01 AM-11:00, is repeated (but with the beginning of the applicable time periods shifting forward by one hour).
Using any suitable algorithm known in the art, the n-grams of corpusREF and corpusTPI can be determined. Typically, when determining the n-grams, n is in the following range: 1≤n≤4. An ordered list of the n-grams for corpusTPI and corpusREF can be referred to, respectively, as ngramsTPI and ngramsREF. For each of these ordered lists, along with each of its n-grams “ng,” is included the number of times the n-gram occurs in its corpus. For an n-gram ng, this number of times can be referred to as ng_no. For an n-gram ng of ngramsTPI, its ng_no can be referred to as ng_noTPI. For an n-gram ng of ngramsREF, its ng_no can be referred to as ng_noREF.
For purposes of the example, the ordered lists ngramsTPI and ngramsREF can be referred to as, respectively, FastEats_ngramsTPI and FastEats_ngramsREF.
From these two tables, we know the following about, for example, the term “salmonella”:
Next, an ordered list of n-grams for further analysis is defined. This ordered list of n-grams can be referred to herein as ngramsFA. The n-grams for ngramsFA are chosen from among the n-grams of ngramsTPI and ngramsREF. Example possibilities, for n-grams to include in ngramsFA, comprise (but are not limited to) the following:
For purposes of the example, ngramsFA can be referred to as FastEats_ngramsFA.
Because the number of n-grams in ngramsFA can be extremely large, it can be important to reduce its size. A heuristic, for decreasing the size of ngramsFA, is to keep only those n-grams having above a certain threshold number of occurrences. Another approach is to list the n-grams of ngramsFA in order of a decreasing number of occurrences. The number of occurrences used for this ordering can be based upon ng_noTPI, ng_noREF, or (ng_noTPI+ng_noREF). Only a predetermined number of n-grams, when starting from the beginning of ngramsFA and proceeding in order of a decreasing number of occurrences, can be kept for further processing. For purposes of the example FastEats_ngramsFA, it may be appropriate to keep only the first 500 n-grams for further processing. For purposes of further processing of the example, it will be assumed that the three n-grams of
From the above-described data for an ngramsFA, the following additional data can be determined for, and also stored with, each n-gram “ng”:
To enhance the ability to meaningfully compare different ng_diff values, as calculated for different n-grams, the differences can be normalized using any appropriate technique. Normalization can be accomplished by dividing each ng_diff value by either of the two values from which it is calculated:
Regardless of how it is calculated, each normalized ng_diff can be referred to as ng_dn, and each can be stored with its n-gram in ngramsFA. It can also be useful to express ng_dn as a percentage, obtained by multiplying ng_dn by 100. For the example,
ng_diff/ng_avgREF
Another possibility for achieving normalization, if the corpusREF for an n-gram ng is comprised of i corpora, is to determine the standard deviation across such corpora. This can be accomplished as follows. For each of corpusREF(1), corpusREF(2), . . . corpusREF(i), the number of occurrences of n-gram ng can be determined. Respectively, each of these number-of-occurrences values can be represented as follows: ng_noREF(1), ng_noREF(2), . . . ng_noREF(i). The standard deviation of these i values can be calculated using any standard procedure, with the result represented by ng_sd. Normalization can then be calculated as follows:
ng_diff/ng_sd
While, with regard to absolute difference values, the term “lunch” seems much more significant than the term “salmonella,” when normalization is applied, the term “salmonella” is seen to be undergoing a much larger change, with respect to posting activity of the recent past. Specifically, while “lunch” has a normalized difference value of 106%, “salmonella” has a value of 1180%.
When normalization is produced by dividing by ng_avgREF, a normalized difference value of about 100% indicates that the change in posting activity, for the TPI, is about the same size as total average posting activity has been in the past. Viewed in this way, a brand manager may still regard “lunch” as having shown an important change in activity during the TPI, even though the activity change is not as large as that for “salmonella.” In contrast, the term “dinner,” with an activity change of only 8%, may be regarded as having no significant activity change.
At this point, it can be useful to reduce the size of ngramsFA, either for purposes of displaying results to the user or for further processing. As was described above, as options for reducing the size of the initial ngramsFA, heuristics can again be applied. Either or both of the same heuristics can be used, the difference being that an n-gram is kept or removed on the basis of its normalized difference value, rather than its number of occurrences:
With this second heuristic reduction of ngramsFA, a first major stage of temporal analysis, for identifying lexical units and/or phrases of interest, has been accomplished. In many cases, a “Stage 1” level of analysis may be all that is needed.
2.3 Visualization
Once a Stage 1 level of analysis has been completed, its results can be visualized, to the user, using any appropriate techniques.
An example approach is to use the “Word Cloud” visualization technique. Two example approaches to using a Word Cloud are described herein, each approach illustrated with the data of
The first approach is shown in
A second approach, to using a Word Cloud, is shown in
In
Of course, the use of a Word Cloud is only an example, and any suitable graphical technique can be used, so long as it can be made to vary in either (or both) of the following two ways:
2.4 Periodic Analysis
Periodic Analysis is an additional stage of processing, beyond above-described Stage 1 (see Section 2.2 “Finding Lexical Units and/or Phrases”). The purpose of Periodic Analysis is to filter out n-grams identified by Stage 1 as being of potential interest.
For example, Stage 1 processing on the example brand of FastEats, identified, within the TPI of 10:01 AM-11:00 AM, the n-gram “lunch” as being of potential interest to the brand manager. However, since we know that FastEats focuses mainly on the lunchtime market, it should not be surprising that the term “lunch” is discussed with much greater volume near lunchtime.
Periodic Analysis asks a deeper question: is there a periodic component to the variability of the TPI, and can we compare the TPI to similar portions of other cycles? Once the component of the TPI's change that is only periodic is filtered out, the user can obtain a much more accurate understanding of the extent to which the particular TPI is really unusual.
With regard to the TPI of 10:01 AM-11:00 AM, Periodic Analysis asks: how does this hour-of-interest compare with the same hour on other days? Let us further suppose that the TPI, of the FastEats example, occurs on a Wednesday. There are at least two possibilities, in terms of comparing this same hour to other hours:
The particular period or periods to apply, when performing a sampling of other times to assemble a reference corpus (i.e., the time-series corpora of corpusREF(1), corpusREF(2), . . . corpusREF(i)) to compare with corpusTPI, will depend on the particular application area. For a brand that specializes in providing fast weekday lunches, 24 hour and 7 day periods can be applicable.
This is illustrated in
Let us assume that, upon sampling the same hour of 10:01 AM-11:00 AM, on the five Wednesdays that preceded the TPI, the following data is collected for the n-gram “lunch” with respect to the brand FastEats:
An average for this data, that can be referred to as ng_avg_sREF (where an “s” is added, to the ng_avgREF discussed above for Stage 1, to indicate that this average is produced from sampled data), is determined:
ng_avg_sREF=102
From ng_noTPI can be subtracted ng_avg_sREF, to produce a non-normalized difference value:
ng_diff_s=103−102=1
Dividing this ng_diff_s by ng_avg_sREF (or ng_noTPI), and multiplying by 100, to determine a normalized difference value as a % (called ng_dn_s %), shows a dramatic change from the ng_dn % determined in Stage 1:
Alternatively, the standard deviation (called ng_sd_s) can be determined from the sampled data:
ng_sd_s=√{square root over (156/5)}=5.6
Dividing ng_diff_s by ng_sd_s produces the following value for ng_dn_s:
1/5.6=0.18
Even if the ng_dn_s (as produced from the standard deviation) were approximately equal to 1 (ng_dn_s≈1), this would still indicate that the number of occurrences for “lunch,” within the TPI, is within a standard deviation of this value and is probably of no interest to the brand manager. However, the fact that the ng_dn_s is relatively close to zero emphasizes that, in fact, the number of occurrences of “lunch,” for the TPI, is almost certainly of no interest.
Similar calculations, for n-grams “salmonella” and “dinner,” are shown in Section 3.2 below (“Additional Calculations”). The results of Periodic Analysis, for all 3 n-grams, are summarized in
As can be seen, the biggest change, from the Stage 1 analysis of
Regardless of how ng_dn_s is determined, that same heuristics described above, for reducing the size of ngramsFA at the end of Stage 1, can be utilized, if necessary, to reduce the number of n-grams presented to the user:
For presentation to the user, any visualization discussed above, with regard to the results of Stage 1, can be used to visualize the result of applying Periodic Analysis.
The Word Clouds of
3.1 Referring To Related Applications
In addition to being incorporated by reference in their entirety, the description presented herein specifically relies on many sections of the Related Applications. A specific Related Application can be referred to as “the '123 Application,” where '123 is the last three digits of the Application Number of a Related Application. A specific section of a Related Application can also be referenced herein by using any of the following conventions:
3.2 Additional Calculations
Let us assume that, upon sampling the same hour of 10:01 AM-11:00 AM, on the five Wednesdays that preceded the TPI, the following data is collected for the n-gram “salmonella” with respect to the brand FastEats:
ng_avg_sREF for this data is:
ng_avg_sREF=3/5=0.6
From ng_noTPI can be subtracted ng_avg_sREF, to produce a non-normalized difference value:
ng_diff_s=8−0.6=7.4
Dividing this ng_diff_s by ng_avg_sREF or ng_noTPI, and multiplying by 100, to determine a normalized difference value as a % (called ng_dn_s %), shows little change from the ng_dn % determined in Stage 1:
ng_dn%=1180%
ng_dn_s %=1233%
Alternatively, the standard deviation (called ng_sd_s) can be determined, of the sampled data:
ng_sd_s=√{square root over (1.2/5)}=0.49
Dividing ng_diff_s by ng_sd_s produces the following value for ng_dn_s:
7.4/0.49=15.1
Let us assume that, upon sampling the same hour of 10:01 AM-11:00 AM, on the five Wednesdays that preceded the TPI, the following data is collected for the n-gram “dinner” with respect to the brand FastEats:
ng_avg_sREF for this data is:
ng_avg_sREF=125/5=25.0
From ng_noTPI can be subtracted ng_avg_sREF, to produce a non-normalized difference value:
ng_diff_s=27−25=2.0
Dividing this ng_diff_s by ng_avg_sREF or ng_noTPI, and multiplying by 100, to determine a normalized difference value as a % (called ng_dn_s %), shows no change from the ng_dn % determined in Stage 1:
ng_dn %=8%
ng_dn_s %=8%
Alternatively, the standard deviation (called ng_sd_s) can be determined, of the sampled data:
ng_sd_s=√{square root over (58/5)}=3.4
Dividing ng_diff_s by ng_sd_s produces the following value for ng_dn_s:
2.0/3.4=0.59
3.3 Frame-Based Search Engines (or FBSE's)
Section 4, '837 (“FBSE”) describes a Frame-Based Search Engine (or FBSE). This FBSE is a more generic form of the kind of search described herein in Section 1.2, '417 (“Consumer Sentiment Search”).
Section 4.2, '837 discusses frames as a form of concept representation (Section 4.2.1) and the use of frame extraction rules to produce instances of frames (Section 4.2.2). A pseudo-code format for frame extraction rules is presented in Section 6.2, '837 (“Frame Extraction Rules”).
Snippets are discussed in Section 6.4, '837.
Parts of the '837 Application are repeated, in this section, for convenience to the reader.
In general, a frame is a structure for representing a concept, wherein such concept is also referred to herein as a “frame concept.” A frame specifies a concept in terms of a set of “roles.” Any type of concept can be represented by a frame, as long as the concept can be meaningfully decomposed (or modeled), for the particular application, by a set of roles.
When a frame concept is detected, for a particular UNL (see below Glossary of Selected Terms for definition) in a corpus of natural language, a frame “instance” is created. The instance has, for each applicable frame role, a “role value” assigned. A role value represents a particular, of how the frame concept is being used, by the UNL where the frame concept is detected.
Detection, of whether a frame concept is applicable, can be determined by a set of linguistic rules, each rule herein called a “frame extraction rule.” A set of frame extraction rules, that all relate to a particular frame, can be called the frame's “Rule Set.” Ideally, a frame's Rule Set is able to detect whenever the frame's frame concept is being used, and thereby produce a frame instance representing each particular use of the frame concept. “Frame extraction,” as used herein, refers to the utilization of a frame extraction rule to determine whether a frame is invoked by a UNL.
If a large corpus of interest (or “C_of_I”) is to be searched, such as a significant portion of the online information available on the Internet, in order to have a practical Frame-Based Search Engine (or FBSE) it is typically necessary to perform a large amount of pre-query scanning and indexing of the C_of_I. An overview of this process is discussed at Section 4.1, '837 and Section 4.3.2.1, '837. The basic steps, of an FBSE, are:
The above-described steps can be accomplished using, for example, the computing environment described in Section 3.4. Regarding ordering of the steps, Instance Generation is performed before the steps of Instance Merging or Instance Selection. Instance Merging and Instance Selection, however, can be performed in either order (or even concurrently), depending upon the particular application.
3.4 Computing Environment
Cloud 630 represents data, such as online opinion data, available via the Internet. Computer 610 can execute a web crawling program, such as Heritrix, that finds appropriate web pages and collects them in an input database 600. An alternative, or additional, route for collecting input database 600 is to use user-supplied data 631. For example, such user-supplied data 631 can include the following: any non-volatile media (e.g., a hard drive, CD-ROM or DVD), record-oriented databases (relational or otherwise), an Intranet or a document repository. A computer 611 can be used to process (e.g., reformat) such user-supplied data 631 for input database 600.
Computer 612 can perform the indexing needed for formation of an appropriate frame-based database (FBDB). FBDB's are discussed above (Section 3.3 “Frame-Based Search Engines (or FBSE's)”) and in the Related Applications. The indexing phase scans the input database for sentences that refer to an organizing frame (such as the “Sentiment” frame), produces a snippet around each such sentence and adds the snippet to the appropriate frame-based database.
Databases 620 and 621 represent, respectively, stable “snapshots” of databases 600 and 601. Databases 620 and 621 can provide stable databases that are available for searching, about an object of interest in a first corpus, in response to queries entered by a user at computer 633. Such user queries can travel over the Internet (indicated by cloud 632) to a web interfacing computer 614 that can also run a firewall program. Computer 613 can receive the user query, collect snippet and frame instance data from the contents of the appropriate FBDB (e.g., FBDB 621), and transmit the results back to computer 633 for display to the user. The results from computer 613 can also be stored in a database 602 that is private to the individual user. When it is desired to see the snippets, on which a graphical representation is based, FBDB 621 is available. If it is further desired to see the full documents, on which snippets are based, input database 620 is also available to the user.
In accordance with what is ordinarily known by those in the art, computers 610, 611, 612, 613, 614 and 633 contain computational hardware (e.g., integrated circuits), and programmable memories (volatile and/or non-volatile), of various types.
The kind of information described herein (such as data and/or instructions), that is on computer-readable media and/or programmable memories, can be stored on computer-readable code devices embodied therein. A computer-readable code device can represent that portion of a memory in which a defined unit of information (such as a bit) can be stored and/or from which a defined unit of information can be retrieved.
While the invention has been described in conjunction with specific embodiments, such as the tracking of a brand by a brand manager, it is evident that many alternatives, modifications and variations will be apparent in light of the foregoing description. For example, the invention can be used to track any kind of statement (not just opinions) regarding any kind of object (not just brands) as made by any group of relevant persons (not just consumers). Accordingly, the invention is intended to embrace all such alternatives, modifications and variations, as well as those equivalents that fall within the spirit and scope of this description and its appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5694523 | Wical | Dec 1997 | A |
5940821 | Wical | Aug 1999 | A |
5963940 | Liddy et al. | Oct 1999 | A |
5995922 | Penteroudakis et al. | Nov 1999 | A |
6202064 | Julliard | Mar 2001 | B1 |
6675159 | Lin et al. | Jan 2004 | B1 |
6965857 | Decary | Nov 2005 | B1 |
7302383 | Valles | Nov 2007 | B2 |
7305336 | Polanyi et al. | Dec 2007 | B2 |
7356540 | Smith et al. | Apr 2008 | B2 |
7496593 | Gardner et al. | Feb 2009 | B2 |
7779007 | West et al. | Aug 2010 | B2 |
7805302 | Chelba et al. | Sep 2010 | B2 |
8046348 | Rehling et al. | Oct 2011 | B1 |
8055608 | Rehling et al. | Nov 2011 | B1 |
8301436 | Wang | Oct 2012 | B2 |
8935152 | Li et al. | Jan 2015 | B1 |
9047285 | Li et al. | Jun 2015 | B1 |
20020091671 | Prokoph | Jul 2002 | A1 |
20030083863 | Ringger | May 2003 | A1 |
20030172061 | Krupin et al. | Sep 2003 | A1 |
20030216905 | Chelba et al. | Nov 2003 | A1 |
20040044952 | Jiang et al. | Mar 2004 | A1 |
20040078190 | Fass et al. | Apr 2004 | A1 |
20040150644 | Kincaid | Aug 2004 | A1 |
20050149494 | Lindh et al. | Jul 2005 | A1 |
20050165600 | Kasravi et al. | Jul 2005 | A1 |
20070156677 | Szabo | Jul 2007 | A1 |
20090112892 | Cardie et al. | Apr 2009 | A1 |
20090306967 | Nicolov et al. | Dec 2009 | A1 |
20110161071 | Duong-van | Jun 2011 | A1 |
20110224971 | Moore | Sep 2011 | A1 |
20130018651 | Djordjevic | Jan 2013 | A1 |
Entry |
---|
Gautam et al., published Feb. 17, 2008 (y/m/d), pp. 2040-2042. “Document Retrieval Based on Key Information of Sentence,” IEEE ICACT. |
Ku et al., published Mar. 27, 2006 (y-m-d), 8 pgs. “Opinion Extraction, Summarization and Tracking in News and Blog Corpora,” AAAI Spring Symposium Series 2006. |
Manning et al., published 1999. “Foundations of Statistical Natural Language Processing,” The MIT Press, Table of Contents (12 pgs), sec. 8.5.1 (pp. 296-303) & sec. 15.2 (pp. 539-544). ISBN 0-262-13360-1. |
Ruppenhofer et al., published Aug. 25, 2006 (y/m/d), 166 pages. “FrameNet II: Extended Theory and Practice,” International Computer Science Institute, University of California at Berkeley, USA. |
Wu, Tianhaow et al., published May 3, 2003 (y/m/d), 12 pgs. “A Supervised Learning Algorithm for Information Extraction From Textual Data,” Proceedings of the Workshop on Text Mining, Third SIAM International Conference on Data Mining. |
Zadrozny, Slawomir et al., published 2003, 5 pgs. “Linguistically quantified thresholding strategies for text categonzation,” Systems Research Institute, Polish Academy of Sciences, Warszawa, Poland. |
Zhang et al., published Jun. 22, 2010 (y/m/d), 10 pgs. “Voice of the Customers: Mining Online Customer Reviews for Product Feature-based Ranking,” Proceedings of the 3rd Wonference on Online social networks (WOSN '10). USENIX Association, Berkeley, CA, USA. |
Lucene Support p. 2454, with comments dated May 10, 2010-Jul. 16, 2010; https://issues.apache.org/jira/browse/LUCENE-2454; retrieved Jul. 24, 2019 (y/m/d); 9 pages. |
Lucene Slide Share Presentation, dated May 7, 2010; https://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene; retrieved Jul, 24, 2019 (y/m/d); 15 pages. |
readme.txt in LuceneNestedDocumentSupport.zip, creation date May 10, 2010; retrieved Jul. 25, 2019 (y/m/d); 2 pages. |
NestedDocumentQuery.java in LuceneNestedDocumentSupport.zip, creation date Aug. 25, 2010; retrieved Jul. 25, 2019 (y/m/d); 8 pages. |
PerParentLimitedQuery.java in LuceneNestedDocumentSupport.zip, creation date Sep. 8, 2010; retrieved Jul. 25, 2019 (y/m/d); 10 pages. |
Number | Date | Country | |
---|---|---|---|
Parent | 13836416 | Mar 2013 | US |
Child | 14855165 | US |