This invention pertains to semantics, and more particularly to attitudes represented in text.
While human beings have an intuitive ability to understand language, getting machines to understand language remains a complicated problem. The field of Artificial Intelligence has been working on understanding language for decades. But to date systems either have a very limited ability to understand language in general, or a significant ability to understand language in a very specialized subset of language. For example, voice response systems tend to have fairly limited vocabularies: outside the words the systems are designed to understand, they are lost.
Textual analysis has not fared better than understanding spoken language. With textual analysis, the entirety of the text is present, and (usually) in a form that leaves the system with no uncertainty about the specific letters being used (ignoring the problems of character recognition from scanned text). But systems still have difficulty understanding what the text represents. For example, the sentence “The old man the boats” can confuse systems that think the word “old” is an adjective and the word “man” is a noun, where in fact the word “old” is a noun and the word “man” is a verb.
Aside from the problem of discerning the intended meaning of various words—words that can be used in different parts of a sentence, or words that are homophones, for example—systems have difficulty discerning the underlying context of text. People when they write, no less than when they speak, convey an attitude about a subject. But systems have difficulty understanding what the writer's attitude is.
A need remains for a way to address these and other problems associated with the prior art.
Machine 105 can include category set 130, word set 135, word identifier 140, signature generator 145, and word adder 150. Category set 130 includes a set of categories along which words can be aligned. For example, as shown in
Along each dimension (category), a word can take any value along the spectrum: this value represents the word's membership in the category. For example, along “content” dimension 205, a word can be either temporal or spatial in nature, and to any extent. Similarly, along “quality” dimension 210, a word can be dynamic or static in nature, and to any extent, and along “form” dimension 215, a word can be receiving or giving in nature, and to any extent. In general, each category is independent of the other categories, and therefore can be represented by a dimension that is orthogonal to the dimensions representing the other categories.
Either end of the spectrum can be associated with positive or negative numerical values, provided the association is consistent for all words. While the numerical values a word can take along any dimension can be unbounded, keeping the numerical values within the range of [−1, 1] has an advantage of keeping the numerical values normalized. But a person of ordinary skill in the art will recognize that normalization can occur after the membership numerical values are assigned.
While
While it is possible to generate signatures for text based on the direct category membership numerical values for words, another way to generate signatures is to identify communication types. A communication type is a combination of categories. For example, using just the “content” and “quality” dimensions 205 and 210, there are four categories: one for each quadrant in the graph of the two dimensions. These communication types can be identified by the ends of the respective dimensions: “temporal static”, “temporal dynamic”, “spatial static”, and “spatial dynamic”. Words can then be assigned membership in these communication types, as shown in
In
Examination of the word lists in
Each communication type can be assigned a symbol, such as symbols 310, 315, 320, and 325. These symbols can be used as a shorthand for the specific communication type. For example, a reference to symbol “A” (symbol 310) can be a shorthand reference for the “spatial static” communication type. This notation provides benefits in how the signature is generated, as discussed below with reference to
While
While the term “word” implies a single lexicographic token, a person of ordinary skill in the art will recognize a “word” can be anything, including what might otherwise be considered multiple separate words. For example, the term “baseball bat” consists of two lexicographic words. But while each token of the term “baseball bat” might be a separate word, the images conveyed by each word separately differ from the image conveyed by the complete term. Thus, the meaning of the term “word” is not limited to concepts that can be represented with one word in the language.
Returning to
Word identifier 140 is responsible for identifying a word in some text: be it a block of text for which a signature is to be generated, or a document that is being searched. As discussed above, a “word” might include multiple tokens that could be considered separate words: word identifier 140 is responsible for identifying “words”, regardless of how many tokens might be used to represent the concept.
Signature generator 145 is responsible for generating a signature for a block of text. Signature generator 145 is discussed further below with reference to
Word adder 150 is responsible for adding new words to word set 135. Word adder 150 is discussed further below with reference to
As an example of how signature generator 145 can calculate a signature for a block of text, consider three words w1, w2, and w3. Word w1 is assigned a membership value of 1 in communication type A (referring back to
The above examples express the signatures as additive over the combinations of communication types. A person of ordinary skill in the art will recognize that the mathematical combination of the combinations of communication types can use any mathematical operation, and that the use of addition is merely exemplary.
As can be seen from the example signatures, a communication type can be combined with itself. Thus, “AA” represents a valid communication type, as do “BB”, “CC”, and “DD”.
A cursory examination of the example signatures above, along with the algorithm used to calculate the signatures, shows that combinations of communication types are commutative. That is, there is no difference between “AB” and “BA” as communication types. But a person of ordinary skill in the art will recognize that if combinations of communication types are not commutative, then both combinations can be part of a signature.
Word adder 150 can include membership assigner 155. Membership assigner 155 can be used assign new word 505 to one or more categories in category set 130, shown as new word membership 515. In embodiments of the invention using communication types as opposed to straight category memberships, membership assigner 155 can be used to assign new word 505 to communication types.
Attitude search engine 605 obviously depends on knowing the attitude of the documents in the corpus (set) being searched. If the signatures of any of the documents in the corpus are not known in advance, they can be calculated as discussed above, just like a signature can be calculated for any other block of text. The signatures can be stored anywhere: with the document (as metadata), separately from the document (for example, in a signature storage database), or cached locally on machine 105, among other possibilities.
Attitude search engine 605 and document search engine 610 can be combined in any sequence. For example, document search engine 610 can be used to reduce the corpus of documents to a subset; the attitudes of this subset can then be searched using attitude search engine 605 to identify documents that have a similar attitude to a given signature. Alternatively, attitude search engine 605 can be used to identify documents in the corpus that have a similar attitude to a given signature, after which those documents can be searched using document search engine 610.
A person of ordinary skill in the art will recognize that a “document” can be anything that includes a set of words. For example, “documents” can include objects created by word processing programs, or a website on the Internet. But “documents” can also include objects that might not otherwise be considered documents: for example an image file that includes text that can be scanned, or a file that includes text as hidden metadata. In short, a “document” can include anything that has text that can be used to generate a signature.
Once a signature has been generated, it can be used to search a corpus of documents. As shown in
Alternatively, attitudes can be determined for the documents in the corpus, as shown in block 735. The attitudes can be used to filter the documents in the corpus as shown in block 740, after which the results can be searched for text, as shown in block 745.
A person of ordinary skill in the art will recognize that the blocks shown in
The above discussion does not reflect what is considered to be “close” to a given signature. Any desired measure of “closeness” can be used. For example, relative to a given signature, any other signature that is no more distant than some pre-determined limit (either fixed or percentage) can be considered “close”.
While the above discussion suggests that signatures are generated using a source text, a person of ordinary skill in the art will recognize that signatures can be generated in other manners. For example, a user can manually enter a signature to use. Even more generally, a user can input a range of signatures. For example, a user can enter several signatures that represent the corners of an m-dimensional polygon: any signature within that polygon is considered to be within the range of the signatures. Or a user can enter ranges for the individual combinations of communication types, with any signature that satisfies the various coordinate ranges considered to be within the range of the signatures. Other possible sources for signatures, and limits for ranges, can be used.
The following discussion is intended to provide a brief, general description of a suitable machine in which certain aspects of the invention may be implemented. Typically, the machine includes a system bus to which is attached processors, memory, e.g., random access memory (RAM), read-only memory (ROM), or other state preserving medium, storage devices, a video interface, and input/output interface ports. The machine may be controlled, at least in part, by input from conventional input devices, such as keyboards, mice, etc., as well as by directives received from another machine, interaction with a virtual reality (VR) environment, biometric feedback, or other input signal. As used herein, the term “machine” is intended to broadly encompass a single machine, or a system of communicatively coupled machines or devices operating together. Exemplary machines include computing devices such as personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, etc., as well as transportation devices, such as private or public transportation, e.g., automobiles, trains, cabs, etc.
The machine may include embedded controllers, such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits, embedded computers, smart cards, and the like. The machine may utilize one or more connections to one or more remote machines, such as through a network interface, modern, or other communicative coupling. Machines may be interconnected by way of a physical and/or logical network, such as an intranet, the Internet, local area networks, wide area networks, etc. One skilled in the art will appreciated that network communication may utilize various wired and/or wireless short range or long range carriers and protocols, including radio frequency (RF), satellite, microwave, Institute of Electrical and Electronics Engineers (IEEE) 810.11, Bluetooth, optical, infrared, cable, laser, etc.
The invention may be described by reference to or in conjunction with associated data including functions, procedures, data structures, application programs, etc. which when accessed by a machine results in the machine performing tasks or defining abstract data types or low-level hardware contexts. Associated data may be stored in, for example, the volatile and/or non-volatile memory, e.g., RAM, ROM, etc., or in other storage devices and their associated storage media, including hard-drives, floppy-disks, optical storage, tapes, flash memory, memory sticks, digital video disks, biological storage, etc. Associated data may be delivered over transmission environments, including the physical and/or logical network, in the form of packets, serial data, parallel data, propagated signals, etc., and may be used in a compressed or encrypted format. Associated data may be used in a distributed environment, and stored locally and/or remotely for machine access.
Having described and illustrated the principles of the invention with reference to illustrated embodiments, it will be recognized that the illustrated embodiments may be modified in arrangement and detail without departing from such principles. And, though the foregoing discussion has focused on particular embodiments, other configurations are contemplated. In particular, even though expressions such as “in one embodiment” or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the invention to particular embodiment configurations. As used herein, these terms may reference the same or different embodiments that are combinable into other embodiments.
Consequently, in view of the wide variety of permutations to the embodiments described herein, this detailed description and accompanying material is intended to be illustrative only, and should not be taken as limiting the scope of the invention. What is claimed as the invention, therefore, is all such modifications as may come within the scope and spirit of the following claims and equivalents thereto.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/487,620, filed May 18, 2012, which is herein incorporated by reference for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
5128865 | Sadler | Jul 1992 | A |
5249967 | O'Leary et al. | Oct 1993 | A |
5454722 | Holland et al. | Oct 1995 | A |
5533181 | Bergsneider | Jul 1996 | A |
5781879 | Arnold et al. | Jul 1998 | A |
5796948 | Cohen | Aug 1998 | A |
5797123 | Chou et al. | Aug 1998 | A |
5857855 | Katayama | Jan 1999 | A |
5877120 | Wical | Mar 1999 | A |
5887120 | Wical | Mar 1999 | A |
5961333 | Harrison et al. | Oct 1999 | A |
6126449 | Burns | Oct 2000 | A |
6138085 | Richardson et al. | Oct 2000 | A |
6161130 | Horvitz et al. | Dec 2000 | A |
6173261 | Arai et al. | Jan 2001 | B1 |
6385620 | Kurzius et al. | May 2002 | B1 |
6453315 | Weissman et al. | Sep 2002 | B1 |
6504990 | Abecassis | Jan 2003 | B1 |
6523026 | Gillis | Feb 2003 | B1 |
6556964 | Haug et al. | Apr 2003 | B2 |
6684202 | Humphrey et al. | Jan 2004 | B1 |
7403890 | Roushar | Jul 2008 | B2 |
7487094 | Konig et al. | Feb 2009 | B1 |
7539697 | Akella et al. | May 2009 | B1 |
7555441 | Crow et al. | Jun 2009 | B2 |
7565403 | Horvitz et al. | Jul 2009 | B2 |
7567895 | Chen et al. | Jul 2009 | B2 |
7607083 | Gong et al. | Oct 2009 | B2 |
7644144 | Horvitz et al. | Jan 2010 | B1 |
7711573 | Obeid | May 2010 | B1 |
7711672 | Au | May 2010 | B2 |
7720675 | Burstein et al. | May 2010 | B2 |
7792685 | Andino, Jr. et al. | Sep 2010 | B2 |
7801840 | Rapasi et al. | Sep 2010 | B2 |
7813917 | Shuster | Oct 2010 | B2 |
7870203 | Judge et al. | Jan 2011 | B2 |
7873595 | Singh et al. | Jan 2011 | B2 |
7917587 | Zeng et al. | Mar 2011 | B2 |
7945497 | Kenefick et al. | May 2011 | B2 |
7966265 | Schalk et al. | Jun 2011 | B2 |
8090725 | Cranfill | Jan 2012 | B1 |
8108389 | Bobick et al. | Jan 2012 | B2 |
20020059376 | Schwartz | May 2002 | A1 |
20020099730 | Brown et al. | Jul 2002 | A1 |
20020106622 | Osborne et al. | Aug 2002 | A1 |
20020143573 | Bryce et al. | Oct 2002 | A1 |
20030027121 | Grudnitski et al. | Feb 2003 | A1 |
20030028564 | Sanfilippo | Feb 2003 | A1 |
20030093322 | Sciuk | May 2003 | A1 |
20030144868 | MacIntyre et al. | Jul 2003 | A1 |
20030167266 | Saldanha et al. | Sep 2003 | A1 |
20030182310 | Charnock et al. | Sep 2003 | A1 |
20030212541 | Kinder | Nov 2003 | A1 |
20040030556 | Bennett | Feb 2004 | A1 |
20040053203 | Walters et al. | Mar 2004 | A1 |
20040117234 | Lindsay-Scott et al. | Jun 2004 | A1 |
20050055209 | Epstein | Mar 2005 | A1 |
20050060643 | Glass et al. | Mar 2005 | A1 |
20050108001 | Aarskog | May 2005 | A1 |
20050165600 | Kasravi et al. | Jul 2005 | A1 |
20050192949 | Kojima | Sep 2005 | A1 |
20050197890 | Lu et al. | Sep 2005 | A1 |
20050202871 | Lippincott | Sep 2005 | A1 |
20050204337 | Diesel et al. | Sep 2005 | A1 |
20050262428 | Little et al. | Nov 2005 | A1 |
20050272517 | Funk et al. | Dec 2005 | A1 |
20050282141 | Falash et al. | Dec 2005 | A1 |
20060047530 | So et al. | Mar 2006 | A1 |
20060206332 | Paek et al. | Sep 2006 | A1 |
20060230102 | Hidary | Oct 2006 | A1 |
20060235843 | Musgrove et al. | Oct 2006 | A1 |
20060246973 | Thomas et al. | Nov 2006 | A1 |
20060271872 | Shirai | Nov 2006 | A1 |
20070061179 | Henderson et al. | Mar 2007 | A1 |
20070112710 | Drane et al. | May 2007 | A1 |
20070135225 | Nieminen et al. | Jun 2007 | A1 |
20070196798 | Pryor et al. | Aug 2007 | A1 |
20070203720 | Singh et al. | Aug 2007 | A1 |
20070203991 | Fisher et al. | Aug 2007 | A1 |
20070213126 | Deutsch et al. | Sep 2007 | A1 |
20070259324 | Frank | Nov 2007 | A1 |
20070260421 | Berner et al. | Nov 2007 | A1 |
20070265089 | Robarts et al. | Nov 2007 | A1 |
20080027891 | Repasi et al. | Jan 2008 | A1 |
20080052283 | Jensen et al. | Feb 2008 | A1 |
20080097781 | Clarke et al. | Apr 2008 | A1 |
20080162540 | Parikh et al. | Jul 2008 | A1 |
20080191864 | Wolfson | Aug 2008 | A1 |
20080208910 | MacIntyre et al. | Aug 2008 | A1 |
20080275744 | MacIntyre et al. | Nov 2008 | A1 |
20080281620 | Schalk et al. | Nov 2008 | A1 |
20080288306 | MacIntyre et al. | Nov 2008 | A1 |
20080288889 | Hunt et al. | Nov 2008 | A1 |
20080300930 | Compitello et al. | Dec 2008 | A1 |
20080319829 | Hunt et al. | Dec 2008 | A1 |
20090006156 | Hunt et al. | Jan 2009 | A1 |
20090006164 | Kaiser et al. | Jan 2009 | A1 |
20090018996 | Hunt et al. | Jan 2009 | A1 |
20090024554 | Murdock et al. | Jan 2009 | A1 |
20090024747 | Moses et al. | Jan 2009 | A1 |
20090035736 | Wolpert et al. | Feb 2009 | A1 |
20090187446 | Dewar | Jul 2009 | A1 |
20090198488 | Vigen | Aug 2009 | A1 |
20090248399 | Au | Oct 2009 | A1 |
20090282104 | O'Sullivan et al. | Nov 2009 | A1 |
20090287672 | Chakrabarti et al. | Nov 2009 | A1 |
20090292541 | Daya et al. | Nov 2009 | A1 |
20090319508 | Yih et al. | Dec 2009 | A1 |
20090326926 | Landau et al. | Dec 2009 | A1 |
20090327208 | Bittner et al. | Dec 2009 | A1 |
20100023377 | Sheridan | Jan 2010 | A1 |
20100098289 | Tognoli | Apr 2010 | A1 |
20100100496 | Baldwin et al. | Apr 2010 | A1 |
20100131418 | McCagg et al. | May 2010 | A1 |
20100145678 | Csomai et al. | Jun 2010 | A1 |
20100153288 | Digiambattista et al. | Jun 2010 | A1 |
20100174813 | Hildreth et al. | Jul 2010 | A1 |
20100179845 | Davidson | Jul 2010 | A1 |
20100179916 | Johns et al. | Jul 2010 | A1 |
20100228733 | Harrison et al. | Sep 2010 | A1 |
20100274636 | Sheridan | Oct 2010 | A1 |
20100306251 | Snell | Dec 2010 | A1 |
20110040837 | Eden et al. | Feb 2011 | A1 |
20110055098 | Stewart | Mar 2011 | A1 |
20110078020 | LaJoie et al. | Mar 2011 | A1 |
20110106807 | Srihari et al. | May 2011 | A1 |
20110161071 | Duong-van | Jun 2011 | A1 |
20110184939 | Elliott | Jul 2011 | A1 |
20110208511 | Sikstron et al. | Aug 2011 | A1 |
20110258049 | Schwartz | Oct 2011 | A1 |
20110295759 | Selvakummar et al. | Dec 2011 | A1 |
20130185058 | Rehani et al. | Jul 2013 | A1 |
Number | Date | Country |
---|---|---|
2002149675 | May 2002 | JP |
2004102428 | Apr 2004 | JP |
2004157931 | Jun 2004 | JP |
2006061632 | Mar 2006 | JP |
2007249322 | Sep 2007 | JP |
2004055614 | Jul 2004 | WO |
2008148819 | Dec 2008 | WO |
2012000013 | May 2012 | WO |
Entry |
---|
Khoury et al., A Methodology for Extracting and Representing Actions in Texts, 2006 IEEE International Conference on Fuzzy Systems, Jul. 16-21, 2006. |
Casey Whitelaw, Navendu Garg, and Shlomo Argamon. 2005. Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM '05). ACM, New York, NY, USA, 625-631. |
Aiolli, Fabio; Sebastian!, Fabrizio; Sperduti, Alessandro, Preference Learning for Category-Ranking based Interactive Text Categorization, Proceedings of International Joint Conference on Neural Networks, ICJNN 2007, Orlando, Florida, Aug. 12-17, pp. 2034-2039. |
Mood Indicator Based on History of Electronic Communication Thread, IPCOM, Disclosure No. IPCOM000198194D, Jul. 29, 2010, 3 pages, retrieved from http://ip.com/IPCOM/000198194. |
Keh, Huan-Chao, The Chinese Text Categorization System with Category Priorities, Journal of Software, Oct. 2010, vol. 5, No. 10, pp. 1137-1143. |
R. Hawkins and M. Russell, Document Categorization Using Lexical Analysis and Fuzzy Sets, IBM Technical Disclosure Bulletin, Jun. 1992, vol. 35, No. 1A, 1 Page. |
Employee Engagement What's Your Engagement Ratio? Gallup Consulting 2008. |
Schaufell, Wilmer B. et al. “The Measure of Work Engagement with a Short Questionnaire.” A Cross-National Study. Educational and Psychological Measurement. vol. 66, No. 4. Aug. 2006. |
Performance Optimization Framework Value Proposition. Introduction and Overview. Knowledge Advisors. Copyright 2009. |
Kular, S. et al., Employee Engagement: A Literature Review. Kingston University, Kingston Business School. Working Paper Series No. 19. Oct. 2008. |
Richards, David. Hellmann HR Team Instills Values and Behaviors. Strategic HR Review, 2008, 7, 4. |
Parks, Louise et al. “A Test of the Importance of Work-Life Balance for Employee Engagement and Intention to Stay in Organisations.” Journal of Management and Organization. vol. 14, Issue 3, Jul. 2008. |
McBain, R. “The Practice of Engagement.” Strategic HR Review. Sep./Oct. 2007; 6; 6. |
Hyuna, Choi. “Managing Talent Thourgh Employee Engagement.” SERI Quarterly. Jul. 2008. |
Moon, K. et al., “Emotional States Recognition of Text Data Using Hidden Markov Models.” Proceedings of 2001 Autumn KISS. 2001. vol. 28, No. 2, pp. 127-129. |
International Search Report from PCT/US2013/022072, published as WO2013109836 on Jul. 25, 2013, 3 pages. |
International Search Report from PCT/US2012/036330, published as WO2012158357 on Nov. 22, 2012, 3 pages. |
International Search Report from PCT/US2011/058444, published as WO2012061254 on May 10, 2012, 3 pages. |
International Search Report from PCT/US2011/058435, published as WO2012061252 on May 10, 2012, 8 pages. |
Office Action dated May 5, 2009, U.S. Appl. No. 11/419,324, filed May 19, 2006 entitled “System and Method for Authoring and Learning”. |
Van Rijk, R et al., Using CrisisKit and MOPED to Improve Emergency Management Team Training, Proceedings ISCRAM 2004, Brussels, May 3-4, 2004. pp. 161-166. |
Thomas, P.G. et al., AESOP—An Electronic Student Observatory Project, Frontiers in Education, 1998, 5 pages. |
Loftin, R.B. et al., Training the Hubble Space Telescope Flight Team, IEEE Computer Graphics and Applications, 1995, pp. 31-37. |
Office Action dated Jun. 16, 2008, U.S. Appl. No. 11/419,317, filed May 19, 2006 entitled “Method for Interactive Training and Learning.” |
Office Action dated Oct. 31, 2008, U.S. Appl. No. 11/419,317, filed May 19, 2006 entitled “Method for Interactive Training and Learning.” |
Office Action dated Jun. 16, 2009 U.S. Appl. No. 11/419,317, filed May 19, 2006 entitled “Method for Interactive Training and Learning.” |
Office Action dated Nov. 17, 2009 U.S. Appl. No. 11/419,317, filed May 19, 2006 entitled “Method for Interactive Training and Learning.” |
Aiolli, Fabio; Sebastiani, Fabrizio; Sperduti, Alessandro, Preference Learning for Category-Ranking Based Interactive Text Cagegorization, Proceedings of International Joint Conference on Neural Networks, ICJNN 2007, Orlando, FL, Aug. 12-17, 2007, pp. 2034-2039. |
Mood Indicator Based on History of Electronic Communication Thread, IPCOM, Disclosure No. IPCOM00198194D, Jul. 29, 2010, 3 pages, retrieved from http://ip.com/IPCOM/000198194. |
R. Hawkins and M. Russell, Document Categorization Using Lexical Analysis and Fuzzy Sets, IBM Technical Disclosure Bulletin, Jun. 1992, vol. 35, No. 1A, 1 pg. |
Lingway Vertical Search Solutions, Lingway HR Suite, “Lingway e-Recruitment Applications: a Semantic Solution for Recruitment”, retrieved from http://www.lingway.com/images/pdf/fichelhrslea07anglaisweb.pdf on Jun. 17, 2012 (2 pages). |
Tseng, “Semantic Classification of Chinese unknown words”, ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—vol. 2 Association for Computational Linguistics Stroudsburg, PA, USA ©2003. |
Mohammad, “Measuring Semantic Distance Using Distributional Profiles of Concepts”, a thesis submitted in conformity with the requirements for the degree of Graduate Department of Computer Science University of Toronto, 2008, pp. 1-167. |
Mohammad, et al., “Measuring Semantic Distance Using Distributional Profiles of Concepts”, Association for Computational Linguistics; retrieved at http://www.umiacs.umd.edu/˜saif/WebDocs/Measuring-Semantic-Distance.pdf, 2006, pp. 1-34. |
Number | Date | Country | |
---|---|---|---|
20120296636 A1 | Nov 2012 | US |
Number | Date | Country | |
---|---|---|---|
61487620 | May 2011 | US |