Various automatic natural language processing and natural language understanding applications process natural language text based upon recognizing noun phrases and verb phrases.
Some noun phrases consist of only a single word, while others are each made up of multiple words. Similarly, some verb phrases are made up of a single word, while others contain multiple words. Multiple-word verb phrases and multiple-word noun phrases are referred to collectively herein as “multi-word expressions” or “MWEs.”
Table 1 below contains a number of sample sentences in which noun MWEs are identified using bold formatting, and verb MWEs are identified by italic formatting. These sample sentences are typical of those taken from authority documents processed for compliance purposes.
a system and information
integrity
policy
.
personal data.
compliance
monitoring
statistics to the Board of
Directors
and other critical stakeholders, as necessary.
out of scope systems from in scope systems.
The inventors have recognized that, while conventional dictionaries are reasonably complete in identifying single-word nouns and verbs that can be matched to sentences being analyzed, they include only a very small percentage of MWEs. Additionally, they have recognized that the particular MWEs used varies dramatically across different domains of writing.
In response to recognizing the above disadvantages of using conventional dictionaries as a basis for recognizing noun phrases and verb phrases that include MWEs, the inventors have conceived and reduced to practice a software and/or hardware facility that automatically identifies MWEs in input sentences (“the facility”).
In particular, the facility uses two or more constituent models that each employ a different approach to generate analysis results identifying MWEs that occur in an input sentence. The facility then uses constituent model result evaluation module to derive overall MWE identification results from the results produced by the constituent models. In some embodiments, the constituent model result evaluation module is a logical ruleset applied by the facility. In some embodiments, the constituent model result evaluation module is a machine learning model—such as a reinforcement learning model—constructed, trained, and applied by the facility.
In some embodiments, one of the facility's constituent models is a transformer, such as a Bidirectional Encoder Representation from Transformers (“BERT”) transformer. In various embodiments, the facility uses the results of empirical testing to (1) select one among multiple transformer types, transformer training sets, and/or transformer configurations for exclusive use by the constituent model, or to (2) weight results produced by two or more different transformer types, transformer training sets, and/or transformer configurations to obtain a weighted aggregation for output by the transformer constituent model.
In some embodiments, one or more of the facility's constituent models implement linguistic analysis algorithms, such as noun chunking algorithms and/or verb chunking algorithms. In various embodiments, the facility uses the results of empirical testing to (1) select one among multiple linguistic analysis algorithms for exclusive use by this constituent model, or to (2) weight results produced by two or more different linguistic analysis algorithms to obtain a weighted aggregation for output by the constituent model.
In some embodiments, one of the facility's constituent models accesses a dynamic dictionary to identify in the input text MWEs that have entries in the dictionary. In some embodiments, the dictionary's MWE entries have been added to it in response to input from a linguist and/or a subject matter expert certifying that they are proper MWEs. In some embodiments, this dictionary matches the longest MWEs contained by both the input text and the dictionary.
In some embodiments, the facility uses a constituent model result evaluation module to arbitrate among different results outputted by the different constituent models recognizing MWEs occurring in particular input text, as described in greater detail below. In some embodiments, the constituent model result evaluation module also specifies whether particular MWEs identified by certain constituent models should be automatically added to the dictionary, or should be queued for review by a linguist and/or subject matter expert for discretionary addition to the dictionary. This latter determination is often referred to herein as “nomination” of the MWEs recommended for addition to the dictionary.
In some embodiments, the facility is used to identify MWEs as part of natural language analysis performed by a compliance tool. Compliance tools facilitate an organization's adherence to rules of various kinds that govern their business, and assessing (“auditing”) that adherence. These rules are expressed in authority documents, which can include, for example: statutes, regulations, regulatory directives or guidance, contractual obligations, standards, auditing guidelines, safe harbors, best practice guidelines, vendor documentation, and procedures established by the organization for its own operation. In some cases, a compliance process involves some or all of the following phases: selecting and obtaining copies of a group of authority documents that applies to the organization; identifying the expressions of rules (“citations”) that occur in the authority documents; performing natural language understanding analysis of the citations to determine the rules (“mandates”) that they express; deduplicating the mandates across the group of authority documents—and within individual authority documents—to obtain “controls” (or “common controls”) that each represent a set of mandates that are equivalent, and are each linked to that set of mandates; constructing an audit questionnaire from the controls that efficiently covers compliance with all of the authority documents in the group; and using the established structure of citations, mandates, controls, and audit questions and answers to establish that the answers to audit questions demonstrate compliance with the authority documents in the group. In some cases, documents, citations, mandates, and/or controls are constructed with reference to data objects called “terms” that constitute dictionary entries for words or phrases occurring in those higher-level data objects.
By performing in some or all of the ways described above, the facility identifies a much more complete list of noun phrases and/or verb phrases quickly, using limited computing resources, enabling higher-quality natural language processing results.
Also, the facility improves the functioning of computer or other hardware, such as by reducing the dynamic display area, processing, storage, and/or data transmission resources needed to perform a certain task, thereby enabling the task to be performed by less capable, capacious, and/or expensive hardware devices, and/or be performed with less latency, and/or preserving more of the conserved resources for use in performing other tasks or additional instances of the same task. For example, the facility conserves processing and user interface resources that would have been applied to facilitating a person's manual analysis of entire documents to identify the contained MWEs. As a result, cheaper, less powerful devices can be substituted to achieve the same level of performance, or the same devices can be used with excess processing capacity remaining for performing additional desirable tasks.
In various embodiments, the facility's preprocessing of the input text includes:
Returning to
Returning to
Those skilled in the art will appreciate that the acts shown in
The operation of the facility's constituent models is discussed in greater detail below.
In some embodiments, for the transformer constituent model, the facility uses a BERT-Base multilingual case model which has been trained on 104 languages with 12-layer, 768-hidden, 12-heads, and 110 M parameters. Additional details about configuring and using BERT are available in Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, 2018, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv: 1810.04805, which is herein incorporated by reference in its entirety. In some embodiments, the facility uses an ALBERT-xxlarge-v1 model with 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, and 223 M parameters. Additional details about configuring and using ALBERT are provided by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut, 2019, Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv: 1909.11942, which is hereby incorporated by reference in entirety. In some embodiments, the BERT model, ALBERT model, or both are trained using a DiMSUM data set. Details about this data set are provided by DiMSUM 2016 Shared Task Data, available at github.com/dimsuml6/dimsum-data, which is hereby incorporated by reference in its entirety.
In various embodiments, the facility uses a noun chunk constituent model that uses some or all of the following bases for identifying noun MWEs: NC head count—the nouns in a sentence where the nouns are the root of the sentence; NC all count—all of the tokens with a POS tag NOUN; NC spacy count—all the noun chunks identified by a built-on noun chunk iterater in a Spacy tool; NC textacy count—Spacy noun chunks omitting determiners such as “a” or “the”; and/or NC noadj count—the Spacy noun chunks omitting determiners and adjectives. Details about the Spacy tool are provided by TextCategorizer, available at spacy.io/api/textcategorizer, which is hereby incorporated by reference in its entirety.
In some embodiments, the facility uses a verb chunk constituent model that uses some or all of the following bases for identifying verb MWEs: VC head count—the verbs in a sentence where the verbs are the root of the sentence; VC all count—all the tokens with a POS tag VERB; VC consecutive count— consecutive tokens with a POS tab VERB; VC immhead count—tokens that have dependency labels that are among XCOMP, ACOMP, AUXPASS, AUX, PRT, NEG, CONJ, ROOT and the corresponding POS tag of the head tokens are among VERB, NOUN, or PRONOUN; and/or VC head left right count—head verbs whose children have dependency labels among XCOMP, ACOMP, AUXPASS, AUX, PRT, NEG, CONJ, ROOT—construct the entire phrase from leftmost child to rightmost child.
In some embodiments, the dictionary constituent model acts on a version of the input text that is pre-processed in the NLP pipeline to include parts of speech.
In some embodiments, the dictionary constituent model processes each token identified by the NLP pipeline as a verb by looking that token up in the dictionary and verifying that the dictionary contains an entry for that word having the same part of speech, i.e., verb. The facility then performs word expansion by looking up associated broader terms that include the base verb and words that are adjacent to the left or right in the pre-processed input text until these broadened verbs no longer match input terms or overlap MWEs previously detected in the input text. The facility uses the broadest match in the dictionary. If the word or part of speech does not exist in the dictionary, then the facility skips detecting verb MWEs.
In some embodiments, for tokens that the NLP pipeline identifies as nouns, the facility looks each up in the dictionary and verifies that it contains an entry for that token having the same part of speech, i.e., noun. The facility performs word expansion by looking at the associated broader terms that include the base noun and words adjacent to the left or right until the broadened nouns no longer match the input terms, or overlap MWEs previously detected in the input text. The facility uses the broadest match in the dictionary. If the word or part of speech does exist in the dictionary, then the facility skips detecting noun MWEs.
Returning to
Report
compliance
monitoring
statistics
to the
Board
of
Directors
and other critical stakeholders,
Report
compliance
monitoring
statistics
to the
Board of Directors and other critical stakeholders,
Report
compliance
monitoring
statistics
to the
compliance monitoring statistics to the
compliance
monitoring
statistics
to the
Board
of
Directors
and other critical stakeholders,
Report
compliance
monitoring
statistics
to the
Board
of
Directors
and other critical stakeholders,
The sixth row of the table contains the ultimate result determined for this input text by the facility, as described below.
In act 205, the facility applies the constituent model result evaluation module to the output of the constituent models collected in act 204. This constituent model result evaluation module 360 is shown
In some embodiments, the constituent model result evaluation module applied by the facility is the following ruleset:
In various embodiments, the facility uses various other rulesets in act 205 to arbitrate among the constituent model outputs and suggest MWEs for addition to the dictionary.
In some embodiments, the constituent model result evaluation module applied by the facility in act 205 is a machine learning model. The facility's construction and application of this machine learning model are discussed in greater detail below in connection with
In act 206, from the application of the constituent model result evaluation module in act 205, the facility determines and outputs an overall result identifying the MWEs in the input text and specifying for each whether it is a noun MWE or a verb MWE. This overall result 361 is shown in
After act 207, the facility continues in act 201 to receive the next unit of input text.
In some embodiments, training a model involves evaluating the following three reward functions for each of the MWEs identified by each of the constituent models. A first reward function is based on whether the MWE identified by the constituent model matches an MWE identified in the overall result—that is, the tagging of the input sentence in the gold standard dataset. A second reward function is based on whether the MWE identified by the constituent model matches the MWE type—noun or verb—identified for this MWE in the overall result. A third reward function is based on whether the constituent model identified all of the MWEs identified in the overall result. In some embodiments, the facility provides an identifier for each MWE identified in any of the constituent model results, so that in its training the model can associate each set of reward function values with the identity of the MWE that it corresponds to. In some embodiments, these identifiers are word vectors, such as those described in Vectors, available at spacy.io/api/vectors, which is hereby incorporated by reference in its entirety. In some cases, the use of word vectors as MWE identifiers allows the model to assess the level of similarity between pairs of MWEs as a basis for treating similar MWEs similarly.
Through this training process, the facility establishes a state of the machine learning model that enables it to predict dependent variable values based upon independent variable values, as discussed below.
Returning to
Returning to
While
Returning to
The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
This Application claims the benefit of U.S. application Ser. No. 17/460,054, filed Aug. 27, 2021 and entitled “AUTOMATICALLY IDENTIFYING MULTI-WORD EXPRESSIONS”, which claims the benefit of U.S. Provisional Application No. 63/073,323, filed Sep. 1, 2020 and entitled “IDENTIFYING MULTI-WORD EXPRESSIONS USING TRANSFORMERS,” and U.S. Provisional Application No. 63/071,180, filed Aug. 27, 2020 and entitled “IDENTIFICATION OF MULTI-WORD EXPRESSIONS USING TRANSFORMERS,” both of which are hereby incorporated by reference in their entirety. This application is related to the following applications, each of which is hereby incorporated by reference in its entirety: U.S. Provisional Patent Application No. 61/722,759 filed on Nov. 5, 2012; U.S. patent application Ser. No. 13/723,018 filed on Dec. 20, 2012 (now U.S. Pat. No. 9,009,197); U.S. patent application Ser. No. 13/952,212 filed on Jul. 26, 2013 (now U.S. Pat. No. 8,661,059); International Application No. PCT/US2013/068341 filed on Nov. 4, 2013; U.S. patent application Ser. No. 14/685,466 filed on Apr. 13, 2015 (now U.S. Pat. No. 9,996,608); U.S. patent application Ser. No. 15/794,405 filed on Oct. 26, 2017 (now U.S. Pat. No. 10,353,933); U.S. patent application Ser. No. 17/160,175 filed on Jan. 27, 2021; U.S. patent application Ser. No. 16/026,524 filed on Jul. 3, 2018 (now U.S. Pat. No. 10,896,211); U.S. patent application Ser. No. 16/432,634 filed on Jun. 5, 2019; U.S. patent application Ser. No. 16/432,737 filed on Jun. 5, 2019; U.S. Provisional Patent Application No. 62/150,237 filed on Apr. 20, 2015; U.S. patent application Ser. No. 14/963,063 filed on Dec. 8, 2015 (now U.S. Pat. No. 9,575,954); International Application No. PCT/US2016/026787 filed on Apr. 8, 2016; U.S. patent application Ser. No. 15/404,916 filed on Jan. 12, 2017 (now U.S. Pat. No. 9,977,775); U.S. patent application Ser. No. 15/957,764 filed on Apr. 19, 2018 (now U.S. Pat. No. 10,606,945); U.S. patent application Ser. No. 16/459,385 filed on Jul. 1, 2019; U.S. patent application Ser. No. 17/397,693 filed on Aug. 9, 2021; U.S. patent application Ser. No. 16/459,412 filed on Jul. 1, 2019 (now U.S. Pat. No. 10,824,817); U.S. patent application Ser. No. 16/459,429 filed on Jul. 1, 2019 (now U.S. Pat. No. 10,769,379); U.S. patent application Ser. No. 16/932,609 filed on Jul. 17, 2020; U.S. Provisional Patent Application No. 63/223,879 filed on Jul. 20, 2021; and U.S. patent application Ser. No. 17/389,959 filed on Jul. 30, 2021. In cases where the present patent application conflicts with an application or other document incorporated herein by reference, the present application controls.
Number | Name | Date | Kind |
---|---|---|---|
4847766 | McRae et al. | Jul 1989 | A |
5715468 | Budzinski | Feb 1998 | A |
5745776 | Sheppard, II | Apr 1998 | A |
5819265 | Ravin | Oct 1998 | A |
5832480 | Byrd, Jr. | Nov 1998 | A |
6204848 | Nowlan | Mar 2001 | B1 |
6289342 | Lawrence et al. | Sep 2001 | B1 |
6393389 | Chanod | May 2002 | B1 |
6453315 | Weissman et al. | Sep 2002 | B1 |
6675169 | Bennett et al. | Jan 2004 | B1 |
6738780 | Lawrence et al. | May 2004 | B2 |
6823325 | Davies et al. | Nov 2004 | B1 |
6966030 | Ashford et al. | Nov 2005 | B2 |
7124080 | Chen | Oct 2006 | B2 |
7333927 | Lee et al. | Feb 2008 | B2 |
7409335 | Horvitz | Aug 2008 | B1 |
7493253 | Ceusters et al. | Feb 2009 | B1 |
7822597 | Brun | Oct 2010 | B2 |
7869989 | Harvey et al. | Jan 2011 | B1 |
8019590 | Kinder | Sep 2011 | B1 |
8019769 | Rolle | Sep 2011 | B2 |
8108207 | Harvey et al. | Jan 2012 | B1 |
8190423 | Rehberg et al. | May 2012 | B2 |
8219566 | Rolle | Jul 2012 | B2 |
8417693 | Lempel et al. | Apr 2013 | B2 |
8612466 | Kikuchi et al. | Dec 2013 | B2 |
8661059 | Cougias | Feb 2014 | B1 |
9009197 | Cougias | Apr 2015 | B2 |
9020808 | Branton | Apr 2015 | B2 |
9058317 | Gardner | Jun 2015 | B1 |
9110975 | Diligenti et al. | Aug 2015 | B1 |
9123024 | LeVine et al. | Sep 2015 | B2 |
9575954 | Cougias et al. | Feb 2017 | B2 |
9715497 | Bhadbhade et al. | Jul 2017 | B1 |
9760586 | Cook | Sep 2017 | B1 |
9798753 | Cook | Oct 2017 | B1 |
9798767 | Cook | Oct 2017 | B1 |
9846694 | Cook | Dec 2017 | B1 |
9923931 | Wagster | Mar 2018 | B1 |
9967285 | Rossman et al. | May 2018 | B1 |
9977775 | Cougias et al. | May 2018 | B2 |
9996608 | Cougias | Jun 2018 | B2 |
10198491 | Semturs et al. | Feb 2019 | B1 |
10353933 | Cougias | Jul 2019 | B2 |
10387575 | Shen et al. | Aug 2019 | B1 |
10606945 | Cougias et al. | Mar 2020 | B2 |
10769379 | Cougias et al. | Sep 2020 | B1 |
10824817 | Cougias et al. | Nov 2020 | B1 |
10896211 | Cougias | Jan 2021 | B2 |
11030516 | Klein | Jun 2021 | B1 |
11120227 | Cougias et al. | Sep 2021 | B1 |
11216495 | Cougias | Jan 2022 | B2 |
11386270 | Cougias | Jul 2022 | B2 |
20020065675 | Grainger et al. | May 2002 | A1 |
20020169771 | Melmon et al. | Nov 2002 | A1 |
20020184068 | Krishnan et al. | Dec 2002 | A1 |
20030067498 | Parisi | Apr 2003 | A1 |
20040006466 | Zhou et al. | Jan 2004 | A1 |
20040030540 | Ovil et al. | Feb 2004 | A1 |
20040059932 | Takeuchi et al. | Mar 2004 | A1 |
20040107124 | Sharpe et al. | Jun 2004 | A1 |
20050080776 | Colledge et al. | Apr 2005 | A1 |
20050080780 | Colledge et al. | Apr 2005 | A1 |
20050096914 | Williamson et al. | May 2005 | A1 |
20050138056 | Stefik | Jun 2005 | A1 |
20050203924 | Rosenberg | Sep 2005 | A1 |
20050228799 | Farlow et al. | Oct 2005 | A1 |
20060047656 | Dehlinger et al. | Mar 2006 | A1 |
20060149720 | Dehlinger | Jul 2006 | A1 |
20060149800 | Egnor et al. | Jul 2006 | A1 |
20060259475 | Dehlinger | Nov 2006 | A1 |
20070011211 | Reeves | Jan 2007 | A1 |
20070016583 | Lempel et al. | Jan 2007 | A1 |
20070083359 | Bender | Apr 2007 | A1 |
20070088683 | Feroglia et al. | Apr 2007 | A1 |
20070118515 | Dehlinger | May 2007 | A1 |
20070192085 | Roulland | Aug 2007 | A1 |
20070282592 | Huang et al. | Dec 2007 | A1 |
20070283252 | Stuhec | Dec 2007 | A1 |
20080091408 | Roulland | Apr 2008 | A1 |
20080208563 | Sumita | Aug 2008 | A1 |
20080262863 | Stickley et al. | Oct 2008 | A1 |
20080287142 | Keighran | Nov 2008 | A1 |
20090024385 | Hirsch | Jan 2009 | A1 |
20090089126 | Odubiyi | Apr 2009 | A1 |
20090089195 | Salomon et al. | Apr 2009 | A1 |
20090112859 | Dehlinger | Apr 2009 | A1 |
20090119141 | McCalmont et al. | May 2009 | A1 |
20090187567 | Rolle | Jul 2009 | A1 |
20090265199 | Moerdler et al. | Oct 2009 | A1 |
20100114628 | Adler et al. | May 2010 | A1 |
20100145678 | Csomai | Jun 2010 | A1 |
20100250313 | Crocker et al. | Sep 2010 | A1 |
20110071817 | Siivola | Mar 2011 | A1 |
20110112973 | Sanghvi | May 2011 | A1 |
20110179075 | Kikuchi et al. | Jul 2011 | A1 |
20110208769 | Kemp | Aug 2011 | A1 |
20110225155 | Roulland | Sep 2011 | A1 |
20110270603 | Ovil et al. | Nov 2011 | A1 |
20120036157 | Rolle | Feb 2012 | A1 |
20120066135 | Garst et al. | Mar 2012 | A1 |
20120072422 | Rollins et al. | Mar 2012 | A1 |
20120078801 | Holland et al. | Mar 2012 | A1 |
20120116984 | Hoang et al. | May 2012 | A1 |
20120197631 | Ramani et al. | Aug 2012 | A1 |
20120226491 | Yamazaki | Sep 2012 | A1 |
20130047221 | Warnock | Feb 2013 | A1 |
20130091486 | Gemmell et al. | Apr 2013 | A1 |
20130226662 | LeVine et al. | Aug 2013 | A1 |
20130289976 | Walker | Oct 2013 | A1 |
20130297477 | Overman et al. | Nov 2013 | A1 |
20130346302 | Purves et al. | Dec 2013 | A1 |
20140032209 | Etzioni | Jan 2014 | A1 |
20140046892 | Gopalakrishnan et al. | Feb 2014 | A1 |
20140052617 | Chawla et al. | Feb 2014 | A1 |
20140222920 | Priebe | Aug 2014 | A1 |
20140244524 | Brestoff et al. | Aug 2014 | A1 |
20140310249 | Kowalski | Oct 2014 | A1 |
20150012402 | Buck | Jan 2015 | A1 |
20150032444 | Hamada | Jan 2015 | A1 |
20150066478 | Onishi et al. | Mar 2015 | A1 |
20150142682 | Ghaisas et al. | May 2015 | A1 |
20150220621 | Cougias | Aug 2015 | A1 |
20150249651 | Okamoto | Sep 2015 | A1 |
20150269934 | Biadsy | Sep 2015 | A1 |
20160225372 | Cheung | Aug 2016 | A1 |
20160306789 | Cougias et al. | Oct 2016 | A1 |
20160350283 | Carus et al. | Dec 2016 | A1 |
20160371618 | Leidner et al. | Dec 2016 | A1 |
20170075877 | Lepeltier | Mar 2017 | A1 |
20170147635 | McAteer et al. | May 2017 | A1 |
20170178028 | Cardonha et al. | Jun 2017 | A1 |
20170220536 | Chiba et al. | Aug 2017 | A1 |
20170236129 | Kholkar et al. | Aug 2017 | A1 |
20170300472 | Parikh et al. | Oct 2017 | A1 |
20180018573 | Henderson | Jan 2018 | A1 |
20180053128 | Costas | Feb 2018 | A1 |
20180101779 | Canim et al. | Apr 2018 | A1 |
20180137854 | Perez | May 2018 | A1 |
20180233141 | Solomon | Aug 2018 | A1 |
20180260680 | Finkelstein | Sep 2018 | A1 |
20180285340 | Murphy | Oct 2018 | A1 |
20180293221 | Finkelstein | Oct 2018 | A1 |
20180314754 | Cougias | Nov 2018 | A1 |
20180357097 | Poort et al. | Dec 2018 | A1 |
20180373691 | Alba et al. | Dec 2018 | A1 |
20190080018 | Pilkington et al. | Mar 2019 | A1 |
20190080334 | Copeland et al. | Mar 2019 | A1 |
20190163778 | Brown et al. | May 2019 | A1 |
20190188400 | Vandervort | Jun 2019 | A1 |
20190260694 | Londhe | Aug 2019 | A1 |
20190286642 | Cougias | Sep 2019 | A1 |
20190286643 | Cougias | Sep 2019 | A1 |
20200050620 | Clark et al. | Feb 2020 | A1 |
20200111023 | Pondicherry Murugappan et al. | Apr 2020 | A1 |
20200176098 | Lucas et al. | Jun 2020 | A1 |
20200302364 | Singh | Sep 2020 | A1 |
20200327285 | Cox | Oct 2020 | A1 |
20200394531 | Rigotti | Dec 2020 | A1 |
20210004535 | Cougias et al. | Jan 2021 | A1 |
20210149932 | Cougias | May 2021 | A1 |
20210280320 | Arora | Sep 2021 | A1 |
20210365638 | Cougias et al. | Nov 2021 | A1 |
20220058345 | Guo | Feb 2022 | A1 |
20220067290 | Cougias | Mar 2022 | A1 |
20220159093 | Joshi | May 2022 | A1 |
20230031040 | Cougias | Feb 2023 | A1 |
20230075614 | Cougias | Mar 2023 | A1 |
20230196003 | Peleg | Jun 2023 | A1 |
20230245739 | Tamer | Aug 2023 | A1 |
Number | Date | Country |
---|---|---|
1975837 | Oct 2008 | EP |
3404891 | Nov 2018 | EP |
WO 2008121382 | Oct 2008 | WO |
Entry |
---|
Final Office Action for U.S. Appl. No. 17/160,175, dated May 11, 2023, 18 pages. |
Final Office Action for U.S. Appl. No. 17/389,959, dated May 2, 2023, 31 pages. |
International Search Report, dated Nov. 8, 2022, for International Patent Application No. PCT/US2022/037624. (3 pages). |
Neumann et al., “An Analysis of Public REST Web Service APIs,” 97/Jan. 2021, IEEE Transactions on Services Computing, vol. 14, No. 4, Jul./Aug. 2021, pp. 957-970 (Year: 2021). |
Non Final Office Action for U.S. Appl. No. 17/160,175, dated Dec. 6, 2022, 33 pages. |
“AuditScripts—About Us,” <www.auditscripts.com/about-us/>, 2011. (2 Pages). |
“CSA Cloud Security Alliance—History,” <cloudsecurityalliance.org/history/>, 2010, (2 Pages). |
“HITRUST Common Security Framework Matures with Enhancements for 2010,” Feb. 1, 2010, 4 pages. <hitrustalliance.net/hitrust-common-security-framework-matures-enhancements-2010/>. |
“ISF Information Security Forum,” <securityforum.org/about/>, first published 2007, (3 Pages). |
Baldwin et al., “Chapter 1—Multiword Expressions,” Handbook of Natural Language Processing, Second Edition:1-40, 2010. |
Cloud Security Alliance, “Security Guidance for Critical Areas of Focus in Cloud Computing V2.1,” Dec. 2009, 76 pages. |
Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Cornell University, 2018, 14 pages. |
Ferrari et al., “Detecting requirements defects with NLP patterns: an industrial experience in the railway domain,” Empirical Software Engineering 23:3684-3733, 2018. |
Final Office Action for U.S. Appl. No. 16/432,737, dated Feb. 11, 2021, 10 pages. |
Gharbieh et al., “Deep Learning Models For Multiword Expression Identification,” Proceedings of the 6th Joint Conference on Lexical and Computational Semantics, Canada, Aug. 3-4, 2017, pp. 54-64. |
International Search Report and Written Opinion for International Application No. PCT/US2016/026787, dated Jul. 22, 2016, 13 pages. |
International Search Report and Written Opinion for International Application No. PCT/US2021/048088, dated Feb. 9, 2022, 13 pages. |
Lan et al., “ALBERT: A Lite Bert for Self-Supervised Learning of Language Representations,” arXiv preprint arXiv:1909.11942, 2019, 17 pages. |
Masini, F., “Multi-Word Expressions and Morphology,” Oxford Research Encyclopedias, 2019, 30 pages. |
Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” Advances in neural information processing systems:3111-3119, 2013. |
Non Final Office Action for U.S. Appl. No. 13/952,212, dated Oct. 15, 2013, 7 pages. |
Non Final Office Action for U.S. Appl. No. 16/432,634, dated Feb. 5, 2021, 7 pages. |
Non-Final Office Action for U.S. Appl. No. 17/389,959, dated Dec. 7, 2021, 33 pages. |
Non-Final Office Action for U.S. Appl. No. 17/460,054, dated Nov. 15, 2021, 5 pages. |
Notice of Allowance for U.S. Appl. No. 17/460,054, dated Mar. 7, 2022, 12 pages. |
Office Action for U.S. Appl. No. 16/459,385, dated Apr. 23, 2021, 17 pages. |
Pennington et al., “GloVe: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP): 1532-1543, 2014. |
Peters et al., “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365v2, 2018, 15 pages. |
Proffitt, Tim, “Meeting Compliance Efforts with the Mother of All Control Lists (MOACL),” SANS Institute Information, Security Reading Room, 2010, 56 pages. |
Radford et al., “Improving Language Understanding by Generative Pre-Training,” 2018 (Retrieved from s3-us-west-2.amazonaws.com on Sep. 14, 2021.). |
Ramshaw et al., “Text Chunking Using Transformation-Based Learning,” Natural language processing using very large corpora, Springer, 1999, 157-176. |
Ratinov et al., “Design Challenges and Misconceptions in Named Entity Recognition,” Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL):147-155, 2009. |
Rohanian at al., “Bridging the Gap: Attending to Discontinuity in Identification of Multiword Expressions,” Cornell University, 2019, 7 pages. |
Schneider et al., “SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM),” Proceedings of SemEval:546-559, 2016. |
Suissas, “Verb Sense Classification,” Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering:1-72, Oct. 2014. |
U.S. Appl. No. 13/952,212, filed Jul. 26, 2013, Dorian J. Cougias. Note—Cite Granted U.S. Pat. No. 8,661,059—Already added to US citation list. |
Vaswani et al., “Attention is All You Need,” 31st Conference on Neural Information Processing Systems, 2017, 11 pages. |
Wahl, “The Distributional Learning of Multi-Word Expressions: A Computational Approach,” Dissertation: 1-190, Jun. 2015. |
Wikipedia, “Frequency (statistics)”, 4 pages, downloaded Mar. 11, 2020. (Year: 2020). |
Wikipedia, “Word lists by frequency”, 10 pages, downloaded Mar. 11, 2020. (Year: 2020). |
{hacek over (S)}kvorc et al., “MICE: Mining Idioms with Contextual Embeddings,” Aug. 14, 2020, pp. 1-23. |
International Preliminary Report on Patentability for International Application No. PCT/US2016/026787, dated Oct. 24, 2017 (10 pages). |
International Search Report and Written Opinion for International Application No. PCT/US2013/068341, dated Feb. 26, 2014, 9 pages. |
Final Office Action for U.S. Appl. No. 16/432,634, dated Oct. 12, 2021, 10 pages. |
Final Office Action for U.S. Appl. No. 16/459,385, dated Apr. 23, 2021, 17 pages. |
Final Office Action for U.S. Appl. No. 16/432,634, dated Oct. 12, 2021, 17 pages. |
Final Office Action for U.S. Appl. No. 17/389,959, dated May 18, 2022, 18 pages. |
Non Final Office Action for U.S. Appl. No. 16/932,609, dated May 3, 2022, 16 pages. |
Number | Date | Country | |
---|---|---|---|
20230075614 A1 | Mar 2023 | US |
Number | Date | Country | |
---|---|---|---|
63073323 | Sep 2020 | US | |
63071180 | Aug 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17460054 | Aug 2021 | US |
Child | 17850772 | US |