The present disclosure relates to a method and apparatus for autonomously assimilating content using a machine learning algorithm.
In general, in the descriptions that follow, we will italicize the first occurrence of each special term of art that should be familiar to those skilled in the art of computer implemented algorithms. In addition, when we first introduce a term that we believe to be new or that we will use in a context that we believe to be new, we will bold the term and provide the definition that we intend to apply to that term.
Hereinafter, when we refer to a facility we mean a circuit or an associated set of circuits adapted to perform a particular function regardless of the physical layout of an embodiment thereof. Thus, the electronic elements comprising a given facility may be instantiated in the form of a hard macro adapted to be placed as a physically contiguous module, or in the form of a soft macro the elements of which may be distributed in any appropriate way that meets speed path requirements. In general, electronic systems comprise many different types of facilities, each adapted to perform specific functions in accordance with the intended capabilities of each system. Depending on the intended system application, the several facilities comprising the hardware platform may be integrated onto a single IC, or distributed across multiple ICs. Depending on cost and other known considerations, the electronic components, including the facility-instantiating IC(s), may be embodied in one or more single- or multi-chip packages. However, unless we expressly state to the contrary, we consider the form of instantiation of any facility that practices our disclosed embodiments as being purely a matter of design choice.
Shown in
In a typical embodiment, the mobile device 12 comprises a central processing unit (“CPU”) 22 and a memory facility 24 adapted to store, inter alia: an operating system (“OS”) 26; at least one application program (“App”) 28; and data 30 relating to the operation of the OS 26 and the App 28. An input/output facility 32, comprising a combination display screen and touch panel, facilitates real-time interaction with a user of the mobile device 12. A communication facility (“Comm”) 34, internally coupled to the CPU 22, is adapted to communicate wirelessly via the wireless channel 18 using any of the known wireless communication protocols. In general, the OS 26 can be any of the known mobile operating systems, e.g., the iOS system developed by Apple Inc., or the Android system developed by Google Inc.; or, in some embodiments, any of the known general purpose operating systems, e.g., Windows developed by Microsoft Corporation, Mac OSXdeveloped by Apple Inc., or the UNIX operating system developed by AT&T Inc., including any of the several so-called xNIX variants of the open source Linux.
In most embodiments, the mobile device 12 includes at least one sensor 36, such as a solid-state camera, but may also include one or more microphones (not shown). In some embodiments, the mobile device 12 includes one or more sensors 36 adapted to sense, in real time, ambient environmental conditions, e.g., temperature, humidity, atmospheric pressure, geo-location, and the like. Further, as is known, the camera is well adapted to facilitate measurement of ambient light intensity, and the microphone is well adapted to facilitate measurement of ambient sound intensity. In such embodiments, the OS 26 facilitates communication by the App 28 with the several available sensors 36.
Shown in
Over the years, various attempts have been made to create a machine learning algorithm (“MLA”). However, most of these approaches have met with only limited success, usually as a result of the related projects being of only limited scope. One of the more successful projects of which we are aware was the Knowledge Graph, developed byGoogle LLC to enhance the performance of its search engine. See, Singhal, Amit, “Introducing the Knowledge Graph: Things, Not Strings”, Google Official Blog, 16 May 2012. An even more ambitious project, also by Google LLC, was the Knowledge Vault. See, Dong, Zin Luna, et al., “Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion”, KDD′14, 24-27 Aug. 2014, New York, N.Y., USA. We believe that Google LLC is still developing this technology, but are not presently aware of its current state of functionality.
In some graph databases, each knowledge assertion comprises a single Resource Description Framework (“RDF”) semantic triple, (s,p,o), wherein s is the subject of the assertion, p is the predicate, and o is the object. By way of example, we have illustrated in
In
With respect to all of the prior art systems of which we are aware, we have found none that attempt to infer, during the process of initially assimilating content, the relative difficulty an “average” user might experience in learning particular assertions derived from that content. Further, we are not aware of any such system that thereafter uses an MLA to further refine such a difficulty metric to better fit each particular user.
Therefore, in light of the foregoing, we submit that there exists a need to address, for example to overcome, the problem of presenting content to a user that is not appropriate to that users intellectual abilities. Further, we submit that what is needed is a content discrimination method that is at least as efficient, but more effective than, the known art.
In accordance with our disclosed embodiments, we provide a method for autonomously assimilating Content comprising an Assertion, using a Machine Learning Algorithm (“MLA”), characterized in that the method comprises configuring an electronic data processing facility to perform the steps of: adapting the MLA to Infer from the Assertiona Difficulty Metric; and associating the Difficulty Metric with the Assertion.
In accordance with yet another embodiment of the present disclosure, a computer system may be configured to practice our Content assimilation methods.
In accordance with still another embodiment of the present disclosure, a non-transitory computer readable medium may include executable instructions which, when executed in a processing system, causes the processing system to perform the steps of our Content assimilation methods.
Our disclosed embodiments may be more fully understood by a description of certain preferred embodiments in conjunction with the attached drawings in which:
In the drawings, similar elements will be similarly numbered whenever possible. However, this practice is simply for convenience of reference and to avoid unnecessary proliferation of numbers, and is not intended to imply or suggest that our disclosed embodiments requires identity in either function or structure in the several embodiments.
For convenience of reference, we shall hereafter use the following capitalized terms:
In
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although we will disclose some modes of carrying out the present invention, those skilled in the art will recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
By way of example, let us consider a particular User. From one perspective, we can train our MLA to Infer the intellectual capacity required of a User to comprehend particular Content. For the purposes of our method, we denote this as the inherent, i.e., threshold, Cognitive Skill level required of a User for effective comprehension. Clearly, it would not be especially effective to deliver to this particular User Content that is above her Cognitive Skill level. From another perspective, we can train our MLA to Infer the intellectual capability of this User: below average; average; or above average. For the purposes of our method, we denote this as the inherent, i.e., threshold, Learning Capability of this User. Again, it would not be desirable to present to this particular User Content that is above her Learning Capacity. This then is one important goal of our method: to deliver to each User only Content that satisfies at least a selected one of these threshold conditions. Accordingly, in one mode of operation, our method will select only that Content that does not require greater Cognitive Skill than this User possesses. In one other mode of operation, our method will select only that Content that is within the Learning Capability of this User.
In general, our disclosed embodiments provides a method for autonomously assimilating Content comprising one or more Assertions, using an MLA implemented in a data processing facility comprising:
In particular, our method comprises configuring this data processing facility to perform the steps of:
By way of example, let us consider a first Assertion, A1: “Barak Obama was born in Nairobi”, which can be represented in triple form as follows:
Before a User will be able to understand this Assertion, that User must first possessthe intellectual capacity to understand at least the following predicates:
Let us now assume that our User is a child only three (3) years of age. In this case, it is doubtful that this User will have the intellectual capacity to understand any of these predicates. Depending on the culture within which this User is being reared, the age will vary at which understanding of all of these predicates can be assumed. It is, therefore, important that we train our MLA in such a way that its Inferences with respect to Cognitive Skill will be relatively imprecise or “fuzzy”, i.e., will be scaled or normalized as a function of the expected age distribution at which Users will attain the requisite Cognitive Skill level. With respect to each User, we expect that the MLA will be able to improve the Inference as a result of active feedback indicative of the reaction of the User to presentation of the Assertion. We are aware of several such feedback facilities, both biometric and query-response based, that appear to us to be appropriate for performing this function.
In general, a human teacher who is privileged to engage with a human student in a face-to-face setting has a very significant advantage over any artificial facility. The reason is that humans begin to learn body language while still in the womb. By the time an “average” human reaches adulthood, he is more than capable of detecting and, more importantly, understanding even tiny changes in the demeanor of another human. So, after working only a few minutes with a new student, our theoretical teacher will often have already “received” sufficient “information” from observing the student's responses to his presentation to be able to adapt the manner of that presentation in ways that, based on his prior experience, will tend to improve the student's reception. One significant problem that an artificial facility must overcome is to learn sufficient human body language so as to be able to make decisions based only on electronically “perceived” demeanor. Although this challenge is indeed daunting, we believe that this problem will eventually be solved, perhaps not entirely, but sufficiently well to enable artificial teachers effectively to teach humans. We recognize, however, that there are some who believe otherwise. See, e.g., Narayanan, Arvind, “How to recognize AI snake oil”, Center for Information Technology Policy, Princeton University, https://www.cs.princeton.eduharvindn/talks/MIT-STS-AI-snakeoil.pdf
Let us now assume that our User is a young adult already twenty-one (21) years of age. Unfortunately, despite not having the same chronological problem as the child in our first example, this particular User is generally considered to be intellectually disabled (no disrespect intended). In this case, it is more likely than not that our MLA would have developed a Cognitive Skill Metric that is wholly inappropriate for this User. It is to cope with such cases that we also train our MLA to develop a Difficulty Metric as a function of the Learning Capacity of our anticipated Users. Clearly, the ability of each User to understand all of these predicates will vary greatly, depending on the mental faculties of that User. It is, therefore, important that we train our MLA in such a way that its Inferences with respect to Learning Capacity will also be relatively “fuzzy”, i.e., will be scaled as a function of the expected “intelligence” distribution at which Users will attain the requisite Learning Capacity level. With respect to each User, we expect that the MLA will be able to improve the Inference as a result of active feedback indicative of the reaction of the User to presentation of the Assertion.
Please note that, in each of the above examples, it was not necessary for our system to solicit, ab initio, any “personal information” from any User. Of course, for the training to be effective, the training set upon which we train our MLA must be carefully selected so as to fairly represent the distribution of expected Users with respect to both learning capacity and level of cognitive skills. Various prior art approaches exist for selecting such a training set.
Let us now consider another, more difficult, Assertion, A2: “Human blood is slightly basic”, which can be represented in triple form as follows:
Before a User will be able to understand Assertion A2, that User must first possessthe intellectual capacity to understand at least the following predicates:
In
By way of example, we have added a fifth column to the Assertions_Table illustrated in
In
In one embodiment, we can use this same mechanism to concatenate multiple, semantically similar, s∥p∥o (where “∥” represents the “logical OR” function) values for storage in a single s_[ ], p_[ ] or o_[ ] field. For example, let's add a third Assertion: “President Obama attended Harvard Business School”, which can be represented in triple form as follows:
Note that our first Assertion (see, Paraadapting the MLA to Infer from eachAssertion a Difficulty Metric; and
shares the same subject but using different, but semantically similar, words/phrases. Using our concatenation mechanism, our MLA can, upon detecting the semantic similarity, construct a single entry in the Source_Index table to store the indices of both the first and third Assertion, wherein the value stored in the first column (or field) looks something like this:
So, why do we believe it important to pre-assess the relative difficulty of particular content? Because curiosity is fragile and easily bruised. Imagine that the child in our first example (see, Para [0033], above) is six (6) years of age, and now able to pose the following query to our system (perhaps with some help from her older brother): “Is broccoli good for me?” How do you think this child would react if our MLA were to deliver, in response to this very simple question, something like this:
Having answered our young user's query as appropriately as it could under the circumstances (and decidedly better than did Google's search engine), our MLA can now, again, take advantage of our disclosed embodiments by enriching its answer. Let us assume, for this example, that our MLA, using known methods, determines that the IP address of this user is allocated to a service provider located in Canada, a place where lots of broccoli is grown but where tropical fruits are relatively rare. So, leveraging this collateral information, our MLA searches the Content database seeking Assertions of comparable semantic content and that have associated therewith comparable Difficulty Metrics. It then enriches the answer with the following: “ . . . but Kiwi fruits are also good for you.” The child has received a basic answer it is likely to understand, but, not being familiar with something strangely exotic called “Kiwi fruits”, is now tempted by the supplemented response to pose follow-on queries.
In a general sense, we believe that a User will tend to respond positively when new knowledge is presented in a form that is only moderately challenging, but will tend to respond negatively if that same fundamental knowledge is presented in a form that is perceived as threatening, overwhelming or daunting. We submit that the problem is not the knowledge per se, but rather the form in which that knowledge is presented. This requires our system to maintain (or dynamically construct) Content comprising semantically redundant forms of the same base knowledge. As we have described above, our Difficulty Metric acts as a filter such that the MLA tends to select between semantically equivalent forms of Content in a way that is more likely than currently known approaches to present a User with knowledge in a form more appropriate for her learning ability. Presented with relevant Content in a non-threatening form, our User is more likely than not to internalize at least some of the Content. When this happens, we will have accomplished our most fundamental goal of imparting new knowledge to another human.
Embodiments of the present disclosure may reduce, and in some instances eliminate, the limitations in autonomous assimilation of a Content by pre-assessing the level of understanding required of the User.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions, such as “including”, “comprising”, “incorporating”, “have” and “is”, which we have used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. Reference to the one gender is intended also to comprehend the other gender.
Although we have described our disclosed embodiments in the context of particular embodiments, one of ordinary skill in this art will readily realize that many modifications may be made in such embodiments to adapt them to specific implementations. Thus it is apparent that we have provided a method and apparatus for autonomous assimilation of Content, that, during the assimilation process, Infers Difficulty Metrics to that Content. Further, we submit that our method and apparatus provide performance generally superior to the best prior art techniques.
Number | Date | Country | |
---|---|---|---|
63033458 | Jun 2020 | US |