The present disclosure generally relates to the field of computer-assisted education. In particular, method and system for classification of subject's, for example, student's or job applicant's, progress in solving a complex problem through an extended period of time with a possibility to submit multiple versions of the solution to a given problem.
In recent years, a large portion of the educational process moved from physical brick and mortar classrooms to online resources like virtual classrooms. With such an advancement in the field of education, an important requirement that has consequently developed is that of testing the participant subjects. As a result, testing of subjects has been automated in different areas of education. There are different kinds of tests that are used nowadays. Some are short quizzes that the teachers resort to for checking subject understanding of the subjects during live lectures. The simplest form of automated testing is multiple-choice tests. Some other tests are designed to be more comprehensive and may require for the subject to provide more than a simple answer, for example, input each step of solving a mathematical, chemical, physical problem, or to write computer source code. In some tests, it is expected that subjects submit multiple versions of their solutions overtime before solving the problem.
Conventional testing systems collect and display results of all subjects who took a particular test. However, these conventional systems fail to provide the analysis of the solutions provided by each subject. More specifically, the fact that a subject ultimately solved a problem with a 100% score does not necessarily mean that the subject understands the subject. There are numerous scenarios in which a subject can provide a complete solution for a problem without understanding the subject of the problem, for example, by copying from other subjects, receiving assistance from instructors, or using other resources on the Internet.
Currently, only an instructor in personal communication with the subject can provide a true assessment of the subject's knowledge of a subject by reviewing the subject's notes and all the steps the subject took while solving a complex problem. This manual approach may be subjective and requires a significant amount of time making it in many cases prohibitively expensive.
A system and method are needed to allow for automated evaluation of a subject's progress in understanding a complex topic through the analysis of the evolution of the subject's solutions to a given problem submitted over time.
The present invention discloses a method for automatic classification of study progress for multiple subjects corresponding to a particular test with an option for a subject to submit multiple solutions to the test. Although a student is the typical subject of the method, the method may also be used for other subjects who are not students.
The present invention relies on the method of two-level clustering of digital representations of subject solutions to a given problem.
First level is clustering of actual abstractions, for example, into a graph, of subject solutions into solution clusters.
An example of an abstraction of a solution that is a software program is a control flow graph (CFG) which is the graphical representation of control flow or computation during the execution of programs or applications. Control flow graphs represent the flow inside of a program unit.
The second level is clustering of paths of subjects through solutions clusters into behavioral clusters.
In an embodiment, such behavioral clusters are assigned labels reflecting certain behavior types indicating, for example, that subjects whose path through solution clusters fell into a given behavioral cluster successfully understood the subject, or obtained an external help in developing a solution, or lack understanding of the subject or at least one of its components.
The method steps include collecting at least one solution version developed by at least one subject in response to the test; automatically abstracting each collected solution into a solution abstraction, for example, into a graph; grouping at least one solution abstraction, for example, a graph, into at least one solution cluster based on similarity of solution abstractions, for example, graphs, using a at least one solution clustering criteria; building a path for each individual subject, wherein each of the subject's path is an ordered sequence of at least one solution cluster to which at least of the subject's solutions belong; and grouping at least one path of at least one subject into at least one behavioral cluster based on at least one metric and at least one path clustering criteria.
In an alternative embodiment, the solution clustering criteria is based on at least one metric defined on the space of solution abstractions, for example, graphs.
In an alternative embodiment, the path clustering criteria is based on at least one metric defined on the space of paths through solution abstraction clusters.
In an embodiment, at least one behavioral cluster is assigned at least one textual, color-coded, shape-coded or otherwise descriptive label, for example, ‘a subject copied the solutions and does not understand the subject’, ‘a subject does not understand the subject and writes wrong code’, ‘a subject applied a different approach or approaches to solve the problem’, ‘a subject used one approach to solve the problem’, ‘a subject stayed in cluster for too long’, ‘a subject moves from one cluster to another often’, or ‘most solutions of the subject lay in an incorrect cluster, but the correct solution is another less numerous cluster’.
In an embodiment, clustering of solution abstractions, for example, graphs, into solution clusters is performed using a machine learning artificial intelligence system trained on previously analyzed solution abstractions, for example, graphs.
In an embodiment, clustering of paths into behavioral clusters is performed using a machine learning artificial intelligence system trained on previously analyzed paths.
In an embodiment, clustering of solution abstractions into solution clusters is performed using an expert artificial intelligence system.
In an embodiment, clustering of paths into behavioral clusters is performed using an expert artificial intelligence system.
In an embodiment, the step of identification of the clusters includes at least one of a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm or a k-nearest neighbors (KNN) algorithm.
In an embodiment, the method further comprises determining outliers not belonging to other clusters and classifying the outliers as individual clusters.
The present invention discloses a method and system for classification of study progress of multiple subjects in solving a complex problem with an option for each subject to submit multiple versions of the solution to a given problem. In accordance with the present disclosure, the method includes collection of multiple solution versions from a plurality of subjects throughout a predetermined period of time.
In an embodiment, the predetermined period of time is determined by the designers of the course, the educational institution, the teacher, or the subject.
For example, the predetermined course of time can be several minutes or an entire semester. The purpose of the test is to provide a comprehensive understanding of a subject's progress in a given subject through allowing the subject to submit multiple versions of the solution over a predetermined period of time.
Each solution version that is collected is then abstracted to generate a solution abstraction, for example, a graph,. Solution abstractions, for example, graphs, are grouped together in different solution clusters based on similarity of solution abstractions to pre-existing solution clusters.
In an embodiment, if a solution abstraction, for example, a graph, of a particular solution version is not similar to any of the pre-existing solution clusters, a new solution cluster is formed that includes the specific solution abstraction, for example, a graph, wherein the newly formed solution cluster can be used in the future for solution clustering of the solution abstractions, for example, graphs.
Since multiple solution versions are created and submitted by the subject, it is possible to track the progress of the subject over a period of time. More specifically, the test question can be complex and may be designed in a manner that as a new lesson is taught to the subjects, the solutions submitted by the subject may also require an update in accordance with the latest lessons. As a result, the analysis of all of the solution versions submitted by one or more subjects can be used to build a path for each individual subject, wherein each of the subject's path is an ordered sequence of at least one solution cluster to which at least of the subject's solutions belong. Such a path of the subject's progress through solution clusters provides a systematic outlook of the subject's progress during the predetermined course of time. As a result, each subject is assigned a track.
In accordance with the present disclosure, after the paths for individual subjects have been built, the at least one path of at least one subject is grouped into at least one behavioral cluster based on at least one path clustering criteria.
In an embodiment, at least one of such behavioral clusters is assigned a label indicating the subject's progress in understanding the given subject.
Thus, the system and method, as disclosed in the present disclosure, facilitate obtaining a comprehensive understanding of the progress of multiple subjects for a given subject throughout a predetermined period of time by applying a two-level clustering process to abstractions, for example, graphs, of subjects' solutions.
The system 100 further comprises a solution version abstractor 104. The solution version abstractor 104 is configured for automatically abstracting each collected solution version into a solution abstraction, for example, a graph.
The system 100 further comprises a solution cluster identifier 106. The solution clusters identifier 106 is configured to identify a solution cluster that is similar to at least one solution abstraction, for example, a graph, over the Solution Abstraction Space 108. The solution abstraction space 108 contains solution abstractions, for example, graphs, that are grouped into solution clusters based on a solution abstraction clustering criteria, for example, graph clustering criteria. In an embodiment, the system 100 is configured to maintain the list of solution clusters, wherein each solution abstraction cluster includes solution abstractions, for example, graphs, that are similar according to certain solution clustering criteria.
The system 100 further comprises a solution cluster tracker 110. The solution clusters tracker 110 is configured for identifying paths of individual subjects through solution clusters to which abstractions, for example, graphs, of the solution versions from one subject belong. Such identification is required to build a path for each individual subject, wherein each of the subject's path is an ordered sequence of at least one solution cluster to which at least of the subject's solution abstractions, for example, graphs, belong.
The system 100 further includes a path generator 112 for building the paths for individual subjects of abstractions, for example, graphs, of their solution versions across solution abstraction clusters.
The path of each subject generated by the path generator 112 reflects the subject's progress of understanding of the subject through the subject's progress in development of solution versions throughout the predetermined period of time. Subjects with similar paths have similar progress through understanding of the subject through the subject's progress in development of solution versions throughout the predetermined period of time. Thus, the present disclosure also includes a behavior clusters identifier 114.
In an embodiment, the system 100 is configured to maintain the list of behavior clusters that are generated based on at least one path clustering criteria. The behavior clusters identifier 114 is configured to group at least one path of at least one subject into at least one behavioral cluster based on at least one path clustering criteria.
At block 202, the step 200 for identification of solution clusters includes receiving a solution version N for a subject S as a solution to the test by the solution version collector 102. In an embodiment, each solution version is assigned an identifier associated with the version and the subject who submitted it.
At block 204, the step 200 for identification of solution clusters includes generating an abstraction, for example, a graph, of the solution version N of the subject S by the solution version abstractor 104.
The solution abstraction, for example, a graph, is then analyzed for being grouped into a relevant solution cluster within the space of all solution abstractions, for example, graphs, of the test 108 by the Solutions Clusters Tracker 110 based on similarity of solution abstractions, for example, graphs, using at least one solution clustering criteria. In an embodiment, the solution clustering criteria is based on at least one solution abstraction metric, for example, a graph metric.
At block 202, the exemplary method 200 for identification of solution clusters includes receiving multiple solution versions 202-1, 202-2, . . . , 202-3 from a subject as multiple solutions to the test over a predefined period of time. The multiple solution versions 202-1, 202-2, . . . , 202-3 together from a collection of solution versions 202 created by the solution version collector 102. In an embodiment, each solution version is assigned an identifier associated with the version and the subject who submitted that version.
At blocks 204-1, 204-2, . . . , 204-3, the step 200 for identification of solution clusters includes performing an abstraction, for example, generating a graph, of the solution versions 1, 2, . . . , M of the subject by the solution version abstractor 104.
The solution abstractions, for example, graphs, are then analyzed for being grouped into a relevant solution cluster within abstraction space of the versions 1 to M 108 based on similarity of solution abstractions, for example, graphs, using at least one solution clustering criteria by the solution clusters identifier 106 and grouped into the solution abstraction clusters C(1), . . . , C(M) within the solution abstraction space 108. In an embodiment, the solution clustering criteria is based on at least one solution abstraction metric, for example, a graph metric. In an embodiment, if the solution clusters identifier 106 fails to identify an existing solution cluster within which a particular solution abstraction, for example, a graph, matching the clustering criteria, then that solution abstraction, for example, a graph, may be considered an outlier solution abstraction, for example, a graph, and is then used to form a new solution abstraction cluster.
In an embodiment, the clustering of solution abstractions, for example, graphs, into solution abstraction clusters is performed using either an expert artificial intelligence system or a machine learning artificial intelligence system trained on previously analyzed solution abstractions, for example, graphs,. More specifically, the solution clusters identifier 106 performs the operation of clustering of solution abstractions, for example, graphs, into solution clusters and can be either an expert artificial intelligence system or a machine learning artificial intelligence system trained on previously analyzed solution abstractions, for example, graphs.
In an embodiment, the step of identification of the clusters includes the usage of at least one of a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm or a k-nearest neighbors (KNN) algorithm.
After the solution version abstractions, for example, graphs, are clustered into relevant solution clusters C(1), . . . , C(M) within the space 108, the step 300 includes, at block 310, building a path for the subject, wherein each of the subject's path is an ordered sequence of at least one solution abstraction cluster to which at least of the subject's solution abstractions, for example, graphs, belong. More specifically, the solution abstraction clusters tracker 108 is configured for identifying the solution abstraction clusters to which multiple abstractions, for example, graphs, of solution versions from one or more subjects belong. The system 100 includes a path generator 112 for building the paths for individual subjects.
The subject's progress through solution abstraction clusters reflects the subject's progress in understanding the subject through the analysis of the versions of different solutions submitted by the subject over time. In an embodiment, the method includes maintaining the list of behavior clusters that are generated based on at least one path clustering criteria. At block 312, the step 300 includes grouping at least one path of at least one subject into at least one behavioral cluster based on at least one path clustering criteria by the behavior clusters identifier 114.
In an embodiment, the behavior clusters are assigned at least one textual, color-coded, shape-coded or otherwise a descriptive label as identity markers. Some examples of descriptive labels include ‘a subject copied the solutions and does not understand the subject’, ‘a subject does not understand the subject and writes wrong code’, ‘a subject applied a different approach or approaches to solve the problem’, ‘a subject used one approach to solve the problem’, ‘a subject stayed in cluster for too long’, ‘a subject moves from one cluster to another often’, ‘most solutions of the subject lay in an incorrect cluster, but the correct solution is another less numerous cluster’, or ‘subject behavior is different from previously classified behavioral clusters’.
In an embodiment of the system 100, the clustering of paths into behavioral clusters is performed using either an expert artificial intelligence system or a machine learning artificial intelligence system trained on previously analyzed paths. More specifically, the behavior clusters identifier 114 performs the clustering of paths into behavioral clusters, and it can be either an expert artificial intelligence system or a machine learning artificial intelligence system trained on previously analyzed paths.
In an embodiment, the step of identification of the clusters includes the usage of at least one of a Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm or a k-nearest neighbors (KNN) algorithm.
In accordance with an embodiment, if the behavior clusters identifier 114 fails to identify an existing behavior cluster within which a particular path may be grouped, then that path may be considered an outlier path and is then used to form a new behavior cluster.