In the past, students typically performed research for a school project by going to the library, and locating and photocopying books and magazine and newspaper articles that pertain to the topic of the school project. However, since the advent of the Internet and due to the popularity of search engines, students are now much more likely to perform such research online. A student may collect web pages that pertain to the topic of the school project, for instance, in lieu of going to the library and locating relevant books and magazine and newspaper articles.
As noted in the background, today students commonly perform research for school projects online, collecting web pages that pertain to the topics of the school projects, instead of photocopying relevant books and magazine and newspaper articles. However, while computer technology has aided students in how they perform research for school projects, computer technology has not as significantly aided teachers in assessing how well their students have performed such research. Commonly, for instance, a teacher may still have to sift through and review the web pages that a student has collected and which the student believes pertains to the topic of a given school project, to determine how well the student has completed the project.
Embodiments of the present disclosure overcome these shortcomings. In particular, embodiments of the present disclosure permit analytical measures for the articles collected by the student—such as web pages—in relation to the topic of a school project to be determined. Such analytical measures include relevance, coverage, and uniqueness. Relevance indicates how relevant the articles collected by the student are to the topic of the school project. Coverage indicates how well the articles collected by the student completely cover the topic. Uniqueness indicates how unique the articles collected by the student are in comparison to one another.
As such, embodiments of the present disclosure can at least partially relieve a teacher of what can be a painstaking process of manually sifting through and reviewing the articles collected by a student to determine how well the student has completed a school project. Automated analytical measures, such as relevance, coverage, and uniqueness, can provide a teacher baseline values of how well a student has completed a school project. The teacher can thus spend more of his or her time on individualized attention to each student.
For instance, the analytical measures can provide for automatic evaluation as to how well the articles collected by the student satisfy the school project, so that the teacher does not have to manually evaluate the articles. As another example, the progress of the student can be tracked on a week-by-week basis, or on another period basis. As such, the analytical measures can provide the teacher with various metrics as to how well the student is performing in relation to the school projection in question.
A school project is considered as one type of educational project, and can broadly be defined as including activities as varied as due diligence, scientific research, and learn activities, among other types of activities. A student that is assigned or that completes such an educational project thus can be broadly defined as an individual or entity that effects the educational project.
The method 100 determines a number of given concepts related to a selected topic for a school project (102). The topic for the school project is typically selected by a teacher. A concept is a phrase of one or more words that pertain to the topic. The terminology “given concepts” is used herein to distinguish these concepts that are determined in relation to the topic from other concepts, which as described below are determined in relation to the articles collected by a student. As an example, a teacher may select the topic of a school project as the solar system. As such, the method 100 determines given concepts that are related to this topic. Examples of such concepts that may be determined may include the names of various planets, for instance, such as “Saturn,” “Earth,” “Venus,” and so on.
For each document that is located, the following is performed (204). First a general corpus tagging computer program is applied to the document (206). The result of applying this program to the document is the identification, or tagging, of a first subset of the words of the document that relate to a general knowledge domain. An example of such a general corpus tagging computer program is the Penn Treebank tagging computer program, which is available and described at the Internet web site www.cis.upenn.edu/˜treebank/.
A general knowledge domain is a domain of knowledge that encompasses general knowledge relevant across a large number of different topics or areas. A general knowledge domain is compared to a specific knowledge domain that is particular to a given topic or area. For example, a document related to the solar system may include specific knowledge that is particular to both the specific knowledge domain of this topic, as well as general knowledge that is more general, and which pertains to a number of topics including but not limited to the solar system.
The method 200 then extracts a second subset of the words of the document that were not tagged as being part of the first subset (208). These words are presumed to relate to the specific knowledge domain particular to the topic in question. For example, as to the topic of the solar system, once all general knowledge words have been tagged in a located document, the remaining words within the document are presumed to be specific knowledge words pertaining to the solar system. As such, phrases are collected from the second subset of the words that have been extracted, where each phrase includes one or more contiguous words of the document that appear within the second subset of the words that have been extracted (210). The method 200 determines the given concepts related to the topic of the school project as the phrases that have been collected (212).
Referring back to
In equation (1), Wi is the weight of concepti. The function freq(conceptx) is the number of times conceptx appears within all the documents that have been located, which number n.
Articles that have been collected by a student as pertaining to the topic of the school project in question are then received (106). The articles may include web pages, which the student has collected by performing searches using an Internet search engine. However, in other embodiments, the articles may include other types of textual documents that were not found using an Internet search engine and/or that are not web pages. The articles may further include multimedia files, which contain images, audio, and/or video, and which are or have been tagged with text representative of the subject matter of such images, audio, and/or video.
The method 100 determines three types of analytical measures of the articles collected by the student. First, the relevance of the articles collected by the student to the topic of the school project is determined (108). Second, the coverage of how well the articles collected by the student cover the topic of the school project is determined (110). Third, the uniqueness of the articles collected by the student in comparison to one another is determined (112). How each of these different types of analytical measures can be determined in various embodiments of the present disclosure is now described.
Determining the concepts of the article can be achieved in a number of different ways. In one approach, parts 204 and 212 of
An appearance count is then determined for each concept found in the article (306). The appearance count of a concept is equal to the number of times the concept appears in the article. The weighted appearance count is also determined for each concept found in the article (308). The weighted appearance count of a concept is equal to the appearance count determined in part 306, multiplied by the weight of the concept determined in part 104 of
The relevance value for the article is determined (310). The relevance value is equal to an average of the weighted appearance counts for the concepts in the article. Mathematically, the relevance value can be expressed as:
In equation (2), Ri is the relevance value for article i and C is the number of concepts found in the article, whereas freq(conceptj) is the appearance count of conceptj, and wj is the weight of conceptj. As such, freq(conceptj)×wj is the weighted appearance count of conceptj.
Once part 302 has been performed for each article collected by the student for the school project, the relevance of all the articles to the topic is then determined (312). Specifically, the relevance of the articles collected by the student to the topic is determined by averaging the relevance values for the articles. Mathematically, the relevance can be expressed as:
In equation (3), R is the relevance for all the articles collected by the student for the school project and Ri is the relevance value for article i, where there are N total articles.
The method 400 determines whether each given concept determined in part 102 of
A binary vector for the article is constructed (508). The binary vector includes a series of binary values corresponding to the given concepts determined in part 102 of
bveci=<bvali1, bvali2, . . . , bvalim> (4)
In equation (4), bveci is the binary vector for article i. This binary vector has binary values bvali1, bvali2, . . . , bvalim corresponding to the m given concepts, where a binary value bvalix is equal to zero if the given concept x is not found in article i and is equal to one if the given concept x is found in article i.
Once part 502 has been performed for each article located by the student, for each unique pair of articles, a uniqueness value is determined (510). For example, if there are three articles a, b, and c, then there are three unique pairs of articles ab, ac, and bc. The uniqueness value is determined for a unique pair of articles by applying a cosine similarity test to the binary vectors of these articles. The uniqueness value for a unique pair of articles can be mathematically expressed as:
In equation (5), Uab is the uniqueness value for the unique pair of article a and article b having binary vectors bveca and bvecb, respectively. The cosine similarity test for two binary vectors x and y is expressed as cos(x, y), and is equal to the dot product of the two vectors, divided by the product of the absolute values of the two vectors. The cosine similar test of equation (5) results in a value between −1 and 1, where −1 indicates that the two articles do not share any concepts and 1 indicates that they share all their concepts.
The uniqueness of the articles collected by the student for the school project is determined by averaging the uniqueness values for the unique pairs of articles (512). Mathematically, the uniqueness can be expressed as:
In equation (6), U is the uniqueness of the articles collected by the student for the school project and Ui is the uniqueness value for the unique pair of articles i, where there are P total unique pairs of articles.
The methods that have been described can be extended to scenarios in which, besides a topic being selected by a teacher, a number of subtopics of the topic are also selected by the teacher.
Furthermore, the teacher is permitted to select one or more subtopics, from the given concepts related to the topic that have been determined in part 102 (604). For example, if the topic is the solar system, the teacher may select the given concepts “Saturn,” “Earth,” and “Venus” as the desired subtopics. Thereafter, additional given concepts related to each subtopic are determined (102′). Part 102′ is performed in the same way that part 102 is performed, as has been described above in relation to
The weight of each given concept is determined (104′). Part 104′ is performed in the same way that part 104 is performed, as has been described above in relation to
Three types of analytical measures are determined as to the collected articles in relation to the topics and the subtopics, as before. First, the relevance of the collected articles to the topic and the subtopics is determined (108′). Part 108′ is performed in the same way that part 108 is performed, such as by performing the method 300 of
Second, the coverage of how well the collected articles cover the topic and the subtopics is determined (110′). Part 110′ is performed in the same way that part 110 is performed, such as by performing the method 400 of
Finally, the uniqueness of the collected articles in comparison to one another is determined (112), as has been described, such as by performing the method 500 of
In conclusion,
The system 700 includes one or more processors 702 and one or more computer-readable media 704. The computer-readable media 704 stores one or more computer programs 706 that are executed by the processors 702, as indicated by a dotted line in
The system 700 includes a concept generating component 708 and an analytics determining component 710. The components 708 and 710 are each implement by the computer programs 706 stored on the computer-readable media 704 and executed by the processors 702. The concept generating component 708 receives a teacher-selected topic 712, and responsively generates a number of given concepts 714, including their weights. For instance, the concept generating component 708 may perform parts 102, 102′, 104, and 104′ of
The analytics determining component 710 receives the given concepts 714 (and their weights) from the concept generating component 708. The analytics determining component 710 also receives student-collected articles 716. In response, the analytics determining component 710 generates one or more analytical measures 718 regarding the collected articles 716 in relation to the topic 712 selected by the teacher. These analytical measures 718 can include relevance, coverage, and uniqueness, as has been described. The analytical determining component 710 may thus perform parts 106, 108, 108′, 110, 110′, and 112 of