1. Field of the Invention
The present application is related to automated techniques for evaluating work product and, in particular, to techniques that employ feature extraction and machine learning to efficiently and consistently evaluate instances of media content that constitute, or are derived from, coursework submissions.
2. Description of the Related Art
As educational institutions seek to serve a broader range of students and student situations, on-line courses have become an increasingly important offering. Indeed, numerous instances of an increasingly popular genre of on-line courses, known as Massive Open Online Courses (MOOCs), are being created and offered by many universities, as diverse as Stanford, Princeton, Arizona State University, the Berkeley College of Music, and the California Institute for the Arts. These courses can attract tens of thousands of students each. In some cases, courses are offered free of charge. In some cases, new educational business models are being developed, including models in which students may be charged for deeper evaluation and/or credit, or in which advertising provides a revenue stream.
While some universities have created their own Learning Management Systems (LMS), a number of new companies have begun organizing and offering courses in partnership with universities or individuals. Examples of these include Coursera, Udacity, and edX. Still other companies, such as Moodle, Blackboard and Canvas, offer LMS designs and services for universities who wish to offer their own courses.
Students taking on-line courses usually watch video lectures, engage in blog/chat interactions, and submit assignments, exercises, and exams. Submissions may be evaluated (to lesser or greater degrees, depending on the type of course and nature of the material), and feedback on quality of coursework submissions can be provided. While many courses are offered that evaluate submitted assignments and exercises, the nature and mechanics of the evaluations are generally of three basic types:
Improved techniques are desired, particularly techniques that are scalable to efficiently and consistently serve large student communities and techniques that may be employed in subject matter areas, such as artistic expression, computer programming and even signal processing, that have not, to-date, proved to be particularly amenable to conventional machine grading techniques.
For courses that deal with media content, such as sound, music, photographic images, hand sketches, video (including videos of dance, acting, and other performances, computer animations, music videos, and artistic video productions), conventional techniques for automatically evaluating and grading assignments are generally ill-suited to direct evaluation of coursework submitted in media-rich form. Likewise, for courses whose subject includes programming, signal processing or other functionally-expressed designs that operate on, or are used to produce media content, conventional techniques are also ill-suited. Instead, it has been discovered that media-rich, indeed even expressive, content can be accommodated as, or as derivatives of, submissions using feature extraction and machine learning techniques. In this way, e.g., in on-line course offerings, even large numbers of students and student submissions may be accommodated in a scalable and uniform grading or scoring scheme. Likewise, large collections of coursework submissions (whether or not graded or scored) or media content more generally, may be efficiently browsed and grouped using techniques described herein.
Specifically, feature extraction and machine learning techniques may be employed to computationally process coursework submissions in a manner that presents a diverse and highly complex input set of coursework submissions to a human (e.g., an instructor or grader) in a manner that is more easily browsable. While the input space, even after feature extraction, is typically high-dimensional (in general, k-dimensional, k typically greater than 8), a mapped-to, human browsable reduced-dimension space derived using techniques in accordance with the present inventions may be of comparatively low (n) dimension, typically a 2- or 3-dimensional presentation/browsing space. Notwithstanding typical dimensionalities, as a general proposition, k simply exceeds n.
It has been discovered that, by using a computationally-reduced-dimension browsing space, instructors or graders, may efficiently browse a large set of coursework submissions, inspect exemplary submissions from various “neighborhoods” or regions of the reduced dimension space. Grades may be explicitly assigned to particular, inspected ones of the coursework submissions, while remaining submissions may be assigned grades by instructor established cut points in the reduced dimension space and/or by association with actually inspected ones of the submissions. In some cases, situations or embodiments, feature sets used to characterize the coursework submissions may be iteratively updated or extended by the instructor or grader, to improve discrimination between submissions that the instructor or grader views as of differing quality or caliber. In some cases, situations or embodiments, feature sets used to characterize the coursework submissions may be iteratively updated or extended by the instructor or grader, to improve clustering of submissions that the instructor or grader views as of like quality or caliber.
In some cases, situations or embodiments, the mapped-to, reduced-dimension space is computed as a self-organizing map (SOM) trained using exemplary works or submissions. Training sets may, in general, come from a current or prior administration of a course, and/or may include works of recognized masters in a particular field. Using the developed techniques, it is possible to administer courses and grade (in a computationally assisted manner) large volumes of submitted work that takes the form of media encodings of artistic expression, computer programming and even signal processing to be applied to media content. In some cases, situations or embodiments, the developed techniques may also, or alternatively, be used for plagiarism detection.
In some embodiments in accordance with the present invention(s), a method for use in connection with coursework submissions includes retrieving from storage, media content to be used in evaluating or organizing the coursework submissions, wherein at least some instances of the retrieved media content constitute, or are derived from, respective ones of the coursework submissions. For each instance of the media content, the method extracts a set of computationally defined features, wherein a set of values for the extracted computationally-defined features together constitute a k-dimensional feature vector that characterizes the corresponding instance of media content. The method initializes elements of an n-dimensional map, n less than k, with current feature vectors. Finally, the method assigns successive instances of the media content to respective elements of the map to which they most closely correspond and iteratively morphing the current feature vectors to produce a self-organized mapping wherein individual instances of the media content are distributed over the map and associated with respective elements thereof. In some illustrative cases, n=2 and k>8.
In some embodiments, the method further includes visually presenting a user with the map; and responsive to selection by the user of a given element thereof, presenting or rendering the associated media content. In some embodiments, the method further includes allowing the user to browse the media content using the map. In some cases, the user is an instructor or grader and the method further includes allowing an instructor or grader to define cut sets in the map; and assigning coursework submissions with grades or scores based on the instructor- or grader-defined cut sets.
In some embodiments, the assigning and iteratively morphing includes: (1) in successive computational cycles, (i) assigning respective instances of the media content and corresponding feature vectors to respective elements of the n-dimensional map, wherein a respective assigned-to element for a given instance of the media content is that for which the current feature vector most closely matches, based on a distance function, the feature vector that characterizes the given instance of media content, (ii) morphing the current feature vector for the assigned-to element toward the feature vector that characterizes the assigned given instance of media content and (iii) further morphing elements of the map spatially proximate the assigned-to element; and (2) further and iteratively morphing respective current feature vectors for elements of the map in accordance with decaying learning coefficients to produce self-organized n-dimensional mapping from k-dimensional feature vector space, wherein individual instances of the media content are associated with respective elements of the n-dimensional map. In some cases, the distance function calculates a Euclidian, Manhattan, Chebyshev and/or Minkowski distance between feature vectors.
In some embodiments, the method further includes identifying likely instances of plagiarism based on correspondence of feature vectors or mapped-to elements of the self-organized mapping. In some embodiments, the self-organized mapping is exclusive such that respective elements of the map are associated with only a single instance of the media content.
In some embodiments, at least some instances of the retrieved media content are, or are derived from, coursework submissions from a prior administration of a course. In some embodiments, the retrieved media content are, or are derived from, coursework submissions from a current administration of a course. In some embodiments, at least some instances of the retrieved media content are, or are derived from, exemplary works of recognized masters in a field of endeavor.
In some cases, the coursework submissions includes computer readable media encodings of expressive media content selected from the set of: captured musical or vocal performance; sketches, paintings, photographic images or other artistic still visuals; and synchronized audiovisual content, computer animation or other video that is itself expressive or visually captures underlying expression such as dance, acting, or other performance.
In some embodiments, the method further includes receiving from an instructor or grader a quality score for a selected instance of media content associated with an element of the map; and propagating the quality score to additional media content associated with neighboring elements of the map, wherein the additional media content constitutes, or is derived from, a respective one of the coursework submissions. In some cases, the propagating of the quality score to additional media content is in accordance with clusterings or gradients represented in the self-organized mapping. In some cases, the quality score is, or is a component of, a grading scale for an assignment- or test question-type coursework submission.
In some cases, the coursework submissions include software code submitted in satisfaction of a programming assignment or test question, the software code executable to perform, or compilable to execute and perform, digital signal processing to produce output media content; the media content includes exemplary output media content produced using exemplary software codes; and the particular quality score assigned to a particular coursework submission is based on the self-organized mapping of computationally defined features extracted from the output media content produced by execution of the submitted software code. In some cases, the software code coursework submission is executable to perform digital signal processing on input media content to produce the output media content; and the exemplary output media content is produced from the input media content using the exemplary software codes.
In some cases, the output media content includes images or video processed or rendered by the software code coursework submission. In some cases, the media content that constitutes, or is derived from, the coursework submission includes an audio signal encoding; and at least some of the computationally defined features are selected or derived from: a root mean square energy value; a number of zero crossings per frame; a spectral flux; a spectral centroid; a spectral roll-off measure; a spectral tilt; a mel-frequency cepstrum coefficients (MFCCs) representation of short-term power spectrum; a beat histogram; and/or a multi-pitch histogram computed over at least a portion of the audio signal encoding.
In some cases, the media content that constitutes, or is derived from, the coursework submission includes an image or video signal encoding; and at least some of the computationally defined features are selected or derived from: color histograms; two-dimensional transforms; edge, corner or ridge detections; curve or curvature features; a visual centroid; and/or optical flow computed over at least a portion of the image or video signal encoding.
In some cases, the extracted computationally-defined features include features computed over segments an audio, video or image encoded by the media content. In some cases, the segments are nested segments. In some cases, the k-dimensional feature vector used to characterize instances of the media content is reduced from a larger feature vector using a principal component analysis computed over the at least a subset of the media content. In some embodiments, the method further includes receiving from the instructor or curriculum designer at least an initial definition of the set of computationally defined features.
In some embodiments, a computational system includes one or more operative computers programmed to perform the method of any of the preceding methods. In some embodiments in accordance with the present invention(s), a computational system including one or more operative computers programmed to perform the method of any of foregoing methods. In some cases or embodiments, the computational system is itself embodied, at least in part, as a network deployed coursework submission system, whereby a large and scalable plurality (>50) of geographically dispersed students may individually submit their respective coursework submissions in the form of computer readable information encodings. In some cases or embodiments, the computational system includes a student authentication interface for associating a particular coursework submission with a particular one of the geographically dispersed students. In some embodiments in accordance with the present invention(s), non-transient computer readable encoding of instructions executable on one or more operative computers to perform any of the foregoing methods.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
The computational techniques described herein address practical challenges associated with administration of educational courses or testing, including on-line courses offered for credit to large and geographically dispersed collections of students (e.g., over the Internet), using advanced feature extraction techniques combined with machine learning (ML) algorithms. The developed techniques are also applicable to browsing of media content, such as that that may be prepared and/or presented in the context of on-line courses or exhibitions. The developed techniques are particularly well-suited to educational or testing domains in which assignments or test problems call for expressive content, such as sound, music, photographic images, hand sketches, video (including videos of dance, acting, and other performances, computer animations, music videos, and artistic video productions). The developed techniques are also well-suited to educational or testing domains in which assignments or test problems include programming, signal processing or other functionally expressed designs that operate on, or are used to produce media content, and which may be evaluated based on qualities of the resulting media content itself. In each of the foregoing cases, conventional techniques for automatically evaluating and grading assignments (typically multiple choice, true/false or simple fill-in-the-blank or short answer questions) are ill-suited to the direct evaluation of coursework that takes on media-rich forms.
Instead, it has been discovered that media-rich, indeed even artistic or expressive, content can be accommodated as, or as derivatives of, coursework submissions using feature extraction and machine learning techniques. In this way, e.g., in on-line course offerings, even large numbers of students and student submissions may be accommodated in a scalable and uniform grading or scoring scheme. Instructors or curriculum designers are provided with facilities to adaptively refine their assignments or testing based on classifier feedback. Using the developed techniques, it is possible to administer courses and automatically grade submitted work that takes the form of media encodings of artistic expression, computer programming and even signal processing to be applied to media content.
The developed techniques provide instructors/curriculum designers with systems and facilities to organize, cluster and summarize huge collections of submitted coursework. A family of machine learning-type techniques employed herein to classify coursework submissions is self-organizing maps (SOM) or self-organizing feature maps (SOFM). SOM, SOFM and like techniques or variations thereon (hereafter “SOM”) generally employ an artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps differ from some other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data in a manner akin to multidimensional scaling. The SOM model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and at least some implementations or variants are sometimes called Kohonen maps or networks.
Like most artificial neural networks, SOMs operate in two modes: training and mapping. “Training” builds the map using input examples (a competitive process, also called vector quantization), while “mapping” automatically classifies a new input vector. In some cases, situations or embodiments, a training data set may be substantially or essentially limited to current coursework submissions themselves. In some cases, situations or embodiments training data may include (or be “salted” with) hand-picked grading exemplars, works of recognized masters, and/or coursework submissions from prior administrations of a course. In each case, SOM techniques are used to classify elements of the data set (typically media content) based on computationally-defined features that are extracted from the elements. The resulting map, and specifically the underlying media content so classified, may then be browsed, inspected and performed (or rendered) in a manner that facilitates exploration of the data set and application of grades or scores to the coursework submissions that are, or correspond to, the underlying media content.
In some embodiments, a self-organizing map is implemented using computational elements or constructs that constitute or may be understood as nodes or artificial neural elements. Associated with each node is a weight vector of a same dimension as the input (or reduced) feature vectors and a position in the map space. A typical arrangement of nodes is a two-dimensional regular tessellation of elements of cells in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher dimensional (k-dimensional) input space to a lower dimensional (here 2-dimensional, but generally, n-dimensional) map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector.
Finally,
The resultant SOM state 213 is indicative of a trained-in mapping of feature vectors to positions in the map. This trained-in mapping may be employed in the classification of additional coursework submissions.
For each media file (i.e., each coursework submission or instance of media content), we analyze and compute “features”. We begin by dividing the media file into a number of smaller segments (e.g., regions of an image, or sections of a piece of music), and then analyze and extract the features for those segments. Each segment from a media file may also be segmented further as part of the process of deriving features for that segment. This nested segmentation then allows for one or more of the following statistics to be generated for each feature per media file:
1. Mean of the individual segment means
2. Standard Deviation of the individual segment means
3. Minimum of the individual segment means
4. Maximum of the individual segment means
5. Mean of the individual segment standard deviations
6. Standard Deviation of the individual segment standard deviations
7. Minimum of the individual segment standard deviations
8. Maximum of the individual segment standard deviations
These statistics are computed for a wide variety of standard computationally-defined features extractable for media instances in the database (e.g., in data set 220, recall
At this point, the files are preprocessed using PCA (Principal Component Analysis) technique in order to reduce the number of features that contribute similar information about the media file, e.g., inches and cm would both give the same information regarding someone's height, so based on analysis they could be combined into a single feature. After this step, the features are normalized between 0 and 1. This not only evens out the impact of features that may have very large ranges, but also simplifies the mapping of feature vectors to color values in the SOM. Once this preprocessing is completed, normalized feature vectors are re-exported to ARFFs and loaded into the SOM application. Each media file from the database is tagged or linked to its corresponding feature vector (row) in the ARFF. The SOM then creates a two dimensional map, where each node/cell of the map contains a random vector (i.e. media example/instance) from the ARFF file. The SOM then randomly grabs one of the feature vectors from the ARFF, and then finds the node in the SOM that best matches this vector using a distance function such as Euclidian Distance. Note that in some situations or embodiments, it may be desirable to use or include other distance functions such as, e.g., Manhattan distance, Chebyshev distance and/or Minkowski distance. Once the best matching SOM node is found, the values in the node's vector are morphed towards the vector from the ARFF, taking into account a learning coefficient.
In this way, the SOM node comes closer to representing a media file from our database. Once this initial morphing is done, all nodes within a specified radius from the initial SOM node are also morphed; the amount of morphing decays as a function of the distance from the initial node. Both the learning rate and the radius are initial parameters defined by the user at the start of the SOM. These two parameters exponentially decay over a period of time, decreasing the amount of change that new ARFF vectors impart on the SOM map.
This process is iteratively repeated (recall iteration 230,
At this point, the application finalizes associations between a single media file and its best matching node within the SOM—resulting in clusters of like media files. This can be thought of as clustering sub-genres of music, styles of painting, etc. The approach to mapping creates gradients between clusters/groups of such files.
As an example, a SOM implementation run on a collection of media content that includes recordings of early rock and roll and classic blues artists might expose a boundary along songs where the two genres overlap in musical styles or sounds. In at least some modes, situations or embodiments, the applied mapping technique is exclusive, i.e., a single media file is associated to a given map element. This exclusive-mapping technique contrasts with more traditional SOM mappings, in which the SOM may map files to nodes in a non-exclusive manner, creating a stacking approach. While this stacking approach enables hard-clusters (all best matches stack on top of each other, vs. spreading out across the map in a gradient of similarity), the exclusive-mapping technique has been found to present a browsing user with more interesting boundaries and edge relationships between the media files. This enables the SOM to act both as a classification tool, and also an exploration tool, when the subtle differences between media files displayed and navigated in on a gradient map.
Our SOM technique and application is also capable of presenting both human tagged color-coding, and color coding generated off of the raw feature data. This is useful to see ways in which the media may be similar, even when human labeling places them in separate artistic categories. Comparing these labels can give insight into new artistic genres, works, or media that students, teachers, or users may want to explore, but didn't know existed.
Lastly, the SOM can also be used to compare against exemplary homework files in order to gauge how well a student, or class is completing the assignments. Additionally, the SOM could be run over a large corpus of assignments to see how different assignments are between students. If the distance between two assignments is less than some threshold, it may be a flag for the professor to look for possible plagiarism issues. This is also the case if a student has tried to submit the same assignment for multiple homework assignments.
Although embodiments of the present invention are explained in the context of certain illustrative SOM-based techniques, certain operational aspects may be more generally understood simply in the context of media content classification.
The system extracts (532) computationally-defined features from the training examples, and these extracted features are used to train (533) a computational system that implements a SOM-based classifier. Training of the SOM-based classifier to map individual coursework submissions to positions in the map (recall map 200,
In general, as part of a potentially iterative coursework evaluation design process, an operative set of feature extractors may be interactively selected or defined (531) for a particular assignment or test question, and refined based on efficacy. In some cases, SOM-based systems or methods in accordance with the present inventions may be augmented to provide (and/or guide) the instructor or curriculum designer through a menu or hierarchy of feature extractor selections and/or classification stages. In some cases, it may be desirable to allow the instructor/curriculum designer to identify (or label) what is good or bad about the training examples. For example, in an application of the developed techniques to grading of music coursework submissions, systems and methods may allow the instructor/curriculum designer to note that a training example (or even a particular coursework submission inspected) is (1) in the key of C, (2) has more than 2, but less than 10, discernible sections, (3) has a very strong beat, etc.
The classifier learns to categorize submissions in accordance with SOM-based mapping, with or without augmentation based on the instructor or curriculum designer's classifications (e.g., on a grading scale or against a rubric), and (at least during training) provides the instructor or curriculum designer with feedback as to how well submissions will be able to be classified based on the current training. In some cases, the system can make suggestions as to how to change the task or criteria so that submissions are easier to classify and thus grade. In response, the instructor or curriculum designer can modify the assignment or evaluation criteria, resubmitting the original examples or modified ones, until they (and the system) are satisfied that the system will perform well enough for grading student submissions.
“Good” examples (i) can come from historical or current masters in the field or can be examples that are representative of the style being emulated, (ii) can be generated by the curriculum designer or (iii) can include previous student submissions from prior administrations of the course (or even hand-picked grading exemplars from a current administration of the course). In some cases or educational domains, initial bad examples can be provided by the curriculum designer or drawn from student submissions (whether from prior administrations or current exemplars). In some cases, once the system is used to offer a course or evaluate an assignment once, prior training serves as a baseline and hand-selected student submissions are thereafter used to re-train the system, or refine the training, for better results.
Aside from speed and convenience, and the ability to evaluate thousands rather than tens of submissions in a short time can provide significant advantages. Furthermore, in some cases or embodiments, computational capabilities of the classifier may be scaled as needed, e.g., by purchasing additional compute power, such as via cloud services or compute farms. Additional benefits include absolute objectivity and fairness. All assignments can be evaluated by exactly the same rules and trained machine settings. This is not the case with a collection of human examiners, who inevitably bring biases and grow fatigued during/between sessions, yielding inconsistent results.
Typically, once the SOM is trained, additional coursework submissions 511 are presented for automated evaluation as computer readable media encodings uploaded or directly selected by students from their respective workspaces. In some cases, a course administrator may act as an intermediary and present the coursework submissions 511 to an automated coursework evaluation system. Suitable encoding formats are dependent on the particular media content domain to which techniques of the present invention are applied and are, in general, matters of design choice. Nonetheless, persons of skill in the art, having benefit of the present disclosure will appreciate use of suitable data access methods and/or codecs to obtain and present media content in data structural forms that are, in turn, suitable or convenient for feature extraction and use as classifier inputs in any particular implementation of the techniques described herein.
As online courses become more popular and are offered for credit, a further concern arises related to verification and authentication of the student taking the course and submitting assignments. Cases of fraud, e.g., where someone is hired to do the work for someone else who will receive credit, must be avoided if possible. Some institutions who offer online for credit require physical attendance at proctored examinations. As online courses expand to offer credit in geographically diverse locations, and as class sizes grow, supervised exams can become impractical or impossible. The techniques implemented by systems described herein can help with this problem, using the same or similar underlying frameworks for voice, face, and gesture recognition. If required, the user can be required to authenticate themself, e.g., via WebCam, with each assignment submission, or a number of times throughout an online exam. In some cases or embodiments, a student authentication may use the same or similar features used to grade assignments to help determine or confirm identity of the coursework submitter For example, in some cases or embodiments, computationally-defined features extracted from audio and/or video provided in response to a “Say your name into the microphone” direction or a “Turn on your webcam, place your face in the box, and say your name” requirement may be sufficient to reliably establish (or confirm) identity of an individual taking a final exam based, at least in part, on data from earlier coursework submissions or enrollment.
One illustrative example of a media-rich educational domain in which techniques of the present invention may be employed is music programming, i.e., digital signal processing as applied to audio encodings of music. In a music programming course, students may be given an assignment to develop a computer program to perform some desired form of audio processing on an audio signal encoding. For example, a student might be assigned the task of developing programming to reduce the dynamic range of an existing audio track (called compression), by computing a running average of the RMS power in the signal, then applying a dynamically varying gain to the signal in order to make louder segments softer, and softer segments louder, thus limiting the total dynamic range.
In support of such an assignment, the systems and methods described herein may perform an initial textual and structural evaluation of the students' submitted coursework (here, computer code). Using lexical and/or syntactic analysis, it is possible to determine conformance with various elements required by the assignment, e.g., use of calling structures and required interfaces, use of particular computational primitives or techniques per the assignment, coding within storage use constraints, etc. Next, the systems and methods may compile the submitted code automatically (e.g., to see if it compiles; if not, the student must re-submit). Once compiled, the coursework submission may be executed against a data set to process audio input and/or generate audio output. Alternatively or in addition, the student's submission itself may include results (e.g., an encoded audio signal) generated by execution of the coursework submission against a data set to process audio input and/or generate audio output. In either case, the audio features are extracted from the audio signal output and supplied to the classifiers of the machine-grading system to produce a grading or score for the coursework submission.
Although the selection of particular audio features to extracted may be, in general, assignment- or implementation-dependent (and in some cases or embodiments may be augmented or extended with instructor- or curriculum-defined feature extraction modules), an exemplary set of audio feature extraction modules may be provided for selection by the instructor or curriculum designer. For example, in some cases or embodiments, the following computationally-defined feature extractions may be provided or selected with computations over windows of various size (20, 50, 100 ms, 0.5 s, 1 s typical):
In general, means and standard deviations of these and/or or other extracted features are computed (often over different windows) and used to characterize sound and music. In some cases, signals may be segmented, and features computationally extracted over contextually-specific segments. Using various metrics, distance functions, and systems, including artificial neural network (NN), k-nearest neighbor (KNN), Gaussian mixture model (GMM), support vector machine (SVN) and/or other statistical classification techniques, sounds/songs/segments can be compared to others from a previously scored or graded database of training examples. By classifying coursework submissions against a computational representation of the scored or graded training examples, individual coursework submissions are assigned a grade or score. In some case or embodiments, features (or feature sets) can be compressed to yield “fingerprints,” making search/comparison faster and more efficient. Based on the description herein, persons of ordinary skill in the art will appreciate both a wide variety of computationally-defined features that may be extracted from the audio signal of, or that derived from, a coursework submission and a wide variety of computational techniques for classifying such coursework submissions based on the extracted features.
Although music programming and extracted audio features are used as an initial descriptive context for certain exemplary embodiments in accord with the present invention(s), persons of ordinary skill in the art will (based on the present disclosure) also appreciate applications to other media-rich, indeed even expressive, content. For example, turning illustratively to images and/or video, it will be appreciated that it is possible to compute features from still images, or from a succession of images (video) using similar or at least analogous techniques applied generally as described above. Techniques in this sub-domain of signal processing are commonly referred to as “computer vision” and, as will be appreciated by persons of ordinary skill in these arts having benefit of the present disclosure, analogous features for extraction can include color histograms, 2D Transforms, edges/corners/ridges, curves and curvature, blobs, centroids, optical-flow, etc.
As before, means and standard deviations of these and/or or other extracted features are computed (often over different windows) and used to characterize the images or video. Again, extracted feature sets may be used to classify coursework submissions against a graded or scored set of exemplars.
While the invention(s) is (are) described with reference to various embodiments, it will be understood that these embodiments are illustrative and that the scope of the invention(s) is not limited to them. Many variations, modifications, additions, and improvements are possible. For example, while certain illustrative signal processing and machine learning techniques have been described in the context of certain illustrative media-rich coursework and curricula, persons of ordinary skill in the art having benefit of the present disclosure will recognize that it is straightforward to modify the described techniques to accommodate other signal processing and machine learning techniques, other forms of media-rich coursework and/or other curricula. Likewise, while techniques have been described in the illustrative context of self-organizing maps (SOMs), persons of skill in the art having benefit of the present disclosure will appreciate applicability of the techniques described (and claimed) to related machine learning techniques, such as, for example, generative topographic maps (GTMs) which can be understood as a probabilistic counterpart of the SOM.
Both instructor-side and student-side portions of a feature extraction and machine learning system process flows for media-rich coursework have been described herein in accordance with some embodiments of the present invention(s). In simplified, yet illustrative use cases, the instructor or curriculum designer may provide the SOM with a set of exemplars that she scores or labels as “good” and provides set of exemplars that she scores or labels as “bad.” The instructor or curriculum, then may train the illustrated computational machine by selecting/pairing features or expressing rules or other decision logic, as needed, to computationally classify the exemplars (and subsequent coursework submissions) in accord with the desired scorings or labels. As will be appreciated by persons of ordinary skill in the art having benefit of the present disclosure, scores, classes or labels of interest may be multi-level, multi-variate, and/or include multiple tiers and/or more complex categorizations. For example, classifiers may be trained to classify in accordance with instructor or curriculum provided scores (e.g., ratings from 0 to 6 on each of several factors, on a 100-point scale or, in some cases, as composite letter grades) or labels (e.g., expert/intermediate/amateur), etc.
Likewise, while audio processing, music programming and extracted audio features are used as an descriptive context for certain exemplary embodiments in accord with the present invention(s), persons of ordinary skill in the art will (based on the present disclosure) also appreciate applications to other media-rich, indeed even expressive, content. For example, turning illustratively to visual art, images and/or video, it will be appreciated that it is possible to compute features from still images, or from a succession of images (video) using similar or at least analogous techniques applied generally as described above. Techniques in this sub-domain of signal processing are commonly referred to as “computer vision” and, as will be appreciated by persons of ordinary skill in these arts having benefit of the present disclosure, analogous features for extraction can include color histograms, 2D Transforms, edges/corners/ridges, curves and curvature, blobs, centroids, optical-flow, etc. In video targeted applications, detection of sections and transitions (fades, cuts, dissolves, etc.) may be used in automated grading, particularly where a rubric asks students to use at least one each of jump-cut, fade, cross-dissolve, or to have at least three separate “scenes”. As with the audio processing examples above, decision tree logic and computationally-defined features may be employed to detect sections and transitions (here, fades, cuts, dissolves, etc.) between sections. If statistics for sections differ, grading/scoring can be based on the presence, character or structure of the sections or transitions and correspondence with a rubric. Means and standard deviations of these and/or or other extracted features are computed (often over different windows) and used to characterize the images or video. Again, extracted feature sets may be used to classify coursework submissions against a graded or scored set of exemplars.
Using various metrics, distance functions, and systems including SOM-based artificial neural network (NN) systems described herein, k-nearest neighbor (KNN), Gaussian mixture model (GNN), support vector machine (SVN) and/or other statistical classification techniques, images/video segments can be compared to others from a previously scored or graded database of training examples. While illustrative techniques based on a particular distance functions, learning coefficients and spatial/temporal decay have been described, it will be appreciated that any of a variety of distance functions, learning functions and decays may be employed. By classifying coursework submissions against a computational (SOM) representation of the scored or graded training examples, coursework submissions are grouped and individual and representative submissions may be assigned a grade or score that may be consistently propagated to neighboring submissions and throughout the map media content based on the information encoded in the SOM itself.
Embodiments in accordance with the present invention(s) may take the form of, and/or be provided as, a computer program product encoded in a machine-readable medium as instruction sequences and other functional constructs of software, which may in turn be executed in a computational system to perform methods described herein. In general, a machine readable medium can include tangible articles that encode information in a form (e.g., as applications, source or object code, functionally descriptive information, etc.) readable by a machine (e.g., a computer, server, virtualized compute platform or computational facilities of a mobile device or portable computing device, etc.) as well as non-transitory storage incident to transmission of the information. A machine-readable medium may include, but is not limited to, magnetic storage medium (e.g., disks and/or tape storage); optical storage medium (e.g., CD-ROM, DVD, etc.); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or other types of medium suitable for storing electronic instructions, operation sequences, functionally descriptive information encodings, etc.
In general, plural instances may be provided for components, operations or structures described herein as a single instance. Boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in the exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the invention(s).
The present application claims priority under 35 U.S.C. §119(e) or U.S. Provisional Application No. 61/895,917, filed Oct. 25, 2013.
Number | Date | Country | |
---|---|---|---|
61895917 | Oct 2013 | US |