The present disclosure, in some embodiments, concerns a system and method for assessing motor visuo-spatial gestalt and memory abilities. More specifically, but not exclusively, the disclosure is directed to a system for implementing a traditional pen-and-paper gestalt test on a digital platform, for recording metadata regarding a subject's reproduction of an image, and for combining analysis of that metadata with analysis of the reproduction in order to issue a combined evaluation of the subject's gestalt and memory abilities.
Various tests involving production or reproduction of drawings are currently in widespread use, for evaluation of motor or neuropsychological impairments. One commonly used test is the Bender Visual-Motor Gestalt Test (hereinafter, the Bender test). The Bender test is a psychological test used by mental health practitioners that assesses visual-motor functioning, developmental disorders, and neurological impairments in children ages 3 and older and adults. The test consists of a series of index cards picturing different geometric designs. The cards are presented individually and test subjects are asked to copy the design before the next card is shown. Test results are scored based on the accuracy and organization of the reproductions. The Bender test has been in use, in various versions, since 1938.
Another test that requires subjects to copy patterns from cards is the Beery-Buktenica Developmental Test of Visual-Motor Integration. The subject is presented a sequence of images and is asked to copy the images, beginning from a simple line and progressing gradually to more complex geometric shapes. The test assesses how the visual perceptual and fine motor control systems coordinate with one another, or, in other words, how well the motor system produces what the visual system is processing.
Other tests involving production or reproduction of drawings include: the Rey-Osterrieth Complex Figure Test for evaluation of visual perception and long term visual memory; the Clock Draw Test, which is a screening test for people with cognitive impairments and dementia; the Osborn, Butler, & Morris (1984) Copy Design task, which instructs children to copy eight simple geometric designs; and the Trail-Making-Test, which calls upon the subject to connect between a set of numbered dots, in numerical order, without lifting the pen from the page.
Each of the tests described above has traditionally been performed with pen and paper. An exemplary implementation of a drawing reproduction test is shown in
In recent years, image processing technology has been implemented to complement human scoring of pen-and-paper psychological tests. In one example, pen-and-paper examples of the Bender Gestalt test were scanned and uploaded to a computer for processing. The processing included techniques such as segmentation, counting number of drawing components, and computing an area of a bounding box surrounding the drawing. The computer program was programmed to evaluate the drawings on the basis of factors such as simplification, overlapping difficulty, rotation, and perseveration.
In addition, certain psychological tests that have previously been performed off-line have been transitioned to digital platforms. For example, the Corsi block-tapping test requires a subject to mimic a researcher as he or she taps a sequence of up to nine identical spatially separated blocks. The Corsi block-tapping test has been implemented successfully on a tablet computer for several years. More recently, an adaptation of the Trail-Making Test has also been implemented on a tablet.
Pen-and-paper gestalt tests, even when analyzed by a computer, are unable to capture all relevant information regarding the subject's understanding. For example, image processing is unable to evaluate factors such as time spent, pen pressure, or speed. In addition, existing digitized psychological tests, such as the Corsi or Trail-Making Tests, measure only a limited number of parameters, in particular time spent and correctness of input. Many factors relevant to gestalt understanding and memory, such as pen pressure, pen angle, and number of pen lifts, are not relevant to those implementations.
Accordingly, there is a need for a computer-based gestalt test that is able to collect metadata relevant to a patient's visuo-spatial gestalt and memory abilities, and to integrate this metadata into a comprehensive scoring system including evaluation of the patient's reproduction and the metadata. There is also a need for a computer-based gestalt test that is capable of not only of identifying the content of what is drawn by the subject, but also of ranking the quality of the reproduction.
The present disclosure discloses a tablet-based implementation of a pattern-copying test. The tablet displays a drawing on a screen, and prompts the user to reproduce the drawing. A processor collects metadata regarding the user's execution of a copying of the drawing. After the user completes the copy, the processor evaluates the quality of the reproducing using image processing techniques. The metadata and image processing evaluation are fed into separate neural networks. The neural networks output a combined quality score of the reproduction, derived both from the image processing and from the metadata. The processor further outputs the collected metadata as vectors, for separate analysis.
According to a first aspect, a method of evaluating a subject's visuospatial ability is disclosed. The method includes: displaying an original drawing on a screen; receiving input on the screen from a stylus, corresponding to a reproduction of the drawing by the subject; during the receiving step, collecting metadata regarding the process of the reproduction; comparing a resemblance of the reproduction to the original drawing; evaluating the reproduction and the metadata to infer therefrom the subject's understanding of the drawing; and issuing a combined evaluation of the subject's gestalt understanding of the drawing based on a combination of the resemblance comparison and the metadata evaluation.
In another implementation according to the first aspect, the metadata include a reaction time from displaying of the original drawing until commencement of reproduction of the drawing, and a performance time from commencement of reproduction until completion of reproduction.
In another implementation according to the first aspect, the metadata include a number of stylus strokes used to reproduce the drawing.
In another implementation according to the first aspect, the metadata include pressure exerted by the subject with the stylus onto the screen during reproduction.
In another implementation according to the first aspect, the metadata include azimuth angle of the stylus on the screen during reproduction.
In another implementation according to the first aspect, the metadata include age of the subject.
In another implementation according to the first aspect, the comparison of resemblance is based on at least: position of reproduction on the screen, scale of reproduction, orientation of reproduction, and number of elements in the reproduction.
In another implementation according to the first aspect, the method further includes repeating the method with a series of predefined original drawings. Optionally, the metadata include cumulative time of completion of all the predefined original drawings in the series.
In another implementation according to the first aspect, the method further includes: performing the method with a plurality of unique subjects; aggregating the metadata collected from each subject; and, based on the aggregated metadata, determining a kernel density function for performance in one or more measured metadata categories, and deriving a norm for standard performance from the kernel density function.
In another implementation according to the first aspect, the method further includes: comparing the resemblance with a convolutional neural network; evaluating the metadata with a feed-forward neural network; and combining the convolutional neural network and the feed forward neural network at a dense layer phase.
Optionally, the method further includes training the convolutional neural network, the training step comprising: collecting a data set comprising a plurality of sample reproductions of the original drawings; and manually assigning a similarity score to each reproduction.
Optionally, the method further includes augmenting the data set with modified sample reproductions, said augmenting comprising at least one of changing scaling, shifting pixels, and horizontal flipping, in a manner that is sufficiently subtle not to affect a scoring of a given reproduction.
In another implementation according to the first aspect, the combined evaluation is a ranking of the subject's gestalt understanding of the original drawing.
According to a second aspect, a system for evaluating a subject's visuospatial ability is disclosed. The system includes: a mobile computing device including a touch screen configured to display images and to receive input from a stylus; a processor, and a non-transitory computer readable medium; and a computer program product embodied on the non-transitory computer readable medium. When executed by the processor, the computer program product causes the processor to perform the following steps: displaying an original drawing on the screen; during receipt of input on the screen from the stylus, corresponding to a reproduction of the drawing by the subject; collecting metadata regarding the process of the reproduction; comparing a resemblance of the reproduction to the original drawing; evaluating the reproduction and the metadata to infer therefrom the subject's understanding of the drawing; and issuing a combined evaluation of the subject's gestalt understanding of the drawing based on a combination of the resemblance comparison and the metadata evaluation.
In another implementation according to the second aspect, the metadata include a reaction time from displaying of the original drawing until commencement of reproduction of the drawing, and a performance time from commencement of reproduction until completion of reproduction.
In another implementation according to the second aspect, the metadata include a number of stylus strokes used to reproduce the drawing.
In another implementation according to the second aspect, the metadata include pressure exerted by the subject with the stylus onto the screen during reproduction.
In another implementation according to the second aspect, the metadata include azimuth angle of the stylus on the screen during reproduction.
In another implementation according to the second aspect, the metadata include age of the subject.
In another implementation according to the second aspect, the computer program product is configured to compare the resemblance based on at least: position of reproduction on the screen, scale of reproduction, orientation of reproduction, and number of elements in the reproduction.
In another implementation according to the second aspect, the computer program product is configured to repeat each of the steps with a predefined series of original drawings. Optionally, the metadata include cumulative time of completion of all the predefined original drawings in the series.
In another implementation according to the second aspect, the computer program product is further configured to aggregate metadata collected from a plurality of unique subjects, and, based on the aggregated metadata, determine a kernel density function for performance in one or more measured metadata categories, and derive a norm for standard performance from the kernel density function.
In another implementation according to the second aspect, the computer program product further comprises a convolutional neural network for comparing the resemblance, and a feed forward neural network for evaluating the metadata, as wherein the convolutional neural network and the feed forward neural network at a dense layer phase, so as to output a single combined evaluation.
In another implementation according to the second aspect, the combined evaluation is a ranking of the subject's gestalt understanding of the original drawing.
In the drawings:
The present disclosure, in some embodiments, concerns a system and method for assessing motor visuo-spatial gestalt and memory abilities. More specifically, but not exclusively, the disclosure is directed to a system for implementing a traditional pen-and-paper gestalt test on a digital platform, for recording metadata regarding a subject's reproduction of an image, and for combining analysis of that metadata with analysis of the reproduction in order to issue a combined evaluation of the subject's gestalt and memory abilities.
Before explaining at least one embodiment in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of components and/or methods set forth in the following description and/or illustrated in the drawings. The invention is capable of other embodiments or of being practiced or carried out in various ways.
Tablet 12 includes a touch screen 14. The touch screen 14 may be any standard screen or display suitable for implementation in a mobile computing device, such as LCD, OLED, AMOLED, Super AMOLED, TFT, or IPS.
Touch screen 14 is configured to display a graphic user interface 16. The graphic user interface 16 includes a first region 18, in which an original drawing d is displayed, and a second region 20, which includes an open space for a user to reproduce the original drawing d with reproduction r. Optionally, one or more demarcating lines 22 are also displayed, in order to form a boundary between the display region 18 and the reproduction region 20. In the illustrated embodiment, the display region 18 is depicted as being above the reproduction region 20; this configuration is merely exemplary, and other configurations are also possible.
In an alternative testing scenario, the subject is required to retrieve an original drawing from memory following being shown that drawing, and to draw the drawing solely based on memory. In such a scenario, the original drawing may be displayed in the display region 18 for a given period of time, and then erased from the display region 18. Alternatively, the original drawing may be displayed on the entire display 14, and then removed from the display 14.
System 10 also includes stylus 24. Stylus 24 is an active stylus, also known as a digital stylus. An active stylus has digital components or circuitry inside the stylus 24 that communicate with a digitizer of the tablet 12. This communication allows for measurement of features such as pressure sensitivity, tilt, number of pen raises, and timing. Examples of active styluses currently available include the Apple Pencil® and the Microsoft Surface Pen®. The Surface Pen, for example, is capable of detecting 4,096 levels of pressure and has 1,024 levels of tilt sensitivity. In experiments implementing the system and method disclosed herein, the tablet that was used was the Microsoft Surface Pro® 7, and the stylus was the Surface Pen®, version 2.
Tablet 12 also includes a processor (not shown). The processor includes a memory, and circuitry for executing computer readable program instructions stored on the memory. The memory is a non-transitory storage medium having stored thereon code instructions that, when executed by the processor, causes performance of various steps. The storage medium may be, for example, an electronic storage device, a magnetic storage device, an optical storage device, a semiconductor storage device, or any suitable combination of the foregoing. In particular, the functions described herein may be programmed a computer program product installed on the non-transitory computer readable medium of tablet 12. In addition, the functions described herein may be performed by a cloud-based computer, or by a combination of a processor and memory stored on the tablet and a processor and memory stored on a remote device.
Mobile device 12 is further equipped with a communications module for wirelessly communicating with the cloud, for example, via a Bluetooth or wireless internet connection. This wireless connection is used, inter alia, for downloading software updates, including updates to the deep neural networks disclosed herein.
As discussed above, each reproduction is evaluated based on two sets of criteria: the objective quality of the reproduction, as measured by the elements of the reproduction that the subject produced; and metadata collected during the process of the reproduction.
The above-mentioned features may be evaluated without regard to metadata concerning how the reproductions were generated. This evaluation may be performed with both standard computer vision methods and with artificial neural networks. In exemplary embodiments, a computer vision algorithm initially extracts the reproduction's position, as determined by the center of mass, and further analyzes the image's scale, the orientation, the centering, and the skew. On the basis of such evaluations, the system is able to not only determine the content of what is drawn (for example, that there are five lines) but also provide an evaluation of the quality of the reproduction.
As discussed above, system 10 is also capable of collecting metadata regarding the process of formation of the reproduction. As used in the present disclosure, the term metadata encompasses all relevant information about how an image is generated. The metadata includes, but is not limited, to data collected through sensors in stylus 24. For example, in addition to the examples described below, the metadata may encompass the age of the subject, the location of the subject, and the time during which the examination was performed.
The following metadata are collected over the course of each reproduction, before, during, and after receipt of input on the screen 14 from the stylus 24:
It should be understood, of course, that number of pen strokes is not the sole criterion by which gestalt comprehension should be evaluated. There may be examples in which the number of pen strokes is as expected, but the subject exhibited a poor understanding of the drawing. Such is the case in
At step 101, the practitioner displays an original drawing on display 14 of the tablet 12. For example, the practitioner may execute a computer program stored on the memory of the tablet 12.
At step 102, the tablet 12 receives input from the subject corresponding to a reproduction of the original drawing. The input consists of one or more pen strokes of the stylus 24, as described above.
At step 103, during the reproduction of the original drawing, the tablet 12 collects metadata regarding the process of the reproduction. As discussed above, the metadata may include a reaction time from displaying of the original drawing until commencement of reproduction of the drawing; a performance time from commencement of reproduction until completion of reproduction; the number of stylus strokes used to reproduce the drawing; pressure exerted by the subject onto the screen during reproduction; and azimuth angle of the stylus on the screen during reproduction. Metadata such as age may be input into the tablet 12 by the practitioner.
At step 104, the system compares the reproduction to the original drawing. As discussed above, this comparison may be based on one or more of the following factors: position of reproduction on screen, scale of reproduction, orientation of reproduction, and number of elements in the reproduction.
At step 105, the system evaluates elements of the drawing and the collected metadata in order to infer therefrom the subject's gestalt understanding of the drawing. For example, as discussed above, the number of drawing elements may be compared to an expected number of drawing elements, the number of pen lifts may be compared to a number of expected pen lifts, and a time of completion may be compared to an expected time of completion. Step 105 may be performed before, during, or after step 104.
At step 106, the system issues a combined evaluation of the subject's gestalt understanding of the drawing, based on a combination of the resemblance comparison and the metadata evaluation. The combined evaluation may be a ranking of the subject's gestalt understanding, on a linear scale.
Convolutional neural network 201 is used for the image processing (step 104). Convolutional neural networks were originally designed to classify between different images. Their architecture follows specific rules and orders of layers, beginning with an input layer, and proceeding to one or more convolutional layers, pooling layers, and dense layers, also known as fully connected layers, as is known to those of skill in the art.
Feed-forward neural network 202 is used for evaluation of the metadata. Feedforward neural network 202 maps the different metadata onto an output function corresponding to an evaluation of the subject's understanding.
Because the system 10 is designed to produce one score for each reproduction, the convolutional neural network 201 and feedforward neural network 202 are combined at a shared dense layer phase 203. This dense layer phase issues a combined output 204. In exemplary embodiments, the combined output 204 is a ranking of the subject's gestalt understanding of the original drawing on a linear scale, such as a scale of 1 to 4, or an equivalent textual ranking. For example, the output may be one of the words mediocre (lowest level); poor (next higher level); good (next higher level); or perfect (highest level).
In addition to the combined evaluation, the system may output individual vectors corresponding to each of the metadata that was measured, for separate analysis of each of those metadata categories. The individual vectors may be used by the practitioner in order to consider the subject's proficiency in specific measured categories. For example, a subject may draw a perfect reproduction, but may take a long time or use an inordinate number of pen lifts to do so. Optionally, the individual metadata vectors may also be plotted on a scale of normal values for those parameters. These individual metadata vectors may provide context for the overall score output by the system, and may enable the practitioner to identify specific areas of strength and weakness in the subject.
Returning to
Prior to commencing method 100, it is necessary to train the neural network.
Typically, neural networks are trained with a supervised learning process. In a supervised learning model, the network is given a set of N training examples of the form {(X1, y1), . . . (Xn, Yn)}, wherein xi is the feature vector of the i-th example and yi is its label (i.e., class). A supervised learning algorithm seeks a function g:X→Y, wherein X is the input (feature) space and Y is the output (label) space. The feature space includes the drawn image itself, and the metadata parameters regarding the image, as discussed above.
For the supervised learning process to proceed, each supervised learning algorithm needs labeled samples. One exemplary manner to collect labeled samples is to assign a duplication score with a human specialist. For example, researchers may collect a data set including a plurality of sample reproductions of the original drawings. Two independent psychologists may rate each sample reproduction based on various parameters. This rating serves as the “true” label of each sample reproductions.
Following collection and labeling of the samples, the data is augmented. In machine learning, data augmentation is the process of synthetically modifying images without changing their essence. For example, a user may zoom, crop, skew, or rotate an image of a dog without changing it to an image of a cat. Classic data augmentation engages in these synthetic modifications while retaining the same true label of the drawing.
Augmentation is performed in order to increase the number of inputs for the neural network during a supervised learning process. Suppose, for example, that data gathering is conducted by administering 100 tests for multiple age levels of children. Because each test includes 40 drawings, the total number of drawings is 4,000 per age group. This quantity is usually vastly insufficient for a robust machine learning algorithm, let alone a convolutional neural network (CNN) with thousands of different parameters. Therefore, it is necessary to supplement the data with data augmentation.
A challenge of data augmentation in the context of the tests described in the present disclosure is that classic methods of data augmentation distort the images in a way that would change the scoring of those images. For example, image scaling (how big the drawing is relative to the original) and image rotation (how rotated the drawing is relative to the original) are two of the features that the neural network takes into account when computing an overall score. Therefore, one cannot modify these factors while maintaining the original score of the unscaled, unrotated image.
Accordingly, in order to utilize reliable data augmentation, two approaches may be used. First, it is possible to apply “gentle” modifications. Such gentle modifications may include one or more of scaling, pixel shifting, or horizontal flipping, in a manner that is sufficiently subtle so as not to affect a scoring of a given reproduction. Examples of such changes include 1-3% scaling, a 10 pixel shift to the image center, and horizontal image flips (for some images). These changes are hardly noticeable to the naked eye, but are considered different inputs from the perspective of the algorithm. A second approach is to modify the input images more drastically, and then to submit the modified images back to the human specialists for manual scoring.
Continuing to refer to
At step 109, the system may determine a kernel density function for standard performance in one or more metadata categories. A kernel density function is a non-parametric way to estimate the probability density function of a random variable. As applied to the disclosed embodiments, the kernel density function may generate norms regarding the performance of one or more reproductions. The norms may include, for example, metrics regarding the mean and the standard deviation of metadata parameters, such as the average time it takes a child to duplicate a complex drawing. Such metrics do not exist for pen-and-paper tests, and, as a result, children's performance cannot be accurately evaluated as compared to their peers. The ability to generate such a kernel density function, for multiple parameters, represents a significant advance in its own right, made possible by implementation of the assessment test on a tablet.
Exemplary kernel density functions are shown in
The software produces a norm graph for each feature of each shape. For example, in
This application is the United States National Stage of International Patent Cooperation Treaty Patent Application No. PCT/IL2022/050801, filed Jul. 25, 2022, which claims the benefit of U.S. Provisional Patent Application No. 63/226,776, filed Jul. 29, 2021, each hereby incorporated by reference herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2022/050801 | 7/25/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63226776 | Jul 2021 | US |