For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description and the accompanying drawings, in which:
In the present invention, a software based suite uses camera enabled personal digital assistant (PDA) or cell phone hardware to provide enhanced imaging capabilities. For example, when individuals with low-vision need to read material with low contrast or fine print, they may use their mobile camera device to assist them. As shown in
As shown in
Among the many tasks where general magnification is beneficial (e.g., using appliances, reading a wrist watch, or even looking at pictures), benefits may be realized with respect to the ability to read text.
By detecting and making use of the special characteristics of text, text enhancement algorithms may be tuned to improve resolution, contrast and sharpness. For example, users may set a minimal text size displayed, and text lines may be scaled appropriately. Contrast may be customized for the user and for the content. One user with one condition may read better with a given foreground/background combination, while another may prefer the colors reversed. For non-text content, edge enhancement and histogram equalization may provide increased perception, while for text, black characters on an all white background to maximize the contrast may be more easily read. Varying and/or reversing the contrast may be provided as an individual choice.
The vision enhancement tool may be software based, and may be easily downloadable as an application to a device. The system may operate in different modes. First, the magnifier mode may provide direct video magnification and contrast enhancement of the content being imaged through the camera. To process images, a process with filters and effectively a digital zoom obtained through pixel replication may be employed. In addition, simple edge enhancement and contrast enhancement filters may be employed. To obtain higher definition or text enhancement, the user may capture a still image, which may be enhanced and displayed on the device. Since the content is magnified digitally, the screen may not be able to display the entire scene at the increased resolution. Therefore, different mechanisms for navigating through the enhanced content may be provided. For example, in order to navigate through the content, the user may move the device as if he/she was panning over the enlarged scene. The process of text enhancement may be partitioned into the tasks of image stabilization and acquisition, text detection, enhancement, and recognition. The system may be based on a dynamically reconfigurable component architecture. The component architecture may make the system easily extendable so more advanced applications can be integrated. In one embodiment, the system may be configured with only the capabilities needed for a particular enhancement task, thus optimizing resources and keeping the flexibility.
A mechanism may be provided to let the user provide the context so as to accommodate variations in lighting conditions.
Component architecture may manage considerable resources on a small device. The system may operate in standalone mode providing an integrated capability.
The software modules may include the user-interface, image acquisition and display module, text detection module, and the enhancement module.
As shown in
Software reusability and component management may be supported. The component architecture may provide an easy way to develop and test new algorithms 136, and may provide a basis for moving to new devices 138, where resources may be even more limited.
The cameras attached to the camera phone may be directly used as an image capturing device.
A GUI may continue to display video sequences and capture single images at the same time. When text is shown at the center of the display 139, the user may hit a button to capture the image, which may be passed to detection and recognition modules.
Since captured images may be at the resolution of megapixels, which may be significantly larger than the screen resolution, the user interface may allow the user to browse images within limited screen size and resolution. Image browsers may use scroll bars to cycle through image thumbnails and locate images of interest to inspect in full resolution. Another alternative is Zoomable User Interface technology, which may allow a user to watch images with gradually improved resolutions.
Additionally or alternatively, the user may navigate larger images or document images by simply moving the device. The basic concept is that after a static image of the scene is obtained and processed, to obtain an enhanced image, the camera may be retargeted to measure relative motion of the device. When the phone is panned across the scene in a given direction, the view port over the enhanced image may be moved in the same direction, giving the user the perception of scanning across the scene. The sensitivity of the motion may be controlled so that the user gets a smooth scan.
Text enhancement algorithms 157, 158 may be performed prior to display. The techniques may include, for example, perspective distortion correction, image stabilization, deblurring, contrast enhancement, noise removal and resolution enhancement.
The present invention implements an efficient magnification method to improve text resolutions. Generally bilinear interpolation requires floating point calculations, which make it extremely slow since there is no floating point processor in most smart phones. The real time implementation of image magnification at arbitrary scale is a fundamental requirement for lots of applications in mobile image related applications. The simple replication of pixels can satisfy the real-time requirement, the artifact, however, is very obvious.
The present invention uses a look-up table to accelerate the bi-linear interpolation to achieve real-time performance in mobile phones. In this way, the computational speed in the embedded image processing library is accelerated.
When zooming in/out of an image, a pixel in the new image is often projected back at a point with non-integer coordinates in the original image (Point P in
And then interpolation in Y direction is performed:
From this formula one can estimate how many floating point multiplication and addition operations are required to finish the process. Since we know that (x1−x2)=−1 and (y1−y2)=−1, and we need to calculate f(Q11)(x−x2)(y−y2), f(Q21)(x−x1) (y−y2), f(Q12)(x −x2)(y−y1), and f(Q22)(x−x1)(y−y1) these four formulas, each of which requires two floating point subtractions and multiplications. Therefore, each pixel in the interpolated image will requires 2×4+3 floating point additions (subtractions), and 2×4 floating point multiplications. This means the interpolation of an image at VGA (640×480) resolution requires 640×480×11=3379200 floating additions, and 640×480×8=2457600 floating multiplications. As we tested on a DELL Av50 PDA (650 MHzCPU, 64 MB memory) the interpolation of an image to VGA resolution takes almost 2 minutes. The reason is, mobile devices often use software emulation to process floating point calculation, instead of specific hardware floating point processor in PC.
Since many applications mainly handle the text, one bit can be used to represent each pixel: black for foreground and white for background, or versa visa. This is true since clustering based contrast enhancement was performed first. Therefore, [f(Q11) f(Q12) f(Q21) f(Q22) ] has at most 16 combinations. We quantize (x−x1) and (y−y1) at a small step interval t, as shown in
After pre-calculating the look-up table, the image interpolation becomes a memory-indexing process without any floating calculation. As we tested on Dell Av50 PDA, it takes only 10 milliseconds to interpolate an image at VGA resolution. This means we achieved over 200 times acceleration in PDA. The acceleration comes from: 1) The elimination of all floating point calculation, and 2) look-up table. When we move the same experimental protocol to the desktop PC, however, we only observed around 5× acceleration, since there is floating point processor in Desktop PC. The 5× acceleration is mainly achieved through the look-up table. The cost of this acceleration is 160 KB extra memory.
Users may capture the text image from any arbitrary angle. To read the text we need to correct the perspective distortion first. The first step is to calculate the mapping between the ideal, non-perspective image and the real-captured image, which can be described as a plane-to-plane homograph matrix H. For any matrix entry (i, j), Ĥ maps homogeneous coordinate x=(i, j, 1) to its image coordinate X=Ĥx. Suppose we know n matrix entries (xi, yi, 1), and their corresponding image points (Xi, Yi, 1), where i=1, 2, . . . n. The classical way of computing Ĥ is the homogeneous estimation method: First, reshape matrix H as a vector h=(h11, h12, h13, h22, h23, h24, h31, h32, h33)T, and then solve for Mh=0.
When n=4, h is the null vector of M and we have a unique solution for h (assume |h|=1 or h33=1). This means what we only need the coordinates of four corners (P1, P2, P3, P4) in
As shown in
This indicates we can calculate H3 using only seven cross products. Any homography H with the third row H3 calculated by (1) maps the perspective image to an affine image. The next task is to fill the first and second row of Matrix H. The reason we calculate this homography H is: Given any matrix coordinate we can quickly tell its pixel coordinate in the image. From the matrix coordinate (I) to the affine image (II), the transformation is linear and can be easily computer by transforming the base of the coordinate system. In final step we need to transform the affine image (II) to the perspective image (III) by computing H1. We choose the first and second row of H so that it has a near inverse. We have
This inverse only requires the reverse of two signs in the third row of H. In this way it simplifies the coordinate transformation with numerical stability. Normally the numerical inverse often suffers from “division by zero” when H is nearly singular. In summary, we computer the coordinate transformation in the following way:
and use H−1 to map this affine coordinate to the image coordinate.
Under some adverse imaging conditions the majority of the pixel values may lie in a narrow range, potentially making them more difficult to discriminate. One technology that may make it easier to discern subtle contrasts is called contrast enhancement which may stretch the values in the range where the majority of the pixels lie. Mathematically, contrast enhancement may be described as s=T(r), where r is the original pixel value, T is the transformation, and s is the transformed value. T may be linear or non-linear, depending on the practical imaging conditions. The principle is to make light colors (or intensity) lighter and dark colors darker at the same time, so the total contrast of an image can be increased.
The text and background pixels may form two clusters. When image contrast is high, the distance between two cluster centers may be larger. Therefore, a clustering based contrast enhancement method that uses this unique feature of text images for contrast enhancement, may be used to achieve contrast enhancement. First, the two clusters may be found, and then the contrast may be enhanced based on the two clusters.
Histogram stretching may be a very common and effective approach to general contrast enhancement. However, it may not be the ideal technique when the content is pseudo binary. This is illustrated in
The black block and background pixels form two clusters. When image contrast is high, the distance between two cluster centers is larger, and vice versa. Therefore, the two clusters may be found and used to enhance the contrast. The algorithm is described as:
1. Initialization. Choose two initial cluster centers C1(O) and C2(0) representing the black clock and background pixels, which can be random values between, for example, 0 and 255 for gray scale images. Practically the Convergence may be accelerated if C1(0) and C2(0) are selected as values between the minimum, maximum, and the mean of the image pixel values.
2. Pixel Clustering: For each pixel in text image I(ij) at iteration n, calculate the minimum distance: d(i,j)=arg min |I(i,j)−Ck(n)|,k=1,2. The pixels then may be allocated to the cluster with the minimum distance. In this way, the pixels may be partitioned into two clusters C1 and C2 based on this distance measure. The error at iteration n may be calculated as:
where M×N is the size of the image. The iteration may stop when e(n) is smaller than a preset threshold.
3. Updating: Generate the new location of the center by averaging the pixel values in each cluster:
where NC1 and NC2 are the number of pixels in C1 and C2 respectively. The iteration stops where e(n) does not decrease.
After two cluster centers are determined, one center may be put at a small value (0, for example), and another may be put at a large value (255, for example), and the histogram may be stretched based on these two centers.
c) is an example of a contrast enhancement result based on the cluster based contrast enhancement approach.
To make the algorithm be able to applied on mobile devices, a novel binarization method is used which combines the Niblack's and block-based binarization approaches. The approach consists of the following three steps: (i) For each pixel, determine if binarization is required based on a N×N neighborhood using the block-based approach. If binarization is unnecessary, then all pixels inside this neighborhood are set to background and skipped. (ii) For each pixel requiring binarization, calculate the binarization threshold using Niblack's approach and conduct binarization. (iii) Post-process the binary image to remove ‘ghost’ objects.
A special implementation of the computation of sample mean and standard deviation significantly improves the speed of binarization. Given the neighborhood size 5×5, for a pixel at position (i, j), we compute the standard deviation of pixel values in its neighborhood and then decide if the whole block need binarization based on a predefined threshold Tb. If no binarization is required for this block, then we mark all pixels inside this block background and move to the next undecided pixel, which is (i, j+2) in the example. In this way, we can remove all the computation for pixels that don't need binarization. The implementation of this approach is described in details as follows.
To save the computation time, for each image, we pre-compute the accumulated sum AS and square sum ASQ as, where p(i, j) is the pixel value at position (i, j):
After AS and ASQ are obtained, the sample mean m and standard deviation s in a block with left-top corner (i, j) and right-bottom corner (k, l) are computed as:
s=√{square root over (ss−m·m)}, where K is the number of pixels in this block and ss is computed as
To save memory which is critical for mobile devices, the above-mentioned operations are conducted on a image strip with size N×W, where W is the image width and N is the block height. Each time, only values for the middle row pixels in this strip are computed. Once the calculation is done and results are stored, the first row data in the strip is discarded and a new row will be computed based on the previous rows and added to the end of the strip. The process continues until the whole image is finished. By this implementation, we not only saved the computation time, but also saved the intermediate memory/space usage.
To remove the ‘ghost’ objects generated by Niblack's binarization approach, the post-processing step used in Yanowitz and Bruckstein's method (see S. D. Yanowitz and A. M. Bruckstein, “A New Method for Image Segmentation”, Computer Vision, Graphics and Image Processing, Vol. 46, No. 1, pp. 82-95, April 1989) is selected to improve the binarization result. In this step, the average gradient value at the edge of each foreground object is calculated. Objects having an average gradient below a threshold Tp are labeled as misclassified and removed. The detailed procedure is described as follows:
After this post-processing step, most background noise introduced by Niblack's method will be removed, However, the binarized text image is not smooth, especially at the edge of each character, there exist many small spurs which reduced the readability of text. Another observation about this approach is it might introduce broken strokes or holes in the binary image. Further post-processing steps are required to improve the text stroke quality.
To improve the text quality, a swell filter described in R. J. Schilling, “Fundamentals of Robotics Analysis and Control”, Prentice-Hall, Englewood Cliffs, N.J., 1990, is selected to fill the possible breaks, gaps or holes, and improve the text stroke quality. The procedure is described as follows:
An extension of the above conditions is applied to improve the text stroke quality. Scan the entire binary image with a sliding window having the same size N×N. Whenever the central pixel of the widow is a background pixel, count the number of foreground pixels Psw1 inside this window and change the central background pixel to foreground if Psw1>ksw1, where ksw1=0.35N2.
Applying this approach to real mobile-device-captured images, we obtained promising results, which are shown in
The general text resolution enhancement method does not make use of specific information about text shape.
I To reduce the magnifying artifacts we need to make use of text shapes, which can provide information to maintain the high fidelity of the image even when image is magnified at large times.
The present invention uses a text super-resolution enhancement approach based on text shape training. The original method is proposed in H. Y. Kim, “Binary Operator Design by k-Nearest Neighbor Learning with Applications to Image Resolution Increasing.”, International Journal Imaging Systems and Technology, Vol. 11, pp. 331-339, 2000, but is very expensive. We will optimize the algorithm to make it be able to run on mobile devices with limited resources and computation capability.
The basic idea of the method we propose to implement is as follows. Each foreground pixel in the low resolution image is represented with a pattern vector which is generated from pixel values in a N×N neighborhood of this pixel.
Find all different patterns (different pattern number is 29 for a 3×3 neighborhood) representing all foreground pixels in I1. For each appeared pattern instance, find four possible corresponding pixel values in image I2. A voting vector for each foreground pixel pattern in I1 is computed. Assume a single pattern appears M times in I1, the corresponding magnification pixel values in I2 are P(j)=[P0(j)P1(j)P2(j)P3(j)], where j=1, . . . ,M, then the voting vector C=[C0 C1 C2 C3] for this foreground pixel pattern in I1, is computed as follows:
For each pixel pattern in I1, search for the k nearest patterns measured using Hamming distance, and the corresponding voting vectors C(l), l=1, . . . k. Based on all voting vectors of these patterns, the trained magnification output [P0 P1 P2 P3] for this pattern is defined as follows:
where Ch is half of the total number of pixels of the k patterns that attend the voting.
The training results are put into a look-up table which has 29 rows and four columns. Four cells in each row store four output pixel values in image I2. The binary string of the index of the table represents the pixel patterns in I1. For example, if a foreground pixel in I1 has the pixel pattern [0 1 0 1 1 1 0 0 1] and the four pixels it corresponds in I2 have value [0 1 1 1]. Then the 185th (010111001=185) row of the look-up table has values [0 1 1 1].
The training phase is finished once the look-up table is created. When magnifying a given binary image Iu by factor 2, we only examine the foreground pixel, find the corresponding pattern and convert it to four pixels in the magnified image. For example, if a foreground pixel at position (x,y) of Iu has pattern [0 1 0 1 1 1 0 0 1], then the pixels at positions (2x, 2y), (2x+1, 2y), (2x, 2y+1) and (2x+1, 2y+1) of the magnified image have values 0, 1, 1 and 1, respectively. Since the training operations can be performed offline, the magnification operation is converted to the searching of pixel pattern and looking-up in a look-up table. By this modification, text super-resolution operation can be finished in a short time. Experimental results during the writing of this proposal show that this approach can be made to run very fast on mobile devices.
Most super-resolution algorithms are extremely expensive and can not be embedded in the phone. For the present invention, the memory required is the size of the look-up table addressed above. It increases exponentially with respect to the neighborhood size, and linearly with the square of magnification factor f i.e. f2. Given a neighborhood size N×N, and magnification factor f the look-up table size will be 2N×N·f2 bits, if we use one bit to represent a pixel (which is true in the binary image case). For instance, if the neighborhood size is 3×3, and the magnification factor f=2, then the look-up table only occupies 2048 (29·4) bits or 512 bytes with half of bits unused. If the neighborhood size is 4×4, and the magnification factor is also 4, then the look-up table occupies 1048576 (216·16) bits or 131072 bytes. In our initial experiments we found even with N=3, the result is significantly improved. This will leave the magnificent factor as high as possible.
The most time will be spent on off-line training. After training the magnification is just the memory access process of the look-up table.
The following applications are based on the enhancement techniques described above:
The device may also be used to read and present textual information, as illustrated in
Exemplary uses:
Text captured from camera phones may be degraded even after enhancement. For example, touching and/or broken characters may be common. Once the text region is segmented, a hidden Markov model (HMM) approach may be used to handle the touching or broken characters. In this approach a statistical language model may be created in terms of bi-gram co-occurrence probabilities of symbols and models for individual characters. This method may simultaneously segment and recognize characters based on a statistical model.
In the HMM approach, each character may be represented using a discrete HMM. HMM is a generative model: at each discrete time the system may be in a particular state. In this state it throws out one of the allowed symbols. The symbol picking process may be random and may depend on the probability of each symbol in that state. After a symbol is thrown out, the system may jump to another state according to a state transition probability. The HMM parameters may be for example: symbol probability within each state, bi-gram state transition probability, and initial state probability. Model training may be performed as follows: each text line image may be broken into a left-to-right sequence of overlapping sub-images; each sub-image then may be converted into a discrete observation symbol by using a vector quantization scheme; the observation sequence and the corresponding transcription (ASCII groundtruth text) then may be used to estimate the model parameters.
The recognition process may split the text line image into a sequence of sub-images and convert the sub-image sequence into a sequence of discrete symbols. A dynamic programming algorithm may be used to segment the symbol sequence into a sequence of recognized characters.
To further improve OCR accuracy, knowledge in a specific domain may be used to refine the OCR result from the recognition engine. A database consisting of digitized samples of reading material for each task may be developed and used to characterize the distributions of print parameters (e.g., size, font, contrast, color, background pattern, etc.) for each task. The system parameters may be specifically selected based on these specific application domains.
The words that appear in the list of ingredients in a product may be from a very restricted vocabulary. In fact, once the generic category of a product is known, the words that may appear in the contents may be further restricted. Domain knowledge may be used to improve the recognition accuracy of the OCR subsystem. The knowledge may be represented as, for example, dictionaries and/or thesauri. Furthermore the consumer may add words that they encounter in their daily living and create their own user dictionaries.
The system may allow users to spot keywords in large document repositories or in isolated documents in the field. At times, the consumer may be searching for the existence or absence of certain ingredients in a product. For example, an asthma patient might want to confirm that a bottle of wine does not contain sulfites. In such a scenario, words other than “sulfites” (and various orthographic renditions thereof) may not be important. A user-interface may be provided so the user can specify the word.
As described above, both audio and visual feedback of recognized text may be provided to users. For visual feedback, the enhanced text may be overlayed on the display of the camera phone.
Audio feedback may be provided by a Text-To-Speech (TTS) synthesizing technology that reads text out through speakers attached to the camera phone.
A camera equipped mobile phone having image enhancement capabilities may allow the capture and transmission of full page documents, as shown in
Image processing capabilities may dynamically enhance images captured by a camera equipped mobile phone thereby providing copy quality suitable for reproduction, storage or faxing. In addition, captured documents may be stored automatically and mirrored on a server so as to not overwhelm the limited memory available on a mobile device. After a document has been mirrored on a server, a compact signature identifying the document may remain on the mobile device to facilitate document retrieval. Complex document search capabilities may be available on the server side. For example, documents mirrored on the server may be enhanced with OCR metadata and converted to PDF format to enable complex search capabilities. Acquired images may be faxed, emailed and/or shared with other users from the mobile telephone and/or the server. The process for selecting which documents should be mirrored on a server and when documents should be mirrored on a server may be tailored according to a user's preferences.
According to this example, users may fax or email documents acquired by camera phones anywhere and anytime. The unique document enhancement capabilities may remove excess background and provide a readable document image.
The techniques and systems disclosed herein may allow a camera equipped mobile phone to be used as a mobile magnifying glass. Two modes may be provided. (1) Continuous video mode. In this mode, the camera phone may be used just like an optical magnifying glass. For example, the user may move the camera phone around a document (or scene) he wants to read (or view), and an image is captured, enhanced and magnified, as shown in
This description discloses processing techniques for enabling a resource constrained device (e.g., a mobile telephone equipped with a camera) to be used as a business card reader and contact management tool. For example, a smart-phone based business card reader enables a user to turn the user's camera-enabled mobile phone or PDA (personal Digital Assistant) into a powerful contact management tool (
Smart phones are equipped with a robust business card reading capability. As a result, smart phones can be used to read business cards and manage contact information. This capability can be integrated with various devices through wireless connections. In one implementation, a user who receives a business card from a colleague at a conference may find it inconvenient to enter information through the small keypad in a mobile phone. The user captures an image of the business card with the user's mobile phone; text reading software converts the physical card into tagged electronic contact info which can later be synchronized with the information in the user's smart phone or with the contacts in other devices and or applications including, for example, Pocket PC, Outlook, PalmOS, Lotus Notes, and GoldMine, through Bluetooth or other wireless or wired connections.
Field Analysis Using: Contextual Dictionaries
The OCR will use the technology we presented in previous claims. After OCR we use contextual dictionary to refine the OCR result. Words that appear in business cards can form a very restricted lexicon. Some examples are email, com, net, CEO, etc. The domain knowledge can be used to improve the recognition accuracy and conduct the field analysis.
Text extracted from a business card may include, for example, strings of digits, sequences of words, or a combination of strings of digits and sequences of words. A digital string may indicate, for example, a telephone number, a fax number, a zip code, or a street address. A sequence of words may include one or more keywords. A keyword in a sequence of words may identify a particular field to which a portion of the extracted text should be associated. For example, a key word “email” may indicate that a line of extracted text represents an email address. Similarly, a key word “President” may indicate that a line of extracted text represents a person's title.
Extracted text may be searched for digital strings or keywords. Recognition of particular digital strings or key words may be used to associate a portion (e.g., a line) of the extracted text with a particular field. In some instances, it may not be possible to identify one or more fields by digital string or keyword. In such cases, heuristics may be used to identify the one or more fields. For example, a person's name is often found in the same block as the person's title. Typically, a title field is easily identified using a keyword search. Therefore, the person's name may be identified after the person's title has been identified.
Users may be allowed to add words that they encounter in business cards and create their own user dictionaries.
Patients may find it difficult to remember when to take which medication, and in what quantities. These problems may be addressed by a non-intrusive, compact, inexpensive, lightweight and portable solution, which may integrate multiple medication reminder services.
The medication reminder may include software and a camera enabled smart phone to provide enhanced medication reminding and verification capabilities (
Powerful image processing capabilities may be embedded into smart phones which may help improve medication adherence of patients especially if they have low vision and decreased memory. Image based barcode reading software uses cameras mounted in the smart phone to directly decode barcodes.
Some medication barcodes may include Lot/control/batch number and expiration date to protect a patient from receiving a medication that is beyond its expiration date. 1 D Barcodes are symbols consisting of horizontal lines and spaces and are widely used on consumer goods. In retail settings, barcodes may be used to link the product to price and other inventory-related information. Medication barcodes may be designed for tracking medication errors associated with drug products. 2D barcodes and other symbols also may be used to provide information relevant to a patients medicine or retail products.
A smart-phone based medication reminder may include smart-phone based barcode and symbol reading technology which may enroll and verify the medication simply through scanning.
Enrolling Medications
Consider this example. A 73-year-old man takes several different medications. He uses his smart phone preloaded with the medication reminder software as a medication reminding device. Some of these medications are packaged in rectangular boxes, and others in plastic cylindrical bottles. In order to enroll all of the medications into the device, he simply scans the barcode printed on the label either in a plane or in a cylinder. If he has difficulty in aiming the camera phone at the barcode, he merely moves or rotates the boxes or bottles around the camera until the smart phone beeps to indicate it detects and recognizes the barcode. He then sets the daily frequencies by pressing a numeric key (for example, number 2 to indicate to take it twice a day). This completes the enrollment process. The lot number and/or expiration date also may be decoded and saved into the system if the barcode includes them.
Reminder Alarms
Depending on the patient's needs and preferences, when it is time to take medication, the medication reminder phone may ring or present some other sound or even speech signal, vibrate, or both. If desired, this may be followed by a flashing screen, and then speech or text information to provide additional information such as the number of pills to take, as shown in
Verification
After the reminder alarm (and perhaps the informational message), the patient may choose the medication container(s) needed. If he wants to verify that he has chosen the correct one, he may then scan the barcode, and the smart phone may compare the decoded National Drug Code (NDC) number with the one which was previously enrolled and is reminded to take, as shown in
Self-monitoring
Because adults of all ages often complain they can't recall whether or not they have taken their pills, a self-monitoring option may be included. For example, a graphic showing a day's pill schedule may be displayed. Once a pill is taken, the consumer may be able to indicate that on the graphic by pressing a key.
Functional Overview of Technology
The components of the medication reminder may be loosely partitioned into image acquisition, barcode detection, recognition, interface, alarm, Text-to-Speech (TTS), and the implementation of all of these on mobile devices. The system may be based on a dynamically reconfigurable component architecture so that it may be easily plugged into various mobile devices (cell phones, PDAs, etc.).
System Architecture
The component architecture manages a large number of resources on a small device. Physical storage for resources, memory for processing, and power consumption are all considerations. The system may operate in standalone mode providing an integrated capability. Dynamic management is also possible.
The software modules include the user-interface; medication enrollment; a removal and verification module; and a barcode detection, enhancement and recognition module.
As shown in
Software reusability and component management may be supported. The component architecture may provide an easy way to develop and test new algorithms, and it may provide a basis for moving to new devices, where resources may be even more limited.
Interface for the Functionalities
The interface may include, for example, the following functionalities:
Drug information enrollment: The interface may allow users to enter a New Drug record. Users may type the information through the popup keyboard in the smart phone. However, many patients may not be able to do this. Therefore, a barcode reading capability may be provided which allows users to enroll the new drug through a simple scan.
Select Frequency and Time of Doses: The interface may allow users to set the frequency of dose they want to take. For example, they may select from 1 to 6 doses per day, or select hour-based dosing alarm times. After selecting the frequency, they may be able to adjust the alarm time by, for example, using the up and down arrows.
Setting supply reminders: It may be important to maintain an adequate supply of all of medications at all times. Missing doses of certain types of drugs may be very serious, even life threatening. The interface may allow the users to input the total supply and count the actual number of pills if some doses have been taken.
Deleting Drug Items: When patients want to delete an item, they may simply scan the barcode of that drug and select “delete” from the menu in the interface.
Verification: Verification may be accomplished simply through a barcode scanning.
Customization: The interface may be customizable in terms of functionality, so that users with different physical requirements can use the system effectively, allowing for different alarms, vibrations or visual displays.
Summary: The present invention provides robust algorithms for detection and rectification of barcodes on planes and generalized cylinders subject to perspective distortions and logging features allow adherence monitoring by the user, family or medical personnel. The fact that these devices have inherent connectivity (they are networked devices) allows for further remote setup, and monitoring. A cross-platform software architecture so the software based solution can be easily embedded into smart phones with different operating systems. Further, the systems may incorporate combined visual/audio/vibration outputs for alarms.
AMA has developed a technology called Mobile IBARS, which converts users' camera phones into personal data scanners. AMA is commercializing Mobile IBARS technology in the healthcare market with potential licensing deals with a commercial company providing nutrition/diet service for the customers. The principle of Mobile IBARS simulates the general optical barcode scanner by scanning the image line by line and decoding the barcode from the generated 0 and 1 sequence. The steps can be briefly described as:
Image Capture: In this process, the barcode images are captured using the interface customized for a variety of applications (
Scanning: The software scans the image from a starting point and gets a waveform.
Thresholding: Binarization converts the waveform into a rectangular series of pulses (
Sequence Generation: After generating the rectangular waveform, each bar (or space) can be converted to the count of 1s or 0s by ni=wi/wb, where wi is the width of the ith bar (or space), and wb is the module width. The module width can be directly estimated from guard bars.
Decoding: After generating a binary sequence, the decoding is straightforward. For example, in EAN-13 each character is encoded by seven digits, and consists of two bars and two spaces. The check digit can be used to further improve the identification result.
Verification: A barcode often contains a check digit to verify if a barcode is correctly decoded. For example, the last digit of EAN-13 is a check digit which satisfies
where % denotes the mod operation, c1,c2, . . . c12 is the digit sequence, c13 is the check digit, wi=1 if i % 2=1, and wi=3, if i % 2=0. If the verification passes, then the decoded character sequence is correct; otherwise, the program scans the next line until the decoding and verification pass.
The foregoing description of the preferred embodiment of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiment was chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto, and their equivalents. The entirety of each of the aforementioned documents is incorporated by reference herein.
The present invention claims the benefit of the filing dates of the following U.S. Provisional Patent Applications: Ser. No. 60/806,081, entitled “Mobile Image Enhancement” and filed on Jun. 28, 2006 by David Doermann and Huiping Li; Ser. No. 60/746,752, entitled “Business Card Reader” and filed on May 8, 2006 by David Doermann and Huiping Li; Ser. No. 60/746,755, entitled “Medication Reminder” and filed on May 8, 2006 by David Doermann and Huiping Li; and Ser. No. 60/806,083, entitled “Symbol Acquisition and Recognition” and filed on Jun. 28, 2006 by David Doermann and Huiping Li. These prior applications are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60806081 | Jun 2006 | US | |
60746752 | May 2006 | US | |
60746755 | May 2006 | US | |
60806083 | Jun 2006 | US |