SYSTEMS AND METHODS FOR FOOD ANALYSIS, PERSONALIZED RECOMMENDATIONS AND HEALTH MANAGEMENT

BACKGROUND

A plethora of databases and services are available to offer food and nutrition advices. Examples of such databases and services include healthcare providers, food or nutrition manufacturers, restaurants, online and offline food recipes, and scientific articles. Whether it be for recreational, cosmetic, medical, or other purposes, individuals inevitably have to rely on multiple sources of information to make food and nutrition-related decisions every day.

Numerous attempts have been made to generate a collection of data on nutrition information of commonly consumed foods, as well as their effects on human health. However, these databases usually struggle with inconsistences, unreliability, and generally low quality as data are often collected or crowd-sourced from user inputs. In addition, because many of the attempts have targeted specific populations, geographies, or food categories within a defined period of time, the resulting databases are often fragmented in scope and time. Such fragmentation limits their applicability. Furthermore, these databases depend on different sources of data (e.g., mobile devices, glucose monitors, social media, etc.) that are often incompatible with each other. Without an alternative, individuals continue to be dependent on limited and incomplete databases and/or services to piece together decisions for food and nutrition.

Thus, there is a need for systems and methods that can continuously collect a vast amount of data (e.g., ingredients in a dish, nutrition information, glucose levels, blood pressures, temperature, etc.) from discrete sources, analyze and restructure the data into a common format, assess and predict correlations between an individual's consumed foods and biomarkers, and provide personalized nutrition recommendations based on the individual's health and metabolic state at any given time.

SUMMARY

According to an aspect of the disclosure, a method for mapping foods is provided. The method can comprise abstracting information from data relating to the foods to develop a food ontology.

In some embodiments, the foods can comprise beverages.

In some embodiments, the food-related data can be obtained from a plurality of different sources.

In some embodiments, the food-related data can be abstracted using one or more algorithms comprising at least one machine learning algorithm.

In some embodiments, one or more algorithms can be configured to classify the food-related data into one or more categories. One or more algorithms can comprise (1) a natural language processing (NLP) algorithm, (2) a computer vision algorithm, or (3) a statistical model. The computer vision algorithm can further comprise artificial intelligence (AI) deep learning or optical character recognition (OCR) capabilities.

In some embodiments, one or more categories of the food ontology can comprise (1) elementary foods, (2) packaged foods, (3) food recipes, or (4) food dishes. The food ontology can comprise inter-relations between different foods or their respective nutrients within two or more categories. The food ontology can comprise inter-relations between different foods or their respective nutrients within a same category.

In some embodiments, the food ontology can comprise a graphical representation depicting the information relating to different foods. The food ontology can further comprise displaying the graphical representation of the food ontology on an electronic display. The graphical representation of the food ontology can be two-dimensional. The graphical representation of the food ontology can be multi-dimensional comprising three or more dimensions.

In some embodiments, the information can relate to one or more of the following: (1) ingredients within one or more foods, (2) one or more categories to which foods belong, or (3) the inter-relations between different foods or types of foods.

In some embodiments, the information can relate to micronutrients, macronutrients, phytonutrients, molecular ingredients, chemicals, antioxidants, or additives within one or more foods. The additives can comprise preservatives, artificial coloring, flavors, or fillers within one or more foods.

In some embodiments, the food ontology can comprise one or more layers of abstraction for different foods. One or more layers of abstraction can comprise one or more classes or subclasses of foods. One or more layers of abstraction can comprise a plurality of metadata layers for each food.

In some embodiments, the plurality of metadata layers for each food can comprise a metadata layer for one or more of the following: (1) food name, (2) description of the food, (3) ratings for the food, (4) one or more images of the food, (5) food characteristics, (6) food ingredients, (7) food processing information, or (8) claims by food manufacturers. The claims by food manufacturers can include information as asserted by the food manufacturers on their product labels, websites, advertisements, etc.

In some embodiments, the plurality of metadata layers for each food may comprise a first metadata layer comprising nutritional information about the food, and a second metadata layer comprising non-nutritional information about the food.

In some embodiments, the food ontology can comprise displaying one or more of the plurality of metadata layers, by overlaying the metadata layer(s) onto the graphical representation of the food ontology on the electronic display.

In some embodiments, the food-related data can be obtained from a plurality of different sources, and the plurality of different sources can comprise Internet sources and mobile devices.

In some embodiments, one or more categories of the food ontology can comprise (1) elementary foods, (2) packaged foods, (3) food recipes, or (4) food dishes. The food-related data can further comprise visual, audio or textual data of items from one or more categories. The visual data may further comprise images of the items obtained from one or more Internet sources. The visual data may further comprise images of the items captured using one or more imaging devices. The items may further comprise labels on the packaged foods containing nutrition information and a list of ingredients. The items may further comprise food menus available in physical form or in an electronic format.

In some embodiments, the food ontology can comprise a metadata layer for food characteristics. The food characteristics can comprise (1) dietary needs, (2) allergies, (3) category or categories, (4) type of cuisine, (5) flavors, (6) nutritional characteristics, (7) food textures, or (8) food geolocation and availability information.

In some embodiments, one or more algorithms of the food ontology can determine the food characteristics from the food-related data. The NLP algorithm can be used to automatically parse ingredients from the food recipes of the food-related data.

In some embodiments, one or more algorithms can determine an amount of each known ingredient in the packaged foods based on labels on the packaged foods. One or more algorithms can further estimate type(s) and amount(s) of unknown or unlisted ingredients in the packaged foods. The type(s) and amount(s) of unknown or unlisted ingredients in the packaged foods can be estimated after the amount of each known ingredient has been determined.

In some embodiments, one or more categories of the food-related data can comprise food dishes, and the food dishes can comprise restaurant dishes. The statistical model of one or more algorithms can determine a probability of one or more ingredients appearing in the restaurant dishes, and an expected distribution of an amount of each ingredient.

In some embodiments, the computer vision algorithm and the NLP algorithm can convert images of labels on the packaged foods into structured data. The computer vision algorithm and the NLP algorithm can convert images of restaurant menus into structured data. The images of the restaurant menus can comprise text or pictures of restaurant dishes offered on the restaurant menus.

In some embodiments, the method for mapping foods can further comprise mapping the structured data onto the food ontology. The structured data can be mapped onto the food ontology substantially in real-time. The structured data can be mapped onto the food ontology (1) as the images of the labels on the packaged foods are obtained, or (2) as the images of the restaurant menus are obtained.

In some embodiments, at least a portion of the food-related data can be obtained using one or more automated web-crawlers. One or more automated web-crawlers can be configured to search for the portion of the food-related data from Internet sources. The portion of the food-related data from the Internet sources can comprise (1) websites of restaurants with menus posted online, (2) websites of food manufacturers, and/or (3) food recipe websites.

In some embodiments, the portion of the food-related data from the Internet sources can be unstructured, fragmented or disorganized. One or more algorithms can be used to detect a structure of each of the websites. The structure can be detected by detecting an Xpath corresponding to each food name, description, price, and/or ingredients.

In some embodiments, one or more automated web-crawlers can be configured to search the Internet sources in a continuous manner and update the food ontology substantially in real-time. One or more automated web-crawlers can be configured to substantially enhance retrieval of the food-related data from the Internet sources. An amount of the food-related data obtained from the Internet sources can be increased by two or more orders of magnitude.

In some embodiments, the method for mapping foods can further comprise utilizing the food ontology for one or more of the following purposes: (1) estimate nutritional values for recipes and/or restaurant dishes; (2) provide food and health recommendations to a user, and to gain an understanding of the user's taste profile; (3) construct food logs; (4) generate missing elementary foods from existing packaged foods; (5) generate more accurate labels of food characteristics; (6) analysis of food costs; (7) model effects of cooking on nutritional values and estimate degree of food processing; (8) improved image classification or computer vision classification of foods; (9) improved analysis of voice-based food log; and (10) track food consumption with aid of a plurality of devices comprising of wearable devices and/or digestible devices.

In some embodiments, the method for mapping foods can further comprise utilizing the food ontology to build one or more models that predict one or more user's eating habits based on (1) their historical food consumption data, (2) relations between different foods derived from the food ontology, and (3) the context (location and time of day). One or more models can be configured to (i) detect food consumption patterns in broad populations, (ii) detect food consumption patterns in individuals, (iii) combine the detected food consumption patterns in (i) and (ii) by determining in which type of a population that a specific user/individuals belongs. A user can indicate a new food consumption, and one or more models can be configured to make predictions about the most likely foods that the user is consuming or consuming, by fine-tuning the food selection(s) and readjusting the predictions for the most likely next food items. One or more models can be configured to autocomplete information about an entire meal that the user is consuming, based on one or more food items indicated by the user. The auto-completion capability can allow user clicks or inputs to be reduced by over 50%.

In some embodiments, a portion of the data relating to the foods can be obtained using one or more nutrition trackers. Each of one or more nutrition trackers can be in digital communication with an application programming interface (API), which API is in digital communication with (1) a database for storing data and (2) a software and/or application comprising a graphical user interface (GUI) for receiving the data from and/or sending the data to a user. The API of the each of one or more nutrition trackers can (1) allow a user to record the user's food intake and (2) provide nutritional and/or caloric information of the user's food intake, and wherein the API feeds all data in the database.

In some embodiments, the portion of the data obtained using one or more nutrition trackers can be unstructured, fragmented or disorganized. One or more algorithms can be configured to (i) convert the data into structured data and (ii) organize the structured data into multiple layers of information in the food ontology. The structured data can be standardized into a common format. The food ontology can be used as a meta object that standardizes, organizes, and provides additional layers of information on top of existing food and nutrition databases. A food item containing an abundance of information from databases of one or more nutrition trackers can be mapped into the food ontology. The food ontology can organize the information by building one or more layers.

According to another aspect of the disclosure, a system for mapping foods can comprise a receiving module configured to obtain food-related data from a plurality of different sources. The system of mapping foods can further comprise a food mapping module configured to (1) abstract the food-related data, and (2) use the abstracted data to develop, update, and/or improve a food ontology comprising inter-relations between different foods.

A further aspect of the disclosure is directed to a tangible computer readable medium. The tangible computer readable medium can store instructions that, when executed by one or more processors, causes one or more processors to perform a computer-implemented method for mapping foods. The tangible computer readable medium can be configured to obtain food-related data from a plurality of different sources. The tangible computer readable medium can be further configured to abstract the food-related data and, using the abstracted data to generate, update, and/or improve the food ontology comprising inter-relations between different foods.

According to an aspect of the disclosure, a method for collecting and aggregating food, health or nutritional data is provided. The method can comprise collecting and aggregating a plurality of data sets from a plurality of APIs. The plurality of data sets can be provided in two or more different formats comprising a plurality of physiological inputs associated with a user. The method can further comprise converting the plurality of data sets to a standardized format that is individualized for the user. The plurality of data sets can be converted to the standardized format prior to input of the standardized data points into a personalized food and health platform for further analysis

In some embodiments, the plurality of physiological inputs associated with the user can be influenced by the user's food, drinks or nutritional intake. One or more of the plurality of physiological inputs can affect a metabolism of the user.

In some embodiments, the plurality of APIs can be provided on one or more devices and/or health data services. One or more devices can comprise a mobile device, wearable device, and/or medical device. The mobile device may comprise a smartphone and the wearable device comprises a smartwatch. The medical device may comprise one or more of the following: glucose monitors, heart rate monitors, blood pressure monitors, sweat sensors, or galvanic skin response (GSR) sensors.

In some embodiments, the plurality of physiological inputs can relate to sleep patterns, exercise, blood test(s), genetics, stress, insulin level, medication(s), menstrual cycle, and/or mood of the user.

In some embodiments, one or more of the data sets of the plurality of data sets can comprise at least one image obtained from one or more devices and/or health data services. At least one image can be captured using the mobile device and/or wearable device. The captured image can be analyzed using a convolutional network to determine whether the image is associated with food. The convolutional network can be configured to determine an association between a plurality of images with a plurality of foods.

In some embodiments, the plurality of images can be stored on a memory in the mobile device and/or wearable device, retrieve automatically, and analyzed without using any mobile application(s). The plurality of food images can be automatically aggregated and provided to the personalized food and health platform with timestamps and geolocations for each of the plurality of images, thereby enabling temporal and spatial tracking of the user's food intake.

In some embodiments, the plurality of data sets can comprise textual and/or audio descriptions of foods that are input into the one or more devices by the user. The textual and/or audio descriptions can be analyzed using the NLP algorithm to determine types of foods and amounts of foods consumed, thereby facilitating tracking of the user's food intake. The NLP algorithm can comprise speech recognition capabilities, or can be included in or with a speech recognition software.

In some embodiments, the plurality of data sets can be converted to the standardized format using a canonical model.

According to another aspect of the disclosure, a system for collecting and aggregating food, health and/or nutritional data is provided. The system for collecting and aggregating food, health and/or nutritional data can comprise a device hub comprising one or more processors that are configured to execute a set of software instructions. The set of software instructions can be programmed to collect and aggregate a plurality of data sets from a plurality of APIs, wherein the plurality of data points are provided in two or more different formats and comprises a plurality of physiological inputs associated with a user. The set of software instructions can be further programmed to convert the plurality of data points to a standardized format that is individualized for the user.

A further aspect of the discloser is directed to a tangible computer readable medium. The tangible computer readable medium can store instructions that, when executed by one or more processors, causes one or more processors to perform a computer-implemented method for collecting and aggregating food, health and/or nutritional data. The tangible computer readable medium can be configured to collect and aggregate a plurality of data sets from a plurality of APIs, wherein the plurality of data points are provided in two or more different formats and comprises a plurality of physiological inputs associated with a user. The tangible computer readable medium can be further configured to convert the plurality of data points to a standardized format that is individualized for the user.

According to an aspect of the disclosure, a method for determining effects of food consumption on a user's body is provided. The method can comprise applying a predictive model to (1) data indicative of foods consumed by the user, and (2) data indicative of physiological inputs associated with the user, and (3) information about the foods consumed by the user from a food ontology, to thereby generate a plurality of personalized food and health metrics for the user.

According to another aspect of the disclosure, a method for determining effects of food consumption on a user's body is provided. The method can comprise applying a predictive model to (A) data indicative of foods consumed by the user and data indicative of physiological inputs associated with the user that are obtained from a plurality of discrete APIs, or (B) information about the foods consumed by the user from a food ontology. Applying the predictive model can generate a plurality of personalized food and health metrics for the user.

In some embodiments, the method for determining effects of food consumption on the user's body can further comprise providing the plurality of personalized food and health metrics to the user. The plurality of personalize food and health metrics can be provided to the user by displaying the metrics as a set of graphical visual objects on an electronic display of a user device. The user device can comprise a mobile device, wearable device, and/or medical device.

In some embodiments, the plurality of personalized food and health metrics can be specific to the user, and relate to the user's health or well-being. The plurality of personalized food and health metrics can comprise a predicted impact of one or more of the foods on the user's health or well-being. The plurality of personalized food and health metrics can comprise a health ranking of the foods consumed by the user.

In some embodiments, the plurality of personalized food and health metrics can comprise one or more recommended food(s) intake for the user. One or more recommended food(s) intake can be provided in the form of meal plans and/or recipe suggestions for the user. One or more recommended food(s) intake can comprise a personalized list of food items selected from one or more restaurant menus for the user.

In some embodiments, the plurality of personalized food and health metrics can comprise one or more recommended actions to improve the user's health or well-being. One or more recommended actions can include a recommendation to reduce or increase consumption of one or more selected foods. One or more recommended actions can include a recommendation to the user to consider starting consumption of one or more selected foods.

In some embodiments, the physiological inputs associated with the user can (1) be influenced by the user's food or nutritional intake and/or (2) affect metabolism of the user.

In some embodiments, the data indicative of foods consumed by the user can comprise time series data, and the predictive model can be configured to plot the time series data. The time series data can comprise measurements of changes to one or more biomarkers in the user's body over a time period. One or more biomarkers can be affected by sleep, exercise, blood test(s), genetics, stress, medication(s), menstrual cycle, and/or mood of the user. One or more biomarkers can comprise a glucose level, blood pressure, antioxidant level, cortisol level, cholesterol values, and/or body temperature of the user. The biomarkers can include blood biomarkers such glucose, cortisol, or triglycerides.

In some embodiments, the predictive model can be configured to determine the effects of different foods on the individual's body, by analyzing the changes to one or more biomarkers. The predictive model can be configured to determine recurring patterns between food intake and the one or more biomarkers. The predictive model can be configured to determine correlation(s) between the foods and their effects on one or more biomarkers.

In some embodiments, the predictive model can be configured to determine a correlation between the foods and their effects on the glucose level of the user. The correlation between the foods and their effects on the glucose level can be used to predict the user's glucose responses and insulin responses.

In some embodiments, the predictive model can be configured to determine a correlation between the foods and their effects on the antioxidant level of the user.

In some embodiments, the predictive model can be configured to determine a correlation between the foods and their effects on the blood pressure of the user. The correlation between the foods and their effects on the blood pressure of the user can be used to provide insights to hypertension or other cardiovascular conditions.

In some embodiments, the predictive model can be configured to determine correlation(s) between the foods and their effects on one or more physiological conditions of the user. The predictive model can be configured to determine a correlation between the foods and their effects on the user's digestion or digestive system. The predictive model may be configured to determine a correlation between the foods and their effects on migraines experienced by the user. The predictive model may be configured to determine a correlation between the foods and their effects on the user's sleep quality, pattern, or cycle.

In some embodiments, the predictive model can comprise machine learning models including supervising learning models, semi-supervised learning models, and/or unsupervised learning models.

According to another aspect of the disclosure, a system for determining effects of food consumption on a user's body is provided. The system for determining effects of food consumption on a user's body can comprise one or more processors that are configured to execute a set of software instructions. The set of software instructions can be programmed for applying a predictive model to (1) data indicative of foods consumed by the user, (2) data indicative of physiological inputs associated with the user, and (3) information about the foods consumed by the user from a food ontology, to thereby generate a plurality of personalized food and health metrics for the user.

A further aspect of the disclosure is directed to a tangible computer readable medium storing instructions that, when executed by one or more processors, causes one or more processors to perform a computer-implemented method for determining effects of food consumption on a user's body. The computer-implemented method can comprise applying a predictive model to (1) data indicative of foods consumed by the user, (2) data indicative of physiological inputs associated with the user, and (3) information about the foods consumed by the user from a food ontology, to thereby generate a plurality of personalized food and health metrics for the user.

According to an aspect of the disclosure, a method for generating a food baseline profile of a user is provided. The method for generating a food baseline profile of a user can comprise monitoring effects of different foods on the user's body as the user consumes one or more pre-packaged meals containing known amounts of the foods over a time period. The method can further comprise generating the food baseline profile of the user based on the monitored effects.

In some embodiments, one or more pre-packaged meals can be designed for calibration of the food baseline profile.

In some embodiments, the effects of the different foods on the user's body can be monitored using one or more devices. One or more devices may comprise a glucose monitor, a blood test device, or a genetic test monitor. One or more devices may comprise a wearable device that is worn on the user's body.

In some embodiments, the time period can range from several days to more than one week.

In some embodiments, the food baseline profile of the user can be indicative of the user's body reactions to a known set of foods.

In some embodiments, an accuracy of the food baseline profile can be enhanced by incorporating effects of regular foods consumed by the user. The regular foods can be separate from the one or more pre-packaged meals.

In some embodiments, the method for generating a food baseline profile of the user can further comprise providing a set of instructions for instructing the user on when to consume each of one or more pre-packaged meals.

In some embodiments, one or more devices can be configured to monitor the effects of each of the different foods on the user's body substantially in real-time as the foods are being digested.

According to another aspect of the disclosure, a kit for generating a food baseline profile of a user is provided. The kit can comprise one or more pre-packaged meals containing known amounts of different foods. The kit can further comprise a set of instructions for instructing the user (1) on when to consume each of the one or more pre-packaged meals, and (2) on using one or more devices to monitor effects of the different foods on the user's body, to generate the food baseline profile of the user.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the present disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the present disclosure are utilized, and the accompanying drawings of which:

FIG. 1 illustrates an ecosystem for food analysis and personalized health management, in accordance with some embodiments;

FIG. 2 illustrates examples of sources for food-related data in accordance with some embodiments;

FIG. 3 is a flow chart of a method of food analysis, in accordance with some embodiments;

FIG. 4 illustrates an exemplary table of a training set for a food analysis algorithm, in accordance with some embodiments;

FIG. 5 illustrates an exemplary table of results from testing a food analysis algorithm, in accordance with some embodiments;

FIG. 6 illustrates an exemplary vocabulary list for a food analysis algorithm, in accordance with some embodiments;

FIG. 7 illustrates an exemplary table of a training set for a different food analysis algorithm, in accordance with some embodiments;

FIG. 8 illustrates an exemplary pipeline of a food labeler in accordance with some embodiments;

FIG. 9 illustrates a second exemplary pipeline of a food labeler in accordance with some embodiments;

FIG. 10 illustrates a third exemplary pipeline of a food labeler in accordance with some embodiments;

FIG. 11 illustrates an exemplary table of problem solvers and their statistical analyses, in accordance with some embodiments;

FIGS. 12A-12C illustrate abstracting and classifying information from a consumer food package, in accordance with some embodiments;

FIG. 13 illustrates an exemplary table of abstracted and classified consumer food packages, in accordance with some embodiments;

FIG. 14 illustrates an exemplary restaurant menu for food analysis in accordance with some embodiments;

FIG. 15 illustrates an exemplary table of information abstracted from restaurants menus in accordance with some embodiments;

FIG. 16 illustrates an exemplary two-dimensional graphical representation of food ontology in accordance with some embodiments;

FIGS. 17A-17D illustrate exemplary windows of a GUI-based software interface for food image logging, in accordance with some embodiments;

FIGS. 18A-18C illustrate exemplary windows of a GUI-based software interface for voice recognition analysis, in accordance with some embodiments;

FIGS. 19A and 19B illustrate side-by-side comparisons of similar looking foods that have substantially different caloric content, in accordance with some embodiments;

FIG. 20 illustrates features of an insight and recommendation engine in accordance with some embodiments;

FIG. 21 is a graph of blood glucose levels of two individuals;

FIG. 22 illustrates potential entity partners utilizing embodiments of the present disclosure;

FIG. 23 is a graph of blood glucose level of an individual as a function of time in accordance with some embodiments;

FIGS. 24A and 24B illustrate classification of foods according to their effects on a biomarker, in accordance with some embodiments;

FIGS. 25A-25C illustrate exemplary windows of a GUI-based software interface for personal recommendations on menu items, in accordance with some embodiments;

FIG. 26 illustrates an exemplary window of a GUI-based software interface for blood glucose logging, in accordance with some embodiments;

FIG. 27 illustrates an exemplary window of a GUI-based software interface for a recommendation based on an automatic blood glucose logging, in accordance with some embodiments;

FIGS. 28A and 28B is a flow chart of a method of modeling glucose and insulin interactions in the body, in accordance with some embodiments;

FIG. 29 is a graph plotting measured and estimated blood glucose levels, in accordance with some embodiments;

FIG. 30 is a flow chart of propagation of exogenous insulin into the body, in accordance with some embodiments;

FIG. 31 illustrates an exemplary fitting of glucose absorption and insulin assimilation model, in accordance with some embodiments;

FIGS. 32A-32B illustrate exemplary windows of a GUI-based software interface showing prediction of eating patterns, in accordance with some embodiments;

FIGS. 33A-33C illustrate exemplary windows of a GUI-based software interface showing multiple features, in accordance with some embodiments;

FIGS. 34A-34C illustrate exemplary windows of a GUI-based software interface showing a compressive report by insights and recommendation engine, in accordance with some embodiments;

FIG. 35 shows an exemplary calibration kit in accordance with some embodiments;

FIG. 36 illustrates an exemplary window of a web portal for healthcare providers in accordance with some embodiments;

FIG. 37 (parts A through F) illustrate exemplary windows of a mobile application showing initial set up of the mobile application, in accordance with some embodiments;

FIG. 38 (parts A through C) illustrate exemplary windows of a mobile application showing components of baseline data collection, in accordance with some embodiments;

FIG. 39 (parts A through D) illustrate exemplary windows of a mobile application showing a food image logging interface, in accordance with some embodiments;

FIGS. 40A and 40B illustrate exemplary windows of a GUI-based software interface showing an analysis report of a user's data, in accordance with some embodiments;

FIG. 41 illustrates an exemplary network layout in accordance with some embodiments;

FIG. 42 shows an example of standard deviation image areas in accordance with some embodiments;

FIG. 43 shows an example of rectangle detection image areas in accordance with some embodiments;

FIG. 44 shows an example of a stalactite cave OCR text type separation in accordance with some embodiments;

FIG. 45 shows an exemplary image of a packaged food label from which nutritional information is to be extracted;

FIG. 46 illustrates a nutrients NLP score histogram based on nutritional information extraction on over 40000 food products, in accordance with some embodiments;

FIG. 47 illustrates an ingredients NLP score histogram based on nutritional information extraction on over 40000 food products, in accordance with some embodiments;

FIG. 48 illustrates an allergens NLP score histogram based on nutritional information extraction on over 40000 food products, in accordance with some embodiments;

FIG. 49 shows a graph of image logo training results in accordance with some embodiments;

FIG. 50 shows a graph of text logo recognition results in accordance with some embodiments;

FIGS. 51A and 51B illustrate the results per logo for a plurality of different logos in accordance with some embodiments;

FIG. 52 illustrates a model for classifying whether a food-item is free from soy, in accordance with some embodiments;

FIG. 53 illustrates an example of a learning curve in accordance with some embodiments;

FIG. 54 shows a graphical user interface (GUI) for managing a training set for food classification in accordance with some embodiments; and

FIG. 55 shows a histogram of food classification success by confidence in accordance with some embodiments.

DETAILED DESCRIPTION

Reference will now be made in detail to some exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and disclosure to refer to the same or like parts.

Introduction

Not only can two people respond differently to an identical food, but a single individual can respond differently to the identical food at different times. However, the current state of food and nutrition services continues to rely on previously generated, static databases to provide generic, non-customized suggestions on food and health management.

Currently available databases for food and nutrition are inconsistent due to user input dependency, limited in their scope and time scale, and often incompatible with each other. Thus, there is presently a lack of systems and methods that can curate the fragmented food-related information into a single format, assess relationship between foods and biomarkers for each individual, and provide personalized nutrition recommendations that continuously evolve with the individual's lifestyle.

The disclosure herein can provide the solutions to the above by (1) creating a food ontology that is continuously being updated from various sources (e.g., from the internet, pre-existing databases, user input, etc.) to organize and analyze any obtainable information of all food types (e.g., elementary foods, packaged foods, recipes, restaurant dishes, etc.), (2) generating a personalized data network among a multitude of data collection devices and services (e.g., mobile devices, glucose sensors, healthcare provider databases, etc.) to integrate any obtainable information of biomarkers that can be affected by or can affect metabolism (e.g., sleep, exercise, blood tests, etc.), and (3) connecting the food ontology and the personalized data network to draw insights on how foods can affect each individual, and generate personalized food, health, and wellness recommendations for each individual.

Next, various embodiments of the disclosure will be described with reference to the drawings.

Platform

Embodiments of the present disclosure can be implemented as an ecosystem of devices, database(s), and a platform to provide users with personalized nutrition insights.

FIG. 1 illustrates an ecosystem 100 in accordance with some embodiments. In one aspect, the ecosystem 100 can include a platform 200. The platform 200 can include three components: a food analysis system 210, a device/data hub 220, and an insights and recommendation engine 230. The three components in the platform 200 can be standalone or interconnected to one another. The ecosystem 100 can also include devices 110. The devices 110 can include a wearable device 112 (e.g., a smart watch, fitness tracker, etc.), a mobile device 114 (e.g., a cell phone, a smart phone, a voice recorder, etc.), a medical device 116 (e.g. a glucose monitor, insulin pump, heart rate monitor, skin temperature sensor, etc.), or more. The devices 110 can be in communication with each other. The platform 200 can be in communication with the devices 110. The platform 200 can be in communication with the Internet 120 and database(s) 130 (e.g., other food, nutrition, or healthcare providers). The platform 200 can also in communication with an additional database(s) 240 to store any data or information that is collected and generated by the platform 200. The additional database(s) 240 may be a collection of secure cloud databases.

The food analysis system 210 can create the food ontology. The food analysis system 210 can be connected to various sources of data including the devices 110 (e.g., the wearable device 112, the mobile device 114, etc.), the Internet 120, and the existing databases 130. The food analysis system 210 can serve as a content management system to continuously receive, analyze, and organize nutrition information of all food types (e.g., elementary foods, packaged foods, recipes, restaurant dishes, etc.) into the food ontology.

The device/data hub 220 can generate a user's personalized data network between the devices 110. The device/data hub 220 can automatically aggregate biomarker and health data of the user (e.g., sleep, exercise, blood tests, genetic tests, etc.) from multiple application programming interfaces (APIs) and healthcare provider databases.

The insights and recommendation engine 230 can be in communication with the food analysis system 210 and the device/data hub 220. As such, the engine 230 can create and analyze any correlation between information from the food ontology and information from the personalized data network. The engine 230 constitutes the “brains” of the platform 200, and functions as a food global positioning system (food GPS) for the user. The engine 230 can account for the user's own day-to-day nutritional needs based on how the user's biomarkers react to different foods at different times. Thus, the engine 230 can generate personalized food, health, and wellness recommendations for the user. The engine 230 can be in communication with the devices 110 directly or through communication with the device/data hub 220. In an example, the engine 230 may use the devices 110 to relay the recommendations to the user in a visible format.

The platform 200 can be implemented using one or more graphical user interfaces (GUIs; not shown in FIG. 1) to enable a user to select and employ features of the three components: the food analysis system 210, the device/data hub 220, and the insights and recommendation engine 230. Generally, GUIs can be a type of interface that allows users to interact with electronic devices through graphical icons and visual indicators such as secondary notation, as opposed to text-based interfaces, typed command labels or text navigation. The GUIs can be rendered on a display screen on a user device. Actions in the GUIs can be performed through direct manipulation of the graphical elements. In addition to computers, the GUIs can be rendered on hand-held devices, such as a smartphones, portable media players, gaming devices, and office and industry equipment. The GUIs of the platform 200 can be provided in a software, software application, web browser, etc. The GUIs may be displayed on a user device. The GUIs can be provided through a mobile application. One or more GUIs of the present disclosure can be referred to as the platform GUIs. The platform GUIs can be in communication with other GUIs, such as the wearable device 112, mobile device 114, the medical device 116, etc. End users of the platform can include a baby, a teenager, a college student, adults, healthy individuals, patients, participants of wellness programs, insurees of various providers, etc.

Food Analysis System

The food analysis system 210 can map foods by abstracting information from data relating to the foods to develop the food ontology. The food analysis system 210 can continuously receive, analyze, and organize nutrition information from data relating to all food types into the food ontology. The food ontology can otherwise be referred to as a web of foods. The food-related data can be obtained from a plurality of sources. The plurality of sources can include the devices 110 (e.g., the wearable device 112, the mobile device 114, etc.), the Internet 120, and the existing database(s) 130. Examples of sources for the food-related data are illustrated in FIG. 2. The existing database(s) 130 can be the food databases of competitors.

The food-related data can be classified into a plurality of different categories, for example 2, 3, 4, 5 or more categories. In some cases, a category may include one or more subcategories.

In some embodiments, the food-related data that is analyzed by the food analysis system 210 may be separated into four different categories, for example (1) elementary foods, (2) recipes, (3) packaged foods, and (4) restaurant dishes. The foods can include beverages (e.g. water, coffee, tea, alcohol, etc.).

Elementary foods in category (1) may include commonly eaten foods and ingredients.

Elementary foods may be further separated into two or more constituents, for example (a) elementary ingredient and (b) elementary recipe. An elementary ingredient is a food item that contains a single ingredient other than water. Examples of elementary ingredients may include banana, almond, frozen blueberries, raw blueberries, and so forth. An elementary recipe is an abstraction of commonly eaten foods that contain more than a single ingredient. Examples of elementary recipes may include Pad Thai, chicken lo mein, french fries, and so forth.

Recipes in category (2) may include foods comprising of multiple ingredients together with preparation instructions. Recipes in category (2) may only include specific recipes, and may be fundamentally different than the elementary recipes in category (1). For example, a generic “Pad Thai” in category (1) is an abstraction of the notion of a dish known as Pad Thai (which generally includes noodles, oil, peanuts, etc), and is therefore an elementary recipe. In contrast, different Pad Thai recipes (e.g. those found on various Internet sources) are specific recipes, and are realizations of the elementary recipe “Pad Thai” and are therefore (non-elementary) recipes. Other examples of recipes in category (2) may include, e.g. multiple recipes for fettuccine alfredo from different sources.

Packaged foods in category (3) may include foods that have a barcode and are sold as a package, e.g. a Chocolate Clif bar with a specific universal product code (UPC). Most packaged foods may be considered to include a recipe by the food manufacturer, since a recipe is needed to prepare the packaged foods. However, since the amounts of each ingredient and the exact preparation instructions are not usually provided or listed on packaged foods, the category of packaged foods can therefore be separate from recipes. The separate categories may be needed because packaged foods and recipes often need to be analyzed differently by the machine learning models disclosed elsewhere herein.

Restaurant dishes in category (4) may include menu items from restaurants, for example McDonald's Big Mac. In practice, these restaurant dishes are created from recipes internal to the restaurant themselves. However, in most restaurants, the exact ingredients list and the preparation instructions are not provided or known to the customers. Accordingly, restaurant dishes are distinguished from recipes since they need to be addressed (analyzed) differently from recipes.

In general, nutrition-related data on food is often partial (incomplete) as some of the information is typically missing or difficult to obtain. For example, elementary foods in category (1) may lack labels, and adding new food items and new nutrients can be challenging. Packaged foods in category (2) usually do not show the amounts of different ingredients. Also, most food labels on packaged foods may only include a limited number of nutrients (e.g. 10-14) when there could be more nutrients. Furthermore, data for the food label may be captured in an image form, and there could be discrepancies between the actual contents of the packaged food and the information on the food label. As for the recipes in category (3), there are usually no labels on recipes, and thus the nutritional information may be inaccurate. Likewise, for the restaurant dishes in category (4), restaurant menus typically do not have labels and may not include nutritional information. The ingredients in a restaurant dish are often part of the free text in the description for the restaurant dish. Furthermore, data for restaurant dishes is typically captured in an image form (e.g. of a menu), and there is a wide variation in the way dishes are being described or depicted on restaurant menus.

In view of the above, each of the above food categories (1)-(4) individually may possess certain gaps or limitations. However, those gaps or limitations can be addressed by the algorithms described herein, such that the food analysis system can generate a complete understanding of each food object by leveraging data within and between different categories. In most cases, knowledge about the ingredients and amounts of each food can allow additional data about each food to be determined.

The food analysis system 210 can leverage one or more algorithms comprising at least one machine learning algorithm to abstract information from the data relating to foods. The food analysis system can classify the abstracted data into one or more categories of the food ontology. The categories can be abstraction layers. One or more algorithms can include a natural language processing (NLP), a computer vision system, or a statistical model. The computer vision system can include artificial intelligence (AI), deep learning, or optical character recognition (OCR) capabilities. The computer vision system, in combination with NLP, can convert images of consumer packaged foods or images of restaurant menu into structured data that can be analyzed and classified into the food ontology.

The categories, or abstraction layers, of foods in the food ontology can include name, description, user ratings, images, characteristics (e.g. dietary needs, allergies, cuisine, flavors, textures, etc.), ingredients breakdown (types and amounts), nutrients break down (types and amounts), processing information, and food geolocation and/or availability information. The food analysis system may use at least one machine learning algorithm to generate additional abstraction layers or metadata of foods. The additional abstraction layers or metadata of foods may be incorporated into the food ontology. A food item can have one or more abstraction layers. An abstraction layer can be used to describe one or more food items.

Dietary needs can be an abstraction layer of the food ontology. The dietary needs can include a vegetarian, lacto-ovo-vegetarian, pescetarian, lacto-vegetarian, ovo-vegetarian, or vegan diet. A vegetarian generally may not eat meat or fish. A lacto-ovo-vegetarian may avoid flesh of all animals, both meat and fish. A pescatarian may eat fish but not meat. A lacto-vegetarian may consume dairy products but no eggs. An ovo-vegetarian may consume eggs but no dairy. A vegan may avoid all animal-based foods, including honey.

The dietary needs can include a gluten free diet. The gluten free diet can be important for individuals with celiac disease (celiacs), a serious autoimmune disorder of gluten indigestion that can damage the small intestine. The celiac disease can affect 1 in 100 people worldwide. Examples that cannot be used for the gluten free diet include wheat, barley, rye, and oats. Examples of common processed foods containing the wheat, barley, rye, or oats can include malted products, cereals, cold cuts, gravies, seasoned rice mixes, trail mixes, and imitation fish or bacon.

The dietary needs can include a diabetic diet for individuals with type 1 or type 2 diabetes. The diabetes can be diseases that result in too much sugar (e.g. glucose) in the blood. The diabetic diet can include fiber-rich foods, including fruits, vegetables, whole grains, legumes (beans, lentils, and lentils) and low-fat dairy products. The diabetic diet can include fish that are rich in omega-3 fatty acids, including salmon and mackerel. The diabetic diet may not include foods that are high in sugar (carbohydrates). For example, an individual with diabetes on a 1,600 calorie diet may consume no more than about 50% of these calories from carbohydrates.

The dietary needs can include various religious diets, including Halal and Kosher diets. The Halal diet can be according to what is permissible or lawful in traditional Islamic dietary laws. In an example, the Halal diet may not include pork or pig meat products. The Kosher diet can be according to what is permissible or lawful to a set of Jewish religious dietary laws called kashrut. In an example, the Kosher diet may not include a hare, hyrax, camel, and pig. The food analysis system 210 can be in communication with several food certification programs, including the Islamic Food and Nutrition Council of America (IFANCA), the Kosher Supervision of America (KSA), etc. to continuously monitor and update its algorithms for the religious diets.

The dietary needs can include a lactose free diet for individuals with lactose intolerance. The lactose intolerance can be a condition related a decreased ability to digest lactose, a sugar found in milk products. The individuals with lactose intolerance can show symptoms including abdominal pain, bloating, diarrhea, gas, and nausea after consuming milk-containing or milk-based products without any medication (e.g. lactase). Thus, the food analysis system 210 can analyze and report whether a food item can contain milk or milk-based ingredients.

The dietary needs can include an organic foods diet. The organic foods can include products that come from animals that are not given any antibiotics or growth hormones. The organic foods can include plants that do not use conventional pesticides or fertilizers that are made with synthetic ingredients. Examples of terms used in labels of commercially available organic products can include “100% Organic,” “Organic,” and “Made with Organic Ingredients.”

The dietary needs can include a non-genetically modified organisms (non-GMO) diet. A GMO ingredient can be a plant or animal that is created by means of genetic engineering in a lab environment that goes beyond traditional crossbreeding. The genetic engineering can be combining genes from different species to create a new one. Because organic foods can be prohibited from using one or more GMO ingredients, an organic food can be, generally, a non-GMO food. Examples of terms used in labels of commercially available non-GMO products can include “Non-GMO Project Verified.” The “Non-GMO Project Verified” label can be certified by the Non-GMO Project.

The dietary needs can include other diets preferred by a user, such as Atkins diet, Zone diet, Ketogenic diet, and raw food diet. The Atkins diet may be a weight-loss program devised by Robert Atkins. The Atkins diet can be a first variation of a low-carbohydrate diet. The Zone diet can be a second variation of the low-carbohydrate diet. The Zone diet can require a specific food ratio of 40% carbohydrates, 30% fats, and 30% protein in each meal. The Zone diet can recommend eating five times a day to help prevent overeating. The Ketogenic diet can be used for children with epilepsy. The Ketogenic diet can be a third variation of the low-carbohydrate diet. The Ketogenic diet can encourage a high-fat diet. The Ketogenic may cause a breakdown of fat deposits in the body for fuel and create substances called ketones through a process called ketosis. The Ketogenic diet can encourage consumption of oils from avocados, coconuts, Brazil nuts, olives, and oily fish. The raw food diet can encourage consumption of foods and drinks that are not processed. The raw food diet may not include food products with artificial food preservatives, including calcium propionate, sodium nitrate, butylated hydroxyanisole (BHA), and butylated hydroxytoluene (BHT). The food analysis system 210 can leverage at least one machine learning algorithm to detect or estimate presence of one or more artificial food preservatives from consumer packages foods.

Allergies can be an abstraction layer of the food ontology. For information regarding food allergies, the food analysis system 210 can be in communication with existing databases such as the Food Allergy Research and Resource Program (FARRP) to continuously monitor and update its algorithms to abstract information from or classify allergenic foods. The allergenic foods can be broken down to one or more groups, including wheat or gluten (barley, corn, maize, oat, rice, rye, wheat, other gluten-containing grains, etc.), lactose or dairy products (cow, goat, sheep milk, etc.), eggs (hen, goose, duck), tree nuts (almond, brazil nut, cashew nut, chestnut, hazelnut, macadamia nut, pecan nut, pistachio, walnut, etc.), legumes (chickpea, lentil, lupin, peanut, pea, etc), fish (Alaska pollock, carp, cod, dogfish, mackerel, salmon, sole, tuna, etc), and crustacean shellfish (crab, lopster, shrimp, etc). Additional allergenic foods can include fruits (acerola, apple, apricot, banana, cherry, coconut, date, fig, grape, mango, melon, orange, peach, pineapple, etc.) and vegetables (asparagus, avocado, carrot, celery, etc.).

Food flavors can be an abstraction layer of the food ontology. The food flavors can include sensory impression of foods. The food flavors can be determined by chemical senses of taste and smell. Some examples of tastes can include sweet, sour, bitter, salty, and savory (also known as umami). Some examples of smell, or odor that can be distinguished by human olfactory system, can include fragrant (e.g., florals and perfumes), fruity (all non-citrus fruits), citrus (e.g., lemon, lime, orange), woody or resinous (e.g., pine or fresh cut grass), chemical (e.g., ammonia, bleach), sweet (e.g. chocolate, vanilla, caramel), minty (e.g., eucalyptus and camphor), toasted or nutty (e.g., popcorn, peanut butter, almonds), pungent (e.g., blue cheese, cigar smoke), and decayed (e.g. rotting meat, sour milk). Alternatively or in addition to, the food flavors can be determined by a range of temperature (e.g., hot, room temperature, cold, frozen, etc.).

The food flavors can be flavorants, or flavorings, as a substance in a food. Some examples of natural or artificial flavorants for taste can include glutamic acid, glycine, guanylic acid, inosinic acid, disodium 5′-ribonucleotide, acetic acid, ascorbic acid, citric acid, fumaric acid, lactic acid, malic acid, phosphoric acid, and tartaric acid. Some examples of natural or artificial flavorants for odor can include diacetyl, acetylpropionyl, acetoin, isoamyl acetate, benzaldehyde cinnamaldehyde, ethyl propionate, methyl anthranilate, limonene, ethyl decadienoate, allyl hexanoate, ethyl maltol, ethylvanillin, methyl salicylate, and manzanate.

The food flavors can also include color. The color of a food can affect an individual's expectations of one or more flavors of the food. In an example, adding more red color to a drink can increase sweetness of the drink. The colors can be labelled by the food analysis system 210 in one or more ways. The colors can be labelled by a standard nomenclature, including red, orange, yellow, green, blue, navy, purple, black, etc. The colors can be labelled as a position in a predefined color palette, or a color wheel. Alternatively or in addition to, the colors can be labelled in terms of a food's characteristic absorption profile in at least a portion of the electromagnetic spectrum. The characteristic absorption profile can be defined by a position and intensity for at least a portion of the electromagnetic spectrum. At least a portion of the electromagnetic spectrum can be the visible spectrum. The visible spectrum can include electromagnetic radiation in a range from about 400 nanometers to about 750 nanometers. Labeling the colors as the position and intensity within the electromagnetic spectrum may avoid bias against users with or without color blindness.

Nutritional characteristics can be an abstraction layer of the food ontology. The nutritional characteristics can include one or more dietary goals or restrictions suggested to or defined by a user. The dietary goals can include low fat, high fat, and high calcium. The nutritional characteristics can include nutritional recommendations for pregnancy. The nutritional recommendations for pregnancy can include: pasteurized foods to prevent Listeria infection; foods with high folic acid, calcium, or iron; foods or drinks with low caffeine; and avoiding or consuming no more than 6 ounces per week fish with a high level of mercury. The fish with the high level of mercury can include king mackerel, marlin, swordfish, tilefish, and tuna.

Textures can be an abstraction layer of the food ontology. The textures can include: soft, firm, creamy, crumbly, crunchy, crisp, brittle, tender, chewy, tough, thick, thin, sticky, airy, fluffy, greasy, gooey, moist, mushy, lumpy, pulpy, grainy, etc.

The food ontology generated by the food analysis system 210 can compete with (or can be compatible with) existing food databases. General databases of foods include U.S. Department of Agriculture (USDA) database, Open Food Facts, and ItemMaster. Databases on packaged foods include Nutritionix, Fatsecret, and Myfitnesspal. Databases on recipes include Yummly, BBC Good Food, Allrecipes, The Kitchn, EatingWell, and MyRecipes. Databases on restaurant dishes include HealthyDiningFinder, Nutritionix, MyNetDiary, FatSecret, HealthyOut, and OpenMenu. Alternatively or in addition to, the food ontology generated by the food analysis system 210 can complement limitations of the existing food databases. The existing food databases can have low quality of data due to complete or partial dependency on user generated and reported content. The existing food databases can have a limited scope as they may not cover all food categories and/or have targeted specific populations and/or geographies during data collection. Also, the existing food databases can be static or outdated due to dependency on data that has been collected in the past. Additionally, the existing food databases may not maintain a robust ontology of foods with detailed abstracting layers (e.g. characteristics, ingredients breakdown, etc.).

FIG. 3 is a flow chart of a method 300 of the food analysis system 210. The food analysis system 210 can connect to the Internet 310 and collect images related to foods. The foods can include consumer packaged foods. The images can include front and back pictures of Immaculate Bakery Gluten Free Chocolate Chunk Cookies 315. The images can include a nutrition facts label. The food analysis system 210 can also be configured to connect to a user's mobile device 320 and collect food-related images 325 taken or saved by the user. The food analysis system 210 can also collect food-related texts (e.g. publications, text messages, etc.) from the Internet 310, the mobile device 320, or other sources. The food analysis system 210 can use various algorithms including OCR to automatically isolate food-related textual information 330 from the images 315, 325. The food analysis system 210 can use NLP and other machine learning algorithms to parse the collected food-related texts into a structured format 340, analyze their characteristics 350, and validate that a resulting analysis is correct 360. The food analysis system 210 can use the validated characteristics to map the foods in the food ontology.

In some embodiments, the food analysis system can perform optical character recognition (OCR) to extract textual information from the images. The food analysis system can also implement a convolutional neural network to extract information from various icons that often appear on packaged foods. Examples of OCR techniques that are used in embodiments of the present disclosure are next described.

Nutritional information extraction as described herein may comprise extracting the nutritional facts, ingredients or allergens of a consumer product using an image of the product. This can be implemented by first dividing the image into multiple sub-images containing areas with text, using for example an image crop algorithm. The image crop algorithm may comprise a method for extracting parts of an image containing text. The method implemented by the image crop algorithm may comprise the following steps: (1) search and extract rectangles in the image; (2) calculate the standard deviation image and pull areas with large standard deviation; (3) remove overlapping areas and calculate the difference between standard deviation areas overlapping with rectangle areas; and (4) return a list of the areas pulled from the images. FIG. 42 shows an example of standard deviation image areas, and FIG. 43 shows an example of rectangle detection image areas. Referring to FIG. 42, the standard deviation image areas may include a plurality of free-form contours 4202 around the text and empty spaces 4204 therebetween. Referring to FIG. 43, the rectangle detection image areas include a plurality of rectangle boxes 4302 surrounding the text. Accordingly, areas (sub-images) of various shapes and sizes can be cropped from the original image.

Next, an OCR algorithm can be implemented on each of the cropped sub-images. Words pertaining to allergens, ingredients or nutritional facts can be searched in the paragraph of the extracted text.

For allergens and ingredients, their spelling may be first corrected using a specialized spell checker. The spell checker can be used to correct words that were misspelled by the OCR. The frequency of each word in a training set is used to determine their probability, for example as set by Zipfs law which states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. For each word detected by the OCR, the food analysis system can check if the word is in the stored vocabulary. If the word is not in the stored vocabulary, the following options may be available: (1) Assume there is a spelling mistake, look for a predetermined number of top matches (e.g. top eight matches) in the vocabulary, and choose the one with the highest probability, or (2) Assume that the detected word is a result of two different words concatenated by the OCR, and then choose the split that offers the highest probability. After the spelling has been corrected, each extracted text can be divided into sections of ingredients and allergens by maximizing their confidence, using for example a stalactite cave algorithm. The stalactite cave algorithm can be used to detect parts of text containing an ingredients paragraph and an allergens paragraph. The stalactite cave algorithm can be implemented by a training a model to classify an expression to three classes, for example: (i) ingredients, (ii) allergens and (iii) others. The model can include tfidf transformation and a logistic regression classifier. The confidence on every three words can be checked using the model. Dynamic programming can be used to assert the parts containing each group (ingredients and allergens), such that the area under the confidence graph is maximized. FIG. 44 shows an example of a stalactite cave OCR text type separation. The x-axis of the graph of FIG. 44 shows the index of the word in the text (e.g. for a food dish). The y-axis of the graph of FIG. 44 shows the probability that a pair of words starting on the index belongs to the various classes (e.g., ingredients, allergens, etc.). After each extracted text has been divided into sections of ingredients and allergens, the paragraph with the highest confidence is selected as being most relevant for allergens and ingredients.

For nutritional facts, the food analysis system described herein can assert for each line whether the line contains a nutritional fact. Next, the line is divided into name (by fitting to a word bank), amount, and measurement. The paragraph with the highest confidence is then selected as being most relevant for nutritional facts.

An example of nutritional information extraction is next described with reference to FIG. 45 which shows an image of a packaged food label. The OCR output is provided as follows:

Allergens: CONTAINS WHEAT AND SOY INGREDIENTS.

Ingredients: Strawberry filling pear juice concentrate, tapioca syrup, dried cane syrup, apple powder, strawberry puree concentrate, constarch, vegetable glycerin, natural flavors, elderberry juice concentrate for color Kashi seven whole grain flour whole: cats, hard red wheat, brown rice, rye, triticale, barley, buckwheat whole wheat flour, invert cane syrup, expeller pressed canola oil, rolled cats, honey, tapioca syrup, acacia gum, vegetable glycerin, coat fiber, leavening sodium acid pyrophosphate, baking soda soy lecithin, xanthan gum, natural flavor.

Nutritional facts:

Name
Amount
Percentage
Unit

Dietary Fiber
3
12
g

Sugars
9
—
g

Iron
2
2

Polyunsaturated Fat
1
—
g

Saturated Fat
0
0
g

Insoluble Fiber
2
—
g

Sodium
100
4
mg

Calories
120

Monounsaturated Fat
1.5
—
g

Calcium
0
0

Vitamin A
0
0

Soluble Fiber
1
—
g

Total Fat
3
5
g

Cholesterol
0
0
mg

Total Carbohydrate
23
8
g

Protein
2
4
g

Trans Fat
0
—
g

FIGS. 46-48 illustrate the results of nutritional information extraction on more than 40000 food products. Specifically, FIG. 46 is illustrative of the nutritional facts accuracy showing a nutrients NLP score histogram of 0.843. FIG. 47 is illustrative of the ingredients accuracy showing an ingredients NLP score histogram of 0.887. FIG. 48 is illustrative of the allergens accuracy showing an allergens NLP score histogram of 0.937. Based on the above results, it can be observed that the various algorithms utilized in the nutritional information extraction process described herein are capable of extracting highly accurate information on nutritional facts, ingredients, and allergens.

In some embodiments, the packaged foods OCR described herein may further include a logo recognition technique. The logo recognition technique may be applied alone, or in conjunction with the nutritional information extraction technique described above. Two main classifiers may be provided for logo recognition: (1) a text based classifier, and (2) an image classifier.

The text classifier can utilize OCR to extract text from an image. A tfidf vectorizer can be used to transform the text to numeric vectors. Classification of the text vectors can be performed using one or more of the following: Logistic Regression, Decision Tree, or Random Forest.

The image classifier can be based on Deep Residual Learning for Image Recognition (ResNet50) pre-trained on imagenet data. Multiple layers (e.g. 3 layers) of multilayer perceptron (MLP) can be attached to the output of ResNet50 and trained. The neural network can be fine-tuned by training all but the first n layers (e.g. first 50 layers). The training can be performed on a small set of images by including a specialized image augmentation. The image augmentation can include making the following random changes to images with logos: (a) image rotation, (b) gamma correction, (c) brightness change, and (d) averting to gray scale image. Conversely, images without logos can be used to enlarge the logo image trainset by inserting logos into those images that do not have logos. The following steps can be performed to ensure a large variety in logo insertion: (i) warping logo image with homography transformation, and (ii) adding logo to a random location in the image. FIG. 49 shows an example of image logo training results. Fine-tuning was performed after 100 epochs. FIG. 50 shows an example of text logo recognition results, specifically accuracy per mean number of samples over 1000 runs for a plurality of different logos. FIGS. 51A and 51B further illustrates the results per logo, for the plurality of different logos.

In some embodiments, the food analysis system disclosed herein may include a food image recognition engine. The food image recognition engine may be a computer vision system that is configured to classify foods from images, and analyze the content, volume and nutritional values of the foods.

Conventional commercially available food image analysis systems are typically limited to food classification alone, and lack the ability to estimate volumes of individual contents within the foods. Existing food image analysis systems may have other deficiencies as well. For example, existing food image analysis systems analyze foods based on food contents that are shown on the images, and are unable to account for ingredients that are not visually apparent from the images. Some of those ingredients (that are not visible from the images) can significantly disrupt and alter the nutritional estimation of a given food. An example of such ingredients is the oil family, where 1 tbsp of oil can contribute over 100 calories. FIGS. 19A and 19B each illustrates side-by-side comparisons of similar looking foods, but that have substantially different caloric content due to differences in individual food content amounts and other ingredients (such as oil) that are not visually apparent. For example, in FIG. 19A, the two bowls contain the same ingredients but in different amounts. The bowl on the left of FIG. 19A contains: 180 g untrimmed steak cooked in ltsp oil. 2 cups Lettuce, 4 rings red onion, 50 g avocado, 1 cup cooked rice noodles, 30 g cucumber, 2 cherry tomatoes, 2 tsp sesame oil, 2 tsp lime juice, 1 tsp soy sauce. The bowl on the right of FIG. 19A contains: 100 g trimmed steak (grilled), and has the same amount of lettuce, red onion and lime juice as the bowl on the left. However, the bowl on the right has half the avocado and rice noodles, 50 g Cucumber, double the cherry Tomatoes, ½ a medium carrot and half the sesame oil compared to the bowl on the left. As such, the left bowl contains 770 calories whereas the right bowl contains only 405 calories. Referring to FIG. 19B, the bowl on the left contains: 200 g chicken cooked in 2 tsp extra virgin olive oil, 30 g semi-trimmed bacon, 2 cups cooked pasta, 30 g full fat cheddar, 2 large Florets Broccoli, 1 medium mushroom. The bowl on the right of FIG. 19B contains: 100 g poached chicken, 1 cup cooked pasta, 1 tsp capers, 10 g low fat cheddar, ¼ large capsicum, 4 large florets broccoli, 2 medium mushrooms. As a result of the extra pasta, the left bowl contains 800 calories whereas the right bowl contains only 380 calories.

Convention/existing food image analysis systems may be limited for the following reasons.

First, studies have shown that 50% of the meals in the U.S are eaten out. This means that for a model to produce good real world results, it must be capable to handle restaurant dishes with high accuracy. The majority of the leading academic papers on the topic, as well as the commercial food image analysis solutions available, expect to have sufficient number of images for each dish of every restaurant. In the U.S. alone, there are over 700,000 restaurants, with an average of 61 dishes in each restaurant, which yields an enormous lower bound of over 42 million dishes. Unfortunately, the majority of the restaurants do not provide enough images of their dishes (even chain restaurants usually have only a few images of each dish). Furthermore, online sources of dishes images (such as those found on Yelp™) do not solve the problem. This means that in order for a solution to provide very accurate results in real world data, it must overcome the “eat out data gap”. Namely, the model has to be able to identify images of dishes even if it has never been exposed to images from a specific restaurant.

Second, as discussed with reference to FIGS. 19A and 19B, there are many foods that appear to be very similar, yet can be very different (e.g. noodle dishes such as pad thai, lo mein, and others). While deep neural networks may perform fairly well at distinguishing objects in many cases, this is not always possible. For this purpose, there is ample value in identifying both the dishes as well as their ingredients, and try to use the former to provide more information about the latter, and vice versa in a bidirectional or multidirectional manner. To achieve this, one will have to leverage the structure of dishes and their relationships to ingredients (namely a food ontology such the one described herein).

Third, in many cases, existing systems may provide several suggestions for a given image, many of which may be poorly classified. This is mainly due to the fact that many foods can be deceptive when analyzed from an image, and no computer vision system can perform perfectly. Specifically in the case of digital nutrition (and other digital healthcare solutions), receiving poor results can reduce a user's confidence in the computer vision system. Users may stop using such systems. This means that there is ample value in providing only correct or sufficiently accurate results. While it is generally impossible for a computer vision system to always be accurate, there is a delicate balance between specificity of a classification (e.g. Noodles dish->Pad Thai->Pad Thai with chicken) and the system's confidence level. Current existing systems tend not to generalize results that they are uncertain about. For example, current existing systems may not be capable of generating more accurate (but more generalized) results to the users.

Lastly, a problem exists in the non-uniqueness of mapping from a food image to nutritional values. Many existing systems attempt to map food images into nutritional values, but are not configured to know the features that are lacking in order to complete the mapping. Nor are the existing systems configured to assist the users in bridging the knowledge gap

The food image recognition engine described herein can address the above shortcomings of existing systems. The food image recognition engine disclosed herein is capable of one or more of the following: (1) Identify which foods are definitely (i.e. 100% probability) in the image; (2) Identify foods which may be in the image (probabilities), either by studying the image, the context in which it was taken, the user's history, or the likelihoods of certain foods appearing together; (3) Distinguish between dishes (e.g. Pad Thai) and ingredients (e.g. peanuts) they may contain; and (4) Leverage historical eating patterns of a user, and the context in which the user is eating to estimate volume and thus nutritional values. In particular, this can include using known menus when the user is eating out.

To achieve the above goals, the food image recognition engine can include an algorithm comprising the following. First, a complete ontology of visual cues for foods (VICUF) is constructed. This is an ontology of any visual cue that human beings (or computers) may use to identify the food that is before them, and may include: (1) Combo food items (e.g. Greek salad, or a burrito); (2) Ingredients (e.g. banana, apple, shrimps); and (3) Other cues (e.g. cup, liquid, fried, etc.). Knowing the entire ontology of VICUF can be used in conjunction with other inputs to obtain a more accurate identification of restaurant dishes. Next, a robust corpus for each label in VICUF is created. Ideally each label in the training set is to be annotated. Next, a convolutional neural network (CNN) is trained on each label in the VICUF (binary classifier), or a CNN capable of multi-labeling is trained. The latter CNN may provide better results since certain foods often appear together, while other foods do not. Next, each time a new image is supplied, the CNN will map it to a probability vector, where each component represents the probability that a specific food-related visual cue is in the image. The above steps may be sufficient to create a food logging experience. Given an image, the food image recognition engine can sort items in the food logger depending on their probability value in the output vector. A threshold can be included such that items with low probability do not appear.

In some embodiments, additional sensors and inputs, for example based on spectroscopy and non-visible light (infrared), can be used to detect the subtleties between similar-looking foods and individual volumes/contents. In some embodiments, the volume/content can be estimated based on a sequence of images or video for 3D reconstruction of foods. In some embodiments, distance estimation can be performed using an infrared system that is configured to measure distances from various points on a plate of food.

The food analysis system 210 can include a labeling machine. The labeling machine can be a machine learning system to discover categories or abstraction layers (herein also referred to as labels) about foods. The labeling machine can be an automated system for textual analysis of food objects and labeling them. This allows adding another layer of metadata, which the system is using to understand various characteristics of every food. These characteristics (labels) can be used in different ways by the system, for example by a personalized recommendation engine.

Examples of labels can include at least one ingredient (e.g., beef, pork, acorn, celery, etc.), at least one nutrient (e.g., vitamin A, vitamin C, calcium, iron, etc.), at least one dietary need (e.g., vegetarian, vegan, gluten free, etc.), at least one allergy (e.g., peanut free, gluten free, etc.), at least one dish type (e.g., salad, sandwich, soup, etc.), at least one cuisine (e.g. ethnic and/or religious cuisines, etc.), at least one flavor (e.g., sweet, fruity, etc.), at least one nutritional characteristic (e.g., low fat, high protein, etc.), and at least one texture (e.g., soft, firm, etc.). The labels can be generally classified into one or more of the following categories, for example: dietary needs (including diets and allergens), processing method, taste, meal, and dish. Dietary needs are food labels that allow a personalized recommendation engine to filter foods that a certain individual will never eat due to certain dietary restrictions. The labels can be generated automatically by the labeling machine. Dietary needs can be generally classified into two kinds: (1) dietary needs that are due to food allergies and (2) dietary needs that are due to specific diets that users often follow for ethical, religious or environmental reasons. Examples of dietary needs relating to food allergies may include gluten-free, dairy-free, no shellfish, no fish, no soy, no eggs, no peanuts, etc. Other examples of dietary needs may include vegetarian, pescetarian, or vegan.

The labeling machine can use a machine learning approach or a non-machine learning heuristic approach to develop a food labeler that can discover characteristics about foods. The machine learning approach can be referred to as a “label classifier” approach. The non-machine learning heuristic approach can be referred to as a “heuristic labeler” approach. The label classifier approach or the heuristic labeler approach can generate at least one algorithm for at least one food labeler to identify and label at least one particular abstraction layer of at least one food. The identification and labeling of particular abstraction layer of food can be referred to as food labeling. The label classifier and the heuristic labeler can use one or more logical entities known as an analyzer and a problem solver for food labeling. Both the label classifier and the heuristics labeler can be evaluated using a corpus of pre-defined food data to assess how well each food labeler can perform. The labeling machine can use various statistical models (e.g. precision, recall, etc.) to evaluate performance of the food labeler.

As a logical entity of the labeling machine, an analyzer can be a single model with a specific logic. The analyzer can be used to determine a specific attribute of a food item (e.g., whether the item is vegan or not). Thus, multiple analyzers can be required to determine multiple attributes of the food item. The analyzer can include a training set. For each analyzer, one or more training sets can be created, and each training set can have a distinguishable name (e.g., “English” or “Spanish” for different languages, or “gluten free” or “vegan” for different dietary needs). A problem solver can be a combination of two or more analyzers to determine an attribute for the food item based on the combined logics of its encompassing analyzers. The problem solver can include at least one label classifier analyzer or one heuristic labeler analyzer. The problem solver can encompass at least one additional problem solver.

For both the label classifier and the heuristic labeler approaches, labeling and classifying a food data is a process with three main processes: (1) training various analyzers; (2) defining at least one combination of two or more analyzers to form a problem solver; and (3) running the problem solver to identify, label, and classify one or more features from the food data. Furthermore, the training the different analyzers process can include two sub-processes: (i) establishing at least one training set for each analyzer, and (ii) performing a training based on at least one training set.

The labeling machine can use the label classifier approach to develop an analyzer. In the label classifier approach, a training set for an analyzer can contain at least one pair of input data and its corresponding correct answer, also known as a target. The target can be a “gold standard” for the analyzer. The analyzer's learning algorithm can find one or more patterns in the training set between the input data and the target, and generate an improved machine learning algorithm that can capture one or more patterns. The analyzer can be trained with more than one training sets, respectively, to generate more than one improved machine learning algorithms for an identical characteristic. Successively, a test set can be used to test accuracies of the more than one improved machine learning algorithms of the analyzer, and the best performing algorithm can be selected for use. Selecting the best performing algorithm can involve comparing statistical analyses from each algorithm's accuracy test, including a recall value, a precision value, and an F₁score. The recall value can indicate how many of items that should have been labelled are actually selected. The precision value can indicate how many of the items were correctly labeled. The F1 score can be a harmonic mean of the recall and precision values as a measure of accuracy, according to the following equation:

$F_{1} = 2 \cdot \frac{recall \cdot precision}{recall + precision}$

where the F₁score can reach its best value at 1 and its worst value at 0.

Thus, if multiple analyzer algorithms are trained to label a same food characteristic, an analyzer algorithm with the highest F₁score can be selected as a working analyzer.

FIG. 4 illustrates a table of a training set 400 for an analyzer based on the label classifier approach. The analyzer can be tasked to classify ingredients from an input data 410 (Raw ingredient text). The particular training set can be named “English” as the training set is designed to train the analyzer to classify ingredients written in English. Each input data is provided with a corresponding target 420 (Matched elementary food item). In this example, an input data of “1 pound 90% lean ground beef” can have a corresponding target of “90% lean ground beef.” The table of the training set 400 can also indicate when an input data-corresponding target pair is selected for use 430 (Approved). A content developer may unselect an input data-corresponding target pair from the training set. The table of the training set 400 can also display a number of samples 440 (Tot # samples), in the database(s) of the platform 200, that contain a same corresponding target. FIG. 5 illustrates a table of results 500 from testing multiple trained analyzers. Each analyzer is trained to identify and label ingredients written in English. The table indicates the recall value 510, the precision value 520, and the F₁score 530 of each analyzer tested. In some embodiments, the label classifier can use a pipeline comprising of word singularization, For example, CountVectorizer and logistic regression can be used to determine whether a food item is to be tagged with a certain label. The label classifier can use as a training set, the same validation set used by the heuristic labeler described herein. As an example, a complete model for classifying whether a food-item is free from Soy or not can structured as shown in FIG. 52. Thus, only food-items which are classified as Soy-free by both classifiers, and for which the combined classification confidence is over 85%, can be considered to be Soy-free. To investigate the effect of the model composition, the precision and recall of the separate models can be examined, as well as the composed model (for the above example of the Soy-free labeler), and summarized in the table below. It can be observed that the model is greatly benefited from the composition, improving the precision (in this case) by around 20%:

Model
Precision
Recall

Heuristic
71%
96%

Classifier
74%
90%

Composed
93%
94%

FIG. 53 illustrates an example of a learning curve for most of the classifiers described herein. It can be observed that the model is capable of learning the training set almost entirely, and that the model is able to generalize. The learning curve also shows that obtaining more data can likely improve the generalization further.

Alternatively or in addition to, the labeling machine can use the heuristic labeler approach to develop an analyzer. The analyzer of the heuristic labeler approach can analyze food items for a specific characteristic of foods (e.g. dietary needs, such as vegetarian, vegan, gluten free, etc.). Such analyzer can use a special vocabulary list that is pre-defined for each specific characteristic to determine whether a new food item should be labeled with the characteristic or not. Each analyzer of the heuristic labeler approach can contain three components: a vocabulary list, a labeling logic, and a training set.

The analyzer of the heuristic labeler approach can include the vocabulary list. The vocabulary list can contain groups of words that are correlated to a food characteristic (e.g. gluten free), also referred to as a food label. When a textual data from a food item is provided as an input, the heuristic labeler approach compares words found in the textual data to the groups of words in the vocabulary list. The groups of words in the vocabulary list can include negative terms, positive terms, menu positive terms, and non-negative terms. Having at least one matching negative term can indicate that the food item may not belong to the food label. Having at least one matching positive term despite having at least one matching negative term can indicate that the food item may belong to the food label. Having at least one matching menu positive term can indicate that the food item may belong to the particular food label only when the word is from a menu category title. Lastly, the non-negative terms can be words that used to be negative terms but have been verified not to be so. The non-negative terms can be kept to ensure that such terms may not be added as a negative term again.

FIG. 6 illustrates an exemplary vocabulary list for a food label “pescetarian.” A pescetarian can be a person who does not eat meat, but eats fish. The vocabulary list for the food label “pescetarian” can have negative terms, positive terms, positive terms, and, optionally, non-negative terms. The negative terms can include caesar, tenderloin, sirloin, brown gravy, pancetta, brisket, country gravy, steak, pork, burger, hamburger, cheeseburger, duck, chicken, beef, lamb, veal, pastrami, sausage, meatball, ham, chorizo, turkey, rabbit, bacon, pepperoni, meat, wing, filet, rib, dog, sausages, meatloaf, steakburger, lasagna, cheesesteak, lasagna, bison, sparerib, salami, capocollo, philly, prosciutto, and worcestershire. The positive terms can include veggie burger and meatless. The positive menu terms can include vegan and vegetarian. There may or may not be a non-negative term discovered.

The analyzer of the heuristic labeler approach can include the labeling logic. Based on the vocabulary list, the labeling logic can have three rules for labeling a food item. Firstly if there is not a matching negative term, positive term, or menu positive term in the food item's name, description, or menu category title, then the food item can be tagged with the label. Secondly if there is at least one matching negative term, in the absence of any other matching terms, then the food item cannot be tagged with the label. Thirdly if there is at least one matching positive term or at least one matching menu positive term in the food item's menu or sub-menu, then the food item can be tagged with the label. The third rule of the labeling logic can remain valid even if there is also at least one matching negative term.

In addition to the vocabulary list and the labeling logic, the analyzer of the heuristic labeler approach can include the training set. A purpose of the training set can be to verify that the vocabulary list generated for a certain food label is sufficient. The training set can include a list of food items with known characteristics. The corresponding vocabulary list can be applied to each food item in the training set to assess if the labeling should be applied or not. Afterwards, the training set can determine whether the labeling assessment was correct, and report a precision value (%) of the vocabulary list.

FIG. 7 illustrates an exemplary table of a training set for the food label “pescetarian” based on the heuristic labeler approach. The training set can include various food items, including food items that a pescetarian can or cannot eat. Each of the various food items can be defined by a following list: name (e.g. Strawberry Swiss Roll), description (e.g. mascarpone créme, honey oats, and strawberry sorbet), menu (desserts), and sub-menu (desserts). If a food item is listed in a menu without a sub-menu, then the sub-menu can be listed as identical to the menu. Testing the vocabulary list can involve applying the vocabulary list to analyze the various food items in the training set, and recording whether or not the vocabulary list can correctly distinguish pescetarian-friendly food items. The training set can also report a precision value (e.g., 98%) and a recall value (e.g. 92%) to determine reliability and validity of the vocabulary list. In this example, the precision value can indicate how many of the items analyzed were correctly labeled “prescetarian.” The recall value tells you how many of the items that should have been labelled “prescetarian” were actually selected.

During training or in use, an analyzer based on the heuristic labeler approach can detect one or more words, extracted from a food item, that are not found in an appropriate vocabulary list. If one or more unknown words is assessed (e.g. by machine learning) to be related to at least one negative term, the analyzer can store one or more words in the databases 240 along with at least one word found in the vocabulary list that may be related. One or more words can later be examined (e.g. by machine learning or a content developer) and added as a new term in the appropriate vocabulary list of the analyzer (e.g. as a new negative term). One or more words added can expand capacity and efficiency of the analyzer.

As a logical entity of the labeling machine, a problem solver can be a combination of at least two analyzers to integrate the logics of at least two analyzers. In some examples, an analyzer from the machine learning approach (label classifier) for a particular food attribute and an analyzer from the heuristic approach (heuristic labeler) for the particular food attribute can be combined as a problem solver to label foods that have the particular food attribute. The resulting problem solver can be denoted as an “attribute” food labeler. For example, a problem solver for identifying and labeling gluten free foods can be called a “gluten free” food labeler.

FIG. 8 illustrates an exemplary pipeline 800 of the “gluten free” food labeler. The pipeline can be a diagram representing a combination of different analyzers. The “gluten free” food labeler 810 can be designed by combining two analyzer nodes: a label classifier 820 and a heuristic labeler 830 that are both trained to identify the “gluten free” attribute. A relationship between the two analyzer nodes to define the food labeler 810 can be described by a logic node. The logic node can be an AND node or an OR node. The AND node can combine results of two analyzers in a strict approach. As illustrated in FIG. 8, if the “gluten free” food labeler is defined by the AND node 840, the food item may then be labeled as “gluten free” only when both analyzer nodes believe that the food item is “gluten free.” On the other hand, the OR node, in use, can combine results of two analyzers in a less-strict approach. For example (not shown in FIG. 8), if a “gluten free” food labeler is defined by the OR node, the food item may then be labeled as “gluten free” when at least one of the two analyzers believes it is “gluten free.” The pipeline also includes arrows 850, 852, 854 that connect the relationship among the nodes.

A problem solver can be a combination of multiple problem solvers to integrate the logics of the multiple problem solvers to fine a food attribute. The resulting problem solver can also be denoted as an “attribute” food labeler. FIG. 9 illustrates an exemplary pipeline of a “vegan” food labeler. The “vegan” food labeler can be defined by a combination of multiple nodes, including a label classifier for the “vegan” attribute, a heuristic classifier for the “vegan” attribute, and additional pre-defined food labelers for additional food attributes. A relationship between the “vegan” label classifier and the “vegan” heuristic classifier is defined by an AND logic node. Each of the additional pre-defined food labelers can be a combination of a label classifier and a heuristic classifier for one of the additional food attributes. The additional food attributes can be “dairy free,” “no eggs,” and “vegetarian,” and their relationships are defined by AND logic nodes. The resulting “vegan” food labeler can label a food item as “vegan” only when the following four requirements are true: (1) the “vegan” label classifier and the “vegan” heuristic classifier both believe that the food item is “vegan”; (2) the “dairy free” food labeler believes that the food item is “dairy free”; (3) the “no eggs” food labeler believes that the food item is “no eggs”; and (4) the “vegetarian” food labeler believes that the food item is “vegetarian.” The “vegan” food labeler in FIG. 9 will not label the food “vegan” if any of the four requirements are not true.

Every version of a problem solver (or a food labeler) can be tested using a training set, and analyzed using statistics. Statistics from a problem solver can be different from the analyzers' statistics because a problem solver is a combination of the results of its encompassing analyzers. Along with the precision, recall, and F₁scores, a utilization value can also be reported. The utilization value can be a fraction of the training set items that can pass a confidence threshold defined in the problem solver's pipeline. A problem solver with a pre-defined confidence threshold can have high precision and recall values, but a low utilization value. The low utilization value can signify that a high portion of the problem solver's results are not accurate, and that the problem solver may not be useful as a food labeler.

In an example, a confidence threshold node of 70% can be added to the pipeline of the “vegan” food labeler illustrated in FIG. 9 to generate a new “vegan” food labeler with a pipeline illustrated in FIG. 10. The new “vegan” food labeler can be tested using a training set, and statistics including a utilization value can be reported. The utilization value of the new “vegan” food labeler can represent a fraction of the training set items that can pass the 70% confidence threshold defined in the problem solver pipeline of the new “vegan” food labeler. A table of multiple problem solvers and their respective statistical analyses are illustrated in FIG. 11.

The food analysis system 210 can use the labeling machine to analyze and map multiple types of data related to foods into the food ontology. The food analysis system 210 can automatically obtain data (images or texts) related to the foods (e.g. nutrition facts labels, manufacturer's product information, restaurant menu, recipes, etc.) from one or more sources (e.g., the Internet 120, grocery store websites, restaurant websites, recipe blogs, user input, etc.). The food analysis system 210 can leverage deep learning, OCR, and/or NLP capabilities to convert images of at least one consumer packaged food, at least one restaurant menu item, or at least one food recipe into structured data according to a preferred format of the system. The food analysis system 210 can also reorganize the obtained text data into structured data according to the preferred format of the system. With the newly structured data, the food analysis system 210 can, in real time, analyze, and classify features of, and map at least one consumer packaged food, at least one restaurant menu item, or at least one food recipe into the food ontology.

During analysis and classification of the foods, the food analysis system 210 can automatically parse and classify types and amounts of ingredients specified in the obtained data. During the analysis and classification of the foods, the food analysis system 210 can automatically estimate types and amounts of unknown ingredients based on the known ingredients specified in the obtained data or other similar foods in the food ontology. The food analysis system 210 can use at least a probabilistic model to estimate ingredients that must or may appear in the foods (e.g. a restaurant dish). The food analysis system also can calculate a probability (or confidence level) of the foods having the estimated ingredients, as well as the expected ranged of amounts of each estimated ingredient.

Food classification can comprise matching the text for a raw ingredient to an equivalent elementary food item in the food ontology. The food classification can be implemented for example in scikit-learn via the following pipeline. First, word singularization can be performed, by processing the raw text using an inflect library to transform all nouns to their singular form. The inflect library can be used to correctly generate plurals, singular nouns, ordinals, indefinite articles, and convert numbers to words. Next, a count vectorizer can be used to extract the features as word vectors. Next, the k-best features are selected, according to the chi-squared test. Lastly, classification can be performed with multinomial logistic regression. The above pipeline can be executed in a grid search with cross validation to find the best k value.

FIG. 54 shows a graphical user interface (GUI) for managing the training set for food classification. The GUI enables the following capabilities for the training set management. For example, training samples (a mapping of text to an elementary food) can be added. While adding samples, it is possible to search recipe ingredients for sample text. Conversely, training samples can be removed. The interface also allows existing training samples to be viewed, and their mapped food items to be changed. Samples can be approved. Additionally, approval of some samples can be removed if those samples were created automatically and are not very accurate. The interface can also allow the data to be filtered by text/food item/approval state. The interface can allow relevant information to be accessed easily/conveniently by users.

The food classification can include a confidence score which can be used to filter or add human verification input for results. FIG. 55 shows a histogram of food classification success by confidence in accordance with some embodiments. As an example, if a confidence threshold of 0.7 were selected on the sample dataset, there may be 971 correct ingredient matches and 29 incorrect ingredient matches above the threshold, and 353 correct ingredient matches and 55 incorrect ingredient matches below the threshold.

As previously described, different food categories can present ingredients information in different ways, and therefore may need to be analyzed differently. In some embodiments, the food categories can be classified into a number of different models (e.g. 4 different models), each of which builds upon the results of the previous model.

A first model may include a food classification model. The first model may utilize the food classification techniques described elsewhere herein. Relevant food categories for the first model may include packaged foods, certain restaurant menus, and in some instances recipes (although recipes may not be commonly input to the first model). The input to the first model may comprise free text describing a food item, e.g. “cane sugar” or “brown rice,” and the output of the first model may comprise a food identity (food ID).

A second model may include a food parsing model. Relevant food categories for the second model may include recipes, and in some instances packaged foods (although packaged foods may not be commonly input to the second model), and in some instances restaurant dishes (although restaurant dishes may be rarely input to the second model). The input to the second model may comprise free text describing a food item with its amount, e.g. “1 cup of couscous,” and the output of the second model may comprise a food identity (food ID), amounts of ingredients, and measurement ID. Food parsing may comprise transforming a single raw-text ingredient (e.g., “1 cup brown sugar”) into an equivalent food item, serving and/or amount according to the food ontology. Food parsing may be performed for example using the following process. First, the elementary food item can be determined using the food classification from the first model. A list of all possible measurements can be extracted from the text and standardized using a standards unit library, whereby each unit is associated with a respective amount. The above list can be further expanded with conversions between different units. For example, if ‘tablespoon’ was found in the text, but ‘teaspoon’ was not, then a ‘teaspoon’ measurement can be added that corresponds to 3 times the amount of ‘tablespoon’. All measurement units of the matched food-item can be retrieved from the database (e.g. ‘cup’, ‘tablespoon’, etc.). The two lists are then matched to choose the most probable measurement with its amount. The food parsing functionality can be implemented as part of the food analysis system described herein.

A third model may include a free text analysis model. Relevant food categories for the third model may include restaurant dishes, and in some instances packaged foods (although packaged foods may be rarely input to the third model). The input to the third model may comprise a string of food description, for example: “House made deep fried chips tossed in a creamy chipotle salsa roja. Topped with cashew crema and cilantro. With chorizo and two hard-boiled eggs.” The output of the third model may comprise a list comprising a food ID, amounts of ingredients, and measurement ID. In certain cases, the amounts or measurements are not provided as part of the free text. In those cases the model can be configured to extract the ingredients ID, and mark the measurement ID and amounts as unknown.

Free text analysis may comprise using any of the NLP algorithms described herein. Free text analysis may comprise performing entity extraction on free text to extract food information provided in the text. The entities may comprise objects representing ingredients. Additionally, the entities may also comprise objects that need not necessarily represent ingredients, but that nonetheless provide insights about the labels or nutrients of the food. For example, in the case of the description of a food item named “Totopos”:

{{House made [deep fried]processing method [chips]food tossed in a creamy [chipotle salsa roja]food. Topped with [cashew crema]food and [cilantro]food. Served with [chorizo]food and [two]amount [hard-boiled eggs]food.}}

The entities may comprise for example: food items, amounts, measurement sizes, processing methods, dietary needs, restaurant names, among others. The free text analysis can associate each entity to one or more other relevant entities, for example the “two” at the end of the string is with reference to the “eggs.” Once an entity has been detected as a “food entity”, the entity can be run through the ingredients classifier to determine the exact food ID. Entity extraction can be solved using statistical methods such as conditional random fields (CRFs). In some cases, if the corpus of entities is sufficiently large, entity extraction can also be solved using Deep Learning techniques as well. In cases where a string contains food items separated by commas, entity extraction can be simplified, since the string can be split based on the commas separation before running the food parsing.

A fourth model may include a menu item analysis model. Relevant food categories for the fourth model may include restaurant dishes. The input to the fourth model may comprise name and description of a dish. The output of the fourth model may comprise a list comprising a food ID, amounts of ingredients, measurement ID, and probability of each ingredient being present in the dish.

Under menu item analysis, a menu item is different than pure textual description, as the menu item contains both the name of the dish and the description. The description need not always contain all of the ingredients, and therefore the name of the menu item can be used to estimate potential ingredients using any of the statistical methods described herein.

For example, consider the following two items from a menu:

(1) Name: Pad thai with chicken. Description: Stir-fried steamed rice noodles with chicken, eggs, mushroom, onions, cilantro and peanuts.

(2) Name: Fried rice. Description: choice of beef, pork, or chicken.

The menu item analysis pipeline may include multiple steps. Under statistical name analysis, a complete ontology of foods (which can be represented by a tree via a taxonomy) is assumed to be present, and that a branch in the taxonomy can be found based on a food name.

The union of all ingredients in the elements in the branch can be taken, and the probability of each ingredient appearing in the food item can be calculated. For example: “Pad thai with chicken” can be either its own branch or a part of the general branch “Pad thai”. All pad thais have rice noodles, 80% of them may have fish sauce, 23% of them may have mushrooms, etc. A probability for the occurrence of each food (calculated over the branch) can be given by: P(ingredient in food)=(number of food items containing ingredient)/(number of food items)

If a certain element in the branch has too few examples, the probability can be recalculated by going higher in the taxonomy.

The probability for the occurrence of each food can allow more information to be extracted from the menus. In some embodiments, the data can be further refined by crowdsourcing from users and/or restaurants about the occurrences of specific ingredients that the users are unsure about.

Free text analysis can be performed on the description, after which the probability of certain ingredients that were found in the previous step can be increased to 100% (or close to 100%). For example, in example (1) above, mushrooms may appear as an ingredient, and thus the confidence level for having mushrooms in the food item can be increased (e.g. from 23% to 100%). By estimating the ingredients amount, a list of probabilities can be generated for the occurrence of each ingredient. Next, a probabilistic model can be applied for the amount of each ingredient. Consider a threshold θ above which an ingredient is assumed to be a part of the dish. Initially this threshold can be assumed to be a value (for example 50%). In some embodiments, the threshold can be estimated by machine learning techniques to maximize the precision of the results.

In order to estimate the amounts, the branch in which the food item appears is again considered, and a probabilistic model is fitted for the amount of each ingredient. The probabilistic model is applied on all samples in the branch that contain the ingredient. Since the amounts are always positive, the probabilistic model can be a log-normal distribution which can provide both the expected amount as well as the standard deviation.

The food analysis system 210 can estimate unknown nutrients from at least one consumer packaged food, at least one restaurant menu item, or at least one food recipe. Once the types and amounts of known and unknown ingredients are abstracted or estimated using the OCR, NLP, labeling machine, or other algorithms of the food analysis system 210, such information can used to estimate nutrients that are not revealed by at least one consumer packaged food, at least one restaurant menu item, or at least one food recipe. The food analysis system 210 can estimate ranges of the nutrients that are not revealed. In some examples, the ranges of the nutrients can be in terms of their amounts (e.g., grams, milligrams, etc.) or their percent (%) daily value based on a recommended total daily requirement for each of the nutrients. In some examples, the nutrients that are not revealed can be macronutrients, including a breakdown of total fat (e.g. saturated fat, trans fat) or a breakdown of total carbohydrate (e.g., dietary fiber, total sugars, added sugars). In some examples, the nutrients that are not revealed can be micronutrients, including vitamins, macrominerals, and microminerals. The vitamins can include biotin, folic acid, niacin, pantothenic acid, riboflavin, thiamin, vitamin A, vitamin B₆, vitamin B₁₂, vitamin C, vitamin D, vitamin E, and vitamin K. The macrominerals can include calcium, phosphorus, magnesium, sodium, potassium, chloride, and sulfur. The microminerals can include iron, manganese, copper, iodine, zinc, cobalt, fluoride, and selenium. In some examples, the nutrients that are not revealed can be phytonutrients. The phytonutrients can be anthocyanins, ellagitannins, flavonoids, allylic sulfides, and isoflavones.

The food analysis system 210 can estimate unknown nutrients from at least one consumer packaged food. The food analysis system 210 can parse and classify data obtained from a nutrition facts label and a list of ingredients of at least one consumer packaged food. The nutrition facts label may show some but not all nutrients found in at least one consumer packages food. Additionally, the nutrition facts label may not disclose amounts (e.g., milligrams) of such nutrients. Thus, the food analysis system 210 can extrapolate known or unknown ingredients and their amounts from the data obtained using the labeling machine and other algorithms described in the disclosure. Using a nutritional breakdown of each of the known or unknown ingredient from the database(s) 240 of the platform 200, the food analysis system 210 can calculate types and amounts of nutrients that may be found in at least one consumer packaged food. By using the revealed items in the nutrition facts label (e.g., calories per serving, total fat, total carbohydrate, protein), the food analysis system 210 can compare its calculated values for the respective revealed items to validate its estimation.

The food analysis system 210 can estimate unknown nutrients from at least one food recipe. The food analysis system 210 can parse and classify a list of ingredients and their respective amounts from at least one food recipe using the labeling machine and other algorithms described in the disclosure. The food analysis system 210 can aggregate types and amounts of nutrients found in the ingredients and provide an estimate on a resulting nutritional value of the recipe. The nutrients can include at least one macronutrient, at least one micronutrient, or both. Such estimation can yield a high reliability as at least one macronutrient and at least one micronutrient are minimally affected by cooking methods. A pipeline to estimate unknown nutrients from at least one food recipe can include: (1) receiving a free text of a food recipe including ingredients, (2) parsing and classifying the ingredients and their respective amounts, and (3) superposing estimated nutritional values of the ingredients using a nutritional breakdown of each of the ingredients from the database(s). Alternatively or in addition to, the food analysis system 210 can include at least one machine learning algorithm based on at least NLP, statistical analysis, and multiple chemistry databases to estimate one or more effects of food processing (e.g., frying, boiling, microwave, cooking time, etc.) on a food's nutrients and their nutritional values. In some examples, such machine learning algorithm can be referred to as a food processing algorithm.

The food processing algorithm can include specific processing parameters (e.g., an absorption coefficient, evaporation coefficient, diffusion coefficient etc.) for one or more nutrients. The diffusion coefficient of each of one or more nutrients can take into consideration a size of a respective ingredient during cooking. An additional pipeline to estimate unknown nutrients from at least one food recipe can include: (1) receiving a free text of a food recipe including ingredients, cooking methods, and cooking time, (2) parsing and classifying the ingredients and their respective amounts by using at least the machine labeler and the food processing algorithm, and (3) superposing estimated nutritional values of the ingredients. During a cooking process, at least one ingredient can be used in two or more steps during cooking. Nutritional values of at least one ingredient can be calculated multiple times, respectively to the two or more steps during cooking.

The food analysis system 210 can estimate unknown nutrients from at least one restaurant dish. The food analysis system 210 can map at least one restaurant dish into the food ontology. Subsequently, the food analysis system 210 can use its algorithms described in the disclosure to detect which ingredients must or may appear in at least one restaurant dish. For each ingredient, the food analysis system 210 can calculate types and expected amounts of nutrients that may be found.

The food analysis system 210 can utilize automatic spider builders (also known as crawlers) to crawl the Internet to obtain all data related to foods, abstract and classify food-related information, and store the information in the database(s) 240. The Internet 120 can include websites of restaurants with their menus, food manufacturers, and recipe blogs. Each website can have a different structure, and can be disorganized and outdated. The automatic spider builders of the food analysis system 210 can detect the layout of each website by detecting an XPath corresponding to each food name, description, price, ingredients, etc. The XPath (or XML path language) can be a query language for selecting data points from an XML (extensible markup language) data structure, such that of a restaurant's website. The automatic spider builders can scale up a data collection process of the food analysis system 210 by orders of magnitude. The automatic spider builders can allow a content management team without technical skills to easily add thousands of new foods into the food analysis system 210.

The food analysis system can analyze a picture of a food dish that does not have any textual information about the food. In an example, a user can take a picture of a food item at a restaurant using a mobile device. The food analysis system 210 that is connected to the mobile device can automatically receive the picture of the food item. The food analysis system 210 can use at least its deep learning and OCR capabilities to abstract and classify ingredients that may appear in the food item. The food analysis system 210 can compare the food item to similar dishes that are already in the food ontology to abstract and classify the ingredients that may appear in the food item. If proven successful by internal testing models of the food analysis system 210, the food item can be mapped in the food ontology. The food analysis system 210 can compare a first characteristic absorption profile in at least a first portion of the electromagnetic spectrum of the picture of the food item of the user to a second characteristic absorption profile in at least a second portion of the electromagnetic spectrum of at least one of the similar dishes that are already in the food ontology.

An example of abstracting and classifying information from a consumer food package will be described in detail below with reference to FIGS. 12A-12C.

FIG. 12A illustrates a picture 1200 of Lemon Cheesecake 1210 from Dierbergs Bakehouse. Once imported into the platform 200, the food labeling system 210 in the platform 200 uses machine learning and OCR to automatically identify and draw a bounding box around one or more information sections 1211, 1212 of the picture 1200. A first information section 1211 contains ingredients in the Lemon Cheesecake, including milk, creak, sugar, barley flour, etc. A second information section 1212 contains warning disclaimers from the manufacturer, including other ingredients that have been utilized in the manufacturer's facility. The food labeling system 210 then uses NLP algorithms in real time to abstract every word from the information sections 1211, 1212. Subsequently, the food labeling system 210 uses the labeling machine to label and classify the information.

FIG. 12B illustrates a table 1220 of abstracted and classified information from the Lemon Cheesecake 1210. In the table 1220, the Lemon Cheesecake 1210 is identified by its product code 1230. The table 1220 presents a list of nutrients (e.g., calories, total fat, sodium, etc.) and their respective amount by weight (e.g., 130 g, 6 g, 170 mg, etc.) 1240-1250. The table 1220 presents a list of ingredients 1260, 1261. The table 1220 presents additional information from the manufacturer 1270-1275, including what the package contains, what the package may contain, what other products have been made by the same equipment from the same facility, and ingredients that have been utilized in the same facility. Importantly, the table 1220 presents one or more abstraction labels 1280, 1281 (e.g., contains added sugar, all natural ingredients, free from artificial colors, free from artificial flavors, etc.) generated by the labeling machine of the food analysis system 210 according to the abstracted and structured data in the table 1220. The table 1220 also contains a confidence score and an accuracy score for OCR and all food labelers used 1290-1297. The artificial intelligence-generated information can also be formatted in other formats, for example as illustrated in FIG. 12C. The labeling machine can work on a single picture or a batch of an unlimited number of pictures, and complete analysis in about 3 minutes to about 5 minutes total. If a picture has more than one packaged food, the food analysis system 210 can break down the picture into multiple sub-pictures. Each sub-picture can contain one packaged food.

FIG. 13 illustrates an exemplary table of a multitude of consumer packaged foods after abstraction and classification. The multitude of the consumer packaged foods can be organized by a machine generated numbering system or by their respective product codes. Each consumer packaged food can contain multiple pictures (e.g., of different years or from different angles, etc.). The table also keeps track of a number of ingredients, nutrients, allergens detected, as well as a number of food characteristic labels that have been found to be true (positive) and a number of food characteristic labels that have been found to be not applicable (negative) by the labeling machine.

FIG. 14 illustrates an exemplary a restaurant menu obtained and analyzed by the food analysis system 210. The food analysis system 210 can use OCR and other algorithms to detect and draw bounding boxes around food dishes and their respective menu and sub-menu titles. As shown in FIG. 14, by way of example only, the food analysis system 210 can break down the menu 1400 into food dishes and their titles 1410 and other non-food related information 1420. FIG. 15 illustrates an exemplary table of information abstracted from restaurants menus. This is a scalable approach that can capture menus of at least about 350,000 restaurant locations across the U.S. using AI, OCR, and/or other algorithms. Additionally, the AI can continuously monitor the web for new websites or previously-analyzed websites to identify new menus and expand the database.

The food analysis system 210 of the platform 200 can compile all analyzed information into the food ontology. The food ontology can be used as a web of all foods. The food ontology can describe relationships among foods, ingredients, nutrients and other characteristics. The food ontology can be a graph database constructed of nodes and edges. A node can represent a food. A food can be either a specific food (e.g., Coca Cola, banana, etc.) or an abstract one (e.g., salad, tuna sandwich, mac and cheese, etc.) that can be a combination of one or more specific foods. An edge can represent a relationship between two foods. Several types of relationship between the two foods can be allowed. In an example, two types of relationship between the two foods can be allowed. A first type of relationship between the two foods can be “IS_A,” representing a food that may be a variety or a part of another food. Some examples can include (1) “Yellowfin tuna IS_A (variety of) Tuna” and “Tuna IS_A (variety of) Fish,” and (2) “Chicken thigh IS_A (part of) Chicken” and “Chicken IS_A (variety of) Poultry.” A second type of relationship between the two foods can be “CONTAINS,” representing a first food that may contain a second food. The first food can contain the second food either after a type of processing, transformation, or directly. If such containment includes a processing of a single ingredient, the edge can include the processing method information. If such containment includes a processing of two or more ingredients, the node can include the processing method information. Some examples can include: (1) “Almond flour CONTAINS (grounded) Almonds,” and (2) “Cooked white rice CONTAINS (White rice, Water).” The food ontology can have additional types of pre-defined edges to describe additional relationships between any pair of two nodes. The food analysis system 210 can generate one or more new edges for the food ontology.

Nodes in the food ontology can be specific food items. The specific food items can include a specific kind of seed, a specific kind of vegetable, a specific kind of tuber, a specific type of edible fungi, a specific type of meat, etc. The specific food items can be referred to as leaf nodes (or leaves) of the graph database of the food ontology. A leaf node may be a node that does not have a child node. For example, Solanum tuberosum (Yukon Gold potato) cannot be broken down into sub-foods and can be registered as a leaf node in the food ontology. Thus, each of the leaf nodes can be a meta object existing above each specific food item. Additionally or in addition to, the food analysis system 210 can (i) receive data of specific food items from other nutrition services (e.g., USDA), (ii) analyze and map the data in the food ontology, and (iii) use at least one machine learning algorithm to learn how to generalize a group of specific food items and create a new leaf node. For example, “long brown rice” and “short brown rice” can be generalized into “brown rice,” and the food ontology can have “long brown rice,” “short brown rice,” and “brown rice” leaf nodes. If the food analysis system 210 determines that an incoming data from a third party database is already generalized (e.g., “brown rice”), the data may be plotted as a leaf node in the food ontology.

Graphical representation of the food ontology can be multi-dimensional, including two, three, or more dimensions. FIG. 16 illustrates an exemplary two-dimensional graphical representation of food ontology 1600 related to Panera Bread Tuna Salad. The food ontology 1600 contains multiple types of nodes. The food ontology contains multiple types of edges. A thick edge represents a relationship between two nodes that is directly abstracted from the raw input data. A thin edge represents a relationship between two nodes that is indirectly estimated by the food analysis system 210. A first type of node can be a consumer packaged food (e.g. Panera Bread Tuna Salad), an abstract food type (e.g., Tuna Salad, Salad, Wheat Wrap, Tuna Salad Wrap, or Wrap), or ingredients (e.g., Fish, Tuna, Pickles, Mayo, Eggs, Vinegar, or Salt). A node can be a subclass of a higher-class node, and such relationship can be represented by an IS_A edge. Some examples include “Panera Bread Tuna Salad IS_A (subclass of) Tuna salad” and “Tuna salad IS_A (subclass of) Salad,” and “Tuna salad wrap IS_A (subclass of) Wrap.” If the subclass as a whole is included in the higher-class node, such relationship can be represented by a CONTAINS edge. In an example, “Tuna Salad Wrap CONTAINS (Tuna Salad, Wheat Wrap).” Alternatively, a node can be an ingredient that is contained in a higher-ingredient node, and such relationship can be presented by the CONTAINS edge. As illustrated in FIG. 16, ingredients for Tuna Salad include Tuna, Mayo, Pickles, Eggs, Vinegar, and Salt. Since Tuna and Mayo are known ingredients, the food ontology describes Tuna and Mayo using a thick CONTAINS edge, as in “Tuna Salad CONTAINS (Tuna, Mayo).” Also, since Eggs, Vinegar, and Salt are known ingredients of Mayo, the food ontology automatically classifies the first three items using a thick CONTAINS edge, as in “Mayo CONTAINS (Eggs, Vinegar, Salt).” On the other hand, since Pickles is an unknown ingredient estimated by the food analysis system 210, the food ontology describe Pickles using a thin CONTAINS edge, as in “Tuna Salad CONTAINS (Pickles).” A node can have a scroll, indicating that the database(s) 240 of the platform 200 has specific nutritional values linked to the node item.

Importantly, the food analysis system 210 can standardize data from other nutrition trackers. Data from such databases can be unstructured, fragmented, and/or disorganized, and can be often be incompatible with each other. The food analysis system 210 can (1) obtain data from the other nutrition trackers, (2) convert such data into structured data having one common format, and (3) organize the structured data into multiple layers of information in the food ontology. Thus, the food ontology can be used as a standardized meta object existing over existing food and/or nutrition databases. For example, the food analysis system 210 can analyze and map a “burrito” from each of multiple databases (e.g., MyFitnessPal, Loselt, FatSecret, etc.) and track a user's food and/or beverage intake.

FIG. 41 illustrates an exemplary network layout 4100 between the food analysis system 210 of the platform 200 and one or more nutrition trackers 4110a-4110c. One or more nutrition trackers 4110a-4110c can be in digital communication with APIs 4120a-4120c, respectively. The APIs 4120a-4120c can be in digital communication with (1) database(s) 4130a-4130c for storing data and (2) software and/or applications comprising a GUI for receiving the data from and/or sending the data to a user (not shown in FIG. 41). The APIs 4120a-4120c of one or more nutrition trackers 4110a-4110c can allow the user to record the user's food intake, and provide nutritional and caloric information of the user's food intake. The APIs 4120a-4120c can feed all data in the database(s) 4130a-4130c, respectively. The data in the database(s) 4130a-4130c are unstructured, fragmented, and/or disorganized. Thus, while receiving the data relating to foods from one or more nutrition trackers 4110a-4110c, the food analysis system 210 uses one or more algorithms to (1) convert the data into structured data and (2) organize the structured data into multiple layers of information in the food ontology 4140. Such structured data are standardized into a common format of the food ontology 4140. Thus, a food item containing an abundance of information from the database(s) 4130a-4130c of the each of one or more nutrition trackers 4110a-4110c is mapped into the food ontology 4140, where the food ontology 4140 organizes the information by building one or more layers.

The food ontology, when combined with additional health-related data and/or machine learning algorithms of the platform 200, can be useful for a number of applications. Examples of such applications can include, but are not limited to: (1) estimate nutritional values for recipes and/or restaurant dishes; (2) provide food and health recommendations to a user, and to gain an understanding of the user's taste profile; (3) construct food logs; (4) generate missing elementary foods from existing packaged foods; (5) generate more accurate labels of food characteristics; (6) analysis of food costs; (7) model effects of cooking on nutritional values and estimate degree of food processing; (8) improved image classification or computer vision classification of foods; and (9) improved analysis of voice-based food log.

Device/Data Hub

The device/data hub 220 can generate a user's personalized data network between the platform 200 and the devices 110 and third-party database(s) 130. The device/data hub 220 can collect and aggregate food, health, or nutritional data. The device/data hub 220 can be a system used to collect and aggregate a plurality of data sets from a plurality of application programming interfaces (APIs). The plurality of data sets can be provided in two or more different formats. The plurality of data sets can include a plurality of physiological inputs associated with the user. The plurality of data sets can be collected from additional sources, such as the third-party database(s) 130 (e.g. healthcare providers). The device/data hub 220 can automatically aggregate food, biomarker, and health data of the user (e.g. nutrition, activity, sleep, genetics, glucose, menstrual cycle, etc.). Such data of the user can be continuously streamed by the device/data hub 220. The device/data hub 220 can be connected to one or more devices and/or one or more services, and collect one or more data points per month. The device/data hub 220 can be connected to over 100 devices and services and collect about 400 million or more data points per month. All incoming data, regardless of its source, can be fully integrated to a format of the device/data hub 220. The all incoming data can be fully integrated to software frameworks of the device/data hub. The software frameworks can be a web framework (WF) or a web application framework (WAF).

The device/data hub 220 can be a serverless system to store raw data from the devices 110 prior to analysis by the food analysis system 210 or the insights and recommendation engine 230. One or more changes in the APIs cannot affect the raw data stored in the device/data hub 220. Not one of the raw data can ever get lost in the device/data hub 220. The devices 110 and database(s) 130, and their respective APIs, that are compatible with the device/data hub 220 can include mobile devices, wearable electronics, medical devices, point-of-care (POC) devices or kits, sensors, etc. The sensors can include a glucose sensor, GPS receiver, heart rate monitor, galvanic skin response (GSR) sensor, skin temperature sensor, capacitive sensor, and metabolic sensor. The sensors can each be a discrete device with a discrete API. The sensors can each be an integrated component or function of one or more of the devices 110.

Examples of the devices 110 and their respective data types are provided. From Abbott glucose monitors, data of blood glucose levels can be obtained. From Fitbit, data including activity, steps, weight, and sleep can be obtained. From Jawbone, data including activity, steps, weight, and sleep can be obtained. From GoogleFit, data including activity and steps can be obtained. From Moves, data on activity can be obtained. From Runkeeper, data including activity, weight and sleep can be obtained. Additional types of the devices 110 can include a smart clothing item with a heart rate monitor, a sensor-enabled mattress that tracks and adjusts to an individual's snoring, earbuds equipped with an in-ear thermometer, an artificial intelligence-embedded toothbrush that collects tooth brushing data through sensors (frequency, duration, brushed area, etc.), a smart ring that tracks an individual's biomarkers (e.g., activity, steps, sleep, heart rate, etc.), a wearable electrocardiography monitor, a portable air-quality tracker, a medication adherence hub that is designed to be placed adjacent to a user's medication pills and alert the user of scheduled medications, an e-cigarette or a vapor, smart utensils designed to offset hand tremors from Parkinson's Disease and other unsteadiness-causing conditions, a pregnancy-tracking wearable that can help women track and understand contractions, hearing aids, a sensor to measure antioxidants in the skin, implantable (e.g. by swallowing) sensors for diagnosing gastrointestinal problems or measuring food intake and/or digestive conditions, a device that sits in the mouth to detect a sound made by chewing of foods, a portable spectrometer to measure absorption spectrum of foods, and a portable mass spectrometer to provide a partial chemical composition of foods.

The platform 200, including the device/data hub 200, can be implemented in a GUI-based software interface. The GUI-based software interface can connect the platform 200, including the device/data hub 220, to a user device (e.g., a personal computer, a smartphone, etc.). The GUI-based software interface can be compatible with any operating system of the user device. When in use, the GUI-based software interface can gain unlimited access to data in the user device or data accessible through the user device. The GUI-based software interface can automatically collect the data without dependency on the user input. The data in the user device can include pictures, videos, voice recordings, texts, location services, etc. The data accessible through the user device can also include data in the user's cloud storage services. The data accessible through the user device can include data from at least one third-party application that connects the user device to one or more third-party devices (e.g., a glucose monitor, temperature sensor, etc.).

The device/data hub 220 can be a seamless food image logger. By using the GUI-based software interface that can be installed in a user device, the device/data hub 220 can gain unlimited access to a camera roll of the user device. Every time an image is taken by the user device, a convolutional network of the device/data hub 220 can analyze the image to decide whether or not the image contains at least one food or beverage. The convolutional network can analyze the image when the user is using an application other than the GUI-based software interface (e.g., a photo application of the user device, Instagram, etc.). If the convolutional network can identify at least one food or beverage in the image, the image is automatically aggregated in the database(s) 240 with a timestamp and geolocation. The stored data of the image can be used for analyses by the food analysis system 210 and the insights and recommendation engine 230. The analyses can include studying how body metrics (e.g., blood glucose level, sleep time, steps, etc.) of the user can be affected by at least one food or beverage in the image. The seamless food image logger function of the device/data hub 220 can facilitate a tedious and incessant process of food tracking required for the food analysis system 210. Exemplary windows of the GUI-based software interface for the seamless food image logger function are illustrated in FIGS. 17A-17D. During or after installation in the user device, the GUI-based software interface can ask users for access to the camera roll of the user device.

The device/data hub 220 can perform food tracking by textual and voice recognition analysis. By using the GUI-based software interface installed in the user device, the user can record and store at least one free text or at least one voice message about foods (e.g., foods the user has eaten, foods the user plans to eat, foods the user wants to learn more about, etc.) to the device/data hub 220. The device/data hub 220 can utilize a third-party service (e.g., Speech2Text) to automatically convert at least one voice message into a respective free text. The stored data of at least one free text or at least one voice message can be used for analyses by the food analysis system 210 and the insights and recommendation engine 230. In an example, if the user types a free text, “1 slice of bread with two eggs and a cup of coffee,” to the GUI-based software interface, the device/data hub 220 can save data of the free text to the database(s) 240 and instruct the food analysis system 210 for analysis of the data. The food analysis system 210 can abstract and classify nutritional information (e.g., carbohydrate or nutrient intake) of foods mentioned in the free text, and map the foods into the food ontology. Subsequently, the device-data hub 220, in communication with the food analysis system 210, can inform the user results of the analysis. The textual and voice recognition function of the device/data hub 220 can facilitate a tedious and incessant process of food tracking required for the food analysis system 210. Exemplary windows of the GUI-based software interface for the voice recognition analysis function are illustrated in FIGS. 18A-18C. The GUI-based software interface window can display example sentence structures that users may use to record a voice message to the device/data hub 220, as shown in FIG. 18A. After automatically converting the voice message into a free text, the GUI-based software interface window can display the free text to the user, as shown in FIG. 18B. After immediately analyzing food-related information from the free text, the GUI-based software interface window can display the results of the analysis (e.g. a food item and predicted calories and serving size), and also ask the user to validate or edit the result for accuracy prior to saving the result in the database(s) 240, as shown in FIG. 18C.

The device/data hub 220 can be in communication with a number of medical devices and healthcare databases to continuously stream and store a user's personalized data. The stored users' personalized data can be used for analysis by the insights and recommendation engine 230. The user's personalized data can include glucose levels. The device/data hub 220 can be in communication with a glucose meter. The device/data hub 220 can be in communication with a continuous glucose monitoring (CGM) device, otherwise known as a real-time CGM (RT-CGM) device. The CGM device, in combination with its corresponding GUI on the device and/or on a user device, can determine glucose levels in the blood on a continuous basis. The CGM device can monitor glucose levels of the interstitial fluid as a close correlation to the blood glucose levels. A measurement of a glucose level of the interstitial fluid can have at most about 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5% or less error than a respective blood glucose level. The measurement of a glucose level of the interstitial fluid can have a shorter delay in response than a measurement of a blood glucose level. The CGM device can remain functional during a user's daily activities, including showering, exercising, sleeping, etc. The CGM device can be used for a user with type 1 diabetes or type 2 diabetes in order to assess when the user should inject insulin. The CGM device can be useful for athletes to optimize their athletic performance. The CGM device can be useful for individuals interested in tracking food intake as way to monitor their weight loss diet. Some examples of the CGM device can include the FreeStyle Libre by Abbott, the G4, G5 and G6 devices by Dexcom, and the Enlite by Medtronic, etc.

A network according to the present disclosure may comprise device/data hub 220, devices 110, third-party database(s) 130, and database(s) of the platform 240. The platform 240 can be a virtual private cloud (VPC) (e.g. Amazon VPC) for data storage. The network may utilize many independent software components that are collectively known as the open-source Apache Hadoop™ stack. These components may include products such as: Cassandra™, CloudStack™, HDFS, Continum™, Cordova™, Pivot™, Spark™, Storm™, and/or ZooKeeper™. The machine learning/NLP algorithms described herein may leverage state-of-the-art machine learning libraries, including public libraries such as those built upon the Apache Spark™ and Python® systems.

Insights and Recommendation Engine

The insights and recommendation engine 230 can (1) access the food ontology in the food analysis system 210, (2) access a plethora of personal biomarkers data from the device/data hub 220, (3) analyze how foods affect a user's biomarkers, and (4) continually generate personal nutrition recommendations to the user. The relationship among the three components of the platform 200 is illustrated in FIG. 20. Upon analyzing and validating how foods may affect the user's biomarkers, the insights and recommendation engine 230 can generate one or more personalized digital signatures unique for the user. A personalized digital signature can be an algorithm to estimate the response of a specific biomarker of the user to consumption of a specific food item. For example, a digital signature for blood glucose (or sugar) level can take into consideration that blood glucose levels of two individuals may respond differently to consumption of identical food items (coffee, apple, and a sandwich) at identical times, as shown by graph in FIG. 21. One or more personalized digital signatures can be unique to a number of other factors related to the individual, including gender, age, race, genetics, microbiome, religious dietary restrictions, geography, height, weight, and time of the day, month, or year. In addition to end users of the insights and recommendation engine 230 and other functions provided in the present disclosure, additional partners that may benefit include healthcare device manufacturers, third party big data, third party databases, and insurance companies, as shown in FIG. 22.

The insights and recommendation engine 230 can determine various effects of food consumption on the user's body by applying at least one predictive model to a plurality of data sets. The plurality of data sets can include foods consumed by the user and physiological inputs associated with the user. Such data can be obtained from a plurality of sources, including discrete APIs. The plurality of data sets can also include information about the foods consumed by the user from the food ontology. Applying at least one predictive model to the plurality of data sets can generate a plurality of personalized food and health metrics for the user.

The insights and recommendation engine 230 can include a number of analytics and deep learning algorithms, including statistical analysis and artificial neural networks (ANN). The ANN can be a mathematical or computational model that is inspired by the structural or functional aspects of biological neural networks. The ANN can include a group of artificial neurons (units) that are interconnected. The ANN can be an adaptive system that is configured to change its structure (e.g., the connections among the units) based on external or internal information that flows through the network during the learning phase. The ANN can be used to model complex relationships between inputs and outputs or to find patterns in data, where the dependency between the inputs and the outputs cannot be easily attained. In some examples, the complex relationships can include how foods affect the user's body in a multitude of biomarkers.

As an alternative or in addition to the ANN, the insights and recommendation engine 230 can include biomathematical predictive models that use metric spaces, decision trees, and decision tree learning algorithms. A metric space can provide a “ruler” or an absolute measurement of how different two feature vectors are. The metric space can be used to define a “distance” between the two feature vectors. A decision tree can be a support tool that uses a tree-like graph or model of decisions and their possible consequences. The decision tree can include one or more leaf nodes (leaves) that represent final decisions. An entire path leading to one or more leaf nodes can represent a rule for arriving at one or more decisions, respectively. A decision tree learning algorithm can be an inductive machine learning mechanism that extrapolates accurate predictions about future events (unknown) from a given set of past (known) events. The decision tree learning algorithm can also provide a measure of confidence that the predictions are correct (e.g., a coverage rate, accuracy rate, and confidence interval). The minimum confidence interval for the tree learning algorithm's accuracy rate can be maintained to at least about 70, 75, 80, 85, 90, 95% or more.

The insights and recommendation engine 230 can use data collected and analyzed by the food analysis system 210 and the device/data hub 220 to generate at least one decision tree learning algorithm. At least one decision tree learning algorithm can be used to predict how foods previously consumed by a user may affect personalized biomarkers of the user (e.g. glucose level). At least one decision tree learning algorithm can also be used to predict how foods that the user has never consumed or other lifestyle events that may affect the user's biomarkers (e.g. glucose level).

An example of how the insights and recommendation engine 230 can analyze a collection of data aggregated by the device/data hub 220 is described in detail. The device/data hub 220, in communication with a user's CGM device, can continually record the user's blood glucose level as a function of time, as demonstrated by graph in FIG. 23. The device/data hub 220 can also record what foods the user has consumed and their respective timestamps. FIG. 23 shows that the user has eaten cereals with milk at 8:00 A.M., nuts at 9:00 A.M., and dates at 10:00 A.M. The data also shows different degrees of spikes in the blood sugar level plot. To partial data and generate a “blood glucose level” digital signature of the user, the insights and recommendation engine 230 can extract known events (e.g., food consumption) and its associated blood sugar levels, and classify the known events by a degree of change in blood glucose level in response to each event. The breakdown can be performed using a biomathematical predictive model. FIGS. 24A-24B illustrates how 6 different food items can be classified into two trend groups based on their effect on an individual's blood glucose level. Graph in FIG. 24A illustrates that white bread and buns (also from white wheat) can affect the user's blood glucose level similarly and negatively. A relatively large variation of blood glucose levels can imply insulin-mediated fat storage as a response to respective foods. On the other hand, graph in FIG. 24B illustrates that coconut ice-cream, lentils, and salads have a relatively small effect on the user's blood sugar level. Although not shown in FIGS. 24A-24B, other biomarkers can be analyzed in a similar approach.

The insights and recommendation engine 230 can generate and use one or more digital signatures of a user to provide one or more recommendations to the user. One or more recommendations can be related to food, health, or wellness. In some examples, the insights and recommendation engine 230 can suggest meal plan recommendations that are tailored to an individual's body and its responses. The recommendations can include which specific foods to consume, where to find the specific foods (e.g. name and location of a restaurant), basic ingredients for the specific foods, how to prepare the specific foods (e.g., methods of cooking), when to consume the specific foods (e.g., between 4:30-5:30 P.M.), how much to consume, which activities or steps to take (or avoid) after consumption of the specific foods, etc. The insights and recommendation engine 230 can also track what the user likes to eat and does not like to eat. The insights and recommendation engine 230 can also predict what other types of foods the user will like or dislike, and use such one or more predictions to generate personalized recommendations. The personalized recommendations can yield high rate of compliance by the user.

The insights and recommendation engine 230 can send one or more personalized messages to a user while the user is using the GUI-based software interface. Additionally, one or more personalized messages can be pop-up messages, voice messages, and e-mails to the user device while the user is not using the GUI-based software interface. One or more personalized messages can suggest the user to consume less (or more) of one or more food items, or stop (or start) consumption of one or more food items. The insights and recommendation engine 230 can send predicted effects of one or more foods to the user's body.

The insights and recommendation engine 230 can suggest recommendations that can drive a behavioral change in a user. The behavioral change can be a preference selected by the user, or a suggestion generated by the insights and recommendation engine 230. The behavioral change can include eating less carbohydrate to lose weight. For example, the user may eat pizza often for lunch, and the insights and recommendation engine 230 can detect that consumption of pizza is correlated with a spiked increase in the user's blood glucose level. The insights and recommendation engine 230 can identify other foods that, when eaten with pizza, can reduce the blood glucose level. The insights and recommendation engine 230 can also identify one or more alternative food items to replace pizza.

By using the GUI-based software interface in a user device, the insights and recommendation engine 230 can receive a menu input, track a user's geolocation using GPS of the user device, search nearby restaurants, and recommend a different menu item available in the vicinity of the user. The insights and recommendation engine 230 can provide ordering tips and reasons for the different menu item. Exemplary windows of the GUI-based software interface for personal recommendations on menu items are illustrated in FIGS. 25A-25C.

The insights and recommendation engine 230 can be useful for individuals with type 1 diabetes or type 2 diabetes. An individual's blood glucose level can be affected by foods consumed and the individual's lifestyle (e.g., physical activity, sleep, stress, etc.). If the blood glucose level is too high, then the individual's body may secrete a hormone called insulin to help regulate the blood glucose by directing fat cells to absorb glucose. The insulin can also direct other cell types to absorb the blood glucose as a source of energy. For diabetics, the blood glucose level can be higher than normal due to the body's impaired ability to produce or respond to insulin. An individual with type 1 diabetes can have an insufficient production of insulin in the individual's body. An individual with type 2 diabetes can have an insufficient production of insulin and/or insulin resistance in the body. Individuals with the type 1 or type 2 diabetes can rely on insulin injections to control their blood glucose levels. Thus, the insights and recommendation engine 230 can (1) monitor a user's food intake, blood glucose levels (either continuously from a CGM device or in a discrete manner using a conventional glucose meter), as well as insulin levels for users using the insulin injection therapy, and (2) analyze relationships among specific food types, insulin injections, and blood glucose level responses. The insights and recommendation engine 230 can use the GUI-based software interface in the user device to display the recommendations to the user. From the recommendations, the user can find out specifically which food items the user's blood glucose level is most responsive to, which food items to avoid or consume more, an optimized time interval for insulin injection, etc. In an example, a recommendation may suggest that, “When you added avocado to your sandwiches your glucose response was over 30% lower [occurred 5 out of 6 times].” Other biomarkers that may be collected and correlated to insulin and blood glucose levels by the insights and recommendation engine 230 can include exercise, stress, activity, medications, menstrual cycle, etc. A combination of foods and at least one or more of the other factors can improve the quality of recommendations generated by the insights and recommendation engine 230.

FIG. 26 illustrates an exemplary window of the GUI-based software interface for blood glucose logging. The blood glucose logging can be performed using a conventional glucose meter or a CGM device. The window can display a food item (e.g., Avocado toast) that has been consumed by the user. The window can display a change in the user's blood glucose level at one specific time point following consumption of the food item. Additionally, the window can display a report on the user's blood glucose level profile within a time period. The time period may capture time points before and after consumption of the food (e.g. pre-meal, or 2 and 3-hour post-meal). The insights and recommendation engine 230 can alter a length of the time period and a number of time points within the time period based on a profile of the food- and blood glucose-related data. Based on a recommended range of the blood glucose level, the insights and recommendation engine 230 can assess whether each of measured blood glucose levels is within or out of the recommended range, and display the assessment on the window.

FIG. 27 illustrates an exemplary window of a GUI-based software interface for displaying a recommendation based on an automatic blood glucose logging. The automatic blood glucose logging can use a CGM device. The CGM device may be in communication (e.g. via Bluetooth, Wi-Fi, etc.) with the GUI-based software interface in a user device. The GUI-based software interface can be connected to and leverage all features of the platform 200, including the food analysis system 210, the device/data hub 220, and the insight and recommendation engine 230. The window can display a graph of the user's blood glucose level, including a most recent measurement from the CGM device. For a food item that the user may be interested in consuming, the insights and recommendation engine 230 can generate an insight (e.g., a free text, image, graph, etc) on how the food item has affected the user's blood glucose level in the past. The insight can be displayed to the user on the window. The insight may help the user to make an informative decision regarding consumption of the food item. If the user has a wearable insulin delivery device, the insight on the window may also inform the user about different bolus options available in the wearable insulin delivery device.

The insights and recommendation engine 230 can use one or more biomathematical models described in the present disclosure to predict a user's general biomarkers. In an example, the insights and recommendation engine 230 can predict a user's glucose metabolism. The glucose metabolism process can begin with digestion. Upon digestion, glucose can be absorbed into the bloodstream upon entering the small intestine. When the blood glucose level increases, the pancreas can release a hormone called insulin to control blood sugar. Insulin can help with transfer of glucose into a number of cell types with insulin receptors. Examples of the cell types can include adipocytes (fat tissue), muscle myocytes (muscle), and hepatocytes (liver). Thus, the user's glucose metabolism can depend on one or more factors, including, but are not limited to, the user's glucose and insulin production levels, blood glucose level prior to food consumption, carbohydrate content in the food, insulin level in the body, blood pressure, physical activity, the user's insulin sensitivity, time of the day, stress, illness, pregnancy, medications, etc. Thus, there may be various ways to generate biomathematical models with one or more mathematical parameters to describe and predict relationships between such factors and the user's glucose metabolism. In some examples, some of the factors may be more relevant to the glucose metabolism than the others. In addition, relevance of such factors to the user may change over time.

In an example, a Glucose Absorption and Insulin Assimilation (GAIA) model can be a biomathematical model to describe and predict the user's glucose metabolism and its interaction with insulin. The insulin may be injected insulin (exogenous) for patients who receive insulin injections or endogenous insulin. The GAIA model can use the user's historical data on food consumption as well as blood glucose and insulin levels to predict glucose responses. The GAIA model can use 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or more historical meals from the user, coupled with their respective glucose and insulin levels, to predict the user's glucose and insulin responses to one or more new meals. FIG. 28 illustrates a flow chart of the GAIA model to predict glucose and insulin interactions in the body. An exemplary graph plotting measured glucose levels and GAIA-estimated blood glucose levels is shown in FIG. 29.

The GAIA model can be defined by using one or more models for glucose G (t) and insulin I(t) in the blood. In an example, the glucose model can be a differential equation for the concentration of glucose in the blood as a function of time, as shown by the following equation:

$\frac{dG}{dt} = a (t) + e (t) - u (t) - φ (G, I)$

where a(t) is the rate of glucose absorption from food in the blood,

e(t) is the rate of endogenous glucose production by the liver,

u(t) is the rate of glucose utilization by the body, and

ϕ(G,I) represents the glucose-insulin interaction in the blood.

In healthy individuals with normal blood glucose levels, a total glucose uptake (utilization+interaction; or u(t)+ϕ(G,I)) can range from about 1.9 to about 2.2 mg·kg⁻¹·min⁻¹. FIG. 29 illsutrates a four-arm blood glucose model modeled using the above equation.

The rate of glucose absorption from food a(t) can depend on several parameters. Such parameters may be meal-dependent or meal-independent. For example, the rate of glucose absorption from food a(t) can be described by four meal-dependent parameters (a₁, a₂, a₃, a₄) that can represent a rate at which the glucose absorption increases or decreases, and the total amount of glucose absorbed after a given meal.

A simplified model can be constructed showing an expected glucose absorption rate for meals over an immediate duration after the meal (e.g., less than 2 hours) or for non-complex meals over an extended time period (e.g., more than 2, 3, 4, 5, 6, 7, 8, 9, 10 or more hours). If a meal contains substantial amounts of fats or protein, then the glucose absorption may be slower. As such, calculating the glucose absorption may include multiple maxima that are not captured by this model. In such cases, the model may only be used for the immediate duration after the meal (e.g., less than 2 hours). The rate of endogenous glucose production (EGP) by the liver into the bloodstream e(t) can depend on several parameters. Such parameters may be meal-dependent or meal-independent. For example, the rate of EGP can be described by two meal-dependent parameters (ε₁, ε₂) and I(t) which is a function of insulin in blood, and where ε₁represents a baseline endogenous glucose production in the absence of insulin, and ε₂represents the rate of endogenous glucose production decrease with plasma insulin.

The rate of glucose utilization u(t) can depend on several parameters. Such parameters may be meal-dependent or meal-independent. For example, the rate of glucose utilization u(t) can be described as a function that grows nonlinearly for duration of a meal, which function can depend on three meal-independent parameters, where v₁and v₃represent the two asymptotes of the glucose utilization rate (at the initial time and final time respectively), and v₂represents the rate at which the utilization rate u(t) varies. The rate of glucose utilization u(t) may be useful for breakfasts since glucose utilization is expected to surge in the mornings after an individual wakes up from sleep.

The glucose-insulin interaction in the blood ϕ(G,I) can be meal-dependent or meal-independent. For example, the glucose-insulin interaction in the blood ϕ(G,I) can be modeled as a non-linear interaction that is meal-independent. The glucose-insulin interaction in the blood ϕ(G,I) can be a function of different parameters, e.g. the blood glucose concentration G(t), the plasma insulin concentration I(t), an insulin sensitivity λ, and G₀which represents how much insulin sensitivity decreases as blood glucose level increases for each patient. The interaction term ϕ(G,I) can be approximated using a Taylor series around different blood glucose concentrations.

The concentration of plasma insulin I(t) can vary for each patient. For example, for patients living with type 1 diabetes for over a year, all insulin in the body may be assumed to have been provided by an exogenous source S. The source S can be either an insulin pump or a manual insulin injection. The insulin model for patients with type 1 diabetes can us a three-compartment model, as described by the following differential equation:

$\frac{d}{dt} [\begin{matrix} I_{inj} \\ I_{isf} \\ I \end{matrix}] = M [\begin{matrix} I_{inj} \\ I_{isf} \\ I \end{matrix}] + [\begin{matrix} S \\ 0 \\ 0 \end{matrix}]$

- where I_injis a volume of insulin at an injection site,
- I_isfis a volume of insulin in the interstitium (interstitial fluid),
- I is the plasma insulin concentration as a function of time,
- S is the rate of insulin provided by the source (often an insulin pump).
  
  The source S can be represented as a superposition of fixed basal rates, square boluses, and normal boluses (represented by delta functions). The rate constants of insulin transfer through the various compartments are represented by the matrix M:

$M = [\begin{matrix} - k_{d} - k_{0} & 0 & 0 \\ k_{0} & - (k_{x} - k_{1}) & 0 \\ 0 & \frac{k_{1}}{V_{i} m} & - k_{I} \end{matrix}]$

- where m is the mass of the patient in kilograms,
- k₀is a rate constant for the transfer of insulin from the injection site to the interstitial fluid (1/hour),
- k₁is a rate constant for the transfer of insulin from the interstitial fluid to the plasma (1/hour),
- k_dis a rate constant for the insulin loss at site (1/hour),
- k_xis a rate constant for the transfer of insulin from the interstitial fluid outside of the blood (unused insulin; 1/hour),
- k_lis a rate constant for the disposing of plasma insulin (unused insulin; 1/hour), and
- V_iis an effective volume of plasma per body weight (liters per kilogram).
  
  FIG. 30 illustrates a flow chart of such propagation of exogenous insulin from injection to the interstitial fluid to the plasma, as well as the respective rate constants. The flow chart also includes potential losses of insulin and their rate constants.

As illustrated in the above, by way of example only, the GAIA model can utilize 11 parameters, including 4 glucose absorption parameters (a₁, a₂, a₃, a₄), 2 endogenous glucose production parameters (ε₁, ε₂), 3 glucose utilization parameters (v₁, v₂, v₃), and 2 glucose-insulin interaction parameters (λ, ϕ). Some of the parameters can be meal-dependent, thus may vary from one meal to another. Some of the parameters can be meal-independent, thus may remain fixed over duration of multiple meals. Some parameters may switch between meal-dependent to meal-independent, from meal to meal.

The GAIA model can use at least one prediction pipeline to make at least one prediction on the changes in blood glucose level of a user. In an example, the pipeline can include the following steps: (1) identify one or more historical meals that have sufficient and reliable data (e.g., an historical meal with known ingredients, with respectively tracked glucose and insulin levels for 2 hours before the meal and 4 hours after the meal); (2) fit the measured glucose and insulin levels of each of one or more historical meals to the above models/equations, and obtain a parameter space of all physical values and 11 parameters of the GAIA model, in which the parameter space may be a set of all possible combinations of the physical values and the 11 parameters; (3) generate a distribution function of the parameter space; (4) repeatedly sample each combination from the parameter space by (i) recalculating the meal-dependent parameters with the generated meal-independent parameters, (ii) calculate the error in the fit for each meal, (iii) calculate the total error over all meals, and (iv) repeat (i)-(iii) as long as the total error decreases; and (5) use the finalized parameters to generate a prediction model that is personalized for the user. Furthermore, the insights and recommendation engine 230 can use the finalized parameters and machine learning to study how other biomarkers of the user may have affected the meal-dependent parameters.

FIG. 31 illustrates an exemplary fitting 3100 of the GAIA model. Line 3110 is a measured rate of the change of the blood glucose level 3115. Line 3120 is an estimated rate of the change of the blood glucose level 3125. The estimated rate of the change of the blood glucose level is similar to the measured rate of the change of the blood glucose level. Line 3130 is the glucose-insulin interaction in the blood 3135. Line 3140 is a subtraction of the rate of glucose utilization by the body from the rate of endogenous glucose production by the liver 3145. Line 3150 is the rate of glucose absorption from food in the blood 3155.

The insights and recommendation engine 230 can be used for passive food tracking. Factors such as physical activity and sleep can be tracked passively and automatically via wearable devices (e.g., Apple Watch, Fitbit, Samsung Gear, etc.). Additionally, the blood glucose level can be tracked passively and automatically via the CGM device. On the other hand, tracking food intake can require proactive and frequent interventions by a user (e.g. manual logging via voice or text). Such food intake tracking can be a tedious and unreliable process for data collection. By using machine learning, the insights and recommendation engine 230 can combine a group of user-specific parameters to generate a prediction model for passive food tracking. The parameters can include the user's geolocation using the GPS on a user device. The parameters can include changes in the user's blood glucose level monitored by the automatic blood glucose logging function. The parameters can include the user historical food and/or beverage consumption and blood glucose response data, as well as the food ontology of the food analysis system 210. In an example, when there is a spike in the user's blood glucose level, the insights and recommendation engine 230 can (1) search for the user's historical food items with a similar blood glucose response (e.g., intensity and duration); (2) generate a list of available foods in the vicinity of the user; (3) use the GAIA model to predict the user's blood glucose response to each of the available foods in (2); (4) find food items that are repeatedly consumed by the user; (5) find common food items with a similar glucose profile among steps (1) to (4); and (6) predict which of the common food items the user most likely consumed in the past 2-3 hours. The steps aforementioned can be used to assist in the passive food tracking. Additional parameters for generation a prediction model for passive food tracking can include sound made from chewing foods, absorption spectrum of foods, and partial chemical composition of foods.

The insights and recommendation engine 230 can predict a user's eating patterns or habits. As much as 78% of meals can repeat themselves for an individual's diet. The insights and recommendation engine 230 may find re-occurring patterns in the diet based on (1) the user's historical meal or beverage consumption data, (2) relations between different foods derived from the food ontology, and (3) location and/or time of day. For example, as illustrated in FIGS. 32A-32B, the user may have a habit of eating bananas, red tomatoes, whole wheat toast, and white rice on a first day, and a broccoli rice bowl on a second day. Next time the user consumes a sub-combination or an entirety of banans, red tomatoes, whole wheat toast, and white rice in a, the insights and recommendation engine 230 may predict that the next meal would be the broccoli rice bowl. The insights and recommendation engine 230 can use the GUI-based software interface to ask the user to confirm or correct the prediction prior to logging the meal. Based on the user's response, the insights and recommendation engine 230 can confirm or improve its eating pattern prediction algorithm. In another example, a user can input “omelette” in the GUI-based software interface, and the insights and recommendation engine 230 may predict that the next most likely foods to be logged can be “bread” and “coffee” and auto-complete the user's meal as “Omelette with sliced bread and a cup of coffee.” Such auto-completion capability can allow user clicks and/or inputs to be reduced by at least 30, 40, 50, 60, 70%, or more.

Thus, if the user opens the GUI-based software interface on a user device, the insights and recommendation engine 230 can (1) detect time and/or the user's geolocation using GPS; (2) search for the user's repeated historical meals at a similar time or geolocation; (3) generate a list of available meals in the vicinity of the user; (4) find common food items among (1)-(3); and (5) predict which of the common food items the user is most likely to consume soon. As shown in FIG. 32B, the insights and recommendation engine 230 may use the GUI-based software interface to ask and/or confirm if the prediction was correct. Based on the use response, the insights and recommendation engine 230 can confirm or improve its eating pattern algorithm.

The insights and recommendation engine 230 can also find food consumption patterns of each user and of a broad population. The broad population can be a collection of 5, 10, 100, 10,000, 100,000 or more individual users. The insights and recommendation engine 230 can combine the detected food consumption patterns of the user and the detected food consumption patterns of the broad population to determine in which type of a population that the user may belong.

In addition to glucose and insulin, the insights and recommendation engine 230 (herein referred to as the “engine”) can be used to track and/or predict other factors that may affect and/or be affected by foods. The engine can be used to calculate antioxidant (e.g. thiols, vitamin C, etc.) levels. By using a device that can measure antioxidants at a position in the body (e.g. skin) automatically or with human intervention, the engine can generate one or more digital signatures for the user to track and predict antioxidant responses to foods. The engine can be used to calculate blood pressure levels. By using a continuous blood pressure monitoring device, the engine can provide insight on the relationships between foods and blood pressure, especially for users with hypertension or other cardiovascular conditions. The engine can be used to calculate digestion problems. By using an implantable (semi- or completely permanent) or swallowable (temporary) sensors, the engine can provide insight on the relationship between foods and the conditions of the digestive track (e.g., pH, contraction intensity and/or frequency, etc.). Such function may help patients with digestion problems (e.g., gastroesophageal reflux disease (GERD), Irritable bowel syndrome (IBS), etc.) eliminate foods that may hinder their quality of life. The engine can be used to calculate onset of migraines. By studying associations between foods and migraines (e.g., from manual user input), the engine can help users with chronic migraines to eliminate foods that may be predicted to cause migraines. The engine can be used to help users with sleep. The quality of sleep may be measured using a wearable device (e.g., Apple Watch, Fitbit, Samsung Gear, etc.). Sleep can be affected by food intake, but, conversely, sleep may also affect the user's hunger or metabolism. Thus, the engine can find correlations between the user's food intake and quality of sleep, and provide insights and recommendations accordingly. For example, the engine can use the GUI-based software interface to inform the user that “In 92% of the times you drank coffee after 4 P.M., you slept poorly.” Alternatively or in addition to, the engine may relationships between foods and other factors including, but not limited to including, lethargy/fatigue, drowsiness, or cortisol levels. Any feature of a user's daily activity or physiology that can be measured by a wearable device or medical device may be analyzed by the engine to continuously provide users a better understanding of their bodies and/or a healthier diet.

FIGS. 33A-33C illustrate exemplary windows of the GUI-based software interface showing multiple features. As shown in FIG. 33A, the window may display which food item the user has consumed (e.g. Pear Smoothie), how many ingredients are found or predicted to be present in the food item (e.g., Mangos, Goji Berries, Green Anjou Pear, etc), a generic or specific picture of the food item (e.g. directly imported from a respective restaurant's website), as well as a continuous tracking of the user's blood glucose level and insulin injection(s). As shown in FIG. 33B, the window may display a timeline of multiple events, including insulin injection and its dosing amount, daily activities (e.g. Bicycling), and food items or dishes (e.g., Green on Green Salad, Morning Smoothie). As shown in FIG. 33C, the window may display which food items the user has frequently consumed (e.g. Goji Berries, Mangos), and an average blood glucose response to such food items.

FIGS. 34A-34C illustrate exemplary windows of a GUI-based software interface showing a compressive report by the insights and recommendation engine 230. As shown in FIG. 34A, the window may display a graph showing a user's blood glucose measurements throughout the day. The window may display additional details about blood glucose, including the user's average weekly glucose level and its standard deviation value, the percentage of glucose measurements that have been in a predetermined target range (Time in target), below the target range, and above the target range. The window may also display popular eating hours, represented by the average of number of meals that have been consumed per every hour throughout the day, and a distribution plot of carbohydrates consumed (e.g., in grams) per every hour throughout the day. The window may also display average glucose values at 2 hours post-meal (e.g., breakfast, lunch, and dinner). The window may also indicate whether the average glucose values at 2 hours post-meal are above or below the predetermined target range of blood glucose level. As shown in FIG. 34B, the window may display more details on the analyses of meals. Meals may be grouped into breakfast, lunch, and dinner. For each group, the window may display a minute-to-minute (or other intervals) change in blood glucose level during the 2 hour post-meal period, an average meal nutrition breakdown (e.g., protein, carbohydrates, fats, etc.), and an assessment of the balance of the average meal nutrition breakdown. In addition, the analyses of meals may include top three meals with significant changes in the blood glucose level, and top three meals with minimal changes in the blood glucose level. As shown in FIG. 34C, the window may display frequently eaten foods and their effects in the blood glucose level. In addition to foods, the window may also display correlations between the blood glucose level and other factors. The other factors can include sleep quality, sleep duration, activity type, and a number of daily steps.

Any of the embodiments described herein (e.g. relating to food analysis, food ontology, and personalized food/health/nutritional recommendations) are also suitable for use with the systems and methods for managing nutritional health as described in U.S. patent application Ser. No. 13/784,845 (published as US 2014/0255882) which is incorporated herein by reference in its entirety.

Calibration Kit

A calibration kit can be used to optimize the platform 200 to a user's physiological responsiveness to different foods. Optimizing the platform 200 may include optimizing the functions of the food analysis system 210, the device/data hub 220, and the insights and recommendation engine 230. As users can respond differently to the same food, and a wearable and/or medical device can have different compatibility to different users, the calibration kit may be used to set a food baseline for all users. Generating the food baseline profile for a user can include monitoring effects of different foods on the user's body as the user consumes one or more pre-packaged meals over a time period. One or more pre-packaged meals can contain known amounts of the foods. The monitored effects can be used to generate the food baseline profile. The calibration kit can be a modular kit. The calibration kit can include a monitoring system (e.g., a glucose monitoring system, a blood test, a genetic test, etc.) and one or more standardized meals (also referred to as “calibration meals”). The calibration meals can include food bars, beverages, or both. The platform 200 may know and have tested all features of the calibration meals (e.g, ingredient, nutrients, processing, etc). In some examples, the user may put the device on the body (or perform the provided monitoring test), and consume one calibration meal per every morning. The user may be required to fast (e.g. for 12 hours) throughout the previous night. The device may measure the user's response to the calibration meal. The user can consume other foods throughout the day and track the foods to the platform 200 via the GUI-based software interface (e.g., food tracking by textual and voice recognition analysis, seamless food image logger, etc.). After a short period of time (e.g., a week), the platform 200 can use the data and predictions to set the baseline for the user. The baseline may be referred to as the user's unique, personalized food “finger print.”

FIG. 35 illustrates an exemplary calibration kit 3500. The calibration kit can contain a first box 3502 that includes a CGM device, a second box 3504 that includes a DNA collection kit for DNA testing (e.g., saliva collection kit), a third box 3506 that includes a biome collection kit for microbiome analysis (e.g., sample collected from gut, genitals, mouth, nose, and/or skin), a fourth box 3508 that includes one or more calibration meals. The calibration kit may optionally include any of the above boxes, or different combinations of the boxes. The calibration kit can include 1, 2, 3, 4, 5, or more calibration foods. The calibration kit can include 2, 3, 4, 5, 6, or more boxes. The calibration kit can include 1, 2, 3, 4, 5 or more monitoring systems. In some embodiments, the calibration kit may include one or more other components/devices (e.g. blood testing kits, wearable device, or other biomarker tests/devices) that aid in generating a baseline health status of a user. The calibration kit may also include a detailed list of instructions for a user to easily follow. If necessary, the calibration kit can include a container for each collection kit.

Example 1

End users of the insights and recommendation engine 230 and other features of the platform 200 can include healthcare providers. Healthcare providers can use a GUI-based software interface (e.g. a web portal) to monitor and study the effects that different foods may have on patients' bodies. The healthcare providers and the patients can share or exchange information by each connecting to the platform 200 as a hub. In some examples, the web portal can be used to monitor patients with type 2 and pre-diabetes. FIG. 36 illustrates an exemplary window 3600 of the web portal for the healthcare providers. On the web portal, as shown in the window 3600, the healthcare providers can invite new patient 3610, count a number of active patients that have accepted the invitation 3620, track how a number of meals 3630 and/or activities 3640 have been logged by the participants. The healthcare providers can select the meals 3630 and/or activities 3640 image on the window 3600 to connect to at least one additional window (not shown in FIG. 35) and access more data and analyses.

A new user (e.g., a patient participant for a healthcare provider) can receive an invitation from the healthcare provider to install a GUI-based software interface (e.g. mobile application) on a user device (e.g. a smart phone). FIG. 37 (parts A through F) illustrate exemplary windows 3710-3760 of a mobile application on a user device. Subsequent to initiating the application for the first time (window 3710), the mobile application can ask the user to create an account (window 3720), input basic information about the patient, including weight (window 3730), height (not shown), gender (window 3740), diabetes therapy type, if any (window 3750), and enable access to other features on the user device (window 3760). The other features can include the notification function of the user device, or other mobile applications for health monitoring, motion detecting, photography, etc. Enabling access to the other features can automate multiple processes of the platform 200 and reduce its dependency on user input.

The user can also receive a Calibration Kit to initiate optimization of the platform 200 to the patient's physiological responsiveness. The user can use one or more devices and consume one or more calibration foods included in the calibration kit for an initial short program (e.g., 1 week) for baseline collection. The user can also consume and track other foods and/or beverages. The platform 200 can use data and predictions generated from the initial short program to generate a baseline for the user. The baseline can reflect the user's physiological responsiveness to foods. FIG. 38 (parts A through C) illustrate exemplary windows 3810, 3820, and 3830 of the mobile application on the user device for baseline data collection. Data recorded can include meals (e.g., breakfast, lunch, dinners, etc.; window 3810), daily activities (e.g., sleep, steps, etc; not shown), and additional biomarkers (e.g. glucose level, insulin level, heart rate, etc.; window 3820). Data can be recorded by user input or by using trackers, such as wearable devices. During the baseline collection, the user can have access to summary of the data collected (insights). The summary can include a number of meals, walking steps, sleeping hours, etc.

FIG. 39 (parts A through D) illustrate exemplary windows 3910-3940 of the mobile application on the user device showing food image logging interface. The user can select a meal (e.g., breakfast, lunch, or dinner; window 3910). For accurate logging and baseline data collection, the mobile application can allow the user to record snacks. The user can be prompted to the window 3920 to take a picture of a food item to be consumed (e.g., an apple). The picture of the food item can be recorded, and the mobile application can ask the user to input description and meal time for the food item (window 3930). If the user records the food item for breakfast, then the mobile application may check off breakfast from a list of meals to be recorded (window 3940).

After completion of the baseline data collection, the user can have access to a report of analyses and insights generated by the insights and recommendation engine 230. If the user is linked to a healthcare provider system or a health-related study, then the healthcare provider or a coordinator of the health-related study may have access to a portion or the entirety of the report. FIGS. 40A and 40B illustrate exemplary windows of a GUI-based software interface showing the report on the user's data. In some examples, the report may focus on factors that can influence blood glucose level: food, activity, and sleep. The report can inform a pre-defined target glucose level range (e.g. 70-170 mg/dL) for the user along with the user's average glucose level. The report can include assessment of one or more meals based on how the user's glucose level responded to one or more meals. The assessment can utilize a rating system (e.g., “A” for a balanced glucose response, “F” for a poor glucose response, etc.). The report can also show recommendations generated by the insights and recommendation engine 230. The recommendation can compare two food items consumed by the user and suggest if one of the two food items is a healthier option than the other of the two food items based on the user's physiological responses. In some examples, a recommendation can compare two types of breads (whole grain bread vs. white bread) and recommend swapping white brad for whole grain alternatives. Another recommendation can compare two types of desserts (ice cream and fruits vs. walnut-stuffed dates) and suggest that the walnut-stuff dates may be a healthier option than the ice cream fruits. A different recommendation can compare two types of beverages (a honey-containing drink vs. an artificial sweeteners-containing drink) and recommend swapping artificial sweeteners for a tablespoon of honey. Additionally, the report can show a correlation between to number of steps the user took and the user's respective average blood glucose level. The report can inform the user that the user's blood glucose level decreased from 132 mg/dL to 125 mg/dL when the user increased the number of steps from less than 3,000 steps to more than 10,000 steps per day. Furthermore, the report can show a correlation between sleep and blood glucose level. The report can inform the user that the user's blood glucose level decreased from 129 mg/dL to 115 mg/dL when the user increased a number of hours of sleep from less than 6 hours to more than 8 hours.

While preferred embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the present disclosure. It should be understood that various alternatives to the embodiments described herein may be employed. It is intended that the following claims define the scope of the disclosure, and that methods and structures within the scope of these claims and their equivalents be covered thereby.

SYSTEMS AND METHODS FOR FOOD ANALYSIS, PERSONALIZED RECOMMENDATIONS AND HEALTH MANAGEMENT

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATION

Provisional Applications (1)