The present invention, in some embodiments thereof, relates to medicine and, more particularly, but not exclusively, to a method and system for predicting gestational diabetes.
Gestational diabetes mellitus (GDM) is a common medical complication of pregnancy, in which pregnant women without previous diagnosis of diabetes develop glucose intolerance, posing increased risk of short and long term complications for both mother and offspring. If undiagnosed, GDM may cause complication of pregnancy that can increase the risk of a number of maternal-fetal disorders, including macrosomia, shoulder dystocia or birth injury, premature delivery, and preeclampsia. In addition to the increased risk of complications associated with gestation and delivery, there are also serious postnatal complications associated with GDM. For instance, women with GDM may have diabetes immediately after pregnancy, and women who have had GDM have a higher chance of developing diabetes within the next years. Children of mothers with GDM have a higher risk of developing type-2 diabetes in later life. Thus, untreated GDM contributes to the overall diabetic population in both the short and long term. Currently, GDM is typically diagnosed 24-28 weeks of gestation.
According to an aspect of some embodiments of the present invention there is provided a method of predicting likelihood for gestational diabetes. The method comprises: obtaining a plurality of parameters characterizing a female subject; accessing a computer readable medium storing a machine learning procedure trained for predicting likelihoods for gestational diabetes; feeding the procedure with the plurality of parameters; and receiving from the procedure an output indicative of a likelihood that the subject has, or expected to develop, gestational diabetes, wherein the output indicative is related non-linearly to the parameters.
According to some embodiments of the invention the plurality of parameters comprises at least one parameter extracted from an electronic health record associated with the subject.
According to some embodiments of the invention the method comprises presenting to the subject, by a user interface, a questionnaire and a set of questionnaire controls, receiving a set of response parameters entered by the subject using the questionnaire controls, wherein the plurality of parameters comprises the response parameters.
According to some embodiments of the invention the plurality of parameters comprises at least one parameter extracted from a body liquid test applied to the subject.
According to some embodiments of the invention the plurality of parameters comprises at least one parameter extracted from a diagnosis previously recorded for the subject.
According to some embodiments of the invention the plurality of parameters comprises at least one parameter indicative of a pharmaceutical prescribed for the subject.
According to some embodiments of the invention the female subject is pregnant.
According to some embodiments of the invention the method wherein the pregnant subject is at less than 12 weeks or less than 11 weeks or less than 10 weeks or less than 9 weeks or less than 8 weeks or less than 7 weeks or less than 6 weeks or less than 5 weeks gestation According to some embodiments of the invention the pregnant subject is at less than 8 weeks gestation.
According to some embodiments of the invention the pregnant subject is at less than 6 weeks gestation.
According to some embodiments of the invention the plurality of parameters comprises a result of a blood glucose test applied to the subject.
According to some embodiments of the invention the plurality of parameters comprises an absolute neutrophil count (NEUT.abs) obtained from a blood of the subject.
According to some embodiments of the invention the plurality of parameters comprises white blood cells count WBC, obtained from a blood of the subject.
According to some embodiments of the invention the plurality of parameters comprises a result of a blood glucose test applied to the subject within about 1 year before the pregnancy.
According to some embodiments of the invention the female subject is not pregnant.
According to some embodiments of the invention the female subject is undergoing an assisted reproduction treatment.
According to some embodiments of the invention the assisted reproduction treatment is selected from the group consisting of in vitro fertilization (IVF), Gamete Intrafallopian Transfer Procedure (GIFT), Zygote Intrafallopian Transfer Procedure (ZIFT), Intracytoplasmic Sperm Injection (ICSI), Intrauterine Insemination (IUI), and Therapeutic Donor Insemination (TDI).
According to some embodiments of the invention the plurality of parameters comprises at least one parameter indicative of a glucose tolerance test applied to the subject.
According to some embodiments of the invention the subject has been previously pregnant, and wherein the plurality of parameters comprises a result of a glucose tolerance test applied to the subject during a previous pregnancy.
According to some embodiments of the invention the subject has been previously pregnant, and wherein the plurality of parameters comprises a result of a blood glucose test applied to the subject during a previous pregnancy.
According to some embodiments of the invention the previous pregnancy is a most recent previous pregnancy.
According to some embodiments of the invention the previous pregnancy is a next-to-most recent previous pregnancy.
According to some embodiments of the invention the plurality of parameters comprises at least 10 or at least 20 or at least 30 or at least 40 or at least 50 or at least 100 or at least 200 or at least 300 or at least 400 or at least 500 or at least 1,000 or at least 1,500 or more of the parameters listed in Table 6.1.
According to some embodiments of the invention the plurality of parameters comprises at least 10 or at least 12 or at least 14 or at least 16 of the parameters that are listed at lines 1-40 more preferably lines 1-30 more preferably lines 1-20 of Table 6.1.
According to some embodiments of the invention the plurality of parameters comprises at least 20 or at least 22 or at least 24 or at least 26 or at least 30 or at least 32 or at least 34 or at least 36 of the parameters that are listed at lines 1-50 more preferably lines 1-45 more preferably lines 1-40 of Table 6.1.
According to some embodiments of the invention the plurality of parameters comprises least 50 or at least 60 or at least 70 or at least 80 or at least 90 of the parameters that are listed at lines 1-300 more preferably lines 1-200 more preferably lines 1-100 of Table 6.1.
According to an aspect of some embodiments of the present invention there is provided a method of predicting likelihood for gestational diabetes. The method comprises: presenting on a user interface a questionnaire and a set of questionnaire controls, and receiving from the user interface a set of response parameters entered using the questionnaire controls, wherein the set of response parameters characterizes a female subject; accessing a computer readable medium storing a machine learning procedure trained for predicting likelihoods for gestational diabetes; feeding the procedure with the set of parameters; and receiving from the procedure an output indicative of a likelihood that the female subject has, or expected to develop, gestational diabetes, wherein the output indicative is related non-linearly to the parameters.
According to an aspect of some embodiments of the present invention there is provided a method of determining whether to apply a glucose tolerance test (GTT) to a female subject that has been previously pregnant. The method comprises: obtaining a plurality of parameters characterizing a female subject, wherein the plurality of parameters comprises a result of a GTT applied to the subject during a previous pregnancy; accessing a computer readable medium storing a machine learning procedure trained for predicting likelihoods for gestational diabetes; feeding the procedure with the plurality of parameters; receiving from the procedure an output indicative of a likelihood that the subject has, or expected to develop, gestational diabetes, wherein the output indicative is related non-linearly to the parameters; and when the likelihood is below a predetermined threshold, generating an output recommending not to apply the GTT to the subject.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
Implementation of the method and/or system of embodiments of the invention can involve performing or completing selected tasks manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of embodiments of the method and/or system of the invention, several selected tasks could be implemented by hardware, by software or by firmware or by a combination thereof using an operating system.
For example, hardware for performing selected tasks according to embodiments of the invention could be implemented as a chip or a circuit. As software, selected tasks according to embodiments of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In an exemplary embodiment of the invention, one or more tasks according to exemplary embodiments of method and/or system as described herein are performed by a data processor, such as a computing platform for executing a plurality of instructions. Optionally, the data processor includes a volatile memory for storing instructions and/or data and/or a non-volatile storage, for example, a magnetic hard-disk and/or removable media, for storing instructions and/or data. Optionally, a network connection is provided as well. A display and/or a user input device such as a keyboard or mouse are optionally provided as well.
Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced.
In the drawings:
The present invention, in some embodiments thereof, relates to medicine and, more particularly, but not exclusively, to a method and system for predicting gestational diabetes.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The processing operations of the present embodiments can be embodied in many forms. For example, they can be embodied in on a tangible medium such as a computer for performing the operations. They can be embodied on a computer readable medium, comprising computer readable instructions for carrying out the method operations. They can also be embodied in electronic device having digital computer capabilities arranged to run the computer program on the tangible medium or execute the instruction on a computer readable medium.
Computer programs implementing the method according to some embodiments of this invention can commonly be distributed to users on a distribution medium such as, but not limited to, CD-ROM, flash memory devices, flash drives, or, in some embodiments, drives accessible by means of network communication, over the internet (e.g., within a cloud environment), or over a cellular network. From the distribution medium, the computer programs can be copied to a hard disk or a similar intermediate storage medium. The computer programs can be run by loading the computer instructions either from their distribution medium or their intermediate storage medium into the execution memory of the computer, configuring the computer to act in accordance with the method of this invention. Computer programs implementing the method according to some embodiments of this invention can also be executed by one or more data processors that belong to a cloud computing environment. All these operations are well-known to those skilled in the art of computer systems. Data used and/or provided by the method of the present embodiments can be transmitted by means of network communication, over the internet, over a cellular network or over any type of network, suitable for data transmission.
The method begins at 10 and continues to 11 at which a plurality of parameters characterizing a female subject is obtained. The inventors discovered that the likelihood for gestational diabetes can be predicting both during the pregnancy and before the pregnancy.
Thus, in some embodiments of the present invention the female subject is pregnant. Preferably, the pregnant subject is at less than 12 weeks or less than 11 weeks or less than 10 weeks or less than 9 weeks or less than 8 weeks or less than 7 weeks or less than 6 weeks or less than 5 weeks gestation.
Alternatively, the female subject is not pregnant, for example, the female subject can be a female subject that desires to be pregnant, or that is expected to be pregnant. In some embodiments of the present invention the female subject is undergoing an assisted reproduction treatment, such as, but not limited to, in vitro fertilization (IVF), Gamete Intrafallopian Transfer Procedure (GIFT), Zygote Intrafallopian Transfer Procedure (ZIFT), Intracytoplasmic Sperm Injection (ICSI), Intrauterine Insemination (IUI), or Therapeutic Donor Insemination (TDI).
At least one of the parameters that are obtained at 11, more preferably more than one of these parameters, more preferably at least 10 or at least 20 or at least 30 or at least 40 or at least 50 or at least 100 or at least 200 or at least 300 or at least 400 or at least 500 or at least 1,000 or at least 1,500 or more of the parameters are extracted from an electronic health record associated with the subject. Parameters extracted from an electronic health record can include, but are not limited to, anthropometric parameters (e.g., height, weight, body mass index), blood pressure measurements, blood and urine laboratory tests, diagnoses recorded by physicians, and/or pharmaceuticals prescribed to the subject.
A list of parameters from which the parameters can be selected is provided in Table 6.1 of the Examples section that follows. In some embodiments of the present invention at least 10 or at least 20 or at least 30 or at least 40 or at least 50 or at least 100 or at least 200 or at least 300 or at least 400 or at least 500 or at least 1,000 or at least 1,500 or more of the parameters are selected from the parameters listed in Table 6.1. Preferably, but not necessarily, at least 10 or at least 12 or at least 14 or at least 16 of the parameters are selected from the parameters that are listed at lines 1-40 more preferably lines 1-30 more preferably lines 1-20 of Table 6.1. In some embodiments, at least 20 or at least 22 or at least 24 or at least 26 or at least 30 or at least 32 or at least 34 or at least 36 of the parameters are selected from the parameters that are listed at lines 1-50 more preferably lines 1-45 more preferably lines 1-40 of Table 6.1. In some embodiments, at least 50 or at least 60 or at least 70 or at least 80 or at least 90 of the parameters are selected from the parameters that are listed at lines 1-300 more preferably lines 1-200 more preferably lines 1-100 of Table 6.1.
Also contemplated are embodiments in which the parameters are selected from a set of response parameters that are provided by the subject, or on behalf of the subject, by responding to a questionnaire presented to the subject, or to someone on behalf of the subject. These parameters can include anthropometric parameters (e.g., height, weight, body mass index), a parameter indicative of the age of the subject, one or more parameters indicative of history of diabetes for the subject and for close (e.g., first-degree) relatives of the subject, one or more parameters indicative of diagnoses the subject is aware of (e.g., high cholesterol, heart disease, polycystic ovary syndrome, GDM, high blood pressure), one or more parameters indicative of blood test results the subject is aware of (e.g., Hemoglobin A1c test), pregnancy history, and results of GTT (if taken) during previous pregnancies. A representative example of a questionnaire that can be presented to the subject is shown in
In some embodiments of the present invention the parameters include only parameters extracted from an electronic health record associated with the subject, in some embodiments of the present invention the parameters include only response parameters that are provided by the subject, or on her behalf, and in some embodiments of the present invention the parameters include both parameters extracted from an electronic health record associated with the subject, and response parameters that are provided by the subject or on her behalf.
The number of parameters that are extracted from an electronic health record associated with the subject is preferably at least 10 or at least 20 or at least 30 or at least 40 or at least 50 or at least 100 or at least 200 or at least 300 or at least 400 or at least 500 or at least 1,000 or at least 1,500 or more. The number of response parameters that are provided by the subject or on her behalf is preferably 20 or less, or 15 or less, or 10 or less. The advantage of this embodiment is that a relative small number of parameter allows the subject to manually respond to the questionnaire at a relatively short time.
When the parameters include both parameters extracted from an electronic health record associated with the subject, and response parameters that are provided by the subject or on her behalf, the number of parameters that are extracted from an electronic health record is optionally and preferably significantly larger (e.g., at least 2 or at least 4 or at least 6 or at least 8 or at least 10 or at least 20 or at least 40 times larger) than the number of response parameters that are provided by the subject or on her behalf.
In some embodiments of the present invention at least one of the parameters is extracted from a body liquid test applied to the subject. Representative examples of body liquid tests from which a parameter can extracted according to some embodiments of the present invention include, without limitation, 17-OH-PROGESTERONE, A.B2 GLYCOPROTEI IgG, A.B2 GLYCOPROTEI IgM, ALBUMIN, ALK. PHOSPHATASE, ALY %, ALY, AMYLASE, ANISO-F, ANTI BODY SCREEN I, ANTI CARDIOLIPIN IgG, ANTI CARDIOLIPIN IgM, ANTI THYROID PEROXID, ANTINUCLEAR Ab_(ANA), ANTITHROMBIN-III, APTT-R, APTT-sec, BASO %, BASO abs, BILIRUBIN INDIRECT, BILIRUBIN TOTAL, BILIRUBIN-U STRIP, BILIRUBIN-DIRECT, BLAST-F, BLOOD TYPE, C-REACTIVE PROTEIN, CALCIUM, CH, CHLORIDE, CHOLESTEROL, CHOLESTEROL-HDL, CHOLESTEROL-LDL calc, CHOLESTEROL/HDL, CK-CREAT.KINASE(CPK), CMV AVIDITY, CMV IgG (Add VIDAS), CMV IgM(Add.-VIDAS), CMV IgM, COMPLEMENT C3, COMPLEMENT C4, CONTROL PT, CONTROL PTT, CREATININE, CREATININE-U SAMPLE, DHEA SULPHATE, DNA (ds) Ab, DNA (ds) Ab, DNA (ds) Ab, EBV IgG-EBNA, EBV IgG-EBNA, EBV IgG-EBNA, EBV VCA IgM, EBV VCA IgM, EBV VCA IgM, EBV VCA_IgG, EBV VCA_IgG, EBV VCA_IgG, EOS %, EOS.abs, EPITHELIAL-SED, ERYTHROCYTES-SED, ERYTHROCYTES-U STRIP, ESR, ESTRADIOL (E-2), FERRITIN, FIBRINOGEN CALCU, FIBRINOGEN, FOLIC ACID, FREE ANDROGEN INDEX, FSH, First isolate, GGT, GLOBULIN, GLOM.FILTR.RATE, GLUCOSE—U STRIP, GLUCOSE, GPT (ALT), HB—other, HB, HCT, HCT/HGB Ratio, HDW, HEMOGLOBIN A, HEMOGLOBIN A1C %, HEMOGLOBIN C, HEMOGLOBIN H, HEMOGLOBIN O, HEMOGLOBIN S, HEMOLYTIC, HEPATITIS Bs Ab, HEPATITIS Bs Ag, HEPATITIS C Ab, HYPER %, HbA, HbA2, HbF, ICTERIC, IRON, KETONES-U STRIP, LDH, LEFT SHIFT, LEUCOCYTES—U STRIP, LEUCOCYTES-SED, LH, LI, LI, LIC %, LIC, LIPASE, LIPEMIC, LUC abs, LUC %, LYM %, LYMP.abs, MACRO %, MAGNESIUM, MCH, MCHC, MCV, MICRO %, MICRO %/HYPO %, MICRO-F, MICROALBUMIN-U SAMP, MICROALBUMIN/CREAT, MONO %, MONO.abs, MPXI, NEUT %, NEUT.abs, NITRITE-U STRIP, NON-HDL_CHOLESTEROL, NORMOBLAST. %, NORMOBLAST.abs, PCT, PDW, PH-U STRIP, PHOSPHORUS, PLATLATE CLUMPS, PLT, POTASSIUM, PROGESTERONE, PROLACTIN, PROT-S ANTIGEN (FREE, PROTEIN C ACTIVITY, PROTEIN-U SAMPLE, PROTEIN-URINE 24 h, PROTEIN-TOTAL, PT %, PT-INR, PT-SEC, RAPID PL.REAGIN-VDRL, RBC, RDW, RDW-CV, RDW-SD, RETICUL. COUNT abs, RETICULOCYTES COUNT %, RHEUMATOID FACTOR, RUBELLA Ab IgG, SHBG, SODIUM, SPECIFIC GRAV-U STRI, T3-FREE, TESTOSTERONE-TOTAL, THYROGLOBULIN Ab, TOXOPLASMA IgG, TOXOPLASMA IgM, TRANSFERRIN, TRIGLYCERIDES, TSH, UREA, URIC ACID, UROBILINOGEN-U STRIP, Urine culture, VARICELLA ZOSTER IgG, VARICELLA ZOSTER IgM, VITAMIN B12, VITAMIN D (25-OH), WBC, wherein the tests are applied during any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows.
In some embodiments of the present invention one or more of the parameters is a result of a blood glucose test applied to said subject.
In some embodiments of the present invention one or more of the parameters is an absolute neutrophil count (NEUT.abs) obtained from a blood of the subject.
In some embodiments of the present invention one or more of the parameters is white blood cells count WBC, obtained from a blood of the subject.
In some embodiments of the present invention one or more of the parameters is a result of a blood glucose test applied to the subject within about 1 year before the pregnancy.
In some embodiments of the present invention the parameters comprise at least one parameter extracted from a clinical or hospital diagnosis previously recorded for the subject. Representative examples of clinical and hospital diagnoses which can be used as parameters according to some embodiments of the present invention include, without limitation, ABDOMINAL PAIN, ABORTION INDUCED BY MEDICATION, ABSENCE OF MENSTRUATION, ACCIDENT/INJURY, ACNE, ACQUIRED HYPOTHYROIDISM, ACUTE APPENDICITIS WITHOUT MENTION OF PERITONITIS, ACUTE BRONCHITIS, ACUTE CONJUNCTIVITIS, ACUTE NASOPHARYNGITIS, ACUTE NONSUPPURATIVE OTITIS MEDIA, ACUTE PHARYNGITIS, ACUTE SINUSITIS, ACUTE TONSILLITIS, ACUTE UPPER RESPIRATORY INFECTIONS OF MULTIPLE OR UNSP.SITES, ACUTE UPPER RESPIRATORY INFECTIONS OF UNSPECIFIED SITE, AFTERCARE CHECK-UP NO DISEASE, ALLERGIC RHINITIS, ALLERGY, ALOPECIA, ANAL FISSURE, ANEMIA, ANEMIA COMPLICATING PREGNANCY, ANEMIA OTHER/UNSPECIFIED, ANKLE SPRAIN, ANTEPARTUM ANEMIA, ANTEPARTUM INFECTIONS OF GENITOURINARY TRACT, ANTEPARTUM THYROID DYSFUNCTION, ANXIETY STATES, ARTIFICIAL INSEMINATION, ASPIRATION CURETTAGE FOLLOWING DELIVERY OR ABORTION, ASPIRATION OF OVARY, ASTHMA, ASYMPTOMATIC BACTERIURIA IN PREGNANCY, BACK SYMPTOMS/COMPLAINTS, BACKACHE, BENIGN NEOPLASM OF SKIN, BLEPHARITIS, BREAST PAIN, BREAST SCREENING FOR MALIGNANT NEOPLASMS, CALCUL.OF GALLBLADDER WITHOUT CHOLECYSTITIS, CALCULUS OF GALLBLADDER WITHOUT MENTION OF CHOLECYSTITIS, CANDIDIASIS OF VULVA AND VAGINA, CARPAL TUNNEL SYNDROME, CELLULITIS AND ABSCESS OF UNSPECIFIED SITES, CERCLAGE-MC DONALD, CERCLAGE-UNSPECIFIED, CERVICAL INCOMPETENCE, CERVICAL SHORTENING COMPLICATING PREGNANCY, CERVICALGIA, CESAREAN SECTION AND REMOVAL OF FETUS, CHALAZION, CHEST PAIN, CHEST SYMPTOMS/COMPLAINTS, CHRONIC LYMPHOCYTIC THYROIDITIS, COAGULATION DEFECTS, COLITIS, CONSTIPATION, CONSULTATION, CONTACT DERMATITIS AND OTHER ECZEMA, CONTRACEPTIVE MANAGEMENT, CONTUSION OF BACK, CONTUSION OF UNSPECIFIED SITE, CORPUS LUTEUM CYST OR HEMATOMA, COUGH, CYSTITIS, DEBILITY, DERMATOPHYTOSIS OF FOOT, DERMATOPHYTOSIS OF NAIL, DIAGNOSTIC ULTRASOUND, DIAGNOSTIC ULTRASOUND OF ABDOMEN AND RETROPERITONEUM, DIAGNOSTIC ULTRASOUND OF GRAVID UTERUS, DIAGNOSTIC ULTRASOUND OF URINARY SYSTEM, DIARRHEA, DILATION AND CURETTAGE FOLLOWING DELIVERY OR ABORTION, DISEASES AND CONDITIONS OF THE TEETH AND SUPPORT.STRUCTURES, DISTURBANCE OF SKIN SENSATION, DIZZINESS AND GIDDINESS, DYSCHROMIA, DYSPEPSIA AND OTHER SPECIFIED DISORDERS OF FUNCTION OF STOMACH, DYSPLASIA OF CERVIX, DYSPNEA AND RESPIRATORY ABNORMALITIES, DYSURIA, E. T., ENCOUNTER FOR ASSISTED REPRODUCTIVE FERTILITY PROCEDURE CYCLE, ENCOUNTER FOR OTHER SPECIFIED PROCREATIVE MANAGEMENT, ENCOUNTERS FOR ADMINISTRATIVE PURPOSES, ENCOUNTERS FOR DISABILITY EXAMINATION, ENLARGEMENT OF LYMPH NODES, EPILEPSY, EPISTAXIS, EROSION AND ECTROPION OF CERVIX, ESOPHAGITIS, ESSENTIAL HYPERTENSION, EXCESSIVE VOMITING IN PREGNANCY, FAMILY HISTORY OF DIABETES MELLITUS, FAMILY HISTORY OF MALIGNANT NEOPLASM OF BREAST, FEVER, FILL OUT FORMS, FLANK SYMPTOMS/COMPLAINTS, FOLLICULAR CYST OF OVARY, FOLLOW-UP EXAMINATION, GENETIC COUNSELING ON PROCREATIVE MANAGEMENT, GYNECOLOGICAL EXAMINATION, HABITUAL ABORTER, HEADACHE, HEARTBURN, HEMATURIA, HEMORRHAGE OF RECTUM AND ANUS, HEMORRHOIDS, HERPES SIMPLEX WITHOUT MENTION OF COMPLICATION, HORDEOLUM AND OTHER DEEP INFLAMMATION OF EYELID, HYDRONEPHROSIS, HYPEREMESIS GRAVIDARUM WITH METABOLIC DISTURBANCE, HYSTEROSCOPY, I. V. F., IMPACTED CERUMEN, IN VITRO FERTILIZATION OR TEST TUBE PREGNANCY, INFECTIONS OF GENITOURINARY TRACT IN PREGNANCY, INFECTIOUS DIARRHEA, INFECTIVE OTITIS EXTERNA, INFERTILITY, INFERTILITY/SUBFERTILITY, INFLUENZA, INGROWING NAIL, INJECT.OR INFUSION OF OTHER THERAPEUTIC OR PROPHYLACTIC SUBS., INJECTION OF ANTIBIOTIC, INJECTION OF RH IMMUNE GLOBULIN, INJURIES, INSECT BITE, IRON DEFICIENCY ANEMIA, IRREGULAR MENSTRUAL CYCLE, ISSUE OF MEDICAL CERTIFICATE, ISSUE OF REPEAT PRESCRIPTION, KNEE SYMPTOMS/COMPLAINTS, LABORATORY EXAMINATION, LAPAROSCOPIC APPENDECTOMY, LAPAROSCOPIC CHOLECYSTECTOMY 1992, LAPAROSCOPY, LATE EFFECT OF COMPLICATION OF PREGNANCY, LATE EFFECT OF INJURY TO CRANIAL NERVE, LEG.INDUCED ABORTION, LEUKORRHEA, LOW BACK COMPLT.W/O RADIATION, LUMBAGO, MANUAL EXAMINATION OF BREAST, MIGRAINE, MILD HYPEREMESIS GRAVIDARUM, MISSED ABORTION, MOTHER WITH SINGLE LIVEBORN, MYALGIA AND MYOSITIS, MYOPIA, NAUSEA, NAUSEA AND VOMITING, NECK SPRAIN, NO DISEASE, NONSPECIFIC ELEVATION OF LEVELS OF TRANSAMINASE OR LDH, NONSPECIFIC FINDINGS ON EXAMINATION OF BLOOD, NORMAL PREGNANCY, NOTHING ABNORMAL FOUND, OBESITY, OBSERVATION—PATIENT UNDER OBSERVATION, OBSERVATION AND EVALUATION FOR SUSPECTED CONDITIONS, OBSERVATION FOLLOWING OTHER ACCIDENT, OBSERVATION FOR UNSPECIFIED SUSPECTED CONDITION, ORAL APHTHAE, ORAL CONTRACEPTION COUNSELING, ORAL CONTRACEPTION PRESCRIPTION, OTALGIA, OTHER, OTHER ABNORMAL PRODUCTS OF CONCEPTION, OTHER ACNE, OTHER ADVANCED MATERNAL AGE, OTHER AND UNSPEC.NONINFECTIOUS GASTROENTERITIS AND COLITIS, OTHER AND UNSPECIFIED ESCHERICHIA COLI, OTHER AND UNSPECIFIED INJURY TO UNSPECIFIED SITE, OTHER AND UNSPECIFIED OVARIAN CYST, OTHER ATOPIC DERMATITIS AND RELATED CONDITIONS, OTHER B-COMPLEX DEFICIENCIES, OTHER DIAGNOSTIC ULTRASOUND, OTHER DISORDERS OF LACRIMAL GLAND, OTHER DYSPNEA AND RESPIRATORY ABNORMALITY, OTHER FETAL CONDITIONS, OTHER GENITOURINARY INSTILLATION, OTHER ILL-DEFINED AND UNKNOWN CAUSES OF MORBIDITY AND MORTALITY, OTHER ILL-DEFINED CONDITIONS, OTHER MALAISE AND FATIGUE, OTHER NONTHROMBOCYTOPENIC PURPURAS, OTHER OVARIAN HYPERFUNCTION, OTHER SPECIFIED DISEASES OF HAIR AND HAIR FOLLICLES, OTHER SPECIFIED NONINFLAMMATORY DISORDERS OF VAGINA, OTHER SPECIFIED VIRAL WARTS, OTHER THALASSEMIA, PAIN IN JOINT, PAIN IN LIMB, PALPITATIONS, PERSONAL HISTORY OF ALLERGY TO ANALGESIC AGENT, PERSONAL HISTORY OF ALLERGY TO OTHER ANTIBIOTIC AGENT, PERSONAL HISTORY OF ALLERGY TO PENICILLIN, PERSONAL HISTORY OF ALLERGY TO SPECIFIED MEDICINAL AGENTS, PITYRIASIS VERSICOLOR, POLYCYSTIC OVARIES, POSTCOITAL BLEEDING, PREGNANCY EXAMINATION OR TEST, PREGNANCY HIGH RISK, PREGNANCY WITH HISTORY OF PRE-TERM LABOR, PREGNANCY: CONFIRMED, PREGNANT STATE, PROTEINURIA, PRURITUS OF GENITAL ORGANS, PYELONEPHRITIS, RASH AND OTHER NONSPECIFIC SKIN ERUPTION, REGIONAL ENTERITIS OF UNSPECIFIED SITE, RELEASE OF TORSION OF OVARY, RENAL COLIC, ROUTINE GENERAL MEDICAL EXAMINATION AT A HEALTH CARE FACILITY, SCANTY OR INFREQUENT MENSTRUATION, SCIATICA, SCREENING FOR OTHER SPECIFIED CONDITIONS, SEBACEOUS CYST, SEBORRHEIC DERMATITIS, SHOULDER SYMPTOMS/COMPLAINTS, SIGNS AND SYMPTOMS IN BREAST, SPONTANEOUS ABORTION, SUPERVISION OF HIGH RISK PREGNANCY RESULTING FROM ASSISTED REPRODUCTIVE TECHNOLOGY, SUPERVISION OF HIGH-RISK PREGN.WITH POOR OBSTETRIC HISTORY, SUPERVISION OF HIGH-RISK PREGNANCY, SUPERVISION OF HIGH-RISK PREGNANCY WITH HISTORY OF ABORTION, SUPERVISION OF NORMAL FIRST PREGNANCY, SUPERVISION OF OTHER HIGH-RISK PREGNANCY, SUPERVISION OF OTHER NORMAL PREGNANCY, SYNCOPE AND COLLAPSE, SYNOVITIS AND TENOSYNOVITIS, TEMPOROMANDIBULAR JOINT DISORDERS, THREATENED ABORTION, THREATENED PREMATURE LABOR, THYROTOXICOSIS WITH OR WITHOUT GOITER, TOBACCO USE DISORDER, TORSION OF OVARY, TWIN PREGNANCY, U.R.I., UNKNOWN CATEGORY, UNKNOWN OR UNSPECIFIED CAUSE OF MORBIDITY, UNSP.VIRAL INFECT.IN CONDITIONS CLASSIF.ELSEWHERE, UNSPECIFIED ABORTION, UNSPECIFIED ACQUIRED HYPOTHYROIDISM, UNSPECIFIED ANTEPARTUM HEMORRHAGE, UNSPECIFIED HEMORRHAGE IN EARLY PREGNANCY, UNSPECIFIED HYPERTROPHIC AND ATROPHIC CONDITIONS OF SKIN, UNSPECIFIED VOMITING OF PREGNANCY, URINARY TRACT INFECTION, URTICARIA, UTERINE LEIOMYOMA, UTERINE SCAR FROM PREV.SURGERY, VAGINAL HEMATOMA, VAGINITIS AND VULVOVAGINITIS, VARICOSE VEINS OF LOWER EXTREMITIES, VIRAL WARTS, VITAMIN D DEFICIENCY, VOICE DISTURBANCE, VOLUME DEPLETION DISORDER, VOMITING, VOMITING/NAUSEA OF PREGNANCY, wherein the diagnoses correspond to any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows.
In some embodiments of the present invention the parameters comprise at least one parameter indicative of a pharmaceutical prescribed for said subject. Representative examples of prescribed pharmaceuticals which can be used as parameters according to some embodiments of the present invention include, without limitation, ACAMOL CPL 500 MG 21, ACAMOLI FRUIT S/F SYR 125 mg/5 mL 100 mL, ACAMOLI STRAW. S/F SYR 125 mg/5 mL 100 mL, ACAMOLI SUP 150 MG 12, ACIDOPHILUS PROBIOTIC CAP BOX 30, ADVIL LIQUI-GELS CAP 200 MG 16, ADVIL LIQUI-GELS CAP 200 MG 40, AEROVENT SOL 0.25 mg/1 mL 20 mL, AFLUMYCIN CR CF 20 GM, AGISTEN 1 VAGINAL TAB 500 MG, AGISTEN ALOEVERA CR 1% 20 GM, AGISTEN V VAG. TAB 200 MG 3, AGISTEN VAG TAB 200 mg 3, AGISTEN VAGINAL CR 2% 20 GM, AKTIFERRIN SOFT CAP 34 mg 30, AKTIFERRIN-F SOFT CAP 30, ALCOHOL SPT 70% 100 ML, ALCOTINT SPT 70% 100 ML, ALLERGYX TAB 10 MG 10, AMOXICLAV TEVA TAB 875 mg 14, ANAESTHETIC AUR 10 mL, AUGMENTIN BID TAB 875 MG 14, AUGMENTIN TAB 500 MG 20, AVAMYS AQ. NASAL 120 INH SPR 27.5 MCG, BABY D3 DRP 2001U 10 ML, BACTROBAN OIN 2% 15 g, BEDODEKA INJ 1000 MCG/1 ML 100, BETACORTEN CR 0.1% 15 GM, BEVITEX SUBLINGUAL TAB 1000 MCG 30, CEFOVIT FORTE CAP 500 mg 20, CETROTIDE VIA 0.25 mg 7, CIPRALEX, CLARINASE REPETABS TAB 14, CLEXANE INJ 40 mg/0.4 mL 2×0.4 mL, CLEXANE SAFETY LOCK INJ 40 MG/0.4 ML 2, CLEXANE SAFETY LOCK INJ 60 MG/0.6 ML 2, CLOTREE-TEVA VAG OVL 200 MG 3, CLOTRIMAZOLE, COLCHICINE TAB 0.5 MG 30, COMAGIS CR CF 15 GM, CRINONE VAG. GEL 8% 15U, DALACIN VAGINAL OVL 100 MG 3, D-DROPS DRP 200 UNIT 10 mL, DECAPEPTYL INJ 0.1 mg/1 mL 7, DERMACOMBIN CR CF 15 GM, DESOREN AUR CF BOT 5 ML, DETHAMYCIN EYE, DEXAMOL COLD 30D+20N CPL, DEXAMOL CPL 500 MG 20, DEX-OTIC AUR CF BOT 5 ML, DICLECTIN DR TAB 10 mg/10 mg 100, DOXYLIN TAB 100 MG 10, D-TABS TAB 400 IU 90, DUPHASTON TAB 10 MG BOX 20, ELASTAN TREAT CR 75 ML, ELOCOM CR 0.1% 15 GM, ELTROXIN TAB 100 MCG 100, ELTROXIN TAB 50 MCG 100, EMLA CR 5% 30 GM, ENDOMETRIN VAGINAL TAB 100 MG BOX 30, ESTROFEM TAB 2 MG 28, EUTHYROX TAB 100 mcg 100, EUTHYROX TAB 50 mcg 100, FAMOTIDINE-TEVA TAB 20 MG 30, FEMINA SOAP LIQ 330 ML, FENISTIL GEL 0.1% 30 GM, FERRIFOL TAB CF BOX 30, FERRIPEL-3 SYR 50 MG/5 ML 110 ML, FERROGRAD FOLIC TAB CF BOX 30, FLIXONASE NASAL SPR 0.05%, FOLEX 400 TAB CF BOX 30, FOLI 5, FOLIC ACID, FORIC PREGNANCY TAB CF BOX 30, FOROL TAB 30, GENTLE IRON CAP 25 MG BOX 90, GESTON INJ 50 mg/1 mL 1 mL, GLYCERIN, GONAL-F-PEN INJ 3001U 0.5 ML, GONAL-F-PEN INJ 4501U 0.75 ML, GONAL-F-PEN INJ 900 IU 1.5 ML, GYNO-DAKTARIN VAG. CAP 400 mg 3, GYNO-PEVARYL VAG TAB 150 MG 3, HYDROAGISTEN CR CF 15 GM, IKACLOMIN TAB 50 MG 10, IRON BIS-GLYCINATE DH CAP 90, IRON PLUS FEMINA, KAMRHO-D I.M. INJ 300 mcg 2 mL, LANTON CAP 30 MG BOX 28, LEMOCIN CHERRY SUGAR FREE LOZ CF 24, LEMOCIN LEMON SUGAR FREE LOZ CF 24, LORATADINE, MAALOX SUS CF BOT 355 ML, MAARAZ 9M:OMEGA-3+PRENATA, MACRODANTIN 29/M CAP 100 MG 30, MAXITROL OCC CF TUB 3.5 GM, MELIANE TAB 21, MENOGON < >< > INJ 751U/751U 10, MENOPUR MULTIDOSE INJ 1200 UNIT, MENOPUR MULTIDOSE INJ 600 UNIT, MENOPUR VIA 751U 10, MICROPIRIN TAB 100 MG 28, MICROPIRIN TAB 75 MG 28, MICROPIRIN TAB 75 mg 30, MOXYPEN CAP 500 MG 10, MOXYPEN CAP 500 mg 20, MOXYVIT CAP 500 MG 20, NASOCORT AQUA NASAL SPR 64 MCG, NORMALAX PWD 240 GM, NUROFEN LIQUID CAP 200 MG BOX 20, NUROFEN LIQUID CAP 200 MG BOX 40, NUROFEN ORANGE CHILD SUS 100 MG/5 ML 100 ML, NUROFEN ORANGE CHILD SUS 100 MG/5 ML 150 ML, NUROFEN STRAWB.CHILD SUS 100 MG/5 ML 150 ML, NUROFEN STRAWB.PED. SUS 100 mg/5 mL 100 mL, NUSSIDEX TAB CF 20, OMEGA D3 9 MONTHS, OMEGA-3 9 MONTHS, OMEPRADEX CPL 20 mg 30, OPTALGIN CPL 500 MG 21, OPTALGIN CPL 500 mg 42, OPTALGIN DRP 500 MG/1 ML 10 ML, OPTALGIN TAB 500 MG 20, ORACORT E PAS TUB 5 g, ORGALUTRAN SYRI.< >< > INJ 0.25 MG 0.5 ML, OTHER, OTIDIN AUR CF 10 GM, OTRIMER SPR 0.9% 50 ML, OTRIVIN MENTHOL M.D SPR 0.1% 10 ML, OTRIVIN NASAL M.D. SPR 0.1% 10 ML, OTRIVONIM BABY WIP PKG 2×36U, OVITRELLE PREFILLED INJ 250 mcg, PAPAVERINE TAB 80 MG 30, PARO TOOTHBRUSH MEDIC NO. 726, PEN-RAFA VK CPL 500 mg 40, PHENIMYXIN COL 8 ML, POLYCUTAN CR 15 GM, PRAMIN INJ 10 MG/2 ML 5, PRAMIN SUP 20 MG 6, PRAMIN TAB 10 MG 30, PREDNISONE TAB 20 MG 30, PREDNISONE TAB 5 MG 30, PRE-GENTLY BADATZ, PRENATAL 9 MONTHS, PRENATAL MULTIVITAMINS D.H TAB 100, PRENATAL MULTIVITAMINS D.H TAB 30, PRENATAL PLUS CF TAB CF BOX 30, PRIMOLUT NOR TAB 5 MG 20, PROCTO-GLYVENOL CR CF TUB 30 GM, PROGESTERON RETARD 29/M INJ 500 MG/2 ML 3, PROGYNOVA TAB 2 MG 28, PUREGON INJ 300 IU 0.36 ML, PUREGON INJ 900 IU, REOLIN EFFERV. TAB 200 MG 30, SEDURAL TAB 100 MG 30, SERETIDE DISKUS INH 50/250 MCG U, STREP A TEST-QUICKVUE IN-LINE, STREPSILS HONEY-LEMON LOZ CF 24, SYMBICORT TURBUHALER INH 160 mcg/4.5 mcg, SYNTHOMYCINE OCC 5% 3.6 GM, TEEJEL GEL CF 10 GM, TELFAST TAB 180 MG 15, TEVACUTAN CR 15 g, TEVADERM CR CF 15 GM, THYMI SYR BOT 100 ML, THYMOLI SYR BOT 100 ML, TIPTIPOT FERRIPEL-3 DRP 50 mg/1 mL 15 mL, TIPTIPOT NOVIMOL SUS 100 mg/1 mL 15 mL, TRIBEMIN TAB 20, TRIDERM CR CF 15 GM, UTROGESTAN ORAL/VAG CAP 100 mg 30, UTROGESTAN ORAL/VAG CAP 200 mg 15, VAXIGRIP VAC, VENTOLIN CFC FREE INH 0.1 MG/INH, VENTOLIN RESPIR. SOL 5 mg/1 mL 20 mL, VIT B1, VIT B12, VIT.B12+FOLIC A.SUBLING. TAB CF 60, VITA-CAL+D, VITAMIDYNE D DRP 200 IU 10 ML, VITAMIN D3 D.H DRP 400 UNIT 15 ML, VOLTAREN EMULGEL GEL 1% 100 GM, YES OR NO DIRECT KIT 1, ZINNAT TAB 250 MG 10, ZINNAT TAB 500 MG 10, ZOFRAN TAB 4 MG 10, ZOFRAN TAB 8 MG 10, ZOVIRAX CR 5% 2 GM, wherein the pharmaceutical are prescribed during any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows.
In some embodiments of the present invention one or more of the parameters is indicative of a GTT applied to the subject. When the subject has been previously pregnant, one of the parameters is a result of a GTT applied to said subject during a previous pregnancy, e.g., the most recent previous pregnancy or the next-to-most recent previous pregnancy.
A list of parameters that relate to GTT and which are contemplated according to some embodiments of the present invention, includes, without limitation, 100 g GTT 0 minutes at the first previous pregnancy, 100 g GTT 0 minutes at the 2nd previous pregnancy, 100 g GTT 0 minutes at 3rd previous pregnancy, 100 g GTT 120 minutes at the 1st previous pregnancy, 100 g GTT 120 minutes at the 2nd previous pregnancy, 100 g GTT 120 minutes at the 3rd previous pregnancy, 100 g GTT 180 minutes at the 1st previous pregnancy, 100 g GTT 180 minutes at the 2nd previous pregnancy, 100 g GTT 180 minutes at the 3rd previous pregnancy, 100 g GTT 60 minutes at the 1st previous pregnancy, 100 g GTT 60 minutes at the 2nd previous pregnancy, 100 g GTT 60 minutes at the 3rd previous pregnancy, 50 g GTT at the 1st previous pregnancy, 50 g GTT at the 2nd previous pregnancy, 50 g GTT at the 3rd previous pregnancy.
In some embodiments of the present invention one or more of the parameters is the Body Mass Index (BMI) of the subject, as measured during any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows.
In some embodiments of the present invention one of the parameters is the number of previous births delivered by the subject.
In some embodiments of the present invention one or more of the parameters is the diastolic blood pressure sampled during any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows, and in some embodiments of the present invention one or more of the parameters is the systolic blood pressure sampled during any of the time windows F0, F1, F2, M1, M2, M3, M4, M5, P1, P2, and P3 defined in the Examples section that follows.
Referring again to
As used herein the term “machine learning” refers to a procedure embodied as a computer program configured to induce patterns, regularities, or rules from previously collected data to develop an appropriate response to future data, or describe the data in some meaningful way.
Representative examples of machine learning procedures suitable for the present embodiments, include, without limitation, clustering, association rule algorithms, feature evaluation algorithms, subset selection algorithms, support vector machines, classification rules, cost-sensitive classifiers, vote algorithms, stacking algorithms, Bayesian networks, decision trees, neural networks, instance-based algorithms, linear modeling algorithms, k-nearest neighbors (KNN) analysis, ensemble learning algorithms, probabilistic models, graphical models, logistic regression methods (including multinomial logistic regression methods), gradient ascent methods, singular value decomposition methods and principle component analysis.
Following is an overview of some machine learning procedures suitable for the present embodiments.
Support vector machines are algorithms that are based on statistical learning theory. A support vector machine (SVM) according to some embodiments of the present invention can be used for classification purposes and/or for numeric prediction. A support vector machine for classification is referred to herein as “support vector classifier,” support vector machine for numeric prediction is referred to herein as “support vector regression”.
An SVM is typically characterized by a kernel function, the selection of which determines whether the resulting SVM provides classification, regression or other functions. Through application of the kernel function, the SVM maps input vectors into high dimensional feature space, in which a decision hyper-surface (also known as a separator) can be constructed to provide classification, regression or other decision functions. In the simplest case, the surface is a hyper-plane (also known as linear separator), but more complex separators are also contemplated and can be applied using kernel functions. The data points that define the hyper-surface are referred to as support vectors.
The support vector classifier selects a separator where the distance of the separator from the closest data points is as large as possible, thereby separating feature vector points associated with objects in a given class from feature vector points associated with objects outside the class. For support vector regression, a high-dimensional tube with a radius of acceptable error is constructed which minimizes the error of the data set while also maximizing the flatness of the associated curve or function. In other words, the tube is an envelope around the fit curve, defined by a collection of data points nearest the curve or surface.
An advantage of a support vector machine is that once the support vectors have been identified, the remaining observations can be removed from the calculations, thus greatly reducing the computational complexity of the problem. An SVM typically operates in two phases: a training phase and a testing phase. During the training phase, a set of support vectors is generated for use in executing the decision rule. During the testing phase, decisions are made using the decision rule. A support vector algorithm is a method for training an SVM. By execution of the algorithm, a training set of parameters is generated, including the support vectors that characterize the SVM. A representative example of a support vector algorithm suitable for the present embodiments includes, without limitation, sequential minimal optimization.
In KNN analysis, the affinity or closeness of objects is determined. The affinity is also known as distance in a feature space between objects. Based on the determined distances, the objects are clustered and an outlier is detected. Thus, the KNN analysis is a technique to find distance-based outliers based on the distance of an object from its kth-nearest neighbors in the feature space. Specifically, each object is ranked on the basis of its distance to its kth-nearest neighbors. The farthest away object is declared the outlier. In some cases the farthest objects are declared outliers. That is, an object is an outlier with respect to parameters, such as, a k number of neighbors and a specified distance, if no more than k objects are at the specified distance or less from the object. The KNN analysis is a classification technique that uses supervised learning. An item is presented and compared to a training set with two or more classes. The item is assigned to the class that is most common amongst its k-nearest neighbors. That is, compute the distance to all the items in the training set to find the k nearest, and extract the majority class from the k and assign to item.
Association rule algorithm is a technique for extracting meaningful association patterns among features.
The term “association”, in the context of machine learning, refers to any interrelation among features, not just ones that predict a particular class or numeric value. Association includes, but it is not limited to, finding association rules, finding patterns, performing feature evaluation, performing feature subset selection, developing predictive models, and understanding interactions between features.
The term “association rules” refers to elements that co-occur frequently within the datasets. It includes, but is not limited to association patterns, discriminative patterns, frequent patterns, closed patterns, and colossal patterns.
A usual primary step of association rule algorithm is to find a set of items or features that are most frequent among all the observations. Once the list is obtained, rules can be extracted from them.
The aforementioned self-organizing map is an unsupervised learning technique often used for visualization and analysis of high-dimensional data. Typical applications are focused on the visualization of the central dependencies within the data on the map. The map generated by the algorithm can be used to speed up the identification of association rules by other algorithms. The algorithm typically includes a grid of processing units, referred to as “neurons”. Each neuron is associated with a feature vector referred to as observation. The map attempts to represent all the available observations with optimal accuracy using a restricted set of models. At the same time the models become ordered on the grid so that similar models are close to each other and dissimilar models far from each other. This procedure enables the identification as well as the visualization of dependencies or associations between the features in the data.
Feature evaluation algorithms are directed to the ranking of features or to the ranking followed by the selection of features based on their impact.
Information gain is one of the machine learning methods suitable for feature evaluation. The definition of information gain requires the definition of entropy, which is a measure of impurity in a collection of training instances. The reduction in entropy of the target feature that occurs by knowing the values of a certain feature is called information gain. Information gain may be used as a parameter to determine the effectiveness of a feature in explaining the response to the treatment. Symmetrical uncertainty is an algorithm that can be used by a feature selection algorithm, according to some embodiments of the present invention. Symmetrical uncertainty compensates for information gain's bias towards features with more values by normalizing features to a [0,1] range.
Subset selection algorithms rely on a combination of an evaluation algorithm and a search algorithm. Similarly to feature evaluation algorithms, subset selection algorithms rank subsets of features. Unlike feature evaluation algorithms, however, a subset selection algorithm suitable for the present embodiments aims at selecting the subset of features with the highest impact on predicting likelihood for gestational diabetes, while accounting for the degree of redundancy between the features included in the subset. The benefits from feature subset selection include facilitating data visualization and understanding, reducing measurement and storage requirements, reducing training and utilization times, and eliminating distracting features to improve classification.
Two basic approaches to subset selection algorithms are the process of adding features to a working subset (forward selection) and deleting from the current subset of features (backward elimination). In machine learning, forward selection is done differently than the statistical procedure with the same name. The feature to be added to the current subset in machine learning is found by evaluating the performance of the current subset augmented by one new feature using cross-validation. In forward selection, subsets are built up by adding each remaining feature in turn to the current subset while evaluating the expected performance of each new subset using cross-validation. The feature that leads to the best performance when added to the current subset is retained and the process continues. The search ends when none of the remaining available features improves the predictive ability of the current subset. This process finds a local optimum set of features.
Backward elimination is implemented in a similar fashion. With backward elimination, the search ends when further reduction in the feature set does not improve the predictive ability of the subset. The present embodiments contemplate search algorithms that search forward, backward or in both directions. Representative examples of search algorithms suitable for the present embodiments include, without limitation, exhaustive search, greedy hill-climbing, random perturbations of subsets, wrapper algorithms, probabilistic race search, schemata search, rank race search, and Bayesian classifier.
A decision tree is a decision support algorithm that forms a logical pathway of steps involved in considering the input to make a decision.
The term “decision tree” refers to any type of tree-based learning algorithms, including, but not limited to, model trees, classification trees, and regression trees.
A decision tree can be used to classify the datasets or their relation hierarchically. The decision tree has tree structure that includes branch nodes and leaf nodes. Each branch node specifies an attribute (splitting attribute) and a test (splitting test) to be carried out on the value of the splitting attribute, and branches out to other nodes for all possible outcomes of the splitting test. The branch node that is the root of the decision tree is called the root node. Each leaf node can represent a classification (e.g., whether a particular parameter influences on the likelihood for gestational diabetes) or a value (e.g., the predicted likelihood for gestational diabetes). The leaf nodes can also contain additional information about the represented classification such as a confidence score that measures a confidence level in the represented classification (i.e., the accuracy of the prediction).
Regression techniques which may be used in accordance with some embodiments the present invention include, but are not limited to linear Regression, Multiple Regression, logistic regression, probit regression, ordinal logistic regression ordinal Probit-Regression, Poisson Regression, negative binomial Regression, multinomial logistic Regression (MLR) and truncated regression.
A logistic regression or log it regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (a dependent variable that can take on a limited number of values, whose magnitudes are not meaningful but whose ordering of magnitudes may or may not be meaningful) based on one or more predictor variables. Logistic regression may also predict the probability of occurrence for each data point. Logistic regressions also include a multinomial variant. The multinomial logistic regression model is a regression model which generalizes logistic regression by allowing more than two discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). For binary-valued variables, a cutoff between the 0 and 1 associations is typically determined using the Yuden Index.
A Bayesian network is a model that represents variables and conditional interdependencies between variables. In a Bayesian network variables are represented as nodes, and nodes may be connected to one another by one or more links. A link indicates a relationship between two nodes. Nodes typically have corresponding conditional probability tables that are used to determine the probability of a state of a node given the state of other nodes to which the node is connected. In some embodiments, a Bayes optimal classifier algorithm is employed to apply the maximum a posteriori hypothesis to a new record in order to predict the probability of its classification, as well as to calculate the probabilities from each of the other hypotheses obtained from a training set and to use these probabilities as weighting factors for future predictions of the likelihood for gestational diabetes. An algorithm suitable for a search for the best Bayesian network, includes, without limitation, global score metric-based algorithm. In an alternative approach to building the network, Markov blanket can be employed. The Markov blanket isolates a node from being affected by any node outside its boundary, which is composed of the node's parents, its children, and the parents of its children.
Instance-based techniques generate a new model for each instance, instead of basing predictions on trees or networks generated (once) from a training set.
The term “instance”, in the context of machine learning, refers to an example from a dataset.
Instance-based techniques typically store the entire dataset in memory and build a model from a set of records similar to those being tested. This similarity can be evaluated, for example, through nearest-neighbor or locally weighted methods, e.g., using Euclidian distances. Once a set of records is selected, the final model may be built using several different techniques, such as the naive Bayes.
Neural networks are a class of algorithms based on a concept of inter-connected “neurons.” In a typical neural network, neurons contain data values, each of which affects the value of a connected neuron according to connections with pre-defined strengths, and whether the sum of connections to each particular neuron meets a pre-defined threshold. By determining proper connection strengths and threshold values (a process also referred to as training), a neural network can achieve efficient recognition of images and characters. Oftentimes, these neurons are grouped into layers in order to make connections between groups more obvious and to each computation of values. Each layer of the network may have differing numbers of neurons, and these may or may not be related to particular qualities of the input data.
In one implementation, called a fully-connected neural network, each of the neurons in a particular layer is connected to and provides input value to those in the next layer. These input values are then summed and this sum compared to a bias, or threshold. If the value exceeds the threshold for a particular neuron, that neuron then holds a positive value which can be used as input to neurons in the next layer of neurons. This computation continues through the various layers of the neural network, until it reaches a final layer. At this point, the output of the neural network routine can be read from the values in the final layer. Unlike fully-connected neural networks, convolutional neural networks operate by associating an array of values with each neuron, rather than a single value. The transformation of a neuron value for the subsequent layer is generalized from multiplication to convolution.
The machine learning procedure used according to some embodiments of the present invention is a trained machine learning procedure, which provides output that is related non-linearly to the parameters with which it is fed.
A machine learning procedure can be trained according to some embodiments of the present invention by feeding a machine learning training program with parameters that characterizes each of a cohort of female subjects that has been diagnosed as either having or not having gestational diabetes. Once the data are fed, the machine learning training program generates a trained machine learning procedure which can then be used without the need to re-train it.
For example, when it is desired to employ decision trees, machine learning training program learns the structure of each tree in a plurality of decision trees (e.g., how many nodes there are in each tree, and how these are connected to one another), and also selects the decision rules for split nodes of each tree. At least a portion of the decision rules relate to one or more of the parameters that characterize the female subject. A simple decision rule may be a threshold for the value of a particular parameter, but more complex rules, relating to more than one parameters are also contemplated. The machine learning training program also accumulates data at the leaves of the trees. The structures of the trees, the decision rules for the split nodes, and the data at the leaves are all selected by the machine learning training program, automatically and typically without user intervention, such that the parameters at the root of the trees provide the likelihood for gestational diabetes at the leaves of the trees. The final result of the machine learning training program in this case is a set of trees, where the structures, the decision rules for split nodes, and leaf data for each trees are defined by the machine learning training program.
The method proceeds to 13 at which the trained machine learning procedure is fed with the parameters, and to 14 at which an output indicative of the likelihood that the subject has, or expected to develop, gestational diabetes, is received from the procedure. In some embodiments of the present invention the method proceeds to 15 at which a report predating to the likelihood is generated. The report can be displayed on a display device or transmitted to a computer readable medium.
The method ends at 16.
The inventors found that selected operations of the method can also be used for screening. For example, the method can be used for determining whether to apply a GTT to a female subject that has been previously pregnant. In these embodiments, the parameters that are obtained at 11 comprise a result of a GTT applied to the subject during a previous pregnancy, and the likelihood that is received from the procedure is used for determining whether or not to apply the GTT to the subject. Specifically, when the likelihood is below a predetermined threshold (e.g., below 0.6 or below 0.5 or below 0.4 or below 0.3 or below 0.2 or below 0.1), the method can generate an output recommending not to apply the GTT to the subject, and when the likelihood is above the predetermined threshold, the method can generate an output recommending to apply the GTT to the subject. As demonstrated in the Example section that follows, it was unexpectedly found by the inventors that a GTT, e.g., a 1 h 50 g GTT, performed in previous pregnancies is far more predictive than, for example, a history of GDM. The advantage of using GTT in previous pregnancies as a predictor for the likelihood is that the method of the present embodiments is more cost-effective and efficient than the GTT, and can therefore be used as a selective screening method. The inventors found that avoiding 50% of the GTTs of patients who previously did a GTT would result in only 5% miss rate when diagnosing GDM according to the traditional guidelines. Accurate selective screening is advantageous since it can both reduces costs and physical inconvenience for women at low risk for GDM development.
The prediction of likelihood for gestational diabetes can be executed according to some embodiments of the present invention by a server-client configuration, as will now be explained with reference to
GUI 42 and processor 32 can be integrated together within the same housing or they can be separate units communicating with each other. GUI 42 can optionally and preferably be part of a system including a dedicated CPU and I/O circuits (not shown) to allow GUI 42 to communicate with processor 32. Processor 32 issues to GUI 42 graphical and textual output generated by CPU 36. Processor 32 also receives from GUI 42 signals pertaining to control commands generated by GUI 42 in response to user input. GUI 42 can be of any type known in the art, such as, but not limited to, a keyboard and a display, a touch screen, and the like. In preferred embodiments, GUI 42 is a GUI of a mobile device such as a smartphone, a tablet, a smartwatch and the like. When GUI 42 is a GUI of a mobile device, the CPU circuit of the mobile device can serve as processor 32 and can execute the method optionally and preferably by executing code instructions.
Client 30 and server 50 computers can further comprise one or more computer-readable storage media 44, 64, respectively. Media 44 and 64 are preferably non-transitory storage media storing computer code instructions for executing the method of the present embodiments, and processors 32 and 52 execute these code instructions. The code instructions can be run by loading the respective code instructions into the respective execution memories 38 and 58 of the respective processors 32 and 52. Storage media 64 preferably also store one or more databases including a database of psychologically annotated olfactory perception signatures as further detailed hereinabove.
In operation, processor 32 of client computer 30 displays on GUI 42 a questionnaire and a set of questionnaire controls, such as, but not limited to, a slider, a dropdown menu, a combo box, a text box and the like. A representative example of a displayed questionnaire 60 and a set of controls 62 is shown in
Processor 32 receives the response parameters from GUI 42 and typically transmits these parameters to server computer 50 over network 40. Media 64 can store a machine learning procedure trained for predicting likelihoods for gestational diabetes. Server computer 50 can access media 64, feed the stored procedure with the parameters received from client computer 30, and receive from the procedure an output indicative of the likelihood that the female subject that is characterized by the parameters has, or is expected to develop, gestational diabetes. Server computer 50 can also transmit to client computer 30 the obtained likelihood, and client computer 30 can display this information on GUI 42.
As used herein the term “about” refers to ±10%.
The word “exemplary” is used herein to mean “serving as an example, instance or illustration.” Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments.” Any particular embodiment of the invention may include a plurality of “optional” features unless such features conflict.
The terms “comprises”, “comprising”, “includes”, “including”, “having” and their conjugates mean “including but not limited to”.
The term “consisting of” means “including and limited to”.
The term “consisting essentially of” means that the composition, method or structure may include additional ingredients, steps and/or parts, but only if the additional ingredients, steps and/or parts do not materially alter the basic and novel characteristics of the claimed composition, method or structure.
As used herein, the singular form “a”, “an” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a compound” or “at least one compound” may include a plurality of compounds, including mixtures thereof.
Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
As used herein the term “method” refers to manners, means, techniques and procedures for accomplishing a given task including, but not limited to, those manners, means, techniques and procedures either known to, or readily developed from known manners, means, techniques and procedures by practitioners of the chemical, pharmacological, biological, biochemical and medical arts.
As used herein, the term “treating” includes abrogating, substantially inhibiting, slowing or reversing the progression of a condition, substantially ameliorating clinical or aesthetical symptoms of a condition or substantially preventing the appearance of clinical or aesthetical symptoms of a condition.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below find experimental support in the following examples.
Reference is now made to the following examples, which together with the above descriptions illustrate some embodiments of the invention in a non limiting fashion.
GDM is defined as glucose intolerance that is first recognized in pregnancy. GDM is a common complication of pregnancy, occurring in 3%-9% of pregnancies [1], typically diagnosed between 24-28 weeks of gestation [2]. GDM is associated with short and long term clinical outcomes, affecting both mothers and infants. Mothers with GDM have a higher chance for an operative delivery and are more likely to develop type 2 diabetes [3]. Offsprings of diabetic mothers are predisposed to fetal macrosomia, respiratory difficulties and metabolic complications in the neonatal period and have a higher risk for future obesity and alteration in glucose metabolism [4-6].
The rising prevalence of GDM, reflective of the prevalence of type 2 diabetes, warrant the development of new prevention strategies [7]. Although results from randomized controlled trials aimed at prevention of GDM with nutritional and lifestyle interventions are conflicting [8], some studies demonstrated that a major reduction of the risk is possible, especially when interventions initiate early in pregnancy, during the first or early second trimester [9-10]. Identification of women at a high risk for GDM development at an early stage of pregnancy will therefore enable the implementation of early intervention strategies, which may prevent or reduce GDM prevalence and its associated comorbidities.
In recent years, several studies have utilized electronic health records (EHRs) to construct prediction models for mortality [11,12] and disease onset [13-15]. Despite numerous research studies on risk factors for GDM development [16], no predictive model has been established in clinical practice to date. In this Example, a model for GDM prediction is constructed based on nationwide EHR data. The performance of the model is evaluated from pregnancy initiation up to 20 weeks of gestation.
Data were extracted from the database of Clalit Health Service, the largest health care provider in Israel. Nearly five million individuals, representing over 50% of Israel's adults population, are currently enrolled in Clalit [17], a non-governmental, non-profit organization included in the national health insurance law in Israel. Dating back to 2002, the database contains EHRs of over 11 million patients, with over 5.4 billion numerical and categorical entries. The data analyzed included anthropometrics (height and weight), blood pressure measurements, blood and urine lab tests, diagnoses recorded by physicians, and pharmaceuticals prescribed. Most of the data originates from community clinics records, but records from Clalit's hospitals were also included in the analysis.
In Israel, GDM is defined by a two-step oral glucose tolerance test (GTT) which is performed routinely to all pregnant women during 24-28 weeks of pregnancy according to National Institutes of Health (NIH) guidelines [18]. In the first step, a 50 g, 1 hour GTT is performed; women with glucose levels higher than 200 mg/dL receive a GDM diagnosis. Women with glucose values above 140 mg/dL are referred to the second step, in which an additional 100 g, 3 hours GTT is performed. Women with two glucose measurements above the thresholds of 95, 180, 155 and 140 mg/dL under fasting conditions, one, two and three hours after glucose intake, respectively, also receive a GDM diagnosis [2,19].
Accordingly, GDM status was defined based on the GTT results. In cases in which more than one GTT was available, a GDM diagnosis was defined if at least one of the tests were positive. Women who were supposed to undergo a 100 g GTT due to a high result on the 50 g GTT, but had no record of the test results, were excluded. Women with pre-pregnancy record of diabetes determined by a pre-pregnancy Hemoglobin A1c (HbA1c) blood test above 6.4% or a diabetes diagnosis were also excluded. In total, 588,744 pregnancies of 368,381 women were included in the cohort (see
Prior to any analysis of the data, the study population was split to a training set and a test set. To emulate practical use, the test set was defined according to a temporal validation scheme [20]. Pregnancies that ended during 2017 or 2018 composed the test set, and pregnancies that ended before Dec. 31, 2016 composed the training set. This choice thus represents a setting in which the model may be implemented in practice. Throughout this Example, all results are reported on the test set, unless stated otherwise. Data and cohort characteristics are shown in
Currently there are no validated GDM prediction tools that are employed in clinical practice. The National Institute of Health (NIH) recommends women to rank themselves by counting the number of risk factors they possess according to a short questionnaire [21]. To create a baseline model, a close proxy to this score was calculated for every woman in the cohort, denoted here as their “Baseline Risk Score”. An analysis of the features and their predictive power is described in
2355 features were constructed from the dataset. 295 of them are available at the initiation of pregnancy, and the rest 2060 are generated from data gathered throughout the pregnancy. The features available at the initiation of pregnancy include (i) demographics (e.g., ethnicity), (ii) basic measures (e.g., age, weight, height), and medical history gathered prior to the current pregnancy, including data on (iii) previous pregnancies and (iv) data from non-pregnancy periods. Features gathered throughout the current pregnancy include (i) blood and urine lab tests, (ii) clinic and hospital diagnoses, (iii) anthropometrics and blood-pressure measurements, and (iv) pharmaceuticals prescribed and collected. A complete list of the features, including methods for feature generation are available in Appendix 3, below. The percentage of feature availability per category is presented in
Predictions were generated using a Gradient Boosting Machine (GBM) model [22], built with decision-tree base-learners. Cross-validation among the training set was used to set hyperparameters. Cross-validation results and exact hyperparameters values are described in Appendix 4, below.
To understand how single features relate to the model's output Shapley values [26] were used as they are suited for complex models such as artificial neural networks and gradient boosting machines. Shapley values partition the prediction result of every sample into the contribution of each constituent feature value, by estimating the difference between models with subsets of the feature space. By averaging over all samples, Shapley values estimate the contribution of each feature to the overall model predictions.
A baseline, termed Baseline Risk Score, was defined as the summation of seven binary variables that the NIH recommends to use as GDM risk factors. Odds ratios for these seven parameters are presented in
To evaluate whether EHR-derived information may improve GDM predictions, a set of 2355 features was compiled. These features were then used to train a gradient boosting model to predict the probability of each held-out individual (not part of the training set) to develop GDM. The predictive model evaluation is shown in
The EHR-based model of the present embodiments achieved an area under the receiver operating characteristic curve (auROC) of 0.854 and Precision-Recall (auPR) of 0.318, compared to a auROC of 0.682 and auPR of 0.097 by the baseline model (
The ability to predict GDM at different weeks of gestation was evaluated by constructing models based only on data collected prior to that week. The results of this analysis show that although the prediction improves by incorporating features gathered during pregnancy, predictions at pregnancy initiation still outperform the baseline model by 2-3 folds of auPR. This effect is stronger for women in their 2nd pregnancy onwards (
The Inventors next examined whether the predictions differ in accuracy for different subsets of the population, consisting of (1) Exact gestational age: the subset of women with their gestational age logged, who underwent the GTT in the recommended period; (2) First pregnancy: women with no previous record of pregnancy; (3) Has GTT: women who have a record of a GTT from a previous pregnancy; (4) Two blood tests available: women with two separate records of a fasting glucose blood test in different trimesters in pregnancy; and (5) High risk: woman with Baseline Risk Score greater than or equal to 3. Across all subgroups, the EHR-based model of the present embodiments had higher auROC and auPR values as compare to the baseline model (
Table 1.2 summarizes the evaluation results in the geographical and temporal validation sets.
The study ensured that the model predictions reflect the actual expected risk of an individual and, furthermore, demonstrate the utility of the predictor by considering its decision curve (
The additive nature of the Shapley values allows construction of a feature importance score for feature sets, by summing of Shapley values per set. The results of this analysis (for the sets defined in Methods, above) are presented in
The Shapley values were further used to build Dependence Plots that capture the non-linear associations of every feature. Dependence plots show the Shapley value of a specific feature, representing its predicted contribution, in the form of relative risk (RR), against the feature's value (see Appendix 5, below). In this Example, the dependence plots for two well known risk factors for GDM were examined: pre-pregnancy maternal BMI [29], and the number of relatives diagnosed with diabetes mellitus (DM) [30]. For prepregnancy BMI, the RR for GDM starts to increase above 21, reaches above 1 in BMI values above 24, and plateaus at 1.2 in BMI above 30 (
To further explore the importance of GCT in the previous pregnancy, a further analysis was conducted: for every patient the combined Shapley value was plotted for all glucose tests (GCT and OGTT, if applicable) during previous pregnancy versus the value of GCT in the previous pregnancy (
A simpler prediction model was establish based on a reduced number of features as opposed to a model based on many EHR features. To this end, the performance of a model based on 9 questions that a patient can answer herself. This predictor achieves an auROC of 0.801 and auPR of 0.238, compared to 0.680 and 0.100, respectively, for the baseline model (
This study also emulated usage of the predictive model of the present embodiments as a screening tool to identify women who are less likely to develop GDM, rather than subjecting those who fall below a certain risk threshold to the usual two-step GCT plus OGTT (GCT/OGTT) diagnostic process. The trade-off of missing diagnoses when implementing such screening across varying risk group thresholds was assessed by analyzing the proportion of women who could avoid testing versus the predictor miss rate. That is, the percentage of GDM-positive women not accurately diagnosed by this approach (
The version of Clalit Health database that was used does not include exact delivery dates for all women, however it has approximate (±1 month) birth date of every child. As such, the cohort was defined by collecting all birth dates of children Clalit-insured mothers, and looking for GTTs in the relevant period prior to the delivery, namely 32 weeks before the logged date of birth to 7 weeks after the logged date of birth. The fact that 50 g GTTs are only used in pregnancy in Israel, and the fact that pregnancy period we looked for to begin with, mean the GTTs of the data are all pregnancy-related.
GTTs appear in the lab tests data under five distinct tests: one for 1 hour 50 g result, and four for fasting, 1 h, 2 h and 3 h 100 g results. GDM was defined in accordance to practice, regardless of the order of the tests, and without consideration of whether a relevant diagnosis was recorded. In case more than one GTT was conducted, a positive result in a single GTT we considered to be positive. A small portion of pregnancies (n=8228 in the training set and n=1,525 in the test set, 1.6%) was excluded for women that had a 50 g result of 140 mg/dL or higher, but do not have a record of a 100 g GTT.
The cohort was defined our according to the relevant date of delivery. Main cohort included pregnancies that ended between Jan. 1, 2010 to Dec. 31, 2016, and test cohort included pregnancies that ended between Jan. 1, 2017 to Dec. 31, 2017.
The sole exclusion criteria was pre-pregnancy notion of (non-gestational) diabetes. Normally women with DM does not take a GTT during pregnancy, but it appears that some (<0.2%) do. To address that, patients who had one of the following markers prior to pregnancy start were excluded: (1) a recorded diagnosis of DM, defined as any of the ICD9 codes in 250.x or 357.2, or (2) a recorded non-pregnancy HbA1c % blood test of 6.5 or higher. Note that although fasting glucose could also be used to diagnose diabetes, this metric is less accurate in the data as some non-fasting patients still take the test, and was therefore not used.
The original Risk Score suggested by the NIH [21] includes eight parameters, of which all except for ethnicity are relevant for Israeli population. Seven of these parameters were therefore included in the score, defined according to the following binary variables enumerated as (1) through (7).
Binary variable (1): Overweight status. This binary variable was set to be true if non pregnancy BMI is higher than 25 kg/m2, and false otherwise. If there is no record of BMI prior to the pregnancy, this binary variable was set to be false.
Binary variable (2): Family history of diabetes. This binary variable was set to be true if a first degree relative (parent or sibling) has at least one diagnosis of DM, defined as any of the ICD9 codes in 250.x or 357.2, and false otherwise. Only diagnoses available at pregnancy initiation are considered.
Binary variable (3): Age. This binary variable was set to be true if the patient was 25 or more years of age at pregnancy start, and false otherwise.
Binary variable (4): History of pregnancy complication. This binary variable was set to be the logic OR operation of the following markers: (a) History of GDM according to GTTs, defined similar to the target; (b) History of miscarriage or stillbirth, seen in a form of a diagnosis with ICD9 632, 634.x, 635.x or 637.x; and (c) History of a liveborn baby with birth weight higher than 4 kg. Note that birth weight is only logged for deliveries done in Clalit owned hospitals (about 30% of the deliveries)
Binary variable (5): History of PCOS. This binary variable was set to be true if the patient has at least one diagnosis of PCOS, ICD9 code 256.4, and false otherwise. Only diagnoses available at pregnancy start were considered.
Binary variable (6): Problems with insulin or blood sugar. This binary variable was set to be true if the patient has at least one diagnosis of prediabetes, either according to ICD9 codes 790.2x or by a HbA1c test in the range 5.7% to 6.4%, and false otherwise. Only diagnoses and tests available at pregnancy initiation were considered.
Binary variable (7): High blood pressure, high cholesterol, and/or heart disease. This binary variable was set to be the logic OR operation of the following markers: (a) History of high BP, defined as two or more BP tests with systolic BP over 140 or diastolic BP over 90, (blood pressure measurements taken during pregnancies are not included in this analysis); and (b) Recorded relevant ICD9 of 401.x, 272.x 390.x-449.x.
The final Baseline Risk Score is, then, the number of “true” entries in Binary variables (1) through (7), and therefore ranges from 0 to 7. An analysis of the odds ratio of the constructing variables, as well as a comparison to a logistic regression model from the above binary variables is presented in
2355 features were constructed from the dataset. The following list describes the generation mechanism for each.
(A): Features that are available at pregnancy initiation (295 features).
(A)(1): Demographics (41 features).
(A)(1)(i): Was the patient born in Israel (True/False)
(A)(1)(ii): Features describing ethnicity. 15 features breaking down the origin of the patient's ancestors, as logged in their country of origin. World's countries were clustered into 14 categories, corresponding to Israel's major ethnic groups. North Africa, Iraq, Iran, Yemen, East Europe, West Europe, ex-USSR, North America, Latin America, Arab, Mediterranean, Ethiopia, Asia and Africa. Another feature logs the percentage of unknown origin.
(A)(2): Personal characteristics (7 features).
(A)(2)(i): Age at pregnancy initiation.
(A)(2)(ii): Weight, height and BMI. Only samples available before the current pregnancy and outside past pregnancies were considered, median for all samples with age 18 and up was calculated.
(A)(2)(iii): Systolic and Diastolic blood pressure. Only samples available before the current pregnancy and outside past pregnancies were considered, median for all samples with age 18 and up was calculated.
(A)(2)(iv): Number of children born in current pregnancy—1 for single born, 2 for twins etc.
(A)(3): Pregnancy history (103 features).
(A)(3)(i): History of GDM.
(A)(3)(i)(a): Any history of GDM according to past pregnancies' GTTs (True/False).
(A)(3)(i)(b): GDM status in each of the last 3 pregnancies.
(A)(3)(ii): History of miscarriage. seen in a form of a diagnosis with ICD9 632, 634.x, 635.x or 637.x
(A)(3)(iii): Largest baby weight. maximal birth weight recorded. Note that birth weight is only available for 25% of the cohort.
(A)(3)(iv): Number of previous births. number of children born before current pregnancy.
(A)(3)(v): Lab tests during last 3 pregnancies.
(A)(3)(v)(a): Median values during each pregnancy of the following tests, the 25 most common lab tests. HB, HCT, RBC, MCV, MCH, MCHC, WBC, PLT, LYM %, NEUT %, MONO %, EOS %, BASO %, LYMP (abs), NEUT (abs), MONO (abs), EOS (abs), BASO (abs), MPV, RDW, Urine culture, MICRO %, MACRO %, HYPO %, HYPER % (75 features).
(A)(3)(v)(b): Median values during each pregnancy of fasting glucose and HbA1c % (6 features).
(A)(3)(v)(c): GTT results (both 50 g and 100 g), if available (15 features).
(A)(4): Medical history outside of pregnancy (144 features).
(A)(4)(i): Number of first degree relative (parent or sibling) with at least one diagnosis of DM, defined as any of the ICD9 codes in 250.x or 357.2. Only diagnoses available at pregnancy initiation are considered.
(A)(4)(ii): History of PCOS, according to ICD9 code 256.4. Only diagnoses available at pregnancy start are considered.
(A)(4)(iii): History of prediabetes.
(A)(4)(iii)(a): Diagnoses. true if the patient has at least one diagnosis of prediabetes according to ICD9 codes 790.2x.
(A)(4)(iii)(b): Maximal HbA1c % logged.
(A)(4)(iii)(c): Joint prediabetes definition. either according to diagnosis or by a HbA1c test in the range 5.7% to 6.4%. Only diagnoses and tests available at pregnancy initiation are considered.
(A)(4)(iv): Features related to high blood pressure, high cholesterol, and/or heart disease.
(A)(4)(iv)(a): Number of high BP tests with systolic BP over 140 or diastolic BP over 90. Blood pressure measurements taken during pregnancies are not included in this analysis.
(A)(4)(iv)(b): Recorded relevant ICD9 of 401.x (hypertension), 272.x (high cholesterol) and 390.x-449.x (heart diseases) (3 True/False features).
(A)(4)(v): Baseline Risk Score value (Appendix 1).
(A)(4)(vi): Lab tests during last 5 years (132 features). logging the median value in every window M1-M5 (see Time Windows ahead). We considered the top 25 most common tests. HB, HCT, RBC, MCV, MCH, MCHC, WBC, PLT, LYM %, NEUT %, MONO %, EOS %, BASO %, LYMP (abs), NEUT (abs), MONO (abs), EOS (abs), BASO (abs), MPV, RDW, Urine culture, MICRO %, MACRO %, HYPO % and HYPER %, plus Glucose and HbA1c %. We only considered data gathered outside of pregnancy periods for these features.
(A)(4)(vii): Two coefficients of a linear regression for fasting glucose vs. time (only if 3 or more measurements were available).
(B): Features gathered throughout current pregnancy (2060 features).
(B)(1): Lab tests (524 features). Median values of 250 most common lab tests during F0-F2 (see Time Windows ahead).
(B)(2): Clinic and hospital diagnoses (906 features). Counts of 300 most common diagnoses in community clinics and 100 most common diagnoses in hospitals, plus “other” count for all the non-top diagnoses, for each window in F0-F2 (see Time Windows ahead).
(B)(3): Anthropometrics and blood-pressure measurements (27 features).
(B)(3)(i): Medians of weight, height, BMI, systolic and diastolic BP and the time between the measurement to the GTT, for each window in F0-F2 (see Time Windows ahead).
(B)(3)(ii): Two coefficients of a linear regression for weight vs. time, for 10-20 weeks of gestation (only if 3 or more measurements were available).
(B)(3)(iii): 2×2 coefficients of a linear regression for systolic/diastolic blood pressure vs. time, for 0-20 weeks of gestation (only if 3 or more measurements were available).
(B)(4): Pharmaceuticals (603 features). Counts of 300 most common medications in, plus “other” count for all the non-top medications, for each window in F0-F2 (see Time Windows ahead).
A complete list of all the 2355 features used in this study is provided in Table 6.1 below (see Appendix 6).
Windows during pregnancy were defined according to the medical examination pregnancy schedule for Israeli women [19]. Three relative-time windows within the pregnancy period were defined. These time-windows are named F0, F1 and F2, and are defined as follows.
Time-window F0: from 30 to 22 weeks before the GTT, representing −4 to 4 weeks of gestation.
Time-window F1: from 22 to 12 weeks before the GTT, representing 4 to 14 weeks of gestation. This window includes the period in which women attend the first blood test during pregnancy, which is recommended during 6-12 weeks of gestation.
Time-window F2: from 12 to 4 weeks before the GTT, representing 14 to 22 weeks of gestation. This window includes the period in which women attend the second blood test during pregnancy (triple test), which is recommended during 16-18 weeks of gestation.
This choice is backed by the test population in the data, as shown in
For medical history outside pregnancy periods, five one-year time-windows were defined, to covering the five years prior to the date of approximate gestation. These one-year time-windows are named M1 (last year before pregnancy) to M5 (5 to 4 years before pregnancy).
Additional time windows were defined for past pregnancy periods. These are denoted by P1 (last pregnancy), P2 (two pregnancies ago) and P3 (three pregnancies ago). Pregnancies were located according to the birth date of a child, and pregnancy period was defined as 40 weeks before that date plus 2.5 months in each direction to cover randomization of birth dates.
A gradient boosting predictor trained with the LightGBM [41] python package was used. Hyperparameters were selected following a cross-validated random search, with the following settings selected: num_boost_round=603, num_leaves=20, learning_rate=0.05, feature_fraction=0.2, bagging_fraction=0.8, bagging_freq=5, min_data_in_leaf=4.
To draw the dependence plots, the resulting Shapley value were converted to Relative Risk (RR). In Shapley analysis, the log-likelihood (LL) of the predicted probability is calculated according to
LL=ϕ0+ϕ1+ . . . +ϕd
where ϕ0 is the base Shapley value (the log it of the population prevalence P0), and ϕi for i∈{1, . . . , d} are the Shapley values related to features 1, . . . ,d. The predicted probability based on a single feature is then
is the Sigmoid function, the inverse of the log it function. The relative risk related to a single feature and sample was therefore defined as:
For a set D={i, j, . . . } of features, this definition extends to
To plot the dependence plot, mean and standard deviations of the RR were calculated for each bin of feature value. This resembles a standard dependence plot, only with RR instead of Shapley values presented.
Table 6.1 lists all the 2355 features used in this study. The list is sorted according the significance of the respective feature for predicting the likelihood for GDM, in descending order, so that from the standpoint of prediction accuracy it is more preferred to select a parameter that is listed higher in Table 6.1, than a parameter that is listed lower in Table 6.1. For example, when N parameters are used, it is preferred to select those parameters from lines 1 through M of Table 6.1, where N≤M≤2355.
ESCHERICHIA COLI (E. COLI) (04149), count during F0:
E
—
COLI
ESCHERICHIA COLI (E. COLI) (04149), count during F2:
E
—
COLI
ESCHERICHIA COLI (E. COLI) (04149), count during F1: first
E
—
COLI
In the above Examples, the ability to utilize EHRs for predicting GDM in early stages of pregnancy was examined. The results show that EHRs can be used to produce accurate predictions of GDM risk, performing significantly better than a baseline model based on commonly assessed risk factors. The analysis presented in the above Examples demonstrate that accurate prediction of GDM is feasible even at pregnancy initiation, with an auROC of 0.836, close to the performance of a predictor constructed later on in pregnancy, which reaches an auROC of 0.854.
The predictor of the present embodiments may be used to identify and recruit a high risk cohort with risk of up to 70% for GDM development. The study presented in the above Examples thus allows early prediction and detection of GDM and prevention interventions.
Other than well known risk factors for GDM development such as maternal age and family history of diabetes, the analysis presented in the above Examples reveals factors that were not previously described to be highly predictive of gestational diabetes. The main risk factors identified were results of the 50 g GTT tests in previous pregnancies. While the medical system already addresses women with a history of GDM in previous pregnancies as being at increased risk for GDM in the current pregnancy, the experiments performed by the inventors according to some embodiments of the present invention show that the 50 g GTT result is far more predictive. Thus, according to some embodiments of the present invention explicit GTT value can be used as a risk assessment guideline with more weight than, and more preferably instead of, GDM diagnosis.
Although maximal prediction accuracy requires using the patient's entire EHR, the above Examples demonstrate that 9 simple questions that can be answered by the patient herself still enable accurate prediction (auROC of 0.801). This allows patients to get accurate GDM risk estimation by web- or smartphone-based self assessment tools.
Currently, in Israel, a 50 g GTT is used as a universal screening test for GDM, followed by the 100 g GTT if needed. The results presented in the above Examples show that the use of prediction model, which is more cost-effective and efficient, allows the identification of low risk women who can therefore avoid the 50 g GTTs entirely. Thus, the present embodiments comprise a selective screening method, wherein women for which the prediction is low are screened out of the 50 g GTT. Avoiding 50% of the 50 g GTTs of patients who previously did a GTT would result in only 5% miss rate when diagnosing GDM according to the two steps approach guidelines. Accurate selective screening is highly desirable, as it can both reduces costs and physical inconvenience for women at low risk for GDM development.
Although the predictor presented in the above Examples is based on retrospective EHR which have inherent biases and are influenced by the interaction of the patient with the health system, these biases are reduced since the data contains information originating from a non-governmental, non-profit organization which includes the majority of the Israeli population, and since the outcome of the model is based on routine pregnancy tests. Other than the data used in the above Examples, some embodiments of the present invention contemplate use of additional types of information for predicting the likelihood for GDM. These types of information include, without limitation, information regarding lifestyle habits. Although the predictor was trained and validated on Israeli population data, the size of the dataset, the validation process, and the fact that the analysis validated the utility of established risk factors for GDM development, supports the ability of the method of the present embodiments to provide prediction also for other populations.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting. In addition, any priority document(s) of this application is/are hereby incorporated herein by reference in its/their entirety.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 62/852,317 filed on May 24, 2019, the contents of which are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2020/050570 | 5/24/2020 | WO |
Number | Date | Country | |
---|---|---|---|
62852317 | May 2019 | US |