The progress of an epidemic or other extended-duration event can be subject to a wide variety of influences. Consequently, it can be difficult to forecast the progress of such events. Moreover, it can be difficult to determine, based on information about an event, whether or how the progress or consequences of an event can be modified.
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items. The attached drawings are for purposes of illustration and are not necessarily to scale. For brevity of illustration, in the diagrams herein, an arrow beginning with a diamond connects a first component or operation (at the diamond end) to at least one second component or operation that is or can be included in the first component or operation.
Various examples relate generally to the field of computerized analysis systems. Various examples relate to systems for analyzing events or other complex situations, such as for conducting epidemiological studies.
Various example embodiments provide systems and methods for performing epidemiological disease studies using graph simulations (e.g., social-interaction graph simulations). Example implementations provide a specialized, high-level interface to a sophisticated, population-based, synthetic information platform and may be geared toward the quantitative evaluation of the combined effects of behavior, interventions, resource management, and policy in domains such as public health and national security. The embodiments presented herein may be modified for use for a large class of infectious diseases and may be applied to any type of population segment. Various embodiments provide the ability to carry out experiments for situation assessment, forecasting, decision support, intervention efficacy analysis, and various other purposes. For example, some implementations herein may be used to evaluate interventions that can be employed by public health analysts or others to address epidemics.
Various examples can permit estimating the course of an event, e.g., an extended event that goes on over a period of time, or a point-in-time event that has consequences that extend over time or occur after the event itself. For point-in-time events, the “course of the event” as used herein includes a time period after the event during which the consequences of the event occur or play out. Events can include epidemics, e.g., Ebola, influenza, SARS, Zika, or other infectious diseases. Additionally or alternatively, events can include life-changing events, such as changing jobs, relocating, adding a child to a family (e.g., by birth or adoption), marriage, or other events that significantly affect an entity over time.
Example event analysis systems herein can support running numerous simulations of experiments that generate distributions of outcomes to gain an appreciation of the time-varying state (the dynamics) of an event such as an epidemiological event. The system may support exploration of the variability of outcomes in a stochastic process. The outcome of experiments may be provided as analysis reports showing, e.g., distributions of numerous replicates of an experiment. Such reports can be viewable in the form of plotted graphs, in some examples.
Examples herein can permit bioinformatics researchers to design experiments and create analysis for epidemiological disease studies based on social-interaction graph simulations. Examples can enable improved readiness, planning, and decision making in the domains of public safety and national security by delivering sophisticated modeling and simulation capabilities directly into the hands of the analyst. According to various implementations, the analysis system allows analysts to view results immediately and interactively, greatly speeding up the interpretation of results. It also may allow multiple interventions of a single type and/or allow independent applications access to simulation results, allowing for special-case analysis tools to be developed. The system may be useful in training of military, medical, rescue operation, and/or other personnel, who may have use for timely, accurate reporting of experiment results. The system may also be useful in training and coordinating activities with civilian authorities, medical personnel/infrastructure, and other teams.
The systems and methods of the present disclosure may conduct analyses through interaction with various information resources. For example, information about population characteristics, disease characteristics, intervention options, and/or various other types of information may be retrieved from one or more of a variety of information sources. In some implementations of the present disclosure, the analysis system may incorporate and/or work in coordination with a system that incorporates components designed to transmit requests for information to different information sources and retrieve the information from those sources to perform various tasks.
In some examples, an “experiment” defines and specifies an event, along with all the required parameters for simulating the event using data defined in a data library. The parameters can include the number of replicates, duration, region affected, conditions when the event occurred, effect(s) of the event, the trigger which caused the event, or intervention strategy (both type and application on sub-population(s) of the selected region). Once all the required parameters are defined, the experiment can be run to provide estimate(s) of consequences of the event, as described herein.
The job manager 118 and the computing cluster 120 can communicate at least partly via, or can share access to, a data library 122. Data library 122 can include data of a synthetic population. For example, data library 122 can include a graph comprising nodes representing synthetic entities, such as people, plants, animals, cells in a body, or other entities capable of interacting. Data library 122 can include edges linking the nodes. The edges can include labels, e.g., indicating that two linked entities interact in certain locations or contexts, or with certain frequencies.
As shown, in some examples, front end 114 is a client 124 of services provided by a server 126. Server 126, which can represent one or more intercommunicating computing devices, can include at least one of each of: back end 116, job manager 118, cluster 120, or data library 122. In some examples, server 126 can include a single data library 122 and multiple back ends 116, job managers 118, or clusters 120. In some examples, client 124 and server 126 are disjoint sets of one or more computing devices.
System 112 can include at least two types of functionality, illustrated as tool 128 and platform 130. Tool 128 can include front end 114 and back end 116. Platform 130 can include job manager 118, cluster 120, and data library 122. In some examples, tool 128 implements a solution for a specific use case. For example, tool 128 can provide facilities for estimating the progress of an epidemic, for estimating the progress of another type of event, or for performing other specific analyses. Platform 130 can provide services usable by various tools 128, e.g., computational resources and access to the data library 122. Although only one tool 128 is shown, multiple tools 128 can access the platform 130 sequentially or concurrently. In some examples, multiple tools 128 can interact with each other directly or via services provided by platform 130. In some examples, one tool 128 writes to the data library 122 and a different tool 128 reads from the data library 122.
In some examples, a specific tool 128, or the platform 130, can interact with a data source 132, as shown by the dashed lines. The data source can be or include, e.g., a Web server, sensor, or other source of data 134 to be loaded into data library 122. The platform 130 can load the data 134 into the data library 122.
In some examples herein, tool 128 is a tool for forecasting the progress of an event, e.g., an extended event that goes on over a period of time, or a point-in-time event with extended consequences. An example of such an event is an epidemic among human, animal, or plant populations. As discussed in more detail below, the front end 114 can receive attributes 136 of a synthetic population, e.g., a subset of the data library 122. The tool 128 can select a synthetic-population (SP) graph from the data library 122, e.g., using services provided by the job manager 118. The front end 114 can receive data of an intervention 138 designed to affect a course of the event, e.g., to counteract or mitigate the event. The tool 128 can then simulate the course of the event in the SP graph to produce an estimate 140 of the event, based at least in part on the intervention 138. The front end 114 can present the estimate 140, e.g., via a user interface such as a Web page.
The illustrated computing devices, e.g., front end 114, back end 116, job manager 118, or devices of cluster 120, can be or include any suitable computing devices configured to communicate over a wireless and/or wireline network. Examples include, without limitation, mobile devices such as a mobile phone (e.g., a smart phone), a tablet computer, a laptop computer, a portable digital assistant (PDA), a wearable computer (e.g., electronic/smart glasses, a smart watch, fitness trackers, etc.), a networked digital camera, and/or similar devices. Other examples include, without limitation, devices that are generally stationary, such as televisions, desktop computers, game consoles, set top boxes, rack-mounted servers, and the like. As used herein, a message “transmitted to” or “transmitted toward” a destination, or similar terms, can be transmitted directly to the destination, or can be transmitted via one or more intermediate network devices to the destination.
Tool 210 can be or include a wireless phone, a wired phone, a tablet computer, a laptop computer, a wristwatch, or other type of computing device as noted above. Tool 210 can include at least one processor 216, e.g., one or more processor devices such as microprocessors, microcontrollers, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), programmable logic devices (PLDs), programmable logic arrays (PLAs), programmable array logic devices (PALs), or digital signal processors (DSPs). Tool 210 can further include one or more computer readable media 218, such as memory (e.g., random access memory, RAM, solid state drives, SSDs, or the like), disk drives (e.g., platter-based hard drives), another type of computer-readable media, or any combination thereof.
The tool 210 can further include a user interface (UI) 240 configured for communication with a user 242 (shown in phantom). User 242 can represent an entity, e.g., a system, device, party, and/or other feature with which tool 210 can interact. For brevity, examples of user 242 are discussed herein with reference to users of a computing system; however, these examples are not limiting. The user interface 240 or components thereof, e.g., the electronic display device, can be part of the front end 114 (e.g., as illustrated in
User interface 240 can include one or more input devices, integral and/or peripheral to tool 210. The input devices can be user-operable, and/or can be configured for input from other computing devices of tool 210 or separate therefrom. Examples of input devices can include, e.g., a keyboard, keypad, a mouse, a trackball, a pen sensor and/or smart pen, a light pen and/or light gun, a game controller such as a joystick and/or game pad, a voice input device such as a microphone, voice-recognition device, and/or speech-recognition device, a touch input device such as a touchscreen, a gestural and/or motion input device such as a depth camera, a grip sensor, an accelerometer, another haptic input, a visual input device such as one or more cameras and/or image sensors, a pressure input such as a tube with a pressure sensor, a Braille input device, and the like. User queries can be received, e.g., from user 242, via user interface 240.
User interface 240 can include one or more result devices configured for communication to a user and/or to another computing device of or outside tool 210. Result devices can be integral and/or peripheral to tool 210. Examples of result devices can include a display, a printer, audio speakers, beepers, and/or other audio result devices, a vibration motor, linear vibrator, Braille terminal, and/or other haptic result device, and the like. Actions, e.g., presenting to user 242 information of or corresponding to a result of an analysis (e.g., estimate 140), can be taken via user interface 240.
The computing device 210 can further include one or more communications interface(s) 244 configured to selectively communicate via the network 214. For example, communications interface(s) 244 can include or operate one or more transceivers or radios to communicate via network 214. In some examples, communications interface(s) 244, or an individual communications interface 244, can include or be communicatively connected with transceivers or radio units for multiple types of access networks.
The computer readable media 218 can be used to store data or to store components that are operable by the processor 216 or instructions that are executable by the processor 216 to perform various functions as described herein. The computer readable media 218 can store various types of instructions and data, such as an operating system, device drivers, etc. Stored processor-executable instructions can be arranged in modules or components. Stored processor-executable instructions can be executed by the processor 216 to perform the various functions described herein.
The computer readable media 218 can be or include computer storage media. Computer storage media can include, but are not limited to, random-access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PRAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory, compact disc read-only memory (CD-ROM), digital versatile disks (DVDs), optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid-state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or memories, storage, devices, or any other tangible, non-transitory medium which can be used to store the desired information and which can be accessed by the processor 216. Tangible computer-readable media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. In contrast to computer storage media, computer communication media can embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include computer communication media.
The computer readable media 218 can include processor-executable instructions of an interaction module 246 or a selection module 248. The computer readable media 218 can additionally or alternatively include processor-executable instructions of a simulation module 280 or other modules or components. In some example, the processor-executable instructions of the modules 246, 248, or 280 can be executed by the processor 216 to perform various functions described herein, e.g., with reference to at least one of
The platform 212 can include at least one processor 282. The platform 212 can include one or more computer readable media (CRM) 294. The computer readable media 294 can be used to store processor-executable instructions of a simulation module 296 or other modules or components. The processor-executable instructions of the module 296 or other modules can be executed by the processor 282 to perform various functions described herein, e.g., with reference to at least one of
In some examples, processor 282 and, if required, CRM 294 are referred to for brevity herein as a “processing unit.” Similarly, processor 216 and, if required, CRM 218 can be referred to as a “processing unit.” For example, a processing unit can include a CPU or DSP and instructions executable by that CPU or DSP to cause that CPU or DSP to perform functions described herein. Additionally or alternatively, a processing unit can include an ASIC, FPGA, or other logic device(s) wired (physically or via blown fuses or logic-cell configuration data) to perform functions described herein.
The platform 212 can include one or more communications interface(s) 298, e.g., of any of the types described above with reference to communications interface(s) 244. For example, platform 212 can communicate via communications interface(s) 298 with tool 210.
Operations shown in
At block 304, server 126 can receive attributes 136 of (e.g., designating or defining) a synthetic population (referred to as “synth. pop.,” “s. p.,” or “SP” throughout this description and figures). This can be performed, e.g., by the user interface or the experiment component shown in
The initial conditions can include, e.g., an option to upload or otherwise specify a set of synthetic entities to be initially marked as infected with a disease. This can provide users, e.g., researchers, an increased degree of control at simulating specific situations, e.g., travelers carrying diseases between countries. Initially-infected entities can be indicated by identification or by characteristics of those to be marked as infected before the beginning of the simulation. In some examples, the synthetic entities initially marked as infected can be selected at random (or pseudorandom, and likewise throughout this document) from an entire synthetic population or from sub-populations thereof matching specified conditions.
At block 306, server 126 can select a synthetic-population (SP) graph 308 from the data library 122 based at least in part on the attributes 136. This can be performed, e.g., by the subpopulation-services or subpopulation data manager components shown in
In some examples, the SP graph 308 is or comprises a social-interaction graph. In such a graph, labels can represent, e.g., locations at which the connected entities come into contact or interact, such as home, work, or school. An example SP graph 308, and components thereof, is shown in
At block 312, server 126 can receive data 314 of an intervention (or at least one intervention) designed to counteract or mitigate an epidemic. This can be performed, e.g., by the user interface or the experiment component shown in
In some examples, the data 314 can indicate intervention(s) to be applied during the experiment and/or trigger(s) to initiate the intervention(s). Such interventions may include, but are not limited to: 1) table-defined intervention; 2) vaccination; 3) adding social distance; 4) closing offices; 5) closing schools; 6) providing pharmaceutical treatment; 7) providing pharmaceutical prophylaxis; and 8) dynamic sequestration. Examples of table-defined interventions can include scaling infection risks per activity type by user-specified values in a table of parameters. Table-defined interventions can permit simulating the effects of interventions not expressly provided by the system, given values for those parameters. For example, parameters of a table-defined intervention pertinent to an epidemic, e.g., influenza, can indicate how likely infected entities are to wear face masks or to turn away from others while sneezing. In some implementations, the data 314 can specify a duration of intervention, providing control over how long the intended intervention should be applied rather than assuming the intervention is applicable for the remainder of the simulation from the time of triggering. This can permit creating more practical scenarios for some simulations. In some implementations, the data 314 can specify a rate of administration for interventions such as vaccinations. The data 314 can additionally or alternatively specify at least one of: an efficacy level indicating how effectively an intervention reduces the spread of an epidemic; or a compliance rate indicating how likely entities are to cooperate with the intervention (e.g., how many people will accept vaccination).
At block 316, server 126 can simulate the course of the epidemic in the SP graph 308 to produce an epidemic estimate 140, based at least in part on the intervention indicated by the data 314. This can be performed, e.g., by the experiment component or the analysis component shown in
In some examples, the epidemic estimate 140 can include at least one of the following. For example, estimate 140 can include at least one of the following for each tested condition. Estimate 140 can include a curve indicating a number of the nodes marked as infected over the course of the simulation. The number can be per time interval, as in a conventional epicurve, or cumulative. The number can represent total infections or new infections per time interval. Estimate 140 can include an R curve indicating an estimated or actual reproductive number of the epidemic over the course of the simulation. Estimate 140 can include a curve indicating a slope of any of the above-described curves as a function of simulation time. Estimate 140 can include an estimated generation time of the epidemic. Estimate 140 can include an estimated growth rate of the epidemic.
In some examples, block 316 can include presenting, via a user interface (e.g., running on front end 114), a display of the experiment status while the experiment is running. An experiment status indicator can be updated on the user interface as the simulation progresses to keep the user informed as to the progress of the experiment. In some implementations, the status may include a progress bar that informs the user of a current state of the experiment, how much of the experiment is completed, how much of the experiment remains, time elapsed since the beginning of the experiment, estimated time to completion of the experiment, etc.
In some examples, replicates of the experiment are run, e.g., as discussed herein with reference to blocks 428 and 434. In some examples, different interventions are tested, e.g., as discussed herein with reference to blocks 436 and 450. In some examples, block 316 can be followed by block 428 or block 436.
At block 428, server 126 can simulate the course of the epidemic in the synthetic-population graph to produce at least one second epidemic estimate 430. This can be performed, e.g., by the experiment component shown in
An experiment can involve one or more simulations. Each simulation can be executed, e.g., by process(es) or job(s) running on an HPC cluster, under control of configuration file used to manage the execution of a series of jobs on a distributed HPC cluster. Job configurations can be stored, e.g., by the job manager 118. Job manager 118 can permit back end 116 to interface to the databases that hold the synthetic population data and the results of studies on those data, e.g., data library 122.
In various examples, series of random or pseudorandom numbers between some limits with a distribution that is indistinguishable from random on the margins are generated. These numbers can be used as randomization values 432 to influence Experiment Replicates. For example, the simulations that produce epidemic estimate 140 and second epidemic estimate 430 can be respective replicates. In some examples of pseudorandom number generation, a seed number is input to an iterative calculation that then produces the series of pseudorandom numbers. Some such generators will produce identical sequences when they start with the same seed number, permitting deterministically retrying experiments or replicates. Some examples use the Scalable Parallel Random Number Generators Library (SPRNG) from Florida State University to provide randomization values 432.
In some examples, each experiment can involve simulating interactions among perhaps millions of nodes mediated through numerous complex networks which might themselves be time-varying. For some events, the course of these interactions over time can vary greatly with initial conditions. Running replicates as in block 428 can permit analyzing the range of variability in events and the dependence on initial conditions.
At block 434, server 126 can cause presentation of a representation via a user interface (e.g., running on front end 114). This can be performed, e.g., by the analysis component shown in
At block 436, server 126 can receive data 438 of a second intervention, e.g., different from the intervention represented by data 314. In some examples, data 438 can indicate at least one second intervention that is not indicated in data 314 (which may itself indicate at least one intervention).
At block 450, server 126 can simulate the course of the epidemic in the SP graph 308 to produce a second epidemic estimate 452 based at least in part on the second intervention(s) indicated by data 438. Block 450 can additionally or alternatively include running the simulation further based at least in part on at least one intervention indicated in data 314. That is, block 450 can simulate alternative intervention(s) not previously simulated, or can simulate additional intervention(s) used together with some intervention(s) previously simulated. In some examples, as indicated by the dashed arrow, block 450 can be followed by block 434. This can permit a representation to be provided via the user interface based on the epidemic estimate 140 and the second epidemic estimate 452. Additionally or alternatively, block 434 can include causing a representation to be presented, that representation based on at least one of, or all of, epidemic estimate 140, second epidemic estimate 430 (from replicates), and second epidemic estimate 452 (from alternative interventions). Examples are discussed herein, e.g., with reference to
As represented by the dash-dot arrow, block 428 can be used together or in conjunction with block 450. This can permit any number of replicates of any number of interventions to be simulated. Block 434 can then include causing representations to be presented of any of the results. Accordingly, in some examples, the system can provide representations or other outputs indicating how alterations in an intervention, e.g., changes in the rate of administration of a medicine or vaccine, affect spread of a disease in a simulation.
Operations of processes 502 are described with reference to server 126. Additionally or alternatively, operations of blocks 512, 524, or 532 can be performed by client 124. For example, the front end 114 can be configured to perform at least some client-side filtering or processing of epidemic estimate 140. Given a sufficiently capable client 124, this can reduce the network bandwidth required to perform such filtering or processing.
At block 504, server 126 can determine a first subset 506 of nodes of the SP graph 308. The first subset of nodes can represent an initial infected population. Examples of initial conditions specifying an initial infected population are discussed herein, e.g., with reference to block 304. Block 504 can be followed by block 316 of simulating the course of the epidemic. Block 316 can include blocks 516 and 522. Block 316 can include block 512. In some examples, blocks 516 and 522 are used without block 504. Block 504 can be followed by block 512, in some examples.
At block 512, server 126 can receive an indication of a predetermined disease model 514 via a user interface. For example, client 124 can receive the indication via user interface 240, and provide the indication to server 126. In some of these examples, server 126 can also receive the data 314 of the intervention via the user interface. For example, a user 242 can specify the data 314 and the disease model 514 via a Web interface of tool 128, e.g., as discussed herein with reference to
At block 516, server 126 can modify edges of the synthetic-population graph based at least in part on the intervention indicated by the data 314 to produce a modified synthetic-population graph 518. For example, if the data 314 indicate that a particular workplace should be closed, edges to that workplace labeled “work,” or edges between entities labeled with that particular workplace, can be removed from the SP graph 308 to produce the modified SP graph 518. In the modified SP graph 518, the particular workplace, being closed, is not a factor in transmitting infections between entities.
At block 522, server 126 can determine spread of the epidemic in the modified synthetic-population graph 518 based at least in part on a predetermined disease model 514. For example, server 126 can propagate the infection from nodes marked as infectious along edges connected to those nodes. Nodes can have states according to epidemiological models such as the SIR model, in which each node is Susceptible, Infected, or Recovered, or the SEIR model, in which each node is Susceptible, Exposed but not infectious, Infectious, or Recovered. Block 522 can additionally or alternatively include removing nodes from the graph, e.g., in the event of simulated travel by a synthetic entity out of the area of the simulation, or of death and burial of the synthetic entity corresponding to a node. Examples of the spread of a disease through a network are shown in
At block 524, server 126 can receive second attributes 526 of (e.g., designating or defining) a second synthetic population. For example, the second synthetic population indicated by the second attributes 526 can be a subset of the synthetic population represented by SP graph 308.
At block 532, server 126 can determine a second epidemic estimate 534 based at least in part on the second attributes 526 and the SP graph 308. For example, the second epidemic estimate 534 can report the progress of the epidemic among synthetic entities connected with a specific workplace or school, among synthetic entities in a certain age range, or among synthetic entities that responded to the intervention in a particular way (e.g., took medicine vs. did not take medicine). In some examples, block 532 can permit users 242 to quickly visualize experiment outputs without having to create an analysis for every experiment and validate the simulation results.
At block 636, server 126 can receive the attributes 136 of a synthetic population. Examples are discussed herein, e.g., with reference to block 304.
At block 638, server 126 can select a synthetic-population (SP) graph 640 from the data library 122 based at least in part on the attributes 136. Examples are discussed herein, e.g., with reference to block 306. In some examples, the SP graph 640 comprises nodes, edges between at least some of the nodes, and labels associated with at least some of the edges. In some examples, the SP graph 640 includes parameters associated with at least some of the nodes. Examples are discussed herein, e.g., with reference to data library 122 or block 306. Parameters and edges can be associated with the same subsets of the nodes of the SP graph 640, although this is not required. In some examples, both node parameters and edge labels are used; in other examples, either node parameters or edge labels (but not both) are used.
At block 642, server 126 can receive data 644 of an intervention designed to affect the course of an event. As discussed above, an event can continue over a period of time, or can be a point-in-time event that results in or leads to consequences that extend over time. Examples are discussed herein, e.g., with reference to block 312.
As used herein, “consequences” are any results or outcomes of the event, or changes in event state or state of systems affected by the event. The “course of an event” refers to the progress of the event itself or its consequences, as determined via simulation as described herein. The term “consequence” is used herein without regard to whether any particular consequence may be considered by any party to be beneficial or harmful. Consequences themselves can be ongoing or point-in-time. For example, the spread of a disease can be a consequence of an epidemic, since it involves changes in the state of the epidemic (the event) itself. In another example, closure of schools and offices can be a consequence of an electric blackout (an event), since it involves changes in the state of systems (the schools or offices) affected by the event.
At block 646, server 126 can simulate the course of the event in the SP graph 640 to produce an estimate 648 of the event, based at least in part on the intervention indicated by the data 644. Examples are discussed herein, e.g., with reference to block 316. The estimate 648 can include, e.g., estimate(s) of the nature or range of one or more consequence(s) of the event.
At block 702, server 126 can selectively propagate information about consequences of the event along edges of a first subset of the edges based at least in part on at least some corresponding labels of the labels of the SP graph 640. For example, in an epidemic, server 126 can propagate information about infection along edges labeled with the homes or workplaces of infected entities. In a simulation of a power outage, server 126 can propagate information about power losses along edges labeled with particular distribution lines or particular generation plants.
At block 704, server 126 can determine data 706 of consequences of the event. For example, the data 706 can indicate which nodes are infected, e.g., as in
At block 708, server 126 can selectively modify at least some of the parameters of at least some of the nodes based at least in part on the data 706 of the consequences of the event. For example, server 126 can update the parameters of the nodes to reflect the results of the simulation, as in the left-to-right progress in
At block 710, server 126 can change edge labels or node parameters in response to triggers, e.g., as discussed herein with reference to
At block 714, server 126 can selectively modify at least some of the labels of the SP graph 640 based at least in part on the intervention represented by data 644. For example, if simulation determines that a particular synthetic entity will work from home in response to the spread of disease, server 126 can alter label(s) on edge(s) from the node representing that entity to node(s) representing that entity's co-workers to reflect a reduced probability of transmission of the disease.
At block 802, server 126 can receive a query 804. Server 126 can receive the query 804, e.g., from a front end 114 or user interface 240. The query can include, e.g., attributes 136 of a synthetic population, desired outputs or result plots, or analyses or transformations to be performed on simulation results.
At block 806, server 126 can determine at least one first simulation 808 based at least in part on the query. The term “first simulation” is for clarity of identification and does not require a specific order of execution of multiple simulations. The first simulation can be of any of the types described herein, e.g., with reference to
At block 810, which can be included in block 646, server 126 can modify a first subset of nodes of the SP graph 640 at a first simulated time based at least in part on the intervention indicated by data 644 and on attributes of nodes of the first subset of nodes.
At block 812, server 126 can modify a second, different subset of nodes of the SP graph 640 at a second, different simulated time based at least in part on the intervention indicated by data 644 and on attributes of nodes of the second subset of nodes. Using blocks 810 and 812 can permit simulating differences between synthetic entities. For example, in an epidemic, different entities may have different thresholds for when they will seek care. Accordingly, the nodes representing those entities can change state at different times as the event progresses.
Operations of processes 900 are described with reference to server 126. Additionally or alternatively, operations of blocks 902, 904, 908, or 912 can be performed by client 124. For example, the front end 114 can be configured to perform at least some client-side filtering or processing of estimate 648. Given a sufficiently capable client 124, this can reduce the network bandwidth required to perform such filtering or processing.
At block 902, server 126 can cause the estimate 648 of the event to be presented via a user interface. Examples are discussed herein, e.g., with reference to block 434.
At block 904, server 126 can receive second attributes 906 of a second synthetic population. Examples are discussed herein, e.g., with reference to block 524.
At block 908, server 126 can determine a second estimate 910 of the event based at least in part on the second attributes and on at least one of the estimate of the event or the synthetic population. This can permit, e.g., dynamic filtering of result curves or other components of estimate 648 to a particular subpopulation. Examples are discussed herein, e.g., with reference to block 532.
At block 912, server 126 can cause the second estimate 910 of the event to be presented via a user interface. Examples are discussed herein, e.g., with reference to block 434.
At block 1002, server 126 can receive input data 1004 associated with a target population. For example, the input data 1004 can include data or persons, locations, activity sequences, or social contacts, as shown in
At block 1006, server 126 can construct a synthetic data set 1008 based on the input data 1004. The synthetic data set 1008 can include data of a plurality of synthetic entities corresponding with the target population. For example, the synthetic data set 1008 can be, or be included in, a data library 122 or other structure representing synthetic entities, e.g., as nodes in a graph. Examples are discussed herein, e.g., with reference to
At block 1010, server 126 can assign entity attributes 1012 to individual entities of the plurality of synthetic entities in the synthetic data set 1008 based at least in part on the input data 1004. For example, the entity attributes 1012 can include occupation, age, or demographics for synthetic people, or genus or genotype for synthetic insects. Examples are discussed herein, e.g., with reference to population construction module 310,
At block 1014, server 126 can receive activity data 1016 associated with the target population. For example, the activity data 1016 can include data indicating what activities members of the target population undertake, and during which hours of the day. Examples are discussed herein, e.g., with reference to step 226,
At block 1018, server 126 can generate a social-contact graph 1020 by generating graph edges between individual entities of the plurality of synthetic entities based at least in part on corresponding ones of the entity attributes 1012 and on the activity data 1016. For example, server 126 can generate edges between nodes tagged with entity attributes 1012 indicating the corresponding entities have a common workplace, or are in a common location at the same times of day or at different times of day. Examples are discussed herein, e.g., with reference to network construction module 315,
At block 1022, server 126 can receive population attributes 1024 of a synthetic population. Examples are discussed herein, e.g., with reference to block 636.
At block 1026, server 126 can select a synthetic-population graph 1028 from the social-contact graph 1020 based at least in part on the population attributes 1024. For example, server 126 can select a subset of social-contact graph 1020 matching the population attributes 1024. Examples are discussed herein, e.g., with reference to block 638. In some examples, the synthetic-population graph 1028 comprises nodes, parameters associated with at least some of the nodes, and edges between at least some of the nodes, e.g., as discussed herein with reference to SP graph 640.
At block 1030, server 126 can receive data 1032 of an intervention designed to affect the course of an event. Examples are discussed herein, e.g., with reference to block 642.
At block 1034, server 126 can simulate the course of the event in the SP graph 1028 to produce an estimate 1036 of the event. The simulation can be based at least in part on the intervention indicated by the data 1032. Examples are discussed herein, e.g., with reference to block 646.
At block 1102, server 126 can determine data 1104 of consequences of the event. Examples are discussed herein, e.g., with reference to block 704 and data 706.
At block 1106, server 126 can selectively modify at least some of the parameters based at least in part on the data 1104 of the consequences of the event. Examples are discussed herein, e.g., with reference to block 708. This can permit modeling changes in entities' behavior patterns over time, e.g., in response to the event.
At block 1108, in association with (e.g., as part of or in cooperation with) at least one of the constructing (block 1006), the assigning (block 1010), the generating (block 1018), or the simulating (block 1034), server 126 can generate a request 1110 for a service. For brevity of explanation, block 1108 and block 1112 are described herein with reference to simulating (block 1034), but this is not limiting. Examples of generating requests for services are discussed herein, e.g., with reference to management module 305 and service brokers 350,
At block 1112, server 126 can fulfill, by a broker software module, the request 1110 for the service. Examples are discussed herein, e.g., with reference to management module 305, edge brokers 345, and service brokers 350,
At block 1114, server 126 can present the estimate 1036 of the event via a user interface. Examples are discussed herein, e.g., with reference to block 902.
At block 1116, server 126 can receive second population attributes 1118 of a second synthetic population. For example, the second population attributes 1118 can indicate entities composing a subset of the simulated SP graph 1028. Examples are discussed herein, e.g., with reference to block 904 and attributes 906.
At block 1120, server 126 can determine a second estimate 1122 of the event based at least in part on the second population attributes and on at least one of the estimate of the event or the synthetic population. Examples are discussed herein, e.g., with reference to block 908 and estimate 910. Block 1120 can include, or be followed by a separate block involving, causing the second estimate 1122 to be presented via a user interface, e.g., user interface 240 or UIs shown in
Experiments as described herein can be used in determining the course of events, e.g., to assist personnel working in public health epidemiology. Various examples can be used for a large class of infectious diseases in the world. This includes any population segment in the world, and various kind of infectious diseases that spread via person to person contact. Various examples provide a rich class of realistic interventions that can be employed by public health analysts to carry out computational experiments. These experiments can be carried out for, e.g.: (i) situation assessment, (ii) forecasting, (iii) decision support, or (iv) determining efficacy of one or more interventions.
Various examples include a Web-based front end 114 developed for experiment designs and analysis for epidemiological disease studies based on realistic social network simulations. The front end can communicate with a back end 116. Various examples provide cards preview and slider graphs. Various examples are accessible over the Internet using the web address/URL of the server where it is deployed.
Data and results accessible using the tool can be from previously conducted studies and analyses, or can be generated on-line as required, using high performance computing (HPC) capabilities. Datasets generated by the tool can be retained and cataloged automatically.
Various examples support running numerous simulations of experiments that generate distributions of outcomes to gain an appreciation of the time-varying state (the dynamics, or “course”) of an event, e.g., an epidemiological event. Various examples support exploration of the variability of outcomes in stochastic processes. The outcomes of experiments can be used in analysis reports, e.g., showing a distribution of numerous replicates of an experiment, e.g., generally viewable in the form of plotted graphs.
Various aspects can facilitate both the planning and course of action analysis activities of analyst. Various aspects can be used in the training of military/medical personnel/NGO/Rescue Operation teams, or in training and coordinating activities with civilian authorities and medical personnel/infrastructure and other required teams. Various examples can be used by Public Health System Officials, Government Authorities involved in Policy decision making, Scientists and Researchers, Clinicians and Epidemiologists, Surveillance Department Officials, or Students. Various examples can be used for emergency crisis planning.
Throughout the discussion of
To run an experiment: After the experiment has been created and edited, the user can cause the system to execute the experiment by selecting the experiment from the Experiment List Grid (left side) and clicking START. The status of the experiment, e.g., New, Starting, Queued, Running, Completed, or Failed, is represented below the Experiment Name. Also shown in
Also shown are My/All/Archived Filters, e.g., above the Lists for Experiments, Analyses, Initial Conditions, Disease Models, Triggers and Regimens. These toggle the display in a list grid (left side) between MY List, which displays the list of objects created/owned by the current logged in user, ALL List, which displays the list of all available objects to the user i.e., objects of all users both active and archived, and Archive List, which displays the list of Archived objects owned by current logged in user. A filter specified by the user remains in effect for the session or until changed by the user. Hence, each time the user returns to any of the menus, the information listed will be displayed per the most recent filter specified during this session.
The right-hand side of
To Run an Experiment, the user can select it from the list on the left and click the START link. The displayed status (included in the list) will update as the experiment progresses, e.g., as discussed herein with reference to
A “View Cells” option can be available for an experiment. Clicking View Cells causes the system to read the parameters selected for independent variables in the experiment, e.g., the combination of intervention specifications and the parameter(s) to be swept, and to generate the required cells. These cells comprise the specification of the conditions for the experiment to be run, considering combinations of all the sweep and non-sweep values. Examples are discussed herein, e.g., with reference to block 806.
The illustrated “not runnable” display in the upper-right can represent a Progress bar displayed to indicate the experiment definition state and run state. For each card defined (Details/Region/Initial Condition/Disease Model/Intervention), the % complete shown on the progress bar can increase by a corresponding amount, e.g., 20% for a five-card experimental setup. The experiment state text in the progress bar can change from Incomplete to Runnable only when all required parameters (e.g., Details, Region, Initial Condition, and Disease Model) are defined. The progress bar can turn from Red to Yellow to Green in the order of its readiness for simulation. For a completely defined experiment all cards will be defined/selected in the detailed preview. On the right side top corner the Progress Bar will indicate “Runnable” state in a green color. In some examples, the user has a choice of creating a new, e.g., disease model, or of using an existing disease model, and likewise for regions, initial conditions, and interventions.
Name: A unique name to identify the Experiment. A system generated name is prepopulated. The user can retain the default name or provide a new Name.
Description: An optional text field to describe or provide additional information for the experiment.
Status: Provides the state of the experimental run. For a new experiment it will be pre-populated as “New.”
Owner: the name of the user who created the experiment. It is a pre-populated field. For new experiment it will be pre-populated with the logged-in users username.
Model: A selection, e.g., a drop-down list, of simulation engine types. To facilitate simple experimental designs, Epifast can be used; to facilitate complex experimental designs, EpiSimdemics can be used.
Replicates: The number of times the experiment will be run. Default value can be 25. Each run will use a different random number seed (randomization value 432), e.g., defined by the Initial Conditions Daily Seed. In some examples, each replicated experimental run is identical to all others for the parameters of the experiment—Initial Conditions, Disease Models, etc.—but varies in terms of the random number seed used to establish the initial state of the simulation based on the parameters for the Initial Conditions.
Total Cells: Total number of cells in this particular Experiment. It is dependent on the Intervention and Triggers values. For Experiments without Interventions the Total cell count is always 1, in some examples. The system may impose a maximum cell count.
Simulated Days: Duration of the simulation period. Default value can be 200.
The “REGION” card can show (or permit selection of) the geographical region in which the simulation will take place, i.e., the region which is affected by the simulated event. The “REGION” card can include at least one of the following.
Search region: Allows searching for a specific region, e.g., by entering first few characters or entire name of the region.
List: a list of all the available regions. Regions can include, e.g., cities, counties, U.S. states or other subnational entities, CDC or other statistical regions, countries, blocs (e.g., the EU or ECOWAS areas), continents, regions defined by polygons in coordinate space, or any combination of any of those.
Map: an enlarged view of the selected region, depicted in or as a map. The region can be zoomed in or out as desired. The user can select, on the map, regions to be included in the simulation area, e.g., nearby geographical areas to which an epidemic may spread.
Disease Models represent how an event affects a synthetic entity, e.g., how pathogens affect a synthetic person. The “DISEASE MODEL” card can show (or permit user entry of) details, e.g., at least one of the following.
Name: a name of the model
Transmissibility, symptomatic proportion, or other statistics or parameters that apply to the event as a whole without regard to the time at which consequences of the event begin to apply to a particular synthetic entity.
Incubation period, infectious period, and other values that depend on the onset of the event with respect to a particular synthetic entity. In
Initial conditions are a way to define the onset conditions of an epidemic or other event. The “INITIAL CONDITIONS” card can show (or permit user entry of) details, e.g., as discussed herein with reference to
Interventions permit studying effects of different strategies, like treatments on population, or distancing measures on controlling pathogens or disease spread in the population. Simulations can be performed on realistic socio-technical networks of synthetic populations, in which each synthetic individual is represented by a node and edges (e.g., directed or undirected) represent activity between nodes. Examples of such networks are described herein with reference to, e.g.,
Various examples permit analyzing complex, high-variability scenarios on entity-interaction networks to gain evidence for proposed hypotheses or to effectively plan and prepare for events.
The “ENABLED INTERVENTIONS” card can show (or permit user entry of) details, e.g., of choices or parameters of at least one of the following types of interventions. Any number of interventions of any types (in any combination) can be used, in some examples. Types can include: Vaccinate; Social Distance; Close Work; Close School; Pharmaceutical Treatment; Pharmaceutical Prophylaxis; Table-defined Intervention; or Dynamic Sequestration. Examples are discussed herein, e.g., with reference to block 312 or data 314.
The “ENABLED INTERVENTIONS” card can show (or permit user entry of) details, e.g., of at least one of the following:
Name of the intervention: a user-defined name for convenience in referring to particular intervention data 314,
Sub-population: data indicating the interaction of demographics with the dynamics of disease propagation and the impact of disease on socio-technical systems. Subpopulations can include a type and a category. Subpopulations can be selected from the population, e.g., by age, county of residence, or occupational category (e.g., public-safety-critical worker or not). For, e.g., Age, Subpopulation selection can be based on age category, and the categories can include preschool, school-age, adult, seniors, etc.
Trigger: After the onset of an epidemic event, interventions may be triggered by conditions that emerge during the event. For example, the simulation can determine that intervening by Closing Work should be performed in response not, not before, the onset of the event. The set of conditions to initiate the onset of an intervention is referred to as a Trigger. Specification of triggers is possible for each individual intervention.
Compliance: Compliance refers to the probability that an individual might be selected for inoculation or other intervention. For example, a compliance rate of 90% means that 10% of the individuals will not be inoculated.
Efficacy: It refers to the probability of transmission of the disease (or propagation of other consequences of an event) after having been inoculated. The efficacy can be specified as the percentage of the population on which the inoculation is 100% effective, or is at least a threshold percentage effective.
For any intervention, a duration can be specified as a value or sweep. The controls (e.g., sweep, value, initial, final, and increment settings) can function as described below for vaccinations. For regimen-based interventions such as pharmaceutical prophylaxis and pharmaceutical treatment, regimen duration can be used as the duration of the intervention. Duration for vaccination can be the number of simulation days for the experiment.
Intervention type: Vaccination
Vaccinate represents immunizing a selected set of population. It is possible to specify a percentage of the population that complies with this intervention (i.e., the percentage who are vaccinated), a trigger for when the vaccination is applied during the course of the pandemic, and the efficacy of the vaccination. To add a new Vaccinate intervention click on Create New+ button on the Vaccinate Intervention Page. A vaccinate intervention form opens with textbox for name and separate cards to specify detail of sub-population, compliance, trigger and efficacy.
Sub-population: To support simulation and analysis, pre-defined subpopulations in the selected geographic region for the experiment are included for selection for vaccinate intervention plan.
Sub-population Type: The population of a region is logically grouped according to age, working group, infection prone group etc. All the available sub-population Types in the example Region will be listed and available for selection.
Sub-population Categories: The population groups are further classified as categories with specific range/conditions. Example: for Age as the Type, the Categories available are, e.g., Pre-school, School-Age, Adults, or Seniors, along with the % of sub-population. A percentage of the selected sub-population categories can be selected, e.g., using a slider. The selected sub-population percentage can be displayed below the slider.
Compliance: the user can set compliance by specifying % Value or using Sweep. “% Value” defines compliance as a single set point value, e.g., X %. It indicates that X % of individuals in the experiment's selected subpopulations should comply with the intervention. Specification of this value will define a single cell for the experiment. “Sweep” defines as a range of values from an initial value to a final value by an increment, e.g., 20 to 50 by 10. The Initial Value is the starting value for sweep process to generate cells. The Final value will not be exceeded by any cell. The increment will be added to each cell value to create the next subsequent cell.
Trigger: the condition to trigger the intervention. A condition of “On Day” defines the onset of the intervention on specific days, e.g., as a percentage of the elapsed time of the event or as sweep values. A condition of “% Infectious” defines the onset of the intervention when the percent of infectious individuals in the specified subpopulation reaches the indicated value (or values, for a sweep). A trigger delay can be specified, indicating the trigger delay specified in days after beginning of the experiment. It can be a single value or a swept value. For example, for cholera, the delay can be set for 31 simulated days, after which the trigger will be applied.
Intervention Efficacy: as noted above, e.g., the percent of the population on which the intervention is 100% effective. Can be a single or swept value.
Rate of Administration: the number of doses of intervention to be delivered each day to a fraction of subpopulation, following the trigger event. During simulation, the sub population can be divided into groups based on the user specified rate. The intervention can be applied to each group on consecutive days for the entire duration of intervention specified after the specified trigger. The intervention rate can be set to unlimited for some or all interventions. In case user selects the rate of administration as unlimited, the entire sub population can be treated as a single group, and intervention can be applied on a single day as specified by the trigger. For example: an intervention and trigger in an epidemic analysis can be to vaccinate school-age kids at rate of 3000/day. If there are 30000 school-age kids in population and the trigger fires on day 10, then there can be 10 vaccinate interventions: one each for days 10-20, each with a respective 3000 of the 30000 school kids, and the specified compliance rate can be applied each day.
In some examples, triggers are the set of conditions that is obtained to initiate the onset of an intervention which may be triggered by conditions that emerge during the event are called a Trigger. Triggers are the reusable component for interventions of any experiment. At least the following two types of triggers can be available. On Day: Specification of an “On Day” trigger means that an intervention is applied on the day specified—that day being the number of days after the onset of the event. “% Infectious”: Specification of a “% Infectious” trigger means the intervention will be applied as soon as the percentage of individuals in the subpopulation exceeds the trigger threshold on a single day. Either the On Day value or the % Infectious value can be single-point or sweep. The % Infections trigger for an experiment can be specified with respect to a Subpopulation, % of that subpopulation that is infected, and Delay. Subpopulation selections as discussed herein can be used. Multiple subpopulations can be selected. The Delay can represent the number of days (≧0) from onset of the disease to when the trigger conditions will begin to be checked. That is, a trigger will not fire until the number of days specified in the Delay has passed. For Delay and other time ranges herein, days are used for convenience of explanation. However, other time scales can be used, e.g., minutes, hours, weeks, or months.
Intervention type: Social Distance
This type represents limiting non-essential activities in an individual's daily schedule to reduce the probability of disease transmission. Non-essential activities are those that occur at locations in the model other than home, work, and school. The edges in a social network graph that represent these non-essential activities are probabilistically removed based on the compliance rate. Similar to the Vaccinate Intervention, Sub-population, Compliance, and Trigger can be set for Social Distance.
Duration represents the time/duration in days for which an intervention can be applied during the experiment run. This parameter is available to the user for social distance and similar interventions such as close work, close school, dynamic sequestration, or table-defined interventions.
Intervention type: Close Work
Intervention Close Work represents the closure of work places and the elimination of work activities to reduce disease transmission. All edges in the social network graph that represent work contacts are probabilistically removed based on the compliance rate.
Intervention type: Close School
Intervention Close School represents closure of schools and the elimination of school activities to reduce disease transmission. All edges in the social network graph that represent school contacts (including college) are probabilistically removed based on the compliance rate.
Intervention type: Pharmaceutical Treatment
Intervention Pharmaceutical Treatment represents, e.g., antiviral drugs that can diminish the infection to a level sufficient for the natural immunological responses of a body to defeat it.
Diagnostic Rate represents the proportion of the infectious individuals who get diagnosed, and thus are treated. This is for treatment purposes only. For individuals who are diagnosed, the treatment starts on the first day of infectiousness and ends once the regimen is completed. The remaining controls (Sweep, Value, Initial, Final, and Increment) can function in the same way as specified for Intervention: Vaccination. However, units are calculated in percentage of the selected subpopulations for diagnosis, in some examples.
Regimen: A prescribed course of medical treatment for the restoration of normal health of an individual. Regimen parameters allow the user to set choose a number of Available Doses, or Unlimited Doses, as a constraint on the total number of doses for both treatment and prophylaxis. This constraint can be related to stock available or limitations with respect to age, sex, or genetic profile of the population. For a particular regimen, the user can specify: Name (Name of the Regimen. E.g., Tamiflu Treatment); Duration (Number of days for which medication is prescribed. If used as prophylaxis, individual is considered as protected for this duration.); Units per Day (Number of pills individual consumes per day); Infection Efficacy (Reduction in the probability of infection); or Transmission Efficacy (Reduction in the probability of transmission).
Intervention type: Pharmaceutical Prophylaxis
Pharmaceutical Prophylaxis is application of medication that specifically fights a viral infection. Pharmaceutical Prophylaxis for sequestered subpopulations can have unintended consequences by masking the symptoms of some infected individuals, and allowing their introduction into small sequestered groups. The result can be a greater infection rate within the protected subpopulation.
Intervention type: Table-defined Intervention
This permits a customized intervention to be applied. A table-defined intervention can allow risk of infection through each activity type to be scaled independently. The five activity types can include home, work, school, shopping, and other. The edges in the network graph that represent contacts due to the five activity types (e.g., edges labeled with those activity types) are scaled per the user-provided values for the factor. Similar to social distance interventions, subpopulation, compliance, trigger, duration, and rate of administration can be set for table-defined interventions. The scaling Factor can be a collection of numeric values (e.g., reals) that sets the scaling factor for each of the five activity types. Edges can be scaled equally or differently depending on edge direction in a directed graph. An Infectivity multiplier can apply to the “in” edges of the network graph due to the five activity types of the individual affected by the intervention. A susceptibility multiplier can apply to the “out” edges of the network graph due to the five activity types of the individual affected by the intervention.
Intervention type: Dynamic Sequestration
This implies isolating healthy individuals from susceptible population to attempt to protect them from infection. This involves sequestering a specified sub-population randomly in specific group sizes on a particular day specified, followed by simulating disease spread. The selected Group Size can indicate the number of individuals in a sequestered group. Group size can be defined as a value or a sweep.
Linear sweep is specified as a range of values with Initial value, Final value and increment value. Initial Value: Starting value for sweep process to generate a range of cells. Final Value: Ending value for the parameter to be set during a sweep. The final cell generated by the sweep will not exceed this Final Value. The final cell generated in the sweep will be set at the Final Value, regardless of the increment size. Increment: Size of the change in percentage of population to be used during sweep generation of cells. This increment will be added to each cell value to create the next subsequent cell.
If a sweep is specified, the system will generate a set of experimental cells that begin at the Initial value, and represent each level of the parameter above the Initial value incremented up to and including the Final value. For example, if the analyst were to sweep Infectious from 30% to 70% in increments of 10%, the cells generated would represent percentages of 30, 40, 50, 60, and 70, respectively.
Customized sweeps permit entering arbitrary (but bounded with in the range permitted by the simulation) values, e.g., entered as comma separated values. The system generates cells with each of the specified sweep value. For example if the analyst were to sweep “% Infectious” as “15,38,90”, the cells generated would represent percentage of 15%, 38% and 90%.
Initial conditions can be defined in at least one of the following ways, described in the context of epidemic simulations. Similar initial conditions can be used to determine initial populations affected by events other than epidemics. The “upload pids” option can permit selecting specific nodes or subpopulations to be initially infected. The “subpopulation” box can permit the initial infected population to be selected from a subset of the full synthetic population (e.g., selected by age, risk level for an event or epidemic such as high-influenza-risk, or criticality of profession); alternatively, type “all” can indicate the full synthetic population. Ways of defining initial conditions can include: Day 0: Specify the number of people infected on day 0; specific nodes will be selected randomly. The number marks the infected count at the beginning of the Infectious disease period. Every day: Specify the number of people (or other entities) newly infected per day (e.g., 0.5 people per day). This defines the infected count per day as disease progresses. Daily Seed: Specify the numbers of people infected on specific observed days. This represents infected count reported on particular days of infectious period. Daily-seed values can be entered or displayed graphically via the bar graph (
In some examples, animations of the data over the duration of the simulation can be provided, e.g., as computer-generated movies. For example, a movie can show a result plot, e.g., an epicurve, that fills in from left to right as (accelerated) simulation time progresses.
The Plot Configurations panel can permit user control of at least one of the following. Infection Count: Plot actual infection count or cumulative infection count of the selected cells for a day range. Show proportion: Plot proportionate of the actual or cumulative infection count of the selected cells for a day range.
The Data Filter panel can permit user control of at least one of the following. Cells: Plots the infection data for the cells selected from the list by the user. For each cell user can select the option to view the infection data for the replicates of the cells. User can select a set of the replicates for which the infection data should be plotted. Sub Population: User can also view the infection data for different sub population categories for each of the selected cell.
The Download Data control permits the user to the download the analysis results in the form of, e.g., a spreadsheet file. The downloaded analysis data can include at least some of the below data for each cell of the analysis. For example, an analysis with 2 cells can have multiple spreadsheets with infection data (e.g., mean infection data, sheets for infection data of replicates for each cell, and a sheet for infection data of all the sub population category for each cell). Mean infection data: The means of the infection data of all the replicates of each cell for the experiment duration. Infection data for replicates: The infection data for all the replicates of each cell for the experiment duration. There can be separate sheets with the replicates infection data of each cell.
For example, an analysis is run with two cells, identified as 1 and 2, for a duration of 200 days. Each cell has 25 replicates. There can be two separate spreadsheets, one for cell identifier 1, and one for cell identifier No 2. The sheets can be named 1_reps and 2_reps. 1_reps sheet can have infection data for each day of the experiment duration for all the 25 replicates of the cell with identifier 1.
Mean Subpopulation Infection data: The mean of the infection data for each sub population category for each cell for the experiment duration. There can be separate sheets with sub population infection data for each cell. Continuing the example above, there can be two separate sheets, one for cell identifier 1 and one for cell identifier No 2. The sheet for cell 1 can be named 1_meanSubPopInfection data with mean of the infection data for each sub population category for all the 25 replicates of cell identifier 1.
Referring to
Various aspects provide analysis features useful for the computation of estimates of R. Various aspects compute Wallinga likelihood estimations of daily R values from the simulated transmission tree(s), disaggregated by demographics or geographics (e.g., the average number of children infected by a single infectious child). Various aspects provide comparisons of summary features of the daily R curve, e.g. maximum slope or slope near R=1. Various aspects perform statistical analysis of the influence of input parameters on these summary features. Various examples permit comparing daily R values such as Estimated_R, calculated using Wallinga estimate on transmission trees, with Actual_R, calculated as an instantaneous derivative of the number of new infections each day from simulation data on epicurves. Wallinga estimates can be used to infer generation times. In some examples, the generation times used in simulation can be used in deriving Estimated R for studying the disease characteristics. Various examples can permit assessing likely outcomes of disease progressions with applied interventions over large networks.
Various examples estimate relations between R (Estimated_R) and ρ (Growth rate—Actual_R) in a fully-mixed population by estimating Growth rate ρ using a 10-day regression window to filter background noise from the early phase of the epidemic. This can permit more accurately estimating R in the face of complex, time-varying real-world epicurves.
The plots, such as: Actual Effective R, Estimate of Effective R, Slope of Actual Effective R, Slope of Estimated Effective R are now available to view data with these estimates. R curves can clearly show the intervention's effect, and may show that a proposed intervention may not, by itself, be sufficient to stop the outbreak. Epicurves can suggest that the intervention changes the extent of the vulnerable population (the eventual attack rate) more than might be expected on the basis of the change in R.
The illustrated example middleware (referred to without limitation as “Enterprise Middleware”) is a heterogeneous system including components designed based on technologies like JMS, DROOLS, and EJB etc. Throughout this discussion, other components or technologies having corresponding functionality can be substituted for JAVA, DROOLS, TOMCAT, or other specifically-named examples. In Enterprise Middleware system, communication between various components and parallel execution form the foundation of each component. Apart from maintaining the Job lifecycle, Enterprise Middleware also has recovery mechanism embedded in it. Fault tolerance, job overloading, and Cluster resource management are highlights of this middleware solution. Enterprise Middleware provides a simple REST based API (REST is a nonlimiting example), which can be consumed for executing job on cluster. It also provides mechanism by which the output generated at the end of job execution can be transferred to the application server if the environment is setup in such a manner. An example system discussed herein can include at least one of the following components. A. Middleware Service: a. Job Service; b. Callback Service; c. Database Service. B. Job Scheduler (Resource Manager). C. Queue Messaging System: a. Rules Queue (Drools Engine); b. Execution Queue. D. HPC Adapter. E. Cluster: a. Cluster Services; b. HPC Scheduler; c. Heartbeat Services. In some examples, a Database Service exposes REST based API for performing CRUD database operations and APIs for Heartbeat update operations. These API's are called by different components for updating Job status.
In some examples, a job-manager component can have at least one of the following characteristics or functions. Exposes REST based API for consumption by UI layer. Audits logs for middleware objects. Job object and its status details are mentioned in domain package. Exception handling implementation for middleware. Responsible for placing any messages on queue. Resource allocation for processing of jobs belonging to a particular entity. Responsible for exposing methods for consumption by MDB and Callback Service (e.g. to update data library 122 or other databases). Implementing a task throttling system to hold tasks at the source to avoid a large number of tasks from entering the system at once, overwhelming certain components such as the queuing system. Feeding enough to the system to improve throughput while maintaining stability.
In some examples, a messaging component can have at least one of the following characteristics or functions. Listens to messages on the middleware queue. Communicates with business layer and HPC via HPC adapter layer.
In some examples, a rules component can have at least one of the following characteristics or functions. Listens to messages on the rules queue. Business rules are fired by the Rules Service API. Updating the database with job status and task as well as processing messages on Rules Queue.
In some examples, a cluster-services component can have at least one of the following characteristics or functions. REST services are exposed for processing job on cluster for specific tasks. According to the resource allocated, business logic is executed for that particular job.
In some examples, a heartbeat component can have at least one of the following characteristics or functions. Service for polling the cluster statistics like memory usage, processes, and CPU utilization.
In some examples, a scheduling component can have at least one of the following characteristics or functions. Service in this project is responsible for polling job status and informs the callback service present in Job Manager, on job completion.
middleware.url: This property defines which Middleware instance is processing the request. It contains base URL of Job Service, which is used by different components for purposes like—database services, callback services etc.
update.job.endpoint: This property defines the URL endpoint of updating Job object in database. At runtime various components use this property in combination with middleware.url to update the Job status in database.
The service checks whether a Job object exists in Jobs database by comparing its UUID (Universally Unique IDentifier) field. If the Job object doesn't exists then the service assigns the Job object new UUID and then inserts it into the database. If Job is an existing Job object from database, then the service also updates properties of the object before inserting it into database, e.g., the following. resourceName: This property defines on which Cluster the Job will be executed. clusterBaseDir: This property defines the base path on the cluster where the job will create its input and output files.
Referring to
A Rules Queue & Services can have multiple functions. It implements a JMS Queue, a Drools Engine, and an MDB that subscribes to the JMS Queue. The reason it is called Rules Queue is because all the applications using Middleware can add their business rules (flows) for Job object. This component acts as a decision engine, which decides based on Job's parameter what the next logical step of Job should be. Every application creates a Drools file (.drl) and provides it to this Rules engine. Rules Engine at run time fires all the available rules on Job object put on the Rules Queue. The logic to fire rule on Job is placed inside the overridden onMessage( ) method of the Rules MDB. Once a rule matches a Job, then in the Action part of the rule, some of the properties of Job are changed such that it will proceed to the next steps. Then this updated Job is put in database and written onto Execution Queue.
An Execution Queue & Services component is for executing the Job based on the rules applied and sending the job to the Cluster for processing by invoking the Cluster services API, e.g., via an API URL. The Cluster URL is provided by HPC Adapter. Once the job is given to Cluster for further processing, the database update is called with the updated Job status.
<APPLICATION_NAME>: This property defines which class inside the Jar extends the ApplicationServices.java class. At runtime, ClusterServices will try to create object of this class. Based on the “applicationName” attribute of Job object, corresponding jar will be loaded and will try to instantiate its class.
<APPLICATION_NAME>.baseDir: This property defines the base directory where all the files related to a particular application will be created on the HPC.
At runtime, when a Job object is received by Cluster Services for the first time, Cluster Services can inject properties such as the following in the Job object.
cluster.url: This property, like “middleware.url” defines the base URL of the cluster on which the Job was executed. This is used by Recovery mechanism in case due to some error the Job doesn't complete its flow.
check.jobStatus.endpoint: This property defines the name of the API to be called after prepending the value of the property “cluster.url”.
clusterBaseDir: This property is used by application specific properties present in the Jar. Application code needs to create all the file/folder structure it needs to create inside the folder of the value of this property. This helps in keeping files of different Job of Same Application but different instance (aka Same Application Multiple Instance aka SAMI).
An HPC Scheduler component can implement various scheduling techniques or algorithms, e.g., a Scheduled Executor algorithm that executes with fixed delay of time. The HPC Scheduler can notify the Callback Services of status changes of jobs on the cluster. It polls Cluster for jobs of particular user, which is configurable, and sends update to Callback Services if there is any change in status of any of the Job. HPC Scheduler has a retry mechanism, wherein if a Callback Service is unavailable and a Job has status as ‘C’ on cluster then it will retry to send the ‘C’ status of that job again after some time interval. This storage of failed communication of Jobs status is in-memory and is not persisted in database. Also Scheduler has capability of notifying more than one Callback Service.
A Heartbeat Services module can also be a Scheduled Executor or implement another scheduling technique. Heartbeat services can provide information about Cluster like CPU usage, Memory usage, Disk usage, I/O, etc., at every particular interval of time. This information also indicates that the HPC resource is available for use. Resource Manager uses this information while deciding which HPC resource to submit a new Job to. Heartbeat services and HPC Scheduler are bundled together as single war file and deployed in same container as Cluster Services.
1. When any application calls the submitJob( ) API of Job services, Job services inserts the Job in Middleware database by calling insertJob( ) API and returns the Job UUID to application. The Job status is changed to POSTED.
2. Job Scheduler polls Middleware database for Jobs with status as POSTED or RESTART and puts them on Rules queue. Job Scheduler also assigns the Cluster resource to Job before submitting it on Rules Queue.
3. The Rules MDB consumes this Job object and fires the business rule as defined in the .drl files of each application. The Rules services assigns task to Job object, updates the Job object to database and puts the Job on Execution Queue for further processing. Rules service also updates the application with the Job status if Job has application callback service specified.
4. Execution MDB consumes the Job object put on Execution Queue. Execution MDB calls the HPC Adapter to get appropriate Cluster URL based on the Cluster resources allocated to the Job. Once the Cluster URL is obtained, Execution Queue calls the Cluster URL with the Job object for processing. Execution Queue updates the Middleware database with the response Job object received from Cluster. It also updates the Job status to application and puts the Job back on Rules Queue.
5. Cluster services processes the Job and submits a corresponding Job on a cluster for execution. The Cluster job ID it gets in return from HPC Cluster is updated in the Job object and sent back in response.
6. HPC Plugin Services updates Callback Services about status change of Jobs of particular user. Callback services checks if a Job with such Cluster Job id exists or not. If it exists then it updates the status and puts the Job back on Rules Queue.
7. Steps 2 to 6 are executed in loop till no rules are fired for a Job.
The “SIBEL TOMCAT” component, which can be implemented by the tool 128 running on the server 126, can provide backend services (e.g., RESTful services) for managing experiments, cells, analysis, etc. In some implementations, the system uses APACHE TOMCAT® as a container for deploying a web application. The component may deploy REST services (e.g., implemented using ORACLE Jersey) that are called by the user interface component to perform operations. The REST services give calls to a data manager that interacts with the application database to perform operations (e.g., create/read/update/delete, CRUD, operations).
The middleware component interacts with cluster services to execute jobs on the cluster, which may be implemented by the platform 130. The middleware may be a heterogeneous system including components designed based on various technologies (e.g., JMS, DROOLS, EJB, etc.). In some implementations, the middleware component provides a REST-based API that can be utilized for executing jobs on the cluster. In some implementations, the middleware component provides a mechanism by which the output generated at the end of job execution is transferred to a user application server if the environment is configured to allow such a transfer.
The cluster component receives job requests, sends the jobs to be performed, and extracts output of data. A cluster represents a set of one or more nodes where execution of jobs takes place. In some implementations, the cluster component may be implemented as a Torque implementation of Portable Batch System (PBS). In some implementations, a master node chooses one or more nodes to which a job is to be assigned, and may provide an API to check the status of a job.
Various implementations may include features designed to improve the stability and robustness of the system. For example, a task throttling system may hold tasks at the source to avoid a large number of tasks from entering the system at once, overwhelming certain components such as the queuing system. Instead, enough tasks are fed to the system to ensure increased throughput while maintain stability. Some implementations may additionally or alternatively include features to improve the interactivity and flexibility of the system. For example, some implementations allow analysts to view results quickly (e.g., immediately) and interactively, greatly speeding up the interpretation of results. In some implementations, multiple epidemiology applications may be combined into an integrated system and/or adding new areas may be automated, which may reduce or eliminate the use of error-prone, time-consuming manual addition of new areas.
In some implementations, an experiment component may be provided (e.g., within the SIBEL TOMCAT component) that allows the users to define various parameters of experiments and obtain information regarding the experiments. According to various implementations, the experiment component may provide features described herein. Additionally or alternatively, the experiment component may provide one or more of the below features.
A cell display may be provided in which different cells (experiment runs), e.g., defined according to different parameters, may be illustrated. The cell display may provide a view of differentiating factors for each cell up-front instead of comparing the parameters for each cell. In some such implementations, details of parameters of the cell can be seen by selecting a “view cell details” link.
In some implementations, an analysis component may be provided (e.g., within the SIBEL TOMCAT component) that defines various aspects of the performed analysis and/or generates output representative of the results of the simulation. According to various implementations, the analysis component may provide features described herein, or one or more of the following features. In some examples, the analysis may be created on the fly without having to submit analysis jobs on the cluster. This may reduce the time to visualize an analysis. In some examples, interactive analysis curves or other visual (e.g., textual, graphical, etc.) output data may be generated. Rather than, or in addition to, a static output image, users may change various filters/parameters, and the output data may be dynamically modified based on the changes. Examples are discussed herein, e.g., with reference to blocks 524, 904, or 1110. In some implementations, a subpopulation selection may be provided as one of the filter options, such that a user can selectively apply or remove particular subpopulations from the output results and dynamically view the modifications to the output. In some implementations, an analysis listing/summary page may provide a preview of what the output data looks like so the user can view the analysis list and obtain some information without selecting a particular analysis and viewing a detailed page for that analysis.
Analysis output data may be saved to a single output file, such as a spreadsheet (e.g., in CSV/TSV format), which can be used as input for further processing.
An up-front display of the experiment status may be provided such that, when the experiment is running, an experiment status indicator is updated on the user interface to keep the user informed as to the progress of the experiment. In some implementations, the status may include a progress bar that informs the user of a current state of the experiment, how much of the experiment is completed, how much of the experiment remains, time elapsed since the beginning of the experiment, estimated time to completion of the experiment, etc.
A cell display may be provided in which different cells (e.g., defined according to different parameters, such as efficacy levels, levels of compliance with treatment protocols, etc.) may be illustrated and, in some implementations, the cell display may provide a view of differentiating factors for each cell up-front instead of comparing the parameters for each cell. In some such implementations, details of parameters of the cell can be seen by selecting a “view cell details” link.
Disease models (e.g.,
Transmissibility—Transmissibility is a function of contact duration and contact frequency calibrated to yield specific attack rate in population. Disease severity differs for different strains causing disease and is marked by increase in Transmissibility values. Also temporal variations in transmissibility require definition of different disease models for a particular disease under study. Transmissibility is often ≦0.0001, though this is not required.
Incubation Period Probability—The time period between exposure to the infectious agent and detection of the first signs or symptoms in an individual in population is defined as the Incubation period. The period may be as short as minutes to as long as thirty years depending upon the nature of the exposed pathogen. Incubation period is specific for every disease. The Incubation Period probability values are defined per day as probability that a person from a population becomes exposed and harbors the latent contagious pathogen on that day after exposure to the pathogen. The number of days for which probability values are provided can also be entered as a parameter.
Infectious Period Probability—The time period during which infected entities are able to transmit infection to any susceptible host or vector they come in contact is defined as the Infectious period. Both Symptomatic and asymptomatic individuals can be the possible source of infection dissemination in the population. Infectious Period Probabilities can be defined for each day of infectious period. This is probability that infected from a population are capable of transmitting infection to any other susceptible when in contact. The number of days for which probability values are provided can also be entered as a parameter.
Symptomatic Proportion: what percentage of the infected population exhibits symptoms.
Computer-generated models are frequently used to replicate various real-life scenarios. Such models, for example, may be used to model traffic congestion in a particular area during a particular time of day. Using these models, researchers can estimate the effect that a change in certain variables related to the models may have on the outcome of the scenarios being replicated. Example scenarios can include events as described herein, e.g., epidemics or other occurrences having consequences that may occur over the course of the event.
Computer models may be limited in their usefulness by various factors, including the availability of information with which to construct the network underlying the model. Social contact networks are a type of network representing interactions between entities within a population. Large-scale social contact networks may be particularly complicated to model because of the difficulty in collecting reliable data regarding entities and social contacts within the population. Some social contact network models have addressed this difficulty by utilizing only small data sets in constructing the social contact network. In some types of network models (e.g., the Internet, the power grid, etc.), where the real network structure is not easily available due to commercial and security concerns, methods have been developed to infer the network structure by indirect measurements. However, such methods may not apply to large-scale social contact networks (e.g., large heterogeneous urban populations) because of the variety of information sources needed to build them.
Accordingly, various examples include a complex situation analysis system that generates a social contact network, uses edge brokers and service brokers, and dynamically adds brokers. An example system for generating a representation of a situation is disclosed. The example system comprises one or more computer-readable media including computer-executable instructions that are executable by one or more processors to implement an example method of generating a representation of a situation. The example method comprises receiving input data regarding a target population. The example method further comprises constructing a synthetic data set including a synthetic population based on the input data. The synthetic population includes a plurality of synthetic entities. In some examples, each synthetic entity has a one-to-one correspondence with an entity in the target population, although this is not required. In some examples, each synthetic entity is assigned one or more attributes based on information included in the input data. The example method further comprises receiving activity data for a plurality of entities in the target population.
In some examples, the example method further comprises generating activity schedules for each synthetic entity in the synthetic population. Each synthetic entity is assigned at least one activity schedule based on the attributes assigned to the synthetic entity and information included in the activity data. An activity schedule describes the activities of the synthetic entity and includes a location associated with each activity. The example method can further comprise receiving additional data relevant to the situation being represented. The additional data is received from at least two distinct information sources. The example method can further comprise modifying the synthetic data set based on the additional data. Modifying the synthetic data set includes integrating at least a portion of the additional data received from each of the at least two distinct information sources into the synthetic data set based on one or more behavioral theories related to the synthetic population. The example method can further comprise generating a social contact network, e.g., social-interaction graph, based on the synthetic data set. The social contact network can be used to generate the representation of the situation.
Referring generally to
According to various embodiments, system 102 may be implemented as software (e.g., computer-executable instructions stored on one or more computer-readable media) that may be executed by one or more computing systems. System 102 may be implemented across one or more high-performance computing (“HPC”) systems (e.g., a group of two or more computing systems arranged or connected in a cluster to provide increased computing power). In some embodiments, system 102 may be implemented on HPC architectures including 20,000 to 100,000 or more core systems. System 102 may be implemented on wide-area network based distributed computing resources, such as the TeraGrid or the cloud. In further embodiments, one or more components of system 102 may be accessible via mobile communication devices (e.g., cellular phones, PDAs, smartphones, etc.). In such embodiments, the mobile communication devices may be location-aware and one or more components of system 102 may utilize the location of the digital device in creating the desired situation representation.
In the example embodiment of
Surveillance subsystem 106 is configured to collect and process sensor and/or surveillance information from a variety of information sources (e.g., surveillance data, simulations, expert opinions, etc.) for use in creating and/or modifying the synthetic data set. The data may be received from both proprietary (e.g., commercial databases, such as those provided by Dun & Bradstreet) and publicly available sources (e.g., government databases, such as the National Household Travel Survey provided by the Bureau of Transportation Statistics or databases provided by the National Center for Education Statistics). Surveillance subsystem 106 may be used to integrate and/or classify data received from diverse information sources (e.g., by the use of voting schemes). Standard classification schemes used in machine learning and statistics (e.g., Bayes classifiers, classification and regression trees, principal components analysis, support vector machines, clustering, etc.) may be used by surveillance subsystem 106 depending on the desired application. In some embodiments, surveillance subsystem 106 may allow the flexibility to utilize new techniques developed for a specific application. The data collected and processed by surveillance subsystem 106 may be used by synthetic data set subsystem 104 and/or other subsystems of system 102 to create, modify, and/or manipulate the synthetic data set and, accordingly, the situation representation. Synthetic data set subsystem 104 may in turn provide cues to surveillance subsystem 106 for use in orienting surveillance and determining what data should be obtained and/or how the data should be processed.
Decision analysis subsystem 108 is configured to analyze various possible courses of action and support context-based decision making based on the synthetic data set, social contact network and/or situation representation created by synthetic data set subsystem 104. Decision analysis subsystem 108 may be used to define a scenario and design an experiment based on various alternatives that the user wishes to study. The experiment design is utilized by the other subsystems of system 102, including synthetic data set subsystem 104, to build and/or modify the synthetic data set (including, e.g., the synthetic population) and construct the social contact network used to represent the situation. Decision analysis subsystem 108 uses information related to the synthetic data set and/or situation representation received from synthetic data set subsystem 104 to support decision making and analysis of different possible courses of action. Experiment design, decision making, analysis of alternatives, and/or other functions of decision analysis subsystem 108 may be performed in an automated fashion or based on interaction with and input from one or more users of system 102.
In some embodiments, various subsystems of system 102 may utilize one or more case-specific models provided by case modeling subsystem 110. Case modeling subsystem 110 is configured to provide models and/or algorithms based upon the scenario at issue as defined by decision analysis subsystem 108. According to various embodiments, example case models may be related to public health (e.g., epidemiology), economics (e.g., commodity markets), computing networks (e.g., packet switched telecommunication networks), civil infrastructures (e.g., transportation), and other areas. In some embodiments, portions of multiple case models may be used in combination depending on the situation the user desires to represent.
At block 204, synthetic data set subsystem 104 receives the unstructured data, provides context to the data, and creates and/or modifies a synthetic data set, including a synthetic population data set, and constructs a social contact network used to form the desired situation representation. Synthetic data set subsystem 104 may provide context to the unstructured data using various modules that may be based on, for example, properties of the individuals or entities that comprise the synthetic population, previously known goals and/or activities of the members of the synthetic population, theories regarding the expected behavior of the synthetic population members, known interactions between the synthetic population members, etc. In some embodiments, unstructured data obtained from multiple sources may be misaligned or noisy and synthetic data set subsystem 104 may be configured to use one or more behavioral or social theories to combine the unstructured data into the synthetic data set. In various embodiments, synthetic data set subsystem 104 may be configured to contextualize information from at least ten distinct information sources. Synthetic data set subsystem 104 may be configured to construct multi-theory networks, such that synthetic data set subsystem 104 includes multiple behavioral rules that may be utilized by various components of synthetic data set subsystem 104 to construct and/or modify the synthetic data set depending on the situation being represented and the types of interactions involved (e.g., driving behavior, disease manifestation behavior, wireless device use behavior, etc.). Synthetic data set subsystem 104 may also be configured to construct multi-level networks, such that separate types of social contact networks (e.g., transportation networks, communications networks) may be created that relate to distinct types of interactions but are coupled through common synthetic entities and groups. Because context is provided to the unstructured information through the use of behavioral theories and other factors, in some embodiments synthetic data set subsystem 104 may be configured to incorporate information from new data sets into the synthetic data set as they become available for use by system 102. For example, synthetic data set subsystem 104 may be configured to incorporate usage data regarding new wireless communication devices.
Once context has been provided to the unstructured data, the relevant data is integrated into the synthetic data set, which is provided by situational awareness module 104 at block 206. According to various embodiments, the synthetic data set provided at block 206 may be modified (e.g., iteratively) to incorporate further data from surveillance subsystem 106, for example based on experiment features or decisions provided by decision analysis subsystem 108. As further questions are posed via decision analysis subsystem 108 and further data is integrated into the synthetic data set, system 102 may require fewer computing resources to produce a desired situation representation. In some embodiments, the synthetic information resource may be stored or preserved and utilized (e.g., by the same or a different user of system 102) to form representations of other (e.g., similar) situations. In such embodiments, fewer computing resources may be required to create the newly desired situation representation as one or more types of information needed to create the representation may already be incorporated into the previously created synthetic data set.
Synthetic data set subsystem 104 uses the input data to construct a synthetic population based on the received input data (step 224). The synthetic population includes a plurality of interacting synthetic entities, which may be living organisms (e.g., humans, animals, insects, plants, etc.) and/or inanimate objects (e.g., vehicles, wireless communication devices, infrastructure elements, etc.). In some embodiments, the synthetic population may model all entities within an area (e.g., geographic area) of interest, such that each synthetic entity in the synthetic population represents an actual entity in the location (e.g., geographic location) of interest. The synthetic entities may be assigned characteristics based on information reflected in the input data. In the example noted above, wherein the synthetic entities represent human beings and the input data is data from the U.S. Census, the demographic data reflected in the U.S. Census may be used to generate the synthetic population (e.g., age, income level, etc.).
The synthetic entities may also be placed in one or more blocks or groups with other synthetic entities. For example, synthetic entities representing human beings may be placed in households with other synthetic entities based on the census data. The households may be placed geographically in such a way that the synthetic population reflects the same statistical properties as the underlying census data (i.e., the synthetic population is statistically indistinguishable from the census data). Because the synthetic population is composed of synthetic entities created using census demographic data and not actual entities or individuals, the privacy and security of the actual entities within the population of interest can be protected. In other embodiments, the synthetic entities may be grouped into other types of synthetic blocks or groups based on characteristics other than household membership (e.g., genus, species, device type, infrastructure type, etc.). In some embodiments, a synthetic data set may not previously exist and synthetic data set subsystem 104 may create a new synthetic data set including the constructed synthetic population. In other embodiments, a previously existing synthetic data set may be modified to include part or all of the created synthetic population.
System 102 may also obtain or receive a set of activity or event templates including activity data for entities or groups of entities in the target population (step 226). For example, activity templates related to a human population may include activity data for households in the geographic area of interest. The activity templates may be based on information from one or more sources, such as travel surveys collected by the government, marketing surveys (e.g., proprietary surveys conducted by marketing agencies), digital device tracking data (e.g., cellular telephone or wireless communication device usage information), and/or other sources. The activity data may be collected and processed by surveillance subsystem 106 and used by synthetic data set subsystem 104 to construct or modify a social contact network based on the synthetic population. In some embodiments, data may be collected from multiple sources, which may or may not be configured to be compatible with one another, and surveillance subsystem 106 and/or synthetic data set subsystem 104 may be configured to combine and process the data in a way that may be used by synthetic data set subsystem 104 to create and/or modify the synthetic data set. The activity templates may describe daily activities of the inhabitants of the household and may be based on one or more information sources such as activity or time-use surveys. The activity templates may also include data regarding the times at which the various daily activities are performed, priority levels of the activities, preferences regarding how the entity travels to the activity location (e.g., vehicle preference), possible locations for the activity, etc. In some embodiments, an activity template may describe the activities of each full day (i.e., 24 hours) for each inhabitant of the associated household in minute-by-minute or second-by-second detail.
Once the activity templates are received, synthetic data set subsystem 104 matches each synthetic group (e.g., household) with one of the survey groups (e.g., survey households) associated with the activity templates (step 228). The synthetic groups may be matched with survey groups (e.g., using a decision tree) based on information (e.g., demographic information) contained in the input data (e.g., census data) and information from the activity surveys (e.g., number of workers in the household, number of children in the household, ages of inhabitants, etc.). Synthetic data set subsystem 104 then assigns each synthetic group the activity template of its matching survey group.
Once activity templates have been assigned to each synthetic group, a location is assigned for each synthetic group and each activity reflected in the synthetic group's activity template (step 230). The locations may be assigned based on observed land-use patterns, tax data, employment data, and/or other types of data. Locations may be assigned in part based on an identity or purpose of the activity, which, in the example where the synthetic population represents a human population, may include home, work, and school or college, shopping, and/or other identities. Locations for the activities may be chosen using data from a variety of databases, including commercial and/or public databases such as those from Dun & Bradstreet (e.g., for work, retail, and recreation locations) and the National Center for Educational Statistics (e.g., for school and college locations). In some embodiments, the locations may be calibrated against observed travel-time distributions for the relevant geographic area. For example, travel time data in the National Household Travel Survey may be used to calibrate locations. Once locations for each activity have been determined, an activity schedule is generated for each synthetic entity describing the activities of the synthetic entity, including times and locations (step 232). The activity templates and/or activity schedule may be based in part on the experiment and/or desired situation representation. The synthetic data set may be modified to include the activity schedules, including locations.
In some embodiments, system 102 may be configured to receive further data based on the desired situation representation (step 234). Referring to the example above, if the desired situation representation is related to spread of an illness in Illinois, the further data may include information regarding what areas of Illinois have recorded infections, what the level of infection is in those areas, etc. The received further data may be used to modify, or add information to, the synthetic data set (step 236). In various embodiments, steps 234 and 236 may be repeated one or more times (e.g., iteratively) to integrate additional information that is relevant to the desired situation representation into the synthetic data set. At step 238, a social contact network (e.g., represented as a graph) may be created based on the entities and interactions reflected in the synthetic data set. The resultant social contact network can be used to model the desired situation representation such that appropriate decisions can be made using decision analysis subsystem 108.
In the example shown in
Activity generator 264 then uses synthetic population 260 and traveler survey data 262 to form activity schedules 266 for each of the synthetic entities in the synthetic population. Traveler survey data 262 may include surveys conducted by government entities and may include activity participation and travel data for all members of households in the target area. In other embodiments, activity generator 264 may use other data, such as marketing surveys (e.g., commercial surveys conducted by marketing firms), digital device tracking data (e.g., usage data regarding wireless communication devices), and other information to create activity schedules 266. In some embodiments, activity generator 264 may also utilize location information to construct activity schedules 266, such as locations of activities (e.g., including land use and/or employment information). The location information may be included as part of census data 256, traveler survey 262, or one or more other data sources. In various embodiments, activity schedules 266 may be assigned to synthetic entities based on synthetic groups to which the synthetic entities belong. Activity generator 264 is also configured to assign a location to each activity in each activity schedule 266. Locations may be assigned using various methods. One method is to utilize a distance-based distribution that accounts for the reduction in likelihood that an activity location is accurate the further away from an anchor location (e.g., home, work, school, etc.) it is. Locations may be assigned using an iterative process, wherein locations are assigned to activities and compared to the activity time data in the relevant activity schedule 266 to determine if the time needed to travel between locations matches time data reflected in the activity schedule 266. If not, locations may be reassigned iteratively until the time data matches. Synthetic population 260 and activity schedules 266 may be integrated as part of a synthetic data set.
Additional modules are provided in
Traffic simulator 276 is configured to use information from vehicle data 272, traveler plans 278, transit data 268, and transportation network 274 to generate a traffic simulation 284 (e.g., a time-dependent simulation of traffic for the relevant geographic area). Traffic simulation 284 may simulate the flow of traffic over the entire range of times reflected in activity surveys 266 or a portion of the time range. In one embodiment, traffic simulator 276 may be configured to simulate traffic on a second-by-second basis. Traffic simulator 276 is configured to generate traffic simulation 284 based on the detailed travel routes reflected in traveler plans 278, which in turn are based in part on activity schedules 266, such that traffic simulation 284 simulates traffic conditions based on transit patterns related to the activities of each synthetic individual reflected in activity schedules 266. Traffic simulator 276 may be configured to check the generated traffic simulation 284 against transit information from transit data 268 and/or transportation network 274 to determine the reasonableness and/or accuracy of the simulation. For example, traffic simulator 276 may check the amount of traffic in a particular area at a particular time reflected in traffic simulation 284 against traffic count information received from transportation network 274. If the values produced using the simulation are not comparable to the corresponding traffic counts for the relevant area, route planner 270 may be configured to generate a different set of traveler plans 278. In one embodiment, the traveler plan generation and traffic simulation process may be repeated until the traffic simulation 284 corresponds to the information from transit data 268 and transportation network 274 within a given (e.g., user-specified) tolerance.
Session generation module 287 is configured to generate a time and location-based representation of demand for spectrum. Session generation module 287 is configured to receive session input data 286 and utilize the input data, together with the synthetic data set created by the example embodiment shown in
Market simulation module 291 is configured to utilize the generated spectrum demand simulation 288 to determine a proposed spectrum license allocation 292. Market simulation module 291 may receive input data from clearing data 289. Clearing data 289 may include market clearing mechanism data describing the market clearing mechanism(s) (e.g., auction, Dutch auction, ascending bid auction, etc.) used by the supplier to allocate spectrum. Clearing data 289 may also include physical clearing mechanism data describing any physical clearing mechanisms used to address physical limitations to spectrum allocation (e.g., frequency interference between adjacent cells). Market simulation module 291 may also receive information from market rules data 290. Market rules data 290 may include information regarding requirements of one or both of the supplier(s) (e.g., the FCC) and the service provider(s) (e.g., cellular voice and data service providers, radio stations, television stations, etc.) regarding the use of the spectrum. Market simulation module 291 may utilize the spectrum demand simulation 288, clearing data 289, and market rules data 290 to generate a proposed spectrum license allocation 292 that allocates the available spectrum in an efficient manner.
Management module 305 is configured to manage the flow of information in synthetic data set subsystem 104 and organize the construction of a synthetic data set for use in creating a desired situation representation. In various embodiments, the use of management module 305 and/or other components of system 102 may be based on the use of service-oriented architectures. Service-oriented architectures provide a flexible set of services that may be used by multiple different kinds of components and applications. Service-oriented architectures allow different components of system 102 to publish their services to other components and applications. The use of service-oriented architectures may provide for improved software reuse and/or scalability of system 102.
In the illustrated example embodiment, management module 305 controls the flow of information through the use of different types of brokers. Brokers are software modules, or agents, that operate with a specific purpose or intent. In some embodiments, the brokers may be algorithmic (i.e., implemented as high level abstractions rather than as ad hoc constructions that are used in grid-based computing systems). The two primary types of brokers utilized to manage the flow of information are edge brokers 345 and service brokers 350. Edge brokers 345 mediate access to a particular resource (e.g., simulation, data, service, etc.) so that resources need not communicate directly with one another. Service brokers 350 receive high-level requests (e.g., a request for data) and spawn any edge brokers 345 needed to service the requests. If information is required to fulfill a request that is not immediately available to an edge broker 345 (e.g., results of a simulation, data from another database, etc.), a new service broker 350 may be spawned to produce the required information. Multiple service brokers 350 may collaborate to solve a larger problem requiring the utilization of a variety of resources. In some embodiments, service brokers 350 may also provide a resource discovery function, locating resources needed to fulfill a request (e.g., data, resources, models or simulations, etc.).
In various embodiments, brokers may be used to solve a problem or access resources that span across many organizations and locations. If all communication occurs between brokers rather than directly between services, users need not have knowledge of the entire problem being addressed or be aware of or have access to all resources needed to solve the problem. In some embodiments, by using a trusted third party to host the computation, one user or organization may provide a proprietary model that uses proprietary data from a second party without either organization needing to have a trust relationship with the other.
Edge brokers 345 and service brokers 350 may have a number of components. Both edge brokers 345 and service brokers 350 may have an information exchange on which data and requests may be placed for sharing with other brokers and/or applications. An information exchange accepts requests for service and offers the service. If a preexisting edge broker 345 is capable of fulfilling the request, that edge broker 345 may offer to fulfill the request and may be selected by the information exchange. If no preexisting edge broker 345 offers to fulfill the request, one or more new brokers may be spawned to fulfill the request. The spawned, or child, broker (e.g., an edge broker) obtains specifications for the required information from the information exchange of the parent broker (e.g., a service broker), and returns results by writing to the parent broker's information exchange. The information exchange of an edge broker 345 allows data and requests to be shared among all applications served by the edge broker 345. The information exchange of a service broker 350 may be shared among all edge brokers 345 connected to the service broker 350, such that all connected edge brokers 345 can directly share information via the information exchange of service broker 350.
Edge brokers 345 may also have additional components. Edge brokers 345 may have an edge broker interface that provides a universal interface for querying and using the services and/or applications that are made available through the edge brokers 345. Edge brokers 345 may also have a service wrapper that allows legacy applications to be used within the framework of management module 305 by taking requests from the information exchange, formatting them in a way that the application can understand, requesting computational resources, running the application using the resources, gathering the results of the application, and making the results available on the information exchange. Edge brokers 345 may further include a service translator that allows applications that are not able to access the information exchange to be used within the framework of management module 305 by translating requests from the information exchange into service calls and placing the results of the service calls on the information exchange. Further, edge brokers 345 may include one or more user interfaces configured to provide direct access (e.g., user access) to the applications served by the broker. The user interfaces may be specific to the purpose of the broker or associated applications. In some embodiments, user interfaces may be provided for some edge brokers 345 and not provided for others.
In addition to the simulation results provided by edge broker 408, service broker 406 determines that additional data is needed to complete the request. In some embodiments, management module 305 may include coordination brokers that may spawn one or more service brokers and provide even higher-level coordination than service brokers for fulfilling requests. In the example shown in
According to different embodiments, communication can be performed in different ways, depending on the performance needed and the quantity of data to be exchanged. In one embodiment, exchange of data can be mediated completely through levels of brokers, following the interaction paths shown in the examples above. If higher performance is needed, edge brokers connected to the same service broker may be allowed to directly access the service brokers information exchange, allowing data to be placed on or retrieved from the information exchange with no intermediate steps. If higher performance yet is desired, a service address may be communicated between two components and the components may use the service to directly exchange data. The service may be a web service, a communication protocol such as HTTP or FTP, a specialized protocol designed to transfer large amounts of data, or another type of service. The components may use the service to negotiate a communication protocol that they both understand.
Referring back to
Data broker 355 may include a request component that provides a user interface that can be used to interact with management module 305 data. In one embodiment, the user interface is a graphical user interface provided in a web browser that allows a user to browse, select, modify, and store data. Input may be provided via a form (e.g., an HTML form) submitted via the web browser, and output may include forms submitted back to the user via the web browser and requests submitted to a data service component of data broker 355, discussed below, via the information exchange of data broker 355.
Data broker 355 may also include a data service component that serves as a database-type-specific manager for management module 305 data. The data service component may service both database-independent and database-specific requests. Each data broker 355 may require a separate data service component for each type of database being serviced by the data broker 355. For example, if a data broker 355 is configured to service both relational databases and XML repositories, the data broker may require at least two separate data service component instances. The data service component may receive requests for data, metadata, data updates, etc. and provide response submissions, requested data, metadata, data modifications, etc. Output data may be placed in a database table, placed in a URL, provided directly to a user's web browser, or stored and/or communicated in another way.
Management module 305 may also include one or more data set construction brokers 360 configured to construct and manage input data sets used by management module 305. Data set construction may include at least three phases: (1) identifying data for extraction/modification, (2) for selected data, performing data set-specific construction operations and extracting subsets of the selected data, and (3) for selected data, outputting resultant data sets. The first two phases may be generally applicable to all tasks addressed by data set construction broker 360. In some embodiments, the third phase may be application-specific and may be determined at least in part based on the needs of the desired application.
In some embodiments, data set construction broker 360 may provide interactive and automated capabilities in which new behavior can be acquired by recording and abstracting sequences of interactive operations. First, users may interactively explore available data, extract data, create or modify data operations, develop chained operation sequences, save result data subsets for future use, and/or perform other tasks. Further, scripts may be selected from a catalogued library, automating the data set creation process. Additionally, an automated template generation component may be activated whereby sequences of interactive operations are recorded, aggregated into scripts, parameterized for more general use, and catalogued in a library.
Data set construction broker 360 may include a request component through which a user may interact with and/or manipulate management module 305 input data sets. The request component of data set construction broker 360 may share properties similar to that of data broker 355 (e.g., web browser interface). The request component may also include subcomponents such as a database request subcomponent, a broker-specific request subcomponent, a script request subcomponent, and a data extraction request subcomponent. The database request subcomponent is configured to provide an interface to guide a user through building database-independent requests for data and/or data updates. In some embodiments, the database request subcomponent may utilize database metadata provided through a web browser interface to build the requests. The broker-specific subcomponent is configured to provide data set-specific user interfaces for data set construction (e.g., customized based on the input data, such as transportation-related data, epidemic-related data, etc.). The script request subcomponent is configured to provide control of generation and parameterization of data set construction scripts. The data extraction request subcomponent is configured to work with other subcomponents to facilitate generation of chained sequences of database operations to construct a management module 305 input data set. Data set construction broker 360 may also include a core service component, including subcomponents (e.g., database service, broker-specific service, script service, or data extraction service) directed to processing requests received from the subcomponents of the request component of data set construction broker 360.
Management module 305 may further include one or more entity brokers 365 configured to assist in the creation and modification of the synthetic population. Entity broker 365 functions as an edge broker for accessing services of population construction module 310. Entity broker 365 has knowledge of and access to the services of population construction module 310 and publishes those services on its information exchange. Entity broker 365 includes the same components of an edge broker (e.g., information exchange, interface, service translator, service wrapper, etc.) and may also include specialized components for managing interactions between management module 305 and population construction module 310. Greater detail regarding population construction and modification is provided below with reference to the components of population construction module 310.
Management module 305 may include further specialized brokers as needed to perform various functions of management module 305. In various embodiments, management module 305 may include one or more model brokers 370 configured to provide access to models and simulations, one or more resource brokers 375 configured to manage requests for computational resources, and/or one or more security brokers 380 configured to provide security (e.g., authentication and authorization) services within management module 305.
Population construction module 310 is configured to construct and/or modify the synthetic population used by management module 305, network construction module 315, and/or other components of synthetic data set subsystem 104 to create the desired situation representation. The synthetic population includes synthetic entities that may represent entities in a real geographic area (e.g., the United States) or a virtual universe. Each synthetic entity has a set of characteristics or attributes that may be assigned based on information from one or more input data sets (e.g., the U.S. Census). Each synthetic entity may be assigned to one or more subpopulations of the synthetic population (e.g., military unit, factory workers for a specific factory, students or teachers at a specific school, etc.). Further, each synthetic entity may be associated with a sequence of actions that may define what the actions are and where and when the actions occur. The interactions between synthetic entities in the synthetic population may be based at least in part on the activity sequences of the synthetic entities. Population construction module 310 receives requests from management module 305 and responds to the requests through one or more entity brokers. Population construction module 310 may also utilize external data (e.g., received from surveillance subsystem 106) and/or information about the experiment or desired situation representation (e.g., received from management module 305 and/or decision analysis subsystem 108) in constructing and modifying the synthetic population. In one embodiment, all information required to generate the synthetic population may be collected via entity brokers.
Population construction module 310 may include several component modules. Population generation module 320 is configured to generate the synthetic population for use in constructing the desired situation representation. Population generation module 320 may be configured to construct the synthetic population by performing steps shown in
Population generation module 320 may also assign activity templates and generate activity schedules in a manner similar to that described above with respect to
Population editing module 325 is configured to modify and/or add information about synthetic entities in the synthetic population. Requests for modification may be made by management module 305 and conveyed to population editing module 325 by an entity broker. Based on a request, population editing module 325 may select one or more entities or groups from the synthetic population and add or modify attributes of the selected entities or groups. Population editing module 325 may utilize external data and/or scenario information in interpreting the requests and/or modifying the attributes.
Subpopulation module 330 is configured to define subpopulations from the synthetic population and apply modifications to the subpopulations. In some embodiments, synthetic entities may be members of multiple subpopulations. Subpopulation module 330 receives requests for creation or modification of subpopulations from management module 305 via an entity broker and generates a modification plan (e.g., sets of modifications to action sequences, attributes, etc.) that can be executed by management module 305, population construction module 310, and/or other modules of synthetic data set subsystem 104. Scenario information and/or external data may be used to process subpopulation requests and/or produce the modification plan.
In one embodiment, subpopulation module 330 may be configured to modify action sequences associated with one or more subpopulations of synthetic entities. The subpopulation to be modified may be based on a function of the demographics or attributes associated with the synthetic population and/or external data that is specific to the scenario being studied. Demographics may include, for example, income, home location, worker status, susceptibility to disease, etc. Examples of external data may include the probability that entities of a certain demographic class take airline trips or whether a specific plot of land has been sprayed with a pesticide. Once the subpopulation to be modified is identified, replacement activity sequences are identified for the subpopulation. The selected replacement activity sequences may be identified from a set of possible replacement activity sequences based on external data and/or information regarding the scenario being studied. Replacement activity sequences may include activities performed in a city other than a home city, military assignments, withdrawal to home during a pandemic, or other activities. In some embodiments, subpopulation module 330 may be configured to define multiple representations of one or more synthetic entities (e.g., having different attributes and/or activity sequences) and to determine which representation to select based on the external data and/or scenario information.
If the request is an entity request, or a request for a service provided by population construction module 310, it is determined whether the synthetic population and/or synthetic entity associated with the request already exists (step 535). If not, population generation module 320 generates the synthetic population and/or synthetic entity (step 540) and proceeds to step 545. If the synthetic population and/or synthetic entity already exists, process 500 proceeds to step 545. At step 545, it is determined whether the request is to modify the synthetic population. If the request does not include modifying the synthetic population, the desired information about the population is provided and formatted (step 550) and presented to management module 305 (step 530). If the request includes modifying the synthetic population, it is determined whether the creation or modification of a subpopulation has been requested (step 555). If not, population editing module 325 makes any requested changes or additions to the attributes of one or more of the synthetic entities of the synthetic population (step 560), and the entity broker formats the results (step 550) and posts the results to management module 305 (step 530). If the request includes creating or modifying a subpopulation, subpopulation module 330 performs the request subpopulation creation/modification (step 570), and the entity broker formats the results (step 550) and posts the results to management module 305 (step 530).
Referring again to
In one example embodiment, the situation being represented may relate to determining participation in a cellular phone connection. The vertices of the resulting graph may represent people, locations, and cellular towers. Edges may connect all vertices representing people on a particular cellular phone call, locations of those people, and cellular towers involved in the call.
Network analysis module 340 is configured to compute structural measurements on the graphs generated by network generation module 335. Types of measurement methods may include degree distribution, RO-distribution, shortest path distribution, shattering, expansion, betweenness, etc. The measurements performed by network analysis module 340 provide quantitative methods to compare different graphs and, accordingly, different situation representations (e.g., corresponding to different decisions and/or different action choices presented in decision analysis subsystem 108). The measurements may require less computational power than performing a complete simulation and may allow a more efficient understanding of the dynamics of the situation being represented. The measurements performed by network analysis module 340 may be used (e.g., in combination with features of other components of system 102 in some embodiments) to infer statistical and protocol level interactions, rank various (e.g., user-defined) policies in an order, and/or infer any inherent uncertainty in the output.
Intervention field 622 allows the user to select from one or more available intervention methods to define the methods that are enabled in the experiment. Intervention tabs 624 include tabs for each selected intervention method. In one embodiment, tabs may be displayed for all available intervention methods but only the tabs selected in intervention field 622 may be active. In the displayed example embodiment, the vaccination intervention tab has been selected and a vaccination menu is displayed. The vaccination menu includes a subpopulation field 626 that may be used to select some or all of the subpopulations defined by subpopulation module 330 to receive the defined vaccination intervention. Compliance field 628 allows the user to specify parameters regarding compliance of the selected subpopulation(s) in obtaining vaccinations (e.g., percent of selected entities that obtain vaccination, initial vaccination percentage, final vaccination percentage, etc.). Trigger field 630 allows the user to specify when the vaccination intervention is triggered in the experiment (e.g., the day of the experiment on which the vaccination is provided to the selected subpopulation(s)). Efficacy field 632 permits the user to define how effective the vaccine is in fighting the disease (e.g., percent of selected population for which the vaccine is effective, initial effectiveness, final effectiveness, etc.).
User interface 600 is only one possible interface that may be provided by system 102. A wide variety of options and information may be provided to the user based on the type of experiment being conducted. The user interfaces presented to the user may be modified to include different and/or additional information and options based on the models in case modeling subsystem 110. In some embodiments, users may be permitted to select the level of detail with which to specify the parameters of the experiment (e.g., permit system 102 to define certain parameters of the experiment using default values). Other example user interfaces and components thereof are described herein, e.g., with reference to
Various examples include one or more of, including any combination of any number of, the following example features. Throughout these clauses, parenthetical remarks are for example and explanation, and are not limiting. Parenthetical remarks given in this Example Clauses section with respect to specific language apply to corresponding language throughout this section, unless otherwise indicated.
A: A method comprising (e.g., under control of a processing unit): receiving attributes of a synthetic population (e.g., as or as part of a query, e.g., from a front end); selecting a synthetic-population graph from a data library based at least in part on the attributes, wherein the synthetic-population graph comprises nodes and labeled edges between the nodes (e.g., at least one labeled edge; labels can include, e.g., locations or other items shown in
B: The method according to paragraph A, wherein the epidemic estimate comprises at least one of: a curve indicating a number of the nodes marked as infected over the course of the simulation (e.g., an epicurve); an R curve indicating a reproductive number (estimated or actual) of the epidemic over the course of the simulation; a curve indicating a slope of any of the above-described curves as a function of simulation time; an estimated generation time of the epidemic; or an estimated growth rate of the epidemic.
C: The method according to paragraph B, further comprising: receiving second attributes of a second synthetic population (e.g., a subset of the synthetic population); and determining a second epidemic estimate (e.g., results for a subset of the synthetic population that was simulated) based at least in part on the second attributes and the synthetic-population graph.
D: The method according to any of paragraphs A-C, further comprising: determining the epidemic estimate further based at least in part on a first randomization value; and simulating the course of the epidemic in the synthetic-population graph to produce at least one second epidemic estimate, wherein: each second epidemic estimate is determined based at least in part on the intervention and a respective randomization value; and at least one of the respective randomization values is different from the first randomization value (e.g., running replicates).
E: The method according to paragraph D, further comprising causing presentation, via a user interface, of a representation that is based at least in part on: the epidemic estimate; and at least one of the second epidemic estimates.
F: The method according to any of paragraphs A-E, further comprising: receiving data of a second intervention; and simulating the course of the epidemic in the synthetic-population graph to produce a second epidemic estimate based at least in part on the second intervention (e.g., testing multiple interventions).
G: The method according to paragraph F, further comprising causing presentation, via a user interface, of a representation that is based at least in part on: the epidemic estimate; and the second epidemic estimate.
H: The method according to any of paragraphs A-G, wherein: the method further comprises determining a first subset of nodes of the synthetic-population graph, wherein the first subset of nodes represents an initial infected population; and the simulating further comprises: modifying edges of the synthetic-population graph based at least in part on the intervention to produce a modified synthetic-population graph; determining spread of the epidemic in the modified synthetic-population graph based at least in part on a predetermined disease model (e.g., specified by a user or loaded from a database); and determining the epidemic estimate based at least in part on the spread of the epidemic.
I: The method according to paragraph H, further comprising: receiving the data of the intervention via a user interface (e.g., a Web browser); and receiving an indication of the predetermined disease model via the user interface.
J: A method comprising (e.g., under control of a processing unit): receiving attributes of a synthetic population; selecting a synthetic-population graph from a data library based at least in part on the attributes; receiving data of an intervention designed to affect a course of an event (e.g., an extended event that goes on over a period of time, or a point-in-time event with extended consequences); and simulating the course of the event in the synthetic-population graph to produce an estimate of the event, based at least in part on the intervention.
K: The method according to paragraph J, wherein: the synthetic-population graph comprises nodes, edges between at least some of the nodes, and labels associated with at least some of the edges; and the simulating comprises selectively propagating information about consequences of the event (e.g., infection; can include outcomes, results, or changes in event state, e.g., disease progress) along edges of a first subset of the edges based at least in part on at least some corresponding labels of the labels of the synthetic-population graph.
L: The method according to paragraph K, further comprising selectively modifying at least some of the labels of the synthetic-population graph based at least in part on the intervention.
M: The method according to any of paragraphs J-L, wherein: the synthetic-population graph comprises nodes, parameters associated with at least some of the nodes, and edges between at least some of the nodes; and the simulating comprises: determining data of consequences of the event; and selectively modifying at least some of the parameters (e.g., of the entities) based at least in part on the data of the consequences of the event.
N: The method according to any of paragraphs J-M, comprising: receiving a query; and determining at least one first simulation based at least in part on the query, wherein the simulating comprises running the at least one first simulation.
O: The method according to any of paragraphs J-N, wherein simulating comprises: modifying a first subset of nodes of the synthetic-population graph at a first simulated time based at least in part on the intervention and on attributes of nodes of the first subset of nodes; and modifying a second, different subset of nodes of the synthetic-population graph at a second, different simulated time based at least in part on the intervention and on attributes of nodes of the second subset of nodes (e.g., different nodes may change state at different times).
P: The method according to any of paragraphs J-O, further comprising: causing the estimate of the event to be presented via a user interface; receiving second attributes of a second synthetic population (e.g., dynamically; can be a subset of the synthetic population); determining a second estimate of the event based at least in part on the second attributes and on at least one of the estimate of the event or the synthetic population; and causing the second estimate of the event to be presented via the user interface (e.g., dynamically, in response to user controls; see, e.g.,
Q: A method comprising (e.g., under control of a processing unit): receiving input data associated with a target population; constructing a synthetic data set based on the input data, wherein the synthetic data set includes data of a plurality of synthetic entities corresponding with the target population; assigning entity attributes to individual entities of the plurality of synthetic entities based at least in part on the input data; receiving activity data associated with the target population; generating a social-contact graph by generating graph edges between individual entities of the plurality of synthetic entities based at least in part on corresponding ones of the entity attributes and on the activity data; receiving population attributes of a synthetic population; selecting a synthetic-population graph from the social-contact graph based at least in part on the population attributes; receiving data of an intervention designed to counteract or mitigate an event; and simulating a course of the event in the synthetic-population graph to produce an estimate of the event, based at least in part on the intervention.
R: The method according to paragraph Q, wherein: the synthetic-population graph comprises nodes, parameters associated with at least some of the nodes, and edges between at least some of the nodes; and the simulating comprises: determining data of consequences of the event; and selectively modifying at least some of the parameters based at least in part on the data of the consequences of the event.
S: The method according to paragraph Q or R, further comprising: presenting the estimate of the event via a user interface; receiving second population attributes of a second synthetic population; determining a second estimate of the event based at least in part on the second population attributes and on at least one of the estimate of the event or the synthetic population.
T: The method according to any of paragraphs Q-S, further comprising: in association with at least one of the constructing, the assigning, the generating, or the simulating, generating a request for a service; and fulfilling, by a broker software module, the request for the service; wherein the broker software module is selected from the group consisting of: a data broker configured to manage data used in constructing the synthetic data set; a data set construction broker configured to manage at least one of construction and modification of one or more input data sets; and an entity broker configured to manage at least one of creation and modification of the synthetic population.
U: A computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations as any of paragraphs A-I recites.
V: A device comprising: a processing unit; and a computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution by the processing unit configuring the device to perform operations as any of paragraphs A-I recites.
W: A system comprising: means for processing; and means for storing having thereon computer-executable instructions, the computer-executable instructions including means to configure the system to carry out a method as any of paragraphs A-I recites.
X: A computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations as any of paragraphs J-P recites.
Y: A device comprising: a processing unit; and a computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution by the processing unit configuring the device to perform operations as any of paragraphs J-P recites.
Z: A system comprising: means for processing; and means for storing having thereon computer-executable instructions, the computer-executable instructions including means to configure the system to carry out a method as any of paragraphs J-P recites.
AA: A computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution configuring a computer to perform operations as any of paragraphs Q-T recites.
AB: A device comprising: a processing unit; and a computer-readable medium, e.g., a computer storage medium, having thereon computer-executable instructions, the computer-executable instructions upon execution by the processing unit configuring the device to perform operations as any of paragraphs Q-T recites.
AC: A system comprising: means for processing; and means for storing having thereon computer-executable instructions, the computer-executable instructions including means to configure the system to carry out a method as any of paragraphs Q-T recites.
AD: The device as any of paragraphs V, Y, or AB, wherein the processing unit comprises at least one of: an FPGA, an ASIC, a PLD, a GPU (or GPGPU) and accompanying program memory, or a CPU and accompanying program memory.
Various examples permit flexible, effective simulation an analysis of events such as epidemics. A tested system according to various examples herein achieved the following performance characteristics: (i) Number of concurrent logged users: 50; (ii) Number of Experiments created: 12240; (iii) Number of jobs created: 73440; (iv) Duration of run: 60 minutes; (v) Throughput of web server: 559 requests per minute. A tested system was operated on a 76-node Linux cluster of 1 GHz-class processors. A social-interaction graph or SP graph of the United States can include, e.g., about 300 million nodes and about 1012-1013 edges.
In a tested example, an experiment was run to study progression of emerging infectious diseases through population networks in the United States. The experiment was a hepatitis simulation to analyze efficacy of interventions using the Epifast algorithm. The disease model was a Hepatitis A virus strain, the region was Chicago (size: 5.5 million individuals), and the duration was 120 days. The example Disease model is set for Hepatitis A viral strain with Transmissibility at 0.00008, indicating the population is at a very high risk of infection. Two interventions were simulated: vaccination and social distancing measures to control disease spread. Vaccine 1 was applied with Compliance 40% and 70% efficacy on Adults. Social Distancing was applied with Compliance swept from 70-90%. Experiments were run with 10 replicates of each cell.
Example data transmissions (parallelograms) and example blocks in the process diagrams herein represent one or more operations that can be implemented in hardware, software, or a combination thereof to transmit or receive described data or conduct described exchanges. In the context of software, the illustrated blocks and exchanges represent computer-executable instructions that, when executed by one or more processors, cause the processors to transmit or receive the recited data. Generally, computer-executable instructions, e.g., stored in program modules that define operating logic, include routines, programs, objects, modules, components, data structures, and the like that perform particular functions or implement particular abstract data types. Except as expressly set forth herein, the order in which the operations or transmissions are described is not intended to be construed as a limitation, and any number of the described operations or transmissions can be executed or performed in any order, combined in any order, subdivided into multiple sub-operations or transmissions, and/or executed or transmitted in parallel to implement the described processes.
Other architectures can be used to implement the described functionality, and are intended to be within the scope of this disclosure. Furthermore, although specific distributions of responsibilities are defined above for purposes of discussion, the various functions and responsibilities might be distributed and divided in different ways, depending on particular circumstances. Similarly, software can be stored and distributed in various ways and using different means, and the particular software storage and execution configurations described above can be varied in many different ways. Thus, software implementing the techniques described above can be distributed on various types of computer-readable media, not limited to the forms of memory that are specifically described.
Conditional language such as, among others, “can,” “could,” “might” or “may,” unless specifically stated otherwise, are understood within the context to present that certain examples include, while other examples do not include, certain features, elements or steps. Thus, such conditional language is not generally intended to imply that certain features, elements or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without user input or prompting, whether certain features, elements or steps are included or are to be performed in any particular example.
The word “or” and the phrase “and/or” are used herein in an inclusive sense unless specifically stated otherwise. Accordingly, conjunctive language such as, but not limited to, at least one of the phrases “X, Y, or Z,” “at least X, Y, or Z,” “at least one of X, Y or Z,” and/or any of those phrases with “and/or” substituted for “or,” unless specifically stated otherwise, is to be understood as signifying that an item, term, etc., can be either X, Y, or Z, or a combination of any elements thereof (e.g., a combination of XY, XZ, YZ, and/or XYZ). Any use herein of phrases such as “X, or Y, or both” is for clarity of explanation and does not imply that language such as “X or Y” excludes the possibility of both X and Y, unless such exclusion is expressly stated. As used herein, language such as “one or more Xs” shall be considered synonymous with “at least one X” unless otherwise expressly specified. Any recitation of “one or more Xs” signifies that the described steps, operations, structures, or other features may, e.g., include, or be performed with respect to, exactly one X, or a plurality of Xs, in various examples, and that the described subject matter operates regardless of the number of Xs present.
Furthermore, although the subject matter has been described in language specific to structural features or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claims. Moreover, in the claims, any reference to a group of items provided by a preceding claim clause is a reference to at least some of the items in the group of items, unless specifically stated otherwise.
As utilized herein, the terms “approximately,” “about,” “substantially,” and similar terms are intended to have a broad meaning in harmony with the common and accepted usage by those of ordinary skill in the art to which the subject matter of this disclosure pertains. It should be understood by those of skill in the art who review this disclosure that these terms are intended to allow a description of certain features described and claimed without restricting the scope of these features to the precise numerical ranges provided. Accordingly, these terms should be interpreted as indicating that insubstantial or inconsequential modifications or alterations of the subject matter described and are considered to be within the scope of the disclosure.
It should be noted that the term “example” as used herein to describe various embodiments is intended to indicate that such embodiments are possible examples, representations, and/or illustrations of possible embodiments (and such term is not intended to connote that such embodiments are necessarily extraordinary or superlative examples).
It should be noted that the orientation of various elements may differ according to other example embodiments, and that such variations are intended to be encompassed by the present disclosure.
The construction and arrangement of elements shown in the various example embodiments is illustrative only. Other substitutions, modifications, changes, and omissions may also be made in the design and arrangement of the various example embodiments without departing from the scope of the present disclosure.
The present disclosure contemplates methods, systems and program products on any non-transitory (i.e., not merely signals in space) machine-readable media for accomplishing various operations. The embodiments of the present disclosure may be implemented using existing integrated circuits, computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwired system. Embodiments within the scope of the present disclosure include program products comprising machine-readable media for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer or other machine with a processor. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a machine, the machine properly views the connection as a machine-readable medium. Thus, any such connection is properly termed a machine-readable medium. Combinations of the above are also included within the scope of machine-readable media. Machine-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
Although figures and/or description provided herein may show a specific order of method steps, the order of the steps may differ from what is depicted. Also two or more steps may be performed concurrently or with partial concurrence. In various embodiments, more, less or different steps may be utilized with regard to a particular method without departing from the scope of the present disclosure. Such variation will depend on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations can be accomplished with standard programming techniques with rule based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps.
This application is a nonprovisional application of, and claims priority to and the benefit of, U.S. Provisional Patent Application Ser. No. 62/322,791, filed Apr. 14, 2016, and entitled “Epidemic Analysis System,” the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62322791 | Apr 2016 | US |