The field relates generally to simulation of combinatorial processes, such as logistics processes, and more particularly, to techniques for summarizing and querying data related to such simulations.
Simulations generally encompass a set of sequential sub-processes. One example where simulations are employed is in supply chain logistics, where the goal is to move assets (e.g., equipment, materials and/or food) from a supplier to a customer, passing through one or more places, and potentially involving people and machines. The term logistics refers to the management of resources to accomplish such a goal. Useful information in supply chain logistics typically includes: suppliers, features of products and services; people and machines involved; and time to finish each activity. Such data can be obtained and manipulated directly by means of statistical analysis, as commonly done in the business intelligence area, or indirectly via simulations.
Simulations are typically used to help make decisions. In the supply chain logistics example, simulations provide the ability to observe one or more sub-processes that yield results without actually performing the related activities in the real world. Typically, the level of detail of the entire simulation process is chosen based on the target features of the simulation, e.g., specific simulation behaviors that can be quantified and are important for subsequent analysis and decision making.
Simulation applications may be very complex, and in order to capture the workings of the system, it might be necessary to run each simulation a very large number of times. Thus, extreme computational costs are implied and Big Data strategies are often required.
A need therefore exists for techniques for combining results of previous simulations of portions of a simulated process.
Illustrative embodiments of the present invention provide methods and apparatus for automatic combination of sub-process simulation results and heterogeneous data sources. In one exemplary embodiment, a method comprises the steps of obtaining, for a process comprised of a sequence of a plurality of sub-processes, an identification of one or more relevant input features and output features for each of the sub-processes; obtaining at least one execution map for each of the sub-processes, wherein each execution map stores results of at least one execution of a given sub-process originated from at least one data source, and wherein the results indicate a count of a number of times a given tuple of output features appeared given a substantially same tuple of input features; and, in response to one or more user queries regarding at least one target feature, selected among features of the sub-processes, and a user-provided initial scenario comprising values of the one or more relevant input features of a first sub-process, performing the following steps: composing a probability distribution function for the at least one target feature that represents a simulation of the process based on a sequence of the execution maps, one for each of the sub-processes, by matching the input features of each execution map with features from either the initial scenario or from the output of previous execution maps in the sequence; and processing the probability distribution function to answer the one or more user queries for the target feature.
In at least one embodiment, the execution maps for each of the plurality of sub-processes are stored as distributed tables that use the relevant input features to hash data related to multiple executions across multiple nodes, and wherein the composition process occurs in parallel across multiple nodes.
In one or more embodiments, the probability distribution function comprises a probability mass function and wherein, when one or more of the target features are continuous, the system further comprising the step of generating an approximation for a continuous probability density function based on the probability mass function. The probability distribution function for the at least one target feature is generated from the at least one execution map for each of the sub-processes selected based on a confidence level of the results in each execution map.
As noted above, illustrative embodiments described herein provide significant improvements relative to conventional simulation systems by for combining results of previous simulations of portions of a simulated process. These and other features and advantages of the present invention will become more readily apparent from the accompanying drawings and the following detailed description.
Illustrative embodiments of the present invention will be described herein with reference to exemplary communication, storage, and processing devices. It is to be appreciated, however, that the invention is not restricted to use with the particular illustrative configurations shown. Aspects of the present invention provide methods and apparatus for automatic combination of sub-process simulation results and heterogeneous data sources.
One or more embodiments of the invention analyze results in scenarios that have not been simulated by combining the results of previous simulations of parts of the process, for example, in an embarrassingly parallel fashion. In one or more embodiments, computer-based simulation of a sequence of one or more sub-processes is performed. While one or more embodiments are described in the context of supply chain logistics, the present invention may be employed in any environment where it is desirable to analyze results in scenarios that have not been simulated by combining the results of previous simulations of portions of the process.
In at least one embodiment, histograms originated from multiple, heterogeneous data sources, such as results for various sub-processes from different simulators and/or from real-world data, are combined to enable, improve and/or speed up simulations of the complete process. In this way, by leveraging a massively parallel approach in one or more exemplary embodiments, combined simulations can be created that extrapolate what could happen, something that is useful, for example, to obtain quick results when it is not viable to create a new unified simulation model of the entire process from scratch.
One or more embodiments of the invention provide a method for storing the results of computational simulations, datasets, user-input data or any other data source in a unified format, referred to herein as an execution map, that indexes the values of domain features by the scenarios in which they were attained. In at least one embodiment, the method relies on user-provided domain knowledge to define the features of interest in the domain that ultimately compose the maps. When the users specify such features they define hypothesis about what can influence each sub-process. The method then provides the means to extrapolate the combination of the available data sources and to guarantee coherence of the combined model.
One or more embodiments of the invention provide a method, leveraging the execution map representation, for efficiently combining diverse sources into a probability distribution function of a target feature of interest. In at least one embodiment, the method provides a specialist user with a simulation model of the target feature over the complete process, which can be used to quickly query simulation results that have not been simulated in a single run. In addition, the exemplary method can provide results even when no feasible model is readily available. In at least one embodiment, the method applies massive parallelism to efficiently allow the user to query the probability of values of the target feature in scenarios that were not previously simulated for the entire process, but can be extrapolated with coherence guaranteed by the matching of the input and output features in the maps.
One or more embodiments of the invention provide a method, leveraging the execution map representation, for the generation of additional, aggregated, combined sources of a same sub-process with similar schemas. This allows the combination of heterogeneous data sources representing the same sub-process, and mitigates the need to run every simulator a large number of times.
In one or more embodiments, the disclosed methods are also useful when there is already a unified simulation but it could take a long time to simulate new situations and this would demand large amounts of storage and other computational resources. The level of accuracy of such extrapolations can be substantially guaranteed by correctly specifying features that connect the different processes. Within the context of simulations that generate large amounts of data, the application of such techniques leverages decision-making capabilities.
Simulation of processes, such as supply chain logistics processes, is of broad interest to the industry. These simulations can be arbitrarily complex and demand large amounts of storage and other computational resources. In this context of heterogeneous simulations, the following problems may arise.
Running Complex Simulations
Highly detailed models are expensive to simulate in terms of computational resources. As an example, in the area of computer-based simulation, it is not rare to have simulation runs taking hours or even days to complete, requiring an enormous amount of memory and storage. Additionally, since the possibility space of a nondeterministic simulation model can be arbitrarily wide, which is often the case, large amounts of simulation traces are necessary in order to generate results that cover a relevant number of possible cases.
The cost related to the execution of a large number of simulation traces may render the usage of one big and complex simulation impractical if results are needed under time or computational resources constraints.
Building Simulation Models for Complex Domains
Building simulation models for decision making purposes over a complex process may demand prohibitive costs and resources. Although the details and level of granularity of the model contribute to the quality of the simulation results, they may be expensive to be acquired. This also means that the longer the sequence of sub-processes to be simulated, the higher the cost of building a simulation model. Building a simulation model is a naturally iterative activity, depending on analyses by domain experts and statistical verifications of properties. These iterations require that the simulation model be executed several times at every intermediate modeling stage, which relates this problem to the problem of computational costs of running complex simulations.
Another factor to be considered is that the sub-processes that compose the whole system may be managed and run by different agents, using different tools and information systems. It is often the case where multiple simulation models have been built, for different goals over time, and what is needed is a way to combine the results of these simulations. Hence, it is common for these sub-processes to be implemented by distinct simulator programs, which consume and generate different data types and features.
One or more embodiments of the invention mitigate the problem of modeling a complex domain by combining simpler simulations of a sequence of sub-processes into a single holistic simulation. A typical sequence of sub-processes may generate a result set that opens into a large search space for the combination of results given one or more target features. The computational costs of a naïve approach to this combination can be prohibitive, negatively impacting the response time of queries on this search space. On the other hand, in order to accurately support decisions, it is typically necessary to make sure that the combination of results from multiple sub-processes is coherent.
Combining Heterogeneous Simulations of a Same Sub-Process
Moreover, simulation applications implement vastly different simulation techniques, ranging from simple combinations of equations to complex separated elements simulated through a network of models, which includes discrete-event systems, petri nets (also known as a place/transition networks), system dynamics and other hybrid variations.
One or more embodiments of the invention tackle the combination of different simulations of a same sub-process. Simulation data generated from heterogeneous applications describing the same sub-process are quite common. One problem that arises is how to create unified representations that combine the results of these simulations in a way that provides more information about the behavior of that sub-process under diverse scenarios.
Combining Results of Previous Simulations of Portions of a Simulated Process
One or more embodiments of the invention provide methods and apparatus that provide a user with ways to query simulation results that have not been simulated in a single run, faster than what would it take to run the corresponding simulation, and without having to build a composite simulation model. Available results of partial simulations, or other data sources, are leveraged and extrapolated based on user-defined hypotheses to cover different scenarios. The partial results are optionally composed in an embarrassingly parallel manner so that it is possible to answer user queries in substantially real time.
One or more aspects of the invention comprise:
Assume that a target feature t is a feature of the domain that drives the user queries. Let P be a sequence of sub-processes composed of ordered non-overlapping sub-processes pi: P=[p1, p2, . . . , pn]; such that the sub-process pi comes after sub-process pi−1.
In this example, an interesting feature of the domain for the user queries could be the average global lead time, that is, the average time that the supply chain takes to deliver orders at their requested destination. In the real-world, cases may comprise many target features and sub-processes.
Each sub-process pi is covered by one or more alternative data sources, and each data source may cover a sequence of sub-processes. A data source covering the sequence pi, pi+1, . . . , pi+k is composed of multiple executions which implement pi, pi+1, . . . . , pi+k each. Data sources are typically the output of simulation programs for the sub-processes, but they can also correspond to real world data.
One or more embodiments of the invention produce the effect of a quick holistic simulation over the complete process taking advantage of the available data sources and is based on the following assumptions:
In the present exemplary context, data sources generated by simulation programs that describe multiple executions are the focus, but the disclosed techniques also allow for the consideration of historical data, or even user-edit data, as applicable data sources.
A data source could in principle correspond to a sequence of sub-processes if its executions generate outputs for each sub-process in the sequence. This means that executions from data source A, in
The following discussion provides an exemplary formalization of these concepts that is used herein. Let be the set of all data sources and be the set of all executions. For each data source d∈, define id is the set of all executions from d corresponding to sub-process pi.
Consider that each execution e∈ operates consuming an n-tuple as input and generating an n-tuple as output. A tuple q is an ordered sequence of n values of features Tj, with 1≤j≤n, such that q=(q1, q2, . . . , qn)∈T1×T2× . . . ×Tn. Let (q) be the schema, that is, the composition of features T1×T2× . . . ×Tn for any given tuple q.
Let q be the input tuple for an execution e∈. (e) is the input schema of e, defined by (e)def(q). Similarly, let r be the output tuple fore. (e) is the output schema of e, defined by (e)def(r).
It is assumed that the executions originating from a same data source will substantially always have the same input and output features. This does not impose a constraint on the disclosed method, as it is possible to consider additional data sources. This means that for any data source d, all executions ei∈id have the same input and output schemas id and id, respectively. If this is not the case, the same effect can be achieved by considering a data source d whose executions present multiple input and output schemas as distinct data sources d′, d″, . . . , d*. This means splitting the set id into subsets id′, id″, . . . , i* in which the requirement is asserted.
Constructing Execution Maps from Simulation Results
An important aspect of the invention is that executions of a same sub-process may originate from distinct simulators. In fact, the disclosed method applies even for executions that originate from historical data, or other sources, including a combination of sources and many executions coming from the same source.
This section describes how executions of a same sub-process pi are aggregated into an execution map, which indexes the counted results of the executions by the input features used to generate them.
One or more embodiments of the disclosed method presume that a specialist user provides the input and output features that are relevant to each sub-process pi. In the running example, it is assumed that relevant output features for process p1, representing the order generation sub-process, are:
Recall also that all executions of pi from a same data source have the same input and output schemas.
One or more embodiments of the disclosed method thus require that the user provides a way of mapping the data in the logs of the executions to at least some features relevant for the domain. In the example of
An execution map can be constructed of sub-process pi by source d defined as:
Mid:id→(,id)
where is the set of natural numbers. Put another way, an execution map Mid contains the histogram of the results in executions ei∈id.
Recall now that executions of p1 from other data sources can also provide information on the month, average process time and container occupancy ratio. These executions from multiple heterogeneous sources of a same sub-process can be combined to generate additional execution maps.
In the running example, suppose that a data source X (e.g., a database table) provides information on how the sub-process p1 of order generation behaves on the months of January and February, generating the same output information.
The input and output schemas for maps M1A and M1X of
In general, for every two execution maps Mia and Mib where ia is similar to ib and ia is similar to ib, a new execution map Miab is generated by aggregating the executions in the original maps. A schema is said to be similar to another if they are substantially identical or there is a known function that can convert values from one to the other. For example, assume that there is a known function that can convert the week of the year to the corresponding month. Assume also a data source W, whose executions provide the input feature week_date, and the same output features of A and X. Then, W is similar to A, and is likewise similar to X and to AX. For the purpose of the disclosed exemplary method, terms like ab are referred to as a data source in the same way as a or b is referred to, even though this data source does not correspond to executions from a single simulator program or from a real-world database table. In other words, ‘original’ data sources and those composed from the other data sources with similar schemas are not distinguished.
The notation Mid[q] refers to the resulting histogram obtained from the map Mid from a given input tuple q. Mid[q] is valid if q exists as input tuple in Mid, and invalid otherwise.
It is noted that this input tuple (jan) may contain more features than there are in the input features of the execution map and still make Mid[q] valid. For example, regardless of the value and meaning of R, M1AX[(jan, R)] results in the same distribution as M1AX[(jan)]. It is further noted that the output features of M1AX are a subset of the possible input and output features for sub-process p1. Executions of the same sub-process from other data sources may provide information on different subsets of the features of the sub-process.
Assume, following the running example, that from executions of two other data sources Y and Z the features avg_process_time and urgency can be computed, but not rt_containers. Then, the results of Y and Z can be combined, but neither can be combined with either A, X, or AX. Thus, a map M1YZ:1YZ→(,1YZ) is ultimately generated from maps M1Y and M1Z, since 1Y=1Z=1YZ=(month), and 1Y=1Z=1YZ=(avg_process_time, urgency).
Since executions from another data source may provide information on a different subset of output features of the sub-process, the end result is that for each unique pair of input and output schemas, a different execution map of sub-process pi should be generated. The specification of the input features of a sub-process determine what the user considers as variables that influence the executions of this sub-process. On the other hand, the output features correspond to the features that can be relevant for the following sub-processes.
In order to provide efficiency for the composition of execution maps, in one or more embodiments, execution maps are stored as distributed tables that use the input features to hash the data related to massive amount of executions across multiple nodes. In this way, it is possible to minimize data movement during the composition as described in the sequel.
Composition of a Target Feature Probability Distribution Function
After constructing the possible execution maps for all sub-processes, one or more embodiments can, at query time, generate a probability distribution function of a target feature that reproduces the effect of a simulation of the complete process. With this function at hand, the user can query values for the target feature in a specific provided scenario in real time, without the need to build or run a complex simulation that covers the entire process.
A probability distribution function pdf(x) is a function that returns, for each possible value of x, the probability in which x occurs. Here, the value x is of the type of the target features defined by the user query. For instance, if the user query is to know how long it would take to deliver a certain type of orders in a given period of the year, the target feature is the global lead time, i.e., the sum of all sub-process times in P, and the pdf(x) returns the probability for that amount of time to occur. Since the values recorded in execution maps, over which the algorithms defined below operate, are discrete, the method in fact builds a probability mass function, which gives the probabilities for discrete values of x. In case the target feature is continuous, an approximation can be generated for a continuous probability density function based on the probability mass function.
As discussed hereinafter, in one or more embodiments, a pdf composition method operates given a sequence of execution maps, one for each sub-process in P. In order to choose the best execution maps for each sub-process it is important to consider the confidence in the results of each map and the compatibility among results from the various sub-processes. It is noted that in the exemplary embodiments described herein, the execution map for each sub-process is assumed to be already selected.
This list funcs therefore allows the user to define merging strategies for the features that change throughout the process. In a basic case, where the strategies remain the same, the list would contain the same set of functions repeated once for each map in mapSeq. Target features, such as the global lead time of the running example, correspond to the accumulation of values throughout the process and merging functions are used to progressively compute them.
The exemplary compose process 900 receives these inputs and returns the desired probability distribution function, pd f, during step 6.
As shown in
As shown in
In steps 2 and 3, the input scenario is changed into an ‘initial scenario’, representing a histogram of a single instance. In the exemplary compose process 900, and the auxiliary algorithms (processes) described below, histograms are tables with rows (c, F) where c is the count and F is itself a tuple of features.
The initial_hist (step 3) thus records a single occurrence of the provided initial_scenario. The initial_scenario tuple is given by the algorithm GenerateInitialScenario, which performs the necessary transformations on the scenario tuple given as argument, if any. An implementation of GenerateInitialScenario is presumed to be provided by the user in one or more embodiments, and not described further herein.
The resulting histogram is transformed during step 5 into a probability distribution function, pdf, by a Generate PDF process 1000, as discussed hereinafter in conjunction with
The list next_feat corresponds to the input features of the next map of each map. List next_feat is used, during step 5, in a typical massively parallel implementation, to distribute intermediate results among computational nodes so that they can be efficiently combined with the following execution maps. This list is included in the definition of the algorithm as such information is important for data movement and load-balancing strategies, but such strategies are outside the scope of this invention.
As previously stated, the exemplary CombineHistogram process 1100 is a recursive algorithm. At each iteration, hist represents the current histogram, with the results of applying former maps to the initial scenario. In the first call, during step 3, hist corresponds to the initial scenario itself, as no maps have been applied.
Steps 1-2 contain the termination condition for the recursive calls. When there are no more maps in mapSeq, the algorithm returns the current histogram as the final result.
If there are still maps in mapSeq, the first map is removed from the list and stored as current_map by a call to the head algorithm during step 3. The head algorithm is presumed to return the first element of a list structure, as would be apparent to a person of ordinary skill in the art. The head algorithm is also applied over the future_feat list, to yield the current_feat variable, during step 4.
In step 5, a new histogram next_hist is initialized as an empty table. The input features of the next map in the sequence are informed so that they can be used to hash the results to be generated. The loop in steps 6-10 then populates this list with the results of combining the original histogram hist with the results in current_map.
The following operations (steps 8-10) are performed for every pair or tuples (ci, Qi) and (cj, Qj), where the first tuple comes from the input histogram and the second tuple is obtained from scenario map current_map given input Qi. A new tuple (cn, Qn) is generated and stored in the table next_hist through a call to append.
These operations are optionally a point of parallelism, since each pair of tuples can be independently considered. Enabling a high level of parallelism in the computation of the resulting tuples is essential for the real-time aspects of the disclosed method. As previously mentioned, the execution maps are stored in distributed tables that are hashed according to the values of the input features of the executions. Function append stores tuples in the new histogram using the input features of the next map to hash them. By using this strategy, the tuples in the histogram that will have to be joined with tuples of the next execution map will be in the same partition. In this way, this important operation can optionally occur in an embarrassingly parallel fashion.
The count for each tuple of the histogram is obtained by multiplying the counts of the original tuples. This means to represent the fact that each output in the input histogram causes each of the cj results obtained in current_map a ci number of times in the resulting histogram.
The resulting scenario tuple Qn is obtained by merging the input and output tuples Qi and Qj, through a call during step 9 to an auxiliary function merge. This algorithm merges two tuples into one, using the tuples' schemas (Qi) and (Qj). The resulting tuple is an expansion of Qi: the items of Qj with a feature that is not in (Qi) are appended to the resulting tuple.
The merge algorithm also uses the current set of merging functions, given by a call head(funcs). These merging functions can deal with features of Qi that should be updated according to values of features of Qj. For each of these features, the corresponding function generates its value in the resulting scenario Qn. Notice that this is typically useful to compute the target feature, which usually depends on contributions from various sub-processes. The current merging strategy for the target feature t determines how its value calculated so far is updated as the current execution map is combined. In the case of the running example, with global lead time as the target feature, the partial times of the sub-processes are accumulated. Other kinds of functions could be specified by users.
After the input histogram has been combined with the histograms obtained by the scenario map current_map, the resulting histogram is grouped by all the features during step 11 that are still necessary as inputs in the remainder of the sequence of execution maps mapSeq. This is achieved through a call to group_by with the second parameter bound to the structure current_feat (step 11).
The group_by algorithm called in step 11 receives two inputs: a histogram H and a list of features F. The group_by algorithm iterates over all elements (c, Q) in H, operating over the tuple Q. The group_by algorithm discards all items in Q whose features are not in F. Then, all elements (c1, Q), (c2, Q), . . . , (cm, Q) where tuples Q are the same are grouped into a single element (C, Q) where
Notice that the execution of group_by is important to prune unnecessary tuples, and thus reduce the combinatorial nature of the process. Additionally, this is another phase of the algorithm that enables parallelism. As the histograms are distributed according to the values of input features of the next execution map, in one or more embodiments, which are a subset of current_feat, tuples that can be grouped are always on the same node and the operation occurs in an embarrassingly parallel fashion.
Finally, in step 12, the function 1100 returns the result of a recursive call. The arguments passed are the new distribution next_hist, and the tail of the lists, future_feat, next_feat and funcs (i.e., the remainder of such lists after discarding the first elements of each one).
Examples
The call to CollectFeatures (
The initial scenario is then obtained during Step 2 of
A call is made during step 4 (
In the CombineHistogram algorithm 1100 (
In step 11 of
Now, a second level call to the CombineHistogram algorithm 1100 (
This call has arguments:
[(10, (50 h, 20%)), (20, (50 h, 80%)), (25, (100 h, 20%)), (20, (100 h, 80%))],
where each element (c, Q) is composed of a count value and a tuple Q such that (Q)=(lead_time, rt_containers);
Recall map M2B (1200), given in
Given the above inputs, the CombineHistogram algorithm 1100 (
In order to represent that each instance of a result in sub-process p1 leads to multiple possible sequences in sub-process p2, each matching scenario has the count values multiplied. In the present example, this represents that every execution of sub-process p1 in January leads to many possible results in sub-process p2.
This means that the count value of each bar in the histogram 1400 of
For example, the multiplication 1610 of the 10 counts of <50 h, 20%> by the 10 counts of <10 h> yield a count of a 100 situations where the lead time is 60 h and the container occupancy is 20%. The tuple <60 h, 20%> is obtained by adding the values of the target feature (50 h to 10 h), as stated above.
Notice that some of the partial results in
In at least one embodiment, the method 1100 would then proceed by combining this resulting distribution with the distributions given in an execution map selected for sub-process p3. Because the tuples in this distribution still have a feature rt_containers, the values of 20% and 80% can still be used as input for that map, even if those values were not generated by the last map in the sequence. That is to say, the information on the ‘current scenario’ is propagated throughout the combination of sub-processes.
Suppose, however, that it is known that rt_containers is not required as input for any of the remaining maps to be combined. This would be the case, for example, if M2B were the last map in the sequence. The aggregated resulting distribution 1900 is shown in
The final distribution produced by the combination of all maps in the sequence is substantially always the counts of values of the target feature. After trivial normalization, this distribution results in a probability distribution function of the resulting values of the target feature after the entire sequence of sub-process has taken place.
Supply Chain Logistics
In the context of supply chain management for various industries, such as oil and gas exploration and production and health care, there are usually thousands of types of materials to be delivered taking into account a large number of sub-processes. There are multiple policies within each sub-process and dealing with all combinations poses a huge combinatorial problem. Creation of detailed simulation models in these contexts is very time consuming. In addition, simulations that cover all the most likely scenarios might need to generate multiple terabytes of data and take days to be generated and analyzed.
By using the techniques described herein, results can be estimated for scenarios that have never been simulated orders of magnitude more quickly than by executing complete simulations. In addition, even when there is no complete simulation model, results can still be obtained by combining partial results. The resulting probability distribution function of the target variable tends to provide much better analytical and predictive insights when compared to simple reports based on historical data, such as global averages or other statistical measurements, because it is generated based on results of the specific scenario the user wants to query.
Scientific and Engineering Workflows
In Scientific and Engineering workflows, various simulation models are usually executed and chained as part of a same workflow and the execution of which could take many weeks. By using the disclosed methods, previous executions of the same workflow in other scenarios can be used to answer queries in due time when it is not viable to execute the complete workflow.
In addition, partial results of the execution of a workflow can be used to predict the final result. This can be useful in particular when some level of user steering is necessary to verify whether parameters of the simulations are correct so that predicted results make sense. In case a problem is detected, processes can be interrupted earlier and restarted correctly.
Conclusion
In complex domains, such as supply chain management, industrial processes optimization and many others from scientific and engineering areas, the simulation of scenarios to accurately obtain distribution probabilities of target features is usually necessary to support decision making. Very often, such simulations take a long time to be computed and generate massive data sets to be analyzed. In addition, unified simulation models may not be available for the whole process available.
One or more embodiments of the invention generate results for the simulation of new scenarios when there is a lack of time to perform complete simulations. In addition, at least one embodiment of the invention supports decisions when there is no unified simulation model available, but there are massive heterogeneous simulation results (or historic data) from different parts of a process. A massively parallel method is optionally performed for the automatic combination of large volumes of simulation results and other heterogeneous data sources based on user-defined hypothesis about the relationship between sub-processes. Such a combination allows the user to extrapolate available results in order to quickly obtain distribution probabilities of target variables in new scenarios, even when there is no unified simulation model available. The disclosed method substantially guarantees the coherence of the distribution probabilities with such hypothesis. In this way, the better the hypotheses are, the closer the obtained distributions to what would be observed in the real world or provided by complete simulations of scenarios.
The foregoing applications and associated embodiments should be considered as illustrative only, and numerous other embodiments can be configured using the techniques disclosed herein, in a wide variety of different applications.
It should also be understood that the techniques for combining results of previous simulations of portions of a simulated process, as described herein, can be implemented at least in part in the form of one or more software programs stored in memory and executed by a processor of a processing device such as a computer. As mentioned previously, a memory or other storage device having such program code embodied therein is an example of what is more generally referred to herein as a “computer program product.”
The disclosed techniques for combining results of previous simulations of portions of a simulated process may be implemented using one or more processing platforms. One or more of the processing modules or other components may therefore each run on a computer, storage device or other processing platform element. A given such element may be viewed as an example of what is more generally referred to herein as a “processing device.”
Referring now to
Referring now to
The cloud infrastructure 2200 may encompass the entire given system or only portions of that given system, such as one or more of client, servers, controllers, or computing devices in the system.
Although only a single hypervisor 2204 is shown in the embodiment of
An example of a commercially available hypervisor platform that may be used to implement hypervisor 2204 and possibly other portions of the system in one or more embodiments of the invention is the VMware® vSphere™ which may have an associated virtual infrastructure management system, such as the VMware® vCenter™. The underlying physical machines may comprise one or more distributed processing platforms that include storage products, such as VNX™ and Symmetrix VMAX™, both commercially available from EMC Corporation of Hopkinton, Mass. A variety of other storage products may be utilized to implement at least a portion of the system.
In some embodiments, the cloud infrastructure additionally or alternatively comprises a plurality of containers implemented using container host devices. For example, a given container of cloud infrastructure illustratively comprises a Docker container or other type of LXC. The containers may be associated with respective tenants of a multi-tenant environment of the system, although in other embodiments a given tenant can have multiple containers. The containers may be utilized to implement a variety of different types of functionality within the system. For example, containers can be used to implement respective compute nodes or cloud storage nodes of a cloud computing and storage system. The compute nodes or storage nodes may be associated with respective cloud tenants of a multi-tenant environment of system. Containers may be used in combination with other virtualization infrastructure such as virtual machines implemented using a hypervisor.
Another example of a processing platform is processing platform 2300 shown in
The processing device 2302-1 in the processing platform 2300 comprises a processor 2310 coupled to a memory 2312. The processor 2310 may comprise a microprocessor, a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other type of processing circuitry, as well as portions or combinations of such circuitry elements, and the memory 2312, which may be viewed as an example of a “computer program product” having executable computer program code embodied therein, may comprise random access memory (RAM), read only memory (ROM) or other types of memory, in any combination.
Also included in the processing device 2302-1 is network interface circuitry 2314, which is used to interface the processing device with the network 2304 and other system components, and may comprise conventional transceivers.
The other processing devices 2302 of the processing platform 2300 are assumed to be configured in a manner similar to that shown for processing device 2302-1 in the figure.
Again, the particular processing platform 2300 shown in the figure is presented by way of example only, and the given system may include additional or alternative processing platforms, as well as numerous distinct processing platforms in any combination, with each such platform comprising one or more computers, storage devices or other processing devices.
Multiple elements of system may be collectively implemented on a common processing platform of the type shown in
As is known in the art, the methods and apparatus discussed herein may be distributed as an article of manufacture that itself comprises a computer readable medium having computer readable code means embodied thereon. The computer readable program code means is operable, in conjunction with a computer system, to carry out all or some of the steps to perform the methods or create the apparatuses discussed herein. The computer readable medium may be a tangible recordable medium (e.g., floppy disks, hard drives, compact disks, memory cards, semiconductor devices, chips, application specific integrated circuits (ASICs)) or may be a transmission medium (e.g., a network comprising fiber-optics, the world-wide web, cables, or a wireless channel using time-division multiple access, code-division multiple access, or other radio-frequency channel). Any medium known or developed that can store information suitable for use with a computer system may be used. The computer-readable code means is any mechanism for allowing a computer to read instructions and data, such as magnetic variations on a magnetic media or height variations on the surface of a compact disk.
Also, it should again be emphasized that the above-described embodiments of the invention are presented for purposes of illustration only. Many variations and other alternative embodiments may be used. For example, the disclosed techniques are applicable to a wide variety of other types of communication systems, storage systems and processing devices. Accordingly, the particular illustrative configurations of system and device elements detailed herein can be varied in other embodiments. These and numerous other alternative embodiments within the scope of the appended claims will be readily apparent to those skilled in the art.
Number | Name | Date | Kind |
---|---|---|---|
8874615 | Prieditis | Oct 2014 | B2 |
Entry |
---|
Richard M. Fujimoto, “Parallel discrete event simulation”, 1990, Communications of the ACM 33.10, pp. 30-53. |
Richard M. Fujimoto, Parallel and Distributed Simulation, 1999, Proceedings of the 1999 Winter Simulation Conference, pp. 122-131. |
Augusto Cesar Espindola Baffa and Angelo EM Ciarlini, “Modeling POMDPs for generating and simulating stock investment policies”, 2010, Proceedings of the ACM Symposium on Applied Computing, pp. 1-6. |
Augusto C.E. Baffa and Angelo E.M. Ciarlini, “Planning under the uncertainty of the technical analysis of stock markets”, 2010, Ibero-American Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 110-119. |