Computing devices may generate new data based on stored data. For example, a computing device may store a database that includes sales data for a variety of products over a period of time. The computing device may generate new data by calculating an average sale price of each sale.
In some cases, a database or other type of data source may be distributed across a number of computing devices. For example, a first portion of a database that stores sales at a first store location may be stored on a local storage of a first computing device and a second portion of the database that stores sales at a second store location may be stored on a local storage of a second computing device. To generate new data, the second portion of the database may be sent to the first computing device and stored on the local storage of the first computing device. The first computing device may calculate the average sale price of each sale across the database using the first portion and second portion of the database stored on the local storage.
In one aspect, a computing device of a data zone in accordance with one or more embodiments of the invention includes a persistent storage and a processor. The persistent storage includes a data source. The processor obtains a global computation request, instantiates a global computation based on the global computation request, and instantiates an intermediate computation in a second data zone based on the instantiated global computation.
In one aspect, a method of operating a computing device of a data zone in accordance with one or more embodiments of the invention includes obtaining, by the computing device, a global computation request; instantiating, by the computing device, a global computation based on the global computation request; and instantiating, by the computing device, an intermediate computation in a second data zone based on the instantiated global computation.
In one aspect, a non-transitory computer readable medium in accordance with one or more embodiments of the invention includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for operating a computing device of a data zone, the method includes obtaining, by the computing device, a global computation request; instantiating, by the computing device, a global computation based on the global computation request; and instantiating, by the computing device, an intermediate computation in a second data zone based on the instantiated global computation.
Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims.
Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the invention. It will be understood by those skilled in the art that one or more embodiments of the present invention may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the invention. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description.
In the following description of the figures, any component described with regard to a figure, in various embodiments of the invention, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments of the invention, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure.
In general, embodiments of the invention relate to systems, devices, and methods for performing computations using distributed data sources. More specifically, the systems, devices, and methods may enable computations to be performed across data sources residing in any number of data zones. In one or more embodiments of the invention, a global computation may be broken down into any number of intermediate computations for execution in any number of data zones. As used herein, an intermediate computation is any computation generated to service a global computation.
In one or more embodiments of the invention, an intermediate computation may generate a computation result based on only local data of a data zone in which the intermediate computation is being performed. In one or more embodiments of the invention, an intermediate computation may generate a result based on local data of a data zone in which the intermediate computation is being performed and local data of a second data zone. To obtain the local data, or a result based on the local data in the second data zone, the intermediate computation may instantiate a second global computation in the second data zone. Thus, instantiation of a global computation may result in the recursive instantiation of any number of computations in any number of data zones.
In one or more embodiments of the invention, instantiated intermediate computations may generate computation results that are used by other computations to generate a second computation result. For example, the results of the intermediate computations may be aggregated in a single data zone and a global computation result may be generated using the aggregated intermediate computations.
In one or more embodiments of the invention, an intermediate computation instantiated to service a global computation may be a second global computation. For example, a first global computation may use, as input, the result of a global computation as an intermediate computation. To service the first global computation, the second global computation may be instantiated as an intermediate computation of the global computation.
As used herein, a data zone is any collection of computing devices that are logically demarcated from all other computing devices. For example, a data zone may be a cloud computing environment. The cloud computing environment may utilize the computing resources of a number of computing devices.
As used herein, a data source refers to any data of any type stored in any format on a computing device and/or a storage device of a data zone.
In one or more embodiments of the invention, a portion of the data sources may be locked to a data zone. As used herein, a data source that is locked to a data zone means a data source, or a portion thereof, may not be transmitted to computing devices that are not members of the data zone. For example, a cloud computing environment may host a medical record on a non-transitory storage of a computing device of the cloud computing environment. Access restrictions associated with medical records may lock the data to the cloud computing environment and prevent the medical records from being sent to a computing device of a different cloud computing environment.
The clients (100) may be computing devices. The computing devices may be, for example, mobile phones, tablet computers, laptop computers, desktop computers, servers, computing clusters, or cloud computing systems. The computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application and/or the method illustrated in
The clients (100) may issue global computation requests to the data zones (110). A global computation request may request a computation result for data sources in any number of data zones (110). A global computation request may include: (i) a description of the computation to be performed and (ii) an identifier of the client so that the global computation result may be provided to the client specified by the identifier. For additional details regarding global computation requests, See
In one or more embodiments of the invention, the clients (100) may have access to a map (not shown) that specifies the data sources in the data zones (110). In one or more embodiments of the invention, the map may be a data structure that specifies the aforementioned information. The map may be stored on a non-transitory computer readable storage medium of any of the clients (100) or another computing device operably connected to the clients (100).
In one or more embodiments of the invention, the clients (100) may utilize the map to generate global computation requests. For example, the clients (100) may select a type of computation to be performed using the data sources specified by the map.
As discussed above, the clients (100) may send global computation requests to data zones (110). The data zones (110) may collaboratively perform computations to obtain computation results requested by the clients (100). More specifically, a data zone may issue intermediate computations to be performed by varying data zones. The issued intermediate computations may generate results based on: (i) data stored in the data zone in which the intermediate computation is performed and/or (ii) data stored in data zones in which the intermediate computation will not be performed. In turn, the results of the intermediate computations may be used to generate a global computation result. In a case where an intermediate computation generates a result based on data stored in data zones in which the intermediate computation will not be performed, the intermediate computation may issue additional global computations to the respective zones in which the data is stored to obtain results from the respective zone that is used to compute the intermediate computation result.
In one or more embodiments of the invention, the computing resources of the data zones (110) are computing devices. The computing devices may be, for example, mobile phones, tablet computers, laptop computers, desktop computers, servers, computing clusters, or cloud computing systems. The computing devices may include one or more processors, memory (e.g., random access memory), and persistent storage (e.g., disk drives, solid state drives, etc.). The persistent storage may store computer instructions, e.g., computer code, that when executed by the processor(s) of the computing device cause the computing device to perform the functions described in this application and/or the methods illustrated in
In one or more embodiments of the invention, the computing resources of a first data zone are geographically separated from the computing resources of a second data zone. For example, a first data zone may be located in the US and the second data zone may be located in Canada.
In one or more embodiments of the invention, the computing resources of a first data zone are located adjacent to the computing resources of a second data zone. For example, the first and second data zone may include computing resources of a single computing cluster that are logically, rather than physically, separated.
Performing intermediate computations in data zones hosting data sources in which the intermediate computations are performed may reduce the computing resource cost of performing a global computation when compared to aggregating the data sources in a single zone and performing intermediate/global computations in the single data zone. For example, performing an intermediate computation in a data zone may result in an intermediate computation result of a much smaller size than the data on which the intermediate computation result is based. Thus, sending the intermediate computation result, rather than the data on which the intermediate computation result is based, may be more computing resource usage efficient. In one or more embodiments of the invention, each data zone may be a logical grouping of computing resources that stores data locked to the computing resources. The computing resources may be orchestrated to give rise to the functionality of the data zone described throughout this application.
In one or more embodiments of the invention, each data zone may store data, i.e., a data source that is locked to the data zone. As used herein, data that is locked to a data zone means data that may not be transmitted to computing resources that are not part of the logical grouping of computing resources defined by the data zone. Data may be locked to a data zone for any reason. For example, data may be locked to a data zone due to privacy concerns. In another example, data may be locked to a data zone due to the size of the data. In a further example, data may be locked to a data zone due to a restriction imposed on the data by an owner of the data. The data may be locked to a data zone due to other restrictions/reasons without departing from the invention.
In one or more embodiments of the invention, the data zones (110) may be organized as a logical network. In other words, each of the data zones may be a node of the logical network and/or computing network. To perform computations, computation requests from clients and/or data zones may be distributed via the logical network.
In one or more embodiments of the invention, each data zone may include a map of the logical network. The map may specify: (i) the topology of the network, (ii) the computing resources available to each data zone, and (iii) the data stored by each data zone. The map may include more, different, and/or less information without departing from the invention.
In one or more embodiments of the invention, the data zones (110) may send global and intermediate computation requests to other data zones to service global computation requests from clients and/or intermediate computations being performed to service global computation requests from clients. As used herein, an intermediate computation request refers to a request by a data zone to service a global computation request from a client or to service a global or an intermediate computation used to service a global computation request from a client. As noted above, each data zone may host one or more data sources. A computation request from a client may require performing computations on data sources stored in different data zones. To service a global computation request, the data zones may analyze the request and send appropriate intermediate computation requests to other data zones. Additionally, to service global and intermediate computation requests from other data zones, additional intermediate computation requests may be sent to additional data zones. Thus, servicing a global computation request may cause global and intermediate computation requests to be recursively sent to any number of data zones.
To further clarify the data zones (110),
In one or more embodiments of the invention, the data zone (120) is a logical computing device that utilizes the physical computing resources of one or more computing devices to provide the functionality of the data zone (120) described throughout this application and/or to perform the methods illustrated in
In one or more embodiments of the invention, the data zone (120) includes computing resources that provide processing (e.g., computations provided by a processor), memory (e.g., transitory storage provided by RAM), and persistent storage (e.g., non-transitory storage provided by a hard disk drive) by utilizing the physical computing resources of the computing devices of the data zone (120). In one or more embodiments of the invention, the data zone (120) may include instructions stored on a persistent storage of a computing device of the data zone that when executed by a processor of the data zone provides the functionality of the data zone (120) described throughout this application and/or the methods illustrated in
In one or more embodiments of the invention, the computing devices utilized by the data zone (120) are operably connected to each other and/or operably connected to computing devices of other data zones. For example, each of the computing devices of the data zone (120) may include a network interface that enables packets to be sent via a network to other computing devices of the data zone (120) or other data zones.
To provide the aforementioned functionality of the data zone (120), the data zone (120) may include a computation manager (121) that instantiates/manages instances of computation frameworks (124), including computation frameworks (124A) through (124N), executing using computing resources of the data zone (120), a map (122) that specifies the locations/types of data stored in data zones, a template library (123) used to instantiate computations, data sources (128), including data sources (128A) through (128N), stored using computing resources of the data zone (120), and computing results stored as computation results (130), including global computation results (130A) and intermediate computation results (130B). Each component of the data zone (120) is discussed below.
In one or more embodiments of the invention, the computation manager (121) responds to upstream computation requests. The computation manager (121) may respond to the upstream computation requests by instantiating computing frameworks (124). The computation frameworks (124) may generate computation results (130) specified by the upstream computation requests.
As used herein, an upstream computation request refers to any computation request received from another data zone or client. In one or more embodiments of the invention, an upstream computation request is a global computation request sent from a client. In one or more embodiments of the invention, the upstream computation request is an intermediate computation request generated by a computing device of another data zone.
As used herein, an intermediate computation request refers to a computation request generated by a data zone. The intermediate computation requests may be generated by computation frameworks, as will be discussed in greater detail with respect to
As used herein, instantiating a computation framework means to start one or more processes that perform the functionality of a computation framework as will be discussed in greater detail with respect to
In one or more embodiments of the invention, the computation manager (121) is implemented as one or more processes executing using computing resources of the data zone (120) based on computer instructions stored on a non-transitory computer readable media. The computing instructions, when executed using processing computing resources of the data zone (120) cause computing device(s) of the data zone (120) to perform the functions of the framework manager (122) and/or all or a portion of the methods illustrated in
In one or more embodiments of the invention, the computation frameworks (124) may service upstream computation requests. The computation frameworks (124) may service the upstream requests by generating computation results (130) and/or providing generated computation results (130) to the requesting entity. In one or more embodiments of the invention, the computation results (130) may be stored in a cache of the data zone (120). For additional details regarding the computation frameworks (124), See
The data sources (128) and computation results (130) may be data stored using computing resources of the data zone (120). The data zone (120) may store additional, different types, and/or less data without departing from the invention. Each type of the aforementioned data is discussed below.
In one or more embodiments of the invention, each data source (128A, 128N) of the data sources (128) is data stored in the data zone (120) that may not be transmitted to computing devices that are not a part of the data zone (120). As discussed above, the data sources (128) may not be transmitted to computing devices that are not a part of the data zone (120) for any reason without departing from the invention. For example, a data source may include private data that is restricted from being transmitted outside of the data zone (120). In another example, the data source may include data that is prohibitively expensive to transmit to another data zone. In a further example, not transmitting the data of the data source to another data zone may save bandwidth cost, reduce the likelihood of the data source being intercepted or otherwise exposed when transmitting the data source, and/or reduce the computation cost of performing a computation by having the computation be performed in a data zone that provides lower cost computations. The data sources (128) may be used, in part, by computation frameworks (124) to generate computation results (130).
In one or more embodiments of the invention, the data sources (128) have varying formats. For example, a first data source may be in a database format while a second data source may be in a table format. Some of the data sources may have the same format without departing from the invention.
In one or more embodiments of the invention, the data sources (128) may be dynamic. In other words, the content of each data source may be changing over time. For example, a source may include data from a sensor being streamed to a computing device of the data zone (120).
In one or more embodiments of the invention, each data source may have an associated identifier (not shown). The identifier may associate the data source, or a portion thereof, with one or more other data sources, or portions thereof, stored in the data zone (120) and/or other data zones. The identifier may be, for example, a time stamp, a data source, an identifier of a data zone in which the data source is stored, a data format, a data type, a size of the data source, or another characteristic. In one or more embodiments of the invention, the identifier of a data source may be stored as metadata associated with the data source.
In one or more embodiments of the invention, the computation results (130) may be result(s) of computations performed by the computation frameworks (124). In one or more embodiments of the invention, the computation results (130) may be able to be transmitted to computing devices of other data zones, in contrast to the locked data sources (128) which cannot be transmitted to computing devices of other data zones.
The computation results (130) may include global computation results (130A) that reflect the result of performing global computations and local intermediate computation results (130B) that reflect the result of performing intermediate computations.
In one or more embodiments of the invention, each computation result may have an associated identifier. The result identifier, much like an identifier of a data source, may associate the computation result with one or more computation results stored in the data zone (120) and/or other data zones. In one or more embodiments of the invention, the result identifier may associate the computation result with one or more data sources stored in the data zone (120) and/or other data zones. The result identifier may associate the computation result with any number of data sources and/or computation results stored in the data zone (120) and/or other data zones without departing from the invention. The result identifier may be, for example, a time stamp, a data source from which the result was generated, an identifier of a data zone in which the computation result is stored, a data format of the computation result, a data type of the computation result, a size of the computation result, or another characteristic of the computation result. In one or more embodiments of the invention, the result identifier of a computation result may be stored as metadata associated with the computation result.
In one or more embodiments of the invention, the result identifier may include multiple time stamps that may specify one or more of the following: the time at which the computation result was generated, the time at which the computation result was stored in a cache, and the time at which a computation that generated the computation result was instantiated. The result identifier may include additional and/or fewer time stamps that specify other characteristics of the computational result without departing from the invention.
As discussed above, the computation frameworks (124) may generate computation results (130).
In one or more embodiments of the invention, the computation manager (152) instantiates: (i) a global or intermediate computation (154), (ii) local intermediate computation managers (156), and/or (iii) remote intermediate computation managers (160). The aforementioned computations and/or managers may be instantiated by the computation manager (152) to service an upstream computation request which triggered the instantiation of the computation framework (150). Instantiating the framework (150) may include instantiating the computation manager (152).
In one or more embodiments of the invention, the computation manager (152) may instantiate a global computation or an intermediate computation based on a requested computation result specified by a computation request. For example, a global computation request received from a client may result in a global computation to be instantiated. In another example, an intermediate computation request received from a client may result in an intermediate computation to be instantiated.
In one or more embodiments of the invention, the global/intermediate computation (154) may be instantiated based on a template. The template may be part of a template library that includes any number of templates. For additional details regarding templates, See
In one or more embodiments of the invention, the global computation (154) may generate a global computation result using: (i) local intermediate computation results generated by the local intermediate computations (154) and/or (ii) global or local intermediate computation results generated by other data zones.
In one or more embodiments of the invention, the global/intermediate computation result may be stored as a computation result after being generated. In one or more embodiments of the invention, the global computation result may be sent to a requesting entity.
In one or more embodiments of the invention, the local intermediate computation managers (156) may be instantiated by the computation manager (152). The local intermediate computation managers (156) may instantiate local intermediate computations (158) to generate local intermediate computation results used by the global/intermediate computation (154) to generate a global computation result.
In one or more embodiments of the invention, the local intermediate computation managers (156) may instantiate local intermediate computations (158) based on a corresponding input to the global/intermediate computation (154). For example, when a global computation is instantiated, it may be based on a template. The resulting global computation may take, as input, one or more intermediate computation results. The template, on which the global computation is based, may also provide intermediate computation prototypes on which the intermediate computations are based.
Templates may be selected based on a computation type implicated by a global computation request. As used herein, a computation type refers to a method of implementing and processing a data set that generates a desired result. The data set may be, for example, a data source. The desired result may be, for example, an average, a standard of deviation, a histogram, etc.
In one or more embodiments of the invention, the remote intermediate computation managers (160) may instantiate computation frameworks in other data zones. The instantiated computation frameworks may generate a global computation result or an intermediate computation result and provide the aforementioned computation result to the corresponding remote intermediate computation manager.
In one or more embodiments of the invention, the frameworks in other data zones may be instantiated based on a set of criteria, including: (i) the availability of a data source in the data zone and (ii) the data sources implicated by the global computation request.
In one or more embodiments of the invention, the computation manager (152), global computation (154), local intermediate computation managers (156), local intermediate computations (158), and remote intermediate computation managers (160) are implemented as computer instructions, e.g., computer code, stored on a non-transitory storage that is executed using processing resources of the data zone (120,
To further clarify aspects of the invention,
The client identifier (202) may be an identifier of the client to which a result of the computation specified by the global computation request (200) is to be returned. In one or more embodiments of the invention, the client identifier (202) is an identifier of the client that generated the global computation request (200). In one or more embodiments of the invention, the client identifier (202) is a media access control address of the client that generated the global computation request (200). The client identifier (202) may be a media access control address of a client that did not generate the global computation request (200) without departing from the invention.
The computation description (204) may be a description of the global computation result desired by the requesting entity. For example, the computation description (204) may indicate that an average of a number of values stored in various data zones is being requested by the requesting entity. The computation description (204) may indicate any type of computation without departing from the invention.
The data zone identifier (212) may be an identifier of the data zone to which a result of the computation specified by the intermediate computation request (210) is to be returned. In one or more embodiments of the invention, the data zone identifier (212) is an identifier of a data zone that is performing a global or intermediate computation that will use the result of the requested computation as an input. In one or more embodiments of the invention, the data zone identifier (212) is a media access control address of a computing device of the data zone that generated the intermediate computation request (210). The data zone identifier (212) may be a media access control address of a computing device of a data zone that did not generate the intermediate computation request (210) without departing from the invention.
The computation description (214) may be a description of the intermediate computation result desired by the requesting entity. For example, the computation description (214) may indicate that an average of a number of values stored in the data zone is being requested by the requesting entity. The computation description (214) may indicate any type of computation without departing from the invention.
The data source identifier(s) (216) may identify data sources of the data zone and/or other data zones to be used as input for the computation to be performed. The data source identifier (216) may be, for example, a file name, a type of data, a file type, or any other characteristic of stored data.
The computation type identifier (232) may be an identifier of a computation type that may be instantiated using the template (230). The identifier may be used by a data zone to determine which template from the template library is used to instantiate a global/intermediate computation and/or local or remote global/intermediate computations. For example, a global computation request may include a description of a computation result. The computation result may be matched to an identifier of a template of the template library (220,
In one or more embodiments of the invention, the computation type identifier (232) may be a name of a type of computation. The name may be, for example, an average, a standard deviation, a histogram, etc.
The primary computation prototype (234) may be computer instructions, e.g., computer code. The computer instructions may be modifiable at runtime or compilation to be linked to data sources to be used as input and to store results at specifiable locations. For example, the data sources may be passed by value or reference to the computer instructions and a result may be returned by the computer instructions. The computer instructions may generate a result that reflects a specific type of computation, e.g., an average, a standard of deviation, a histogram, etc., as requested by a requestor.
In one or more embodiments of the invention, instantiating a primary computation of the computation type associated with the template (230) may be accomplished by generating an instance of the primary computation prototype (234) that operates on data generated by computation(s) based on the secondary computation prototype (236).
The secondary computation prototype (236) may be computer instructions, e.g., computer code. The computer instructions may be modifiable at runtime or compilation to be linked to data sources to be used as input and to store results at specifiable locations. For example, the data sources may be passed by value or reference to the computer instructions and a result may be returned by the computer instructions. The computer instructions may generate a result that reflects a specific type of computation, e.g., an average, a standard of deviation, a histogram, etc., used to generate a computation result necessary to perform a primary computation prototype. In other words, the secondary computation prototype (236) may be used to instantiate the computations necessary to generate the results used by the primary computation prototype to generate a computation result.
In one or more embodiments of the invention, instantiating a secondary computation of the computation type associated with the template (230) may be accomplished by generating an instance of the secondary computation prototype (236) that operates on data sources specified by the computation request.
In one or more embodiments of the invention, the primary computation prototype (234) may be used to instantiate a global/intermediate computation (154,
To further clarify aspects of templates, three examples of templates are discussed below. Each of the following examples is included for explanatory purposes. The examples include templates for computing a global average, computing a global min/max value, and computing a set of min/max values. Templates may be used to implement other types of computations without departing from the invention. For example, a template may be used to calculate a standard deviation or histograms without departing from the invention.
Example Global Average Template
Intermediate Computation Prototype—
Compute the sum of all values present in a data source, referred to as Sumi.
Count the number of items summed in the data source, referred to as Counti.
Returns a value pair, referred to as ValPairi=<Sumi,Counti>.
Global Computation Prototype—
Obtain value pairs generated by all of the intermediate computations and generate a set of value pairs, referred to as ValPairSetn={ValPair1, ValPair2, . . . , ValParn}, which can be also represented by ValPairSetn={<Sum1,Count1>, <Sum2, Count2>, . . . , <Sumn, Countn>}.
Calculate the sum of the sums and the sum of the counts; generate another value pair, referred to as
Generate the global average by dividing the sum of the sums by the sum of the counts, represented as: Averageglobal=Sumg/Countg.
Example Global Min/Max Template
Intermediate Computation Prototype—
Compute the minimum and the maximum of all values present in a data source, referred to as Mini and Maxi.
Returns a value pair, referred to as ValPairi=<Mini, Maxi>.
Global Computation Prototype—
Obtain the value pairs for all intermediate computations and generate a set of value pairs, referred to as ValPairSetn, ={ValPair1, ValPair2, . . . , ValPairn}, which can be also represented by ValPairSetn={<Min1,Max1>,<Min2,Max2>, . . . ,<Minn, Maxn>}.
Compute the minimum of the minimums and the maximum of the maximums, generate another value pair, referred to as ValPairg=<mini=1 to n Mini,maxi=1 to n Maxi>=<Ming, Maxg>.
Example Global Min/Max Set Template
Intermediate Computation Prototype—
Compute the set of m minimum and the maximum of all values present in a data source di, referred to as MinSetim and MaxSetim. These sets can be also represented as a sorted list in increasing order, where:
Minsetim=Mini1,Mini2, . . . Minim
, where Minik≤Minij,∀k<j.
MaxSetim=Maxi1,Maxi2, . . . Maxim
, where Maxik≤Maxij,∀k<j.
Returns a value pair, referred to as ValPairSetsim=<MinSetim, MaxSetim>.
Global Computation Prototype—
Obtain the value pairs for all intermediate computations and generate a set of value pairs, referred to as ValPalrSetsin={ValPairSets1m,ValPairSets2m, . . . ,ValPairSetsim}, which can be also represented by
Compute the set of minimums of the minimums and the set of maximums of the maximums, creating another value pair, referred to as:
Let MinSetgm=minimum m values in MinAllSetgm×n, sorted in increasing order
Let MaxSetgm=maximum m values in MaxAllSetgm×n, sorted in increasing order
As discussed above, the data zones (110,
While illustrated as separate methods, each of the methods illustrated in
In Step 300, a data generation request is obtained.
In one or more embodiments of the invention, the data generation request may be obtained from an application executing on a client. In one or more embodiments of the invention, the data generation request is obtained from a second client operably linked to a first client that obtained the data generation request. In one or more embodiments of the invention, the data generation request is obtained from a data zone.
In Step 302, a global computation request is generated based on the obtained data generation request.
In one or more embodiments of the invention, the generated global computation request specifies the requesting entity and a computation to be performed. In one or more embodiments of the invention, the generated global computation request may further specify grouping criteria. The grouping criteria may enable data sources of data zones on which the global computation is to be performed to be identified. In one or more embodiments of the invention, the grouping criteria is a data type. In one or more embodiments of the invention, the grouping criteria is an identifier of one or more data sources. In one or more embodiments of the invention, the request may be the same as the request shown in
In Step 304, the generated global computation request is sent to a data zone.
In Step 306, the requested data is obtained from a data zone. The data zone of Step 306 may be the same or different from the data zone in Step 304.
The method may end following Step 306.
In Step 400, a global computation request or an intermediate computation request is obtained.
In one or more embodiments of the invention, the global computation request is obtained from a client. In one or more embodiments of the invention, the global/intermediate computation request is obtained from a data zone. In one or more embodiments of the invention, the global computation request may have a format that is the same as the global computation request shown in
In Step 402, a computation framework is instantiated in response to obtaining the global/intermediate computation request.
In one or more embodiments of the invention, instantiating the computation framework includes generating a computation manager (e.g., 152,
In Step 404, a global/intermediate computation of the instantiated computation framework is instantiated based on the obtained global/intermediate computation request.
In one or more embodiments of the invention, the global/intermediate computation is instantiated by the computation manager. To instantiate the global/intermediate computation, the computation manager may identify a computation type specified by the obtained global/intermediate computation request and instantiate the global/intermediate computation using a template that matches the identified computation type.
In one or more embodiments of the invention, the global/intermediate computation may be instantiated using the method illustrated in
In Step 406, local intermediate computation managers and local intermediate computations are instantiated based on the instantiated global/intermediate computation.
In one or more embodiments of the invention, the computation manager may instantiate the local intermediate computation managers. In turn, each of the local computation managers may instantiate a corresponding local intermediate computation based on the template matched in Step 404.
In one or more embodiments of the invention, the instantiated local intermediate computations generate computation results based on data sources of the data zone in which the local intermediate computations are being performed. In one or more embodiments of the invention, the type of local intermediate computation is selected based on the matched template in Step 404.
In one or more embodiments of the invention, the local intermediate computation managers and local intermediate computations are instantiated using the method illustrated in
In Step 408, remote intermediate computation managers are instantiated and additional global/intermediate computations are instantiated on other computing devices based on the instantiated global/intermediate computation.
In one or more embodiments of the invention, the remote intermediate computation managers instantiate the intermediate computations in other data zones. As noted above, the computation manager may instantiate the global/intermediate computation using a template. The template may provide a prototype for the global/intermediate computation and local intermediate computations. The remote intermediate computation managers may receive intermediate computation results from the intermediate computations in other data zones that were instantiated by the remote intermediate computation managers. The aforementioned results may then be used as input to a global/intermediate computation to obtain a global computation result.
In one or more embodiments of the invention, the remote intermediate computation managers and intermediate computations on other data zones may be instantiated using the method illustrated in
The method may end following Step 408.
As seen in Step 408, remote intermediate computation managers may be instantiated in response to either (i) global computation requests or (ii) intermediate computation requests. The remote intermediate computation managers may generate additional global/intermediate computation requests. Thus, the method illustrated in
In Step 410, a computation result type and data sources are identified based on the obtained global/intermediate computation request.
In one or more embodiments of the invention, the computation result type and/or data sources may be specified in the obtained global computation request.
In Step 412, the identified computation result type is matched to one or more templates.
In one or more embodiments of the invention, the one or more templates may be one or more templates of a library of templates. Each template in the library may match types of computation result types. In one or more embodiments of the invention, each template in the library may match to a computation result type that is different from the computation result type that the other templates in the library match.
In one or more embodiments of the invention, multiple templates in the library match to a computation result type. In a scenario in which multiple templates match a computation result type, the contents of each of the matched templates may be used to generate executable code.
In Step 414, executable code may be generated based on a primary computation prototype of the matched template.
In one or more embodiments of the invention, generated executable code operates on the data sources identified in Step 410 to generate a global/intermediate result. The data sources may be data sources of data zones or intermediate computation results, i.e., computation results.
The method may end following Step 414.
In Step 420, data sources of a data zone are identified based on the global/intermediate computation request.
In Step 422, a local computation manager is instantiated for each identified data source.
In Step 424, a computation type is selected for each local computation manager based on a template.
In one or more embodiments of the invention, the template may be the same template used to facilitate instantiating the global/intermediate computation discussed in
In Step 426, the local computations are instantiated for each data source based on a secondary computation prototype included in the template.
In one or more embodiments of the invention, the intermediate computation template includes executable code. The local computations are instantiated by generating separate instances of the executable code that operate on the respective data sources and each instance generates a separate intermediate computation result.
The method may end following Step 426.
In Step 430, data sources in other data zones that are implicated by the global/intermediate computation request are identified.
In one or more embodiments of the invention, the data sources in other data zones are implemented by matching a grouping criteria specified in the global computation request.
In Step 432, a remote intermediate computation manager is instantiated for each identified data source.
In Step 434, a global/intermediate computation is instantiated in another data zone for each data source identified in Step 430. The instantiated computations may be local intermediate computations or global computations, depending on the type of data implicated by the global computation request.
In one or more embodiments of the invention, the intermediate computation is instantiated as a portion of a computation framework executing in the corresponding data zone. In one or more embodiments of the invention, each instantiated computation is based on a template having a computation type identifier that matches the computation type specified by the global/intermediate computation request. In one or more embodiments of the invention, the template is the same template matched in Step 412 of
The method may end following Step 434.
In Step 500, intermediate computation results are obtained.
In one or more embodiments of the invention, the intermediate computation results are obtained from other data zones. In one or more embodiments of the invention, elements of the data source used to generate the intermediate computation result cannot be obtained using the intermediate computation result.
In Step 502, a global computation result or an intermediate computation result is generated using the obtained intermediate computation results.
In one or more embodiments of the invention, the global/intermediate computation result is obtained from a global/intermediate computation using the obtained global/intermediate computation results. In one or more embodiments of the invention, the elements of data sources used to generate the obtained intermediate computation results cannot be obtained using the global computation result.
In Step 504, the global/intermediate computation is sent to a requesting entity.
The method may end following Step 504.
To further clarify aspects of the invention, a non-limiting example is shown in
Consider a system, as illustrated in
Consider a scenario where the client (600) sends a global computation request to data zone A (610) requesting an average of all of the sensor data.
In response to the request, the data zone A (610) instantiates a framework manager (not shown) associated with the request. The framework manager, in turn, instantiates a global average (612) of the sensor data in data zones A-C as a global computation. To facilitate the execution of the global computation, the framework manager instantiates a local intermediate calculation (614) to obtain a sum and count of the sensor data in data zone A (610). Additionally, the framework manager instantiates remote intermediate computation managers (616) for data zones B and C, respectively, because each of data zones B and C include sensor data.
In data zone B (620) a sum and count (622) of the sensor data computation is instantiated as an intermediate computation. Similarly, in data zone C (630), a sum and count (632) of the sensor data computation is instantiated as an additional intermediate computation.
To facilitate generating a global computation result, the remote intermediate computation managers for data zones B and C (616) obtain the intermediate results from data zones B and C, respectively. The aforementioned intermediate results obtained from data zones B and C are provided, along with the intermediate result generated in data zone A to the global computation, i.e., global average (612). The global computation then generates a global average, e.g., the average, by summing the sum from each intermediate computation, summing the count from each intermediate computation, and dividing the summed sum by the summed count.
Consider a scenario in which the sum and count of the intermediate computation (614) in data zone A (610) is 20 and 8, the sum and count of the intermediate computation (622) in data zone B (620) is 5 and 15, and the sum and count of the intermediate computation (632) in data zone C (630) is 27 and 3. The global average (612) would be calculated by summing the sums 20+5+27=52 and summing the counts 8+15+3=26. The summed sum of 52 is divided by the summed count of 27 resulting in a global average of 2.
In this example, a global average is calculated by sending a sum and count from each data zone to a single data zone in which the global average is calculated. Doing so dramatically reduces the bandwidth requirements and anonymizes the data by not sending the sensor data itself. Rather, intermediate computation results are sent to compute the desired global computation result.
The example ends.
Embodiments of the invention may improve the performance of computations in a network environment by distributing intermediate calculations to be performed by various data zones. A global computation may then be obtained using the intermediate computation results. The aforementioned distribution of computations across the network improves the performance of the computations by: (i) decreasing the communications bandwidth used to perform a global computation, (ii) decreasing disk input-output by reducing the quantity of data, e.g., copying of data sources, stored to perform a global computation, and (iii) spreading the computing load across a greater number of computing devices by performing intermediate computations across multiple data zones.
Still further, embodiments of the invention address the problem of computational resource cost scaling when performing global computations. As the quantity of data implicated by a global computation increases, it requires that any single data zone perform more computations, use more disk input-output, and utilize more bandwidth to move data to a centralized location and perform computations on the data at a centralized location, i.e., a single data zone. Embodiments of the invention reduce the likelihood of a single data zone becoming overwhelmed by attempting to compute a global computation.
While the above discussion highlighted features and/or uses of the invention, embodiments of the invention are not limited to similar uses and are not required to include similar features without departing from the invention. For example, some embodiments of the invention may have different, fewer, or more uses without departing from the invention.
Throughout this application, elements of figures may be labeled as A to N. As used herein, the aforementioned labeling means that the element may include any number of items and does not require that the element include the same number of elements as any other item labeled as A to N. For example, a data structure may include a first element labeled as A and a second element labeled as N. This labeling convention means that the data structure may include any of the elements. A second data structure, also labeled as A to N, may also include any number of elements. The number of elements of the first data structure and the number of elements of the second data structure may be the same or different.
One or more embodiments of the invention may be implemented using instructions executed by one or more processors of the data management device. Further, such instructions may correspond to computer readable instructions that are stored on one or more non-transitory computer readable mediums.
While the invention has been described above with respect to a limited number of embodiments, those skilled in the art, having the benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
The present application is a continuation-in-part of U.S. patent application Ser. No. 14/982,341, filed Dec. 29, 2015, now U.S. Pat. No. 10,015,106 and entitled “Multi-Cluster Distributed Data Processing Platform,” and U.S. patent application Ser. No. 14/982,351, filed Dec. 29, 2015, now U.S. Pat. No. 10,270,707 and entitled “Distributed Catalog Service for Multi-Cluster Data Processing Platform,” both of which are incorporated by reference herein in their entirety, and which claim priority to U.S. Provisional Patent Application Ser. No. 62/143,404, entitled “World Wide Hadoop Platform,” and U.S. Provisional Patent Application Ser. No. 62/143,685, entitled “Bioinformatics,” both filed Apr. 6, 2015, and incorporated by reference herein in their entirety. This application also claims the benefit of U.S. Provisional Application Ser. No. 62/436,709, filed Dec. 20, 2016. In accordance with 37 CFR § 1.57(c), the content of provisional application No. 62/436,709 is expressly incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
6112225 | Kraft et al. | Aug 2000 | A |
6516350 | Lumelsky et al. | Feb 2003 | B1 |
7010163 | Weiss | Mar 2006 | B1 |
7499915 | Chandrasekar et al. | Mar 2009 | B2 |
7657537 | Corbett | Feb 2010 | B1 |
7934018 | Lavallee et al. | Apr 2011 | B1 |
7934248 | Yehuda et al. | Apr 2011 | B1 |
7953843 | Cherkasova | May 2011 | B2 |
8224825 | Wang et al. | Jul 2012 | B2 |
8392564 | Czajkowski | Mar 2013 | B1 |
8499331 | Yehuda et al. | Jul 2013 | B1 |
8706798 | Suchter et al. | Apr 2014 | B1 |
8732118 | Cole | May 2014 | B1 |
8806061 | Lobo et al. | Aug 2014 | B1 |
8873836 | Dietrich | Oct 2014 | B1 |
8886649 | Zhang | Nov 2014 | B2 |
8904506 | Canavor et al. | Dec 2014 | B1 |
8938416 | Cole | Jan 2015 | B1 |
9020802 | Florissi | Apr 2015 | B1 |
9031992 | Florissi et al. | May 2015 | B1 |
9130832 | Boe | Sep 2015 | B1 |
9158843 | Florissi | Oct 2015 | B1 |
9229952 | Meacham | Jan 2016 | B1 |
9235446 | Bruno | Jan 2016 | B2 |
9239711 | Mistry | Jan 2016 | B1 |
9280381 | Florissi et al. | Mar 2016 | B1 |
9338218 | Florissi et al. | May 2016 | B1 |
9361263 | Florissi et al. | Jun 2016 | B1 |
9374660 | Tilles | Jun 2016 | B1 |
9418085 | Shih | Aug 2016 | B1 |
9451012 | Neill | Sep 2016 | B1 |
9489233 | Florissi | Nov 2016 | B1 |
9613124 | Rabinowitz | Apr 2017 | B2 |
9659057 | Tian | May 2017 | B2 |
9665660 | Wensel | May 2017 | B2 |
9678497 | Karypis | Jun 2017 | B2 |
9697262 | Chandramouli et al. | Jul 2017 | B2 |
9747127 | Florissi et al. | Aug 2017 | B1 |
9747128 | Vijendra et al. | Aug 2017 | B1 |
9767149 | Ozcan | Sep 2017 | B2 |
9805170 | Keyes et al. | Oct 2017 | B2 |
9832068 | McSherry | Nov 2017 | B2 |
9838410 | Muddu | Dec 2017 | B2 |
9848041 | Einkauf | Dec 2017 | B2 |
9996662 | Florissi et al. | Jun 2018 | B1 |
10015106 | Florissi et al. | Jul 2018 | B1 |
10111492 | Florissi et al. | Oct 2018 | B2 |
10114923 | Florissi et al. | Oct 2018 | B1 |
10122806 | Florissi et al. | Nov 2018 | B1 |
10127352 | Florissi et al. | Nov 2018 | B1 |
10148736 | Lee et al. | Dec 2018 | B1 |
10250708 | Carver et al. | Apr 2019 | B1 |
10270707 | Florissi et al. | Apr 2019 | B1 |
10277668 | Florissi | Apr 2019 | B1 |
10311363 | Florissi et al. | Jun 2019 | B1 |
10331380 | Florissi et al. | Jun 2019 | B1 |
10348810 | Florissi et al. | Jul 2019 | B1 |
10374968 | Duerk et al. | Aug 2019 | B1 |
10404787 | Florissi et al. | Sep 2019 | B1 |
10425350 | Florissi | Sep 2019 | B1 |
20020056025 | Qiu et al. | May 2002 | A1 |
20020073167 | Powell et al. | Jun 2002 | A1 |
20020129123 | Johnson et al. | Sep 2002 | A1 |
20030212741 | Glasco | Nov 2003 | A1 |
20040247198 | Ghosh et al. | Dec 2004 | A1 |
20050010712 | Kim et al. | Jan 2005 | A1 |
20050102354 | Hollenbeck et al. | May 2005 | A1 |
20050114476 | Chen et al. | May 2005 | A1 |
20050132297 | Milic-Frayling et al. | Jun 2005 | A1 |
20050153686 | Kall et al. | Jul 2005 | A1 |
20050165925 | Dan et al. | Jul 2005 | A1 |
20050266420 | Pusztai et al. | Dec 2005 | A1 |
20050278761 | Gonder et al. | Dec 2005 | A1 |
20060002383 | Jeong et al. | Jan 2006 | A1 |
20060122927 | Huberman et al. | Jun 2006 | A1 |
20060126865 | Blamey et al. | Jun 2006 | A1 |
20060173628 | Sampas et al. | Aug 2006 | A1 |
20070026426 | Fuernkranz et al. | Feb 2007 | A1 |
20070076703 | Yoneda et al. | Apr 2007 | A1 |
20070088703 | Kasiolas et al. | Apr 2007 | A1 |
20080027954 | Gan et al. | Jan 2008 | A1 |
20080028086 | Chetuparambil et al. | Jan 2008 | A1 |
20080077607 | Gatawood et al. | Mar 2008 | A1 |
20080155100 | Ahmed et al. | Jun 2008 | A1 |
20080260119 | Marathe et al. | Oct 2008 | A1 |
20080279167 | Cardei | Nov 2008 | A1 |
20090062623 | Cohen et al. | Mar 2009 | A1 |
20090076651 | Rao | Mar 2009 | A1 |
20090150084 | Colwell et al. | Jun 2009 | A1 |
20090198389 | Kirchhof-Falter et al. | Aug 2009 | A1 |
20090310485 | Averi et al. | Dec 2009 | A1 |
20090319188 | Otto | Dec 2009 | A1 |
20100005077 | Krishnamurthy et al. | Jan 2010 | A1 |
20100042809 | Schenfeld et al. | Feb 2010 | A1 |
20100076845 | Ramer et al. | Mar 2010 | A1 |
20100076856 | Mullins | Mar 2010 | A1 |
20100122065 | Dean et al. | May 2010 | A1 |
20100131639 | Narayana et al. | May 2010 | A1 |
20100184093 | Donovan et al. | Jul 2010 | A1 |
20100229178 | Ito | Sep 2010 | A1 |
20100250646 | Dunagan et al. | Sep 2010 | A1 |
20100290468 | Lynam et al. | Nov 2010 | A1 |
20100293334 | Xun et al. | Nov 2010 | A1 |
20100299437 | Moore | Nov 2010 | A1 |
20110020785 | Lowery, Jr. et al. | Jan 2011 | A1 |
20110029999 | Foti | Feb 2011 | A1 |
20110103364 | Li | May 2011 | A1 |
20110145828 | Takahashi et al. | Jun 2011 | A1 |
20110208703 | Fisher et al. | Aug 2011 | A1 |
20110314002 | Oliver et al. | Dec 2011 | A1 |
20120030599 | Butt et al. | Feb 2012 | A1 |
20120059707 | Goenka et al. | Mar 2012 | A1 |
20120071774 | Osorio et al. | Mar 2012 | A1 |
20120191699 | George et al. | Jul 2012 | A1 |
20130035956 | Carmeli et al. | Feb 2013 | A1 |
20130044925 | Kozuka et al. | Feb 2013 | A1 |
20130054670 | Keyes | Feb 2013 | A1 |
20130194928 | Iqbal | Aug 2013 | A1 |
20130246460 | Maltbie et al. | Sep 2013 | A1 |
20130282897 | Siegel | Oct 2013 | A1 |
20130290249 | Merriman et al. | Oct 2013 | A1 |
20130291118 | Li et al. | Oct 2013 | A1 |
20130318257 | Lee et al. | Nov 2013 | A1 |
20130326538 | Gupta | Dec 2013 | A1 |
20130346229 | Martin et al. | Dec 2013 | A1 |
20130346988 | Bruno et al. | Dec 2013 | A1 |
20140012843 | Soon-Shiong | Jan 2014 | A1 |
20140025393 | Wang et al. | Jan 2014 | A1 |
20140032240 | Lougheed | Jan 2014 | A1 |
20140075161 | Zhang et al. | Mar 2014 | A1 |
20140081984 | Sitsky et al. | Mar 2014 | A1 |
20140082178 | Boldyrev et al. | Mar 2014 | A1 |
20140143251 | Wang | May 2014 | A1 |
20140173331 | Martin et al. | Jun 2014 | A1 |
20140173618 | Neuman | Jun 2014 | A1 |
20140214752 | Rash | Jul 2014 | A1 |
20140215007 | Rash | Jul 2014 | A1 |
20140278808 | Iyoob et al. | Sep 2014 | A1 |
20140279201 | Iyoob et al. | Sep 2014 | A1 |
20140280298 | Petride et al. | Sep 2014 | A1 |
20140280363 | Heng et al. | Sep 2014 | A1 |
20140280604 | Ahiska et al. | Sep 2014 | A1 |
20140280880 | Tellis et al. | Sep 2014 | A1 |
20140280990 | Dove et al. | Sep 2014 | A1 |
20140310258 | Tian | Oct 2014 | A1 |
20140310718 | Gerphagnon et al. | Oct 2014 | A1 |
20140320497 | Vojnovic et al. | Oct 2014 | A1 |
20140324647 | Iyoob et al. | Oct 2014 | A1 |
20140325041 | Xu et al. | Oct 2014 | A1 |
20140333638 | Kaminski et al. | Nov 2014 | A1 |
20140358999 | Rabinowitz et al. | Dec 2014 | A1 |
20140365518 | Cal et al. | Dec 2014 | A1 |
20140365662 | Dave et al. | Dec 2014 | A1 |
20140372611 | Matsuda | Dec 2014 | A1 |
20140379722 | Mysur et al. | Dec 2014 | A1 |
20150006619 | Banadaki | Jan 2015 | A1 |
20150019710 | Shaashua et al. | Jan 2015 | A1 |
20150039586 | Kerschbaum et al. | Feb 2015 | A1 |
20150039667 | Shah | Feb 2015 | A1 |
20150058843 | Holler | Feb 2015 | A1 |
20150066646 | Sriharsha | Mar 2015 | A1 |
20150081877 | Sethi et al. | Mar 2015 | A1 |
20150088786 | Anandhakrishnan | Mar 2015 | A1 |
20150092561 | Sigoure | Apr 2015 | A1 |
20150120791 | Gummaraju | Apr 2015 | A1 |
20150121371 | Gummaraju | Apr 2015 | A1 |
20150169683 | Chandramouli et al. | Jun 2015 | A1 |
20150178052 | Gupta et al. | Jun 2015 | A1 |
20150193583 | McNair | Jul 2015 | A1 |
20150201036 | Nishiki | Jul 2015 | A1 |
20150222723 | Adapalli et al. | Aug 2015 | A1 |
20150254344 | Kulkarni | Sep 2015 | A1 |
20150254558 | Arnold | Sep 2015 | A1 |
20150262268 | Padmanabhan et al. | Sep 2015 | A1 |
20150264122 | Shau | Sep 2015 | A1 |
20150269230 | Kardes | Sep 2015 | A1 |
20150277791 | Li | Oct 2015 | A1 |
20150278513 | Krasin et al. | Oct 2015 | A1 |
20150295781 | Maes | Oct 2015 | A1 |
20150302075 | Schechter | Oct 2015 | A1 |
20150339210 | Kopp | Nov 2015 | A1 |
20150355946 | Kang | Dec 2015 | A1 |
20150369618 | Barnard et al. | Dec 2015 | A1 |
20160004827 | Silva et al. | Jan 2016 | A1 |
20160063191 | Vesto et al. | Mar 2016 | A1 |
20160072726 | Soni et al. | Mar 2016 | A1 |
20160087909 | Chatterjee et al. | Mar 2016 | A1 |
20160098021 | Zornio et al. | Apr 2016 | A1 |
20160098472 | Appleton | Apr 2016 | A1 |
20160098662 | Voss | Apr 2016 | A1 |
20160112531 | Milton et al. | Apr 2016 | A1 |
20160125056 | Knezevic | May 2016 | A1 |
20160132576 | Qi | May 2016 | A1 |
20160170882 | Choi et al. | Jun 2016 | A1 |
20160171072 | Jagtiani et al. | Jun 2016 | A1 |
20160179642 | Cai | Jun 2016 | A1 |
20160179979 | Aasman et al. | Jun 2016 | A1 |
20160182305 | Martin et al. | Jun 2016 | A1 |
20160182327 | Coleman, Jr. et al. | Jun 2016 | A1 |
20160188594 | Ranganathan | Jun 2016 | A1 |
20160196324 | Haviv | Jul 2016 | A1 |
20160205106 | Yacoub et al. | Jul 2016 | A1 |
20160241893 | Allhands et al. | Aug 2016 | A1 |
20160246981 | Nakagawa et al. | Aug 2016 | A1 |
20160260023 | Miserendino, Jr. et al. | Sep 2016 | A1 |
20160261727 | Yang | Sep 2016 | A1 |
20160267132 | Castellanos et al. | Sep 2016 | A1 |
20160269228 | Franke et al. | Sep 2016 | A1 |
20160283551 | Fokoue-Nkoutche et al. | Sep 2016 | A1 |
20160323377 | Einkauf | Nov 2016 | A1 |
20160328661 | Reese | Nov 2016 | A1 |
20160337473 | Rao | Nov 2016 | A1 |
20160350157 | Necas | Dec 2016 | A1 |
20170006135 | Siebel | Jan 2017 | A1 |
20170032263 | Yuan et al. | Feb 2017 | A1 |
20170083573 | Rogers et al. | Mar 2017 | A1 |
20170109299 | Belair | Apr 2017 | A1 |
20170116289 | Deshmukh et al. | Apr 2017 | A1 |
20170149630 | Feller | May 2017 | A1 |
20170155707 | Rash | Jun 2017 | A1 |
20170187785 | Johnson et al. | Jun 2017 | A1 |
20170220646 | Schechter | Aug 2017 | A1 |
20170272458 | Muddu | Sep 2017 | A1 |
20170323028 | Jonker | Nov 2017 | A1 |
20170337135 | Hu | Nov 2017 | A1 |
20170346690 | Dorado | Nov 2017 | A1 |
20180054355 | Balser et al. | Feb 2018 | A1 |
20180101583 | Li et al. | Apr 2018 | A1 |
20180181957 | Crabtree et al. | Jun 2018 | A1 |
20180189296 | Ashour et al. | Jul 2018 | A1 |
20180240062 | Crabtree et al. | Aug 2018 | A1 |
20180308585 | Holmes et al. | Oct 2018 | A1 |
20190026146 | Peffers et al. | Jan 2019 | A1 |
20190130122 | Barnes et al. | May 2019 | A1 |
20190149418 | Bertsche et al. | May 2019 | A1 |
20190173666 | Ardashev et al. | Jun 2019 | A1 |
20190179672 | Christidis et al. | Jun 2019 | A1 |
20190206090 | Ray et al. | Jul 2019 | A1 |
20190207759 | Chan et al. | Jul 2019 | A1 |
20190214848 | Waffner | Jul 2019 | A1 |
20190244243 | Goldberg et al. | Aug 2019 | A1 |
20190253134 | Coleman et al. | Aug 2019 | A1 |
Number | Date | Country |
---|---|---|
104731595 | Jun 2015 | CN |
Entry |
---|
V.K. Vavilapalli et al., “Apache Hadoop YARN: Yet Another Resource Negotiator,” Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC), Article No. 5, Oct. 2013, 16 pages. |
A.C. Murthy et al., “Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2,” Addison-Wesley Professional, Mar. 29, 2014, 78 pages. |
Global Alliance for Genomics and Health, “Beacons,” https://genomicsandhealth.org/work-products-demonstration-projects/beacons, Jun. 27, 2014, 2 pages. |
Data Working Group, “Global Alliance Genomics API,” http://ga4gh.org/#documentation, Dec. 28, 2015, 2 pages. |
Aaron Krol, “Beacon Project Cracks the Door for Genomic Data Sharing,” http://www.bio-itworld.com/2015/8/14/beacon-project-cracks-door-genomic-data-sharing.html, Aug. 14, 2015, 3 pages. |
U.S. Appl. No. 14/982,341 filed in the name of Patricia Gomes Soares Florissi et al., on Dec. 29, 2015 and entitled “Multi-Cluster Distributed Data Processing Platform.” |
U.S. Appl. No. 14/982,351 filed in the name of Patricia Gomes Soares Florissi et al., on Dec. 29, 2015 and entitled “Distributed Catalog Service for Multi-Cluster Data Processing Platform.” |
U.S. Appl. No. 15/395,340 filed in the name of Bryan Duerk et al., on Dec. 30, 2016 and entitled “Data-Driven Automation Mechanism for Analytics Workload Distribution.” |
Wikipedia, “Apache Spark,” https://en.wikipedia.org/wiki/Apache_Spark, Apr. 10, 2017, 6 pages. |
U.S. Appl. No. 15/485,843 filed in the name of Patricia Gomes Soares Florissi et al., on Apr. 12, 2017 and entitled “Scalable Distributed In-Memory Computation.” |
U.S. Appl. No. 15/582,743 filed in the name of Patricia Gomes Soares Florissi et al., on Apr. 30, 2017 and entitled “Scalable Distributed In-Memory Computation Utilizing Batch Mode Extensions.” |
M. K. Gardner et al., “Parellel Genomic Sequence-Searching on an Ad-Hoc Grid: Experiences, Lessons Learned, and Implications,” Proceedings of the 2006 ACM/IEEE SC/06 Conference, IEEE Computer Society, 2006, 14 pages. |
A.G. Craig et al., “Ordering of Cosmid Clones Covering the Herpes Simplex Virus Type I (HSV-I) Genome: A Test Case for Fingerprinting by Hybridisation,” Nucleic Acids Research, vol. 18, 1990, pp. 2653-2660. |
T.R. Golub et al., “Molecular classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, Oct. 15, 1999, pp. 531-537. |
D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, Mar. 2002, pp. 203-209. |
U.S. Appl. No. 15/281,248 filed in the name of Patricia Gomes Soares Florissi et al., on Sep. 30, 2016 and entitled “Methods and Apparatus Implementing Data Model for Disease Monitoring, Characterization and Investigation.” |
P.P. Jayaraman et al., “Analytics-as-a-Service in a Multi-Cloud Environment Through Semantically-Enabled Hierarchical Data Processing,” Software: Practice and Experience, Aug. 2017, pp. 1139-1156, vol. 47, No. 8. |
J.Y.L. Lee et al., “Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era,” The American Statistician, Dec. 15, 2016, 22 pages. |
S. Wang et al., “Genome Privacy: Challenges, Technical Approaches to Mitigate Risk, and Ethical Considerations in the United States,” Annals of the New York Academy of Sciences, Jan. 2017, pp. 73-83, vol. 1387, No. 1. |
K. Xu et al., “Privacy-Preserving Machine Learning Algorithms for Big Data Systems,” IEEE 35th International Conference on Distributed Computing Systems (ICDCS), Jun. 29-Jul. 2, 2015, pp. 318-327. |
X. Wu et al., “Privacy Preserving Data Mining Research: Current Status and Key Issues,” Proceedings of the 7th International Conference on Computational Science, Part III: ICCS 2007, May 2007, pp. 762-772. |
A.P. Kulkarni et al., “Survey on Hadoop and Introduction to YARN,” International Journal of Emerging Technology and Advanced Engineering, May 2014, pp. 82-67, vol. 4, No. 5. |
R.R. Miller et al., “Metagenomics for Pathogen Detection in Public Health,” Genome Medicine, Sep. 20, 2013, 14 pages, vol. 5, No. 81. |
T. Thomas et al., “Metagenomics—a Guide from Sampling to Data Analysis,” Microbial Informatics and Experimentation, Oct. 13, 2012, 12 pages, vol. 2, No. 3. |
E.R. Ganser et al., “A Technique for Drawing Directed Graphs,” IEEE Transactions on Software Engineering, Mar. 1993, pp. 214-230, vol. 19, No. 3. |
J. Leskovec, “Graphs Over Time: Densification Laws, Shrinking Diameters arid Possible Explanations,” Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, Aug. 21-24, 2005, pp. 177-187. |
H. Zha et al., “Bipartite Graph Partitioning and Data Clustering,” Proceedings of the Tenth International Conference on Information and Knowledge Management, Oct. 5-10, 2001, pp. 25-32. |
A. Oghabian et al., “Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis,” PLOS One, Mar. 20, 2014, 10 pages, vol. 9, No. 3. |
S. Ryza, “How To: Tune Your Apache Spark Jobs,” https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/, Mar. 9, 2015, 23 pages. |
T. White, “Hadoop: The Definitive Guide,” O'Reilly Media, Inc., Fourth Edition, Sebastopol, CA, Apr. 2015, 756 pages. |
L. Shashank, “Spark on Yarn,” https://www.slideshare.net/datamantra/spark-on-yarn-54201193, Oct. 21, 2015, 47 pages. |
Dell, “Dell Boomi Platform: Connect Every Part of Your Business to Transform How You do Business,” https://marketing.boomi.com/rs/777-AVU-348/images/Boomi-Integration-Cloud.pdf, 2017, 4 pages. |
D. Ucar et al., “Combinatorial Chromatin Modification Patterns in the Human Genome Revealed by Subspace Clustering,” Nucleic Acids Research, May 1, 2011, pp. 4063-4075, vol. 39, No. 10. |
Number | Date | Country | |
---|---|---|---|
62436709 | Dec 2016 | US | |
62143685 | Apr 2015 | US | |
62143404 | Apr 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14982341 | Dec 2015 | US |
Child | 15799314 | US | |
Parent | 14982351 | Dec 2015 | US |
Child | 14982341 | US |