The subject matter disclosed herein relates to the analysis of seismic data, such as to automatically identify features of interest.
Seismic data is collected and used for evaluating underground structures and features that might otherwise not be discernible. Such seismic data may be useful in searching for minerals or materials (such as hydrocarbons, metals, water, and so forth) that are located underground and which may be difficult to localize. In practice, the seismic data is derived based on the propagation of seismic waves through the various strata forming earth. In particular, the propagation of seismic waves may be useful in localizing the various edges and boundaries associated with different strata within the earth and with the surfaces of various formations or structures that may be present underground.
The seismic waves used to generate seismic data may be created using any number of mechanisms, including explosives, air guns, or other mechanisms capable of creating vibrations or seismic waves capable of spreading through the Earth's subsurface. The seismic waves may reflect, to various degrees, at the boundaries or transitions between strata or structures, and these reflected seismic waves are detected and used to form a set of seismic that may be used to examine the subsurface area being investigated.
One challenge that arises in the context of these seismic investigations is in the interpretation and analysis of the large three-dimensional data sets that can be generated in a seismic survey project. In particular, analysis of such data sets may be tedious and time-consuming, potentially requiring months of manual work to analyze. Further, the complexity of the seismic data may limit the usefulness or effectiveness of automated approaches for data analysis.
In one embodiment a method is provided for analyzing seismic data. The method comprises the act of accessing a set of seismic data comprising a plurality of features of interest. The plurality of features of interest are processed using a classification algorithm. The classification algorithm outputs a list of possible classifications for each feature of interest. The list of possible classifications are processed using a ranking algorithm. The ranking algorithm outputs a ranked list of possible classifications for each feature of interest. During operation, the ranking algorithm generates a pool query set of respective queries related to the features of interest; submits one or more queries to a reviewer to obtain reviewer feedback; and updates the operation of the ranking algorithm based on the reviewer feedback to the queries. Each query comprises two or more ranked features of interest and the reviewer feedback to each query comprises an evaluation of the respective features of interest within the respective query.
In a further embodiment, a seismic data analysis system is provided. The seismic data analysis system comprises a memory storing one or more routines and a processing component configured to execute the one or more routines stored in the memory. The one or more routines, when executed by the processing component, cause acts to be performed comprising: processing a set of seismic data comprising a plurality of features of interest using a classification algorithm that outputs a list of possible classifications for each feature of interest and running a ranking algorithm on the list of possible classifications. The ranking algorithm outputs a ranked list of possible classifications for each feature of interest. During operation, the ranking algorithm: generates a pool query set of respective queries related to the features of interest; submits one or more queries to a reviewer to obtain reviewer feedback; and updates the operation of the ranking algorithm based on the reviewer feedback to the queries. Each query comprises two or more ranked features of interest and the reviewer feedback to each query comprises an evaluation of the respective features of interest within the respective query.
In an additional embodiment, a method is provided for analyzing seismic data. The method comprises the act of accessing a set of seismic data comprising a plurality of features of interest. The plurality of features of interest are classified as corresponding to a geological feature of interest using a classification algorithm. One or more ambiguous features of interest are selected from the plurality of features of interest. The one or more ambiguous features of interest are selected based on the degree to which a classification of a respective ambiguous feature of interest reduces the variance associated with the classification algorithm. One or more queries are submitted to a reviewer to obtain reviewer classifications of one or more of the ambiguous features of interest. The operation of the classification algorithm is updated based on the reviewer classifications.
In a further embodiment, a seismic data analysis system is provided. The seismic data analysis system comprises a memory storing one or more routines and a processing component configured to execute the one or more routines stored in the memory. The one or more routines, when executed by the processing component, cause acts to be performed comprising: accessing a set of seismic data comprising a plurality of features of interest; classifying the plurality of features of interest as corresponding to a geological feature of interest using a classification algorithm; selecting one or more ambiguous features of interest from the plurality of features of interest; submitting one or more queries to a reviewer to obtain reviewer classifications of one or more of the ambiguous features of interest; and updating the operation of the classification algorithm based on the reviewer classifications. The one or more ambiguous features of interest are selected based on the degree to which a classification of a respective ambiguous feature of interest reduces the variance associated with the classification algorithm.
These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Seismic data may be analyzed and used to detect subsurface features of interest, such as geological structures or formations that may be indicative of hydrocarbon resources. For example, identification of geobodies (e.g., channels, pinchouts, progrades, gas chimneys, and so forth) from a three-dimensional (3D) seismic survey may be performed as part of prospecting for hydrocarbons (e.g., oil, natural gas, and so forth). As generally used herein, a geobody is a feature of interest contained in the seismic data or some derived (attribute) data set. Such a geobody may take the form, in a volumetric data set, of a set of contiguous, connected, or proximate voxels within the image data that may in turn, based on the characteristics of the identified voxels, correspond to an actual physical or geological feature or structure within the data, such as a geological structure, formation, or feature. Although the present discussion is generally described in the context of seismic data, it should be appreciated that the present approaches and discussion may be generally applicable in the context of geophysical data (attributes, velocities, or impedances or resistivity volumes), geologic data (geologic models, or geologic simulations), wireline data, or reservoir simulation data or any combinations thereof.
One of the challenges in hydrocarbon prospecting is the time consuming and imprecise task of interpreting the 3D volumes generated from the acquired seismic data. For example, a single seismic volume may require mouths of manual work to analyze. As discussed herein, automated methods may make such time consuming work more feasible for a reviewer to interpret. However, automated interpretation of a 3D volume generated from seismic images may be difficult to achieve in practice. For example, it may be useful for an automated analysis of seismic data to classify and, in certain instances, rank or otherwise sort various features of interest (e.g., channels, pinchouts, progrades, gas chimneys, and so forth) identified in a seismic volume according to type and/or the degree of interest or preference for certain types of features. As generally used herein, a geobody is a feature of interest contained in the seismic data or some derived (attribute) data set. Such a geobody may take the form, in a volumetric data set, of a set of contiguous, connected, or proximate voxels within the image data that may in turn, based on the characteristics of the identified voxels, correspond to an actual physical or geological feature or structure within the data, such as a geological structure, formation, or feature. Although the present discussion is generally described in the context of seismic data, it should be appreciated that the present approaches and discussion may be generally applicable in the context of geophysical data (attributes, velocities, or impedances or resistivity volumes), geologic data (geologic models, or geologic simulations), wireline data, or reservoir simulation data or any combinations thereof.
As will be appreciated, certain types of geological features (or features having certain characteristics) may be of more interest than other types. It would, therefore, be useful if the geological features that are of the greatest interest to the reviewer are ranked or sorted so as to make the review of these features more efficient or productive. With this in mind, the present approaches relate to various active learning techniques that may be used in the analysis of seismic data volumes. Active learning approaches are differentiated from passive approaches, in which user feedback may simply be examined or analyzed for patterns or helpful input, such as when a user selects a presented search result. In such passive approaches the selections being reviewed by the user are not actively formulated to further refine the performance of the learning algorithm and are not evaluated by the user directly as to the quality or correctness of the result.
In contrast, active learning approaches, as discussed herein, relate to those approaches where user or expert feedback may be sought, but it is sought in response to actively formulated queries or data presentations that are constructed to resolve the greatest amount of the remaining ambiguity within the data set undergoing examination. Further, such active approaches may actually involve or request a judgment or evaluation by the reviewer, such as to whether a classification or ranking is good or not, which may allow adjustment of the underlying algorithm to better resolve ambiguities in the data. As discussed herein, such active learning processes may include the automated generation and presentation of a limited set of queries or results to an expert to obtain the expert's feedback and to thereby resolve the greatest ambiguity in the data being evaluated with, optimally, the fewest number of queries. As will be appreciated, the ambiguity which may be addressed in the active approach may relate to the diversity or variance in the underlying data and, in particular, to resolving those data points that are difficult to classify or rank due to not clearly (i.e., unambiguously) falling into one classification, rank, and so forth.
With the foregoing in mind, the following discussion relates various approaches using active learning techniques that may be used to facilitate user review of a seismic data set. By way of brief introduction, it should be appreciated that such automated approaches to analyzing seismic data may involve algorithms used to identify features of interest within a seismic volume, to classify these features into different types or by different characteristics, and, in some instances, to separately rank a set of classified features to further facilitate user review.
With this in mind, and as discussed herein, the present active learning approaches may be utilized in conjunction with a 3D seismic data set generated using any suitable seismic surveying system. Turning to
In the depicted example, a seismic generator 28 of some form (such as one or more controlled detonations, an air gun or cannon, or another suitable source of seismic waves) is part of the seismic surveying system 10. The seismic generator 28 can typically be moved to different positions on the surface of the volume 20 and can be used to generate seismic waves 30 at different positions on the surface 32 that penetrate the subsurface volume 20 under investigation. The various boundaries or transitions within the subsurface 20 (either associated with the various layers or strata 22 or with more complex geological features) cause the reflection 40 of some number of the seismic waves 30. One or more transducers 44 at the surface 32 may be used to detect the waves 40 reflected by the internal structures of the subsurface volume 20 and to generate responsive signals (i.e., electrical or data signals).
These signals, when reconstructed, represent the internal boundaries and features of the subsurface volume 20. For example, in the depicted embodiment, the signals are provided to one or more computers 50 or other suitable processor-based devices that may be used to process the signals and reconstruct a volume depicting the internal features of the subsurface volume 20. In one embodiment, the computer 50 may be a processor-based system having a non-volatile storage 52 (such as a magnetic or solid state hard drive or an optical media) suitable for storing the data or signals generated by the transducer 44 as well as one or more processor-executable routines or algorithms, as discussed herein, suitable for processing the generated data or signals in accordance with the present approaches. In addition, the computer 50 may include a volatile memory component 54 suitable for storing data and signals as well as processor-executable routines or algorithms prior to handling by the processor 56. The processor 56 may, in turn, generate new data (such as a volumetric representation of the subsurface volume 20 and/or a set of features of interest identified in such a reconstructed volume) upon executing the stored algorithms in accordance with the present approaches. The data or reconstructions generated by the processor 56 may be stored in the memory 54 or the storage device 52 or may be displayed for review, such as on an attached display 60.
Turning to
With this in mind, various approaches are discussed herein to facilitate automated inspection of such seismic volumes. In certain of these embodiments, a set of features that may be of interest (i.e., geological structures or formations of interest) is automatically generated and provided to a reviewer or displayed in conjunction with the volume, allowing the reviewer to more efficiently analyze the seismic volume for features that are of interest. As part of this process, the identified features may be classified and/or may be ranked based on likely interest to facilitate the user's examination.
As will be appreciated from the above discussion, in certain embodiments of the present approach, expert inputs are elicited that will have the most impact on the efficacy of a learning algorithm employed in the analysis, such as a classification or ranking algorithm, and which may involve eliciting a judgment or evaluation of classification or rank (e.g., right or wrong, good or bad) by the reviewer with respect to a presented query. Such inputs may be incorporated in real-time in the analysis of seismic data, either in a distributed or non-distributed computing framework. In certain implementations, queries to elicit such input are generated based on a seismic data set undergoing automated evaluation and the queries are sent to a workstation (e.g., computer 50) for an expert to review.
In one embodiment, based upon the expert feedback (such as the selection of a presented geological feature as being of interest) a ranking algorithm, as discussed herein, may update a set of displayed and ranked search results of geological features present in the seismic data. In this manner, the ranked features provided for review as potentially being of interest will gradually be improved in quality as the reviewer's selections are used to refine the performance of the ranking algorithm. For example, reviewer selection of a feature of interest (such as in response to a query posed by the system) reinforces selection and presentation of like features. In this manner, selections that are highly rated by the reviewer are used to preferentially rank similar features as being of interest, thus facilitating the review process by preferentially presenting or highlighting those features believed to be of interest.
By way of example of one such embodiment, a classification algorithm (i.e., a classifier, f) is employed to initially classify a number of regions or features of interest identified within a seismic volume 62. As part of the classification, the classifier may also assign an initial probability to the classification. In one implementation, a separate ranking algorithm (i.e., a ranker, h) processes the output of the classification algorithm and re-orders each identified and classified geological feature according to the respective probabilities that each of the classified geological features is of the assigned type (e.g., channel, pinchout, chimney, or parallel seismic facies). By deriving these respective rankings, the rankings may be improved and, thus, may be of more use to a reviewer.
In one implementation, a classifier, f, is initially trained offline using previously labeled examples of the respective classes of geological features. The trained classifier, f, may then be used to generate a ranked list of possible classifications for each feature undergoing classification during live execution of the classifier (i.e., on an unlabeled data set). In one such implementation, the ranked list for each respective identified feature would include a probabilistic score from 0 to 1 for each of the possible classes for that feature. The ranked list of classifications derived for each feature can then be combined to generate a single ranked list of geological features, with those features listed higher on the list being of greater interest or importance.
In certain embodiments, the classification algorithm f is a multi-class classifier and is based on a branching or tree-type structure (e.g., a decision tree) whereby classifying a respective feature involves assigning a ranking or probability to different classifications in accordance with the decision tree. In this manner, the classification algorithm, f, may generate a list of possible classifications for each identified feature, with each possible classification being associated with a corresponding initial rank or probability based on the operation of the classification algorithm. In one embodiment, based on the probabilistic scores, the output of the classifier is a ranked list that is ranked in order of interest or importance with respect to the possible class labels for the one or more classified features of interest.
By way of example, and turning to
The trained classifier 70 generates a list 90 containing the possible classifications for each identified feature (i.e., each feature being analyzed may give rise to a respective list of possible classifications for that feature). In one embodiment, the classifier 70 generates a probability or ranking associated with each classification on the respective lists 90. In the depicted example, the results of the classification algorithm 70 (i.e., lists 90) are provided to a separate ranking algorithm 92 (i.e., ranker, h) to generate a ranked list 94 of the identified features by type and by certainty or confidence in the classification. For example, if channels were specified to be the type of geological feature of most interest, the ranked list 94 may list those identified features determined to have the highest probability of being channels at the top of the list.
In certain implementations active ranking approaches are employed to refine performance of the ranking algorithm 92. As discussed herein, active ranking approaches are a special case or subset of active learning approaches where the active learning context pertains to a ranking context, such as where a set of previously generated classifications are ranked or sorted. In one such implementation, the ranker 92 (h) is distinct from the classifier 70 (f) such that the expert feedback affects performance of the ranking algorithm 92, but not the classification algorithm 70. That is, in such an implementation, the classification algorithm 90 can function in an unchanged manner while the ranking algorithm 92 undergoes active learning or other modification to more optimally interpret the outputs of the classifier algorithm 90.
In accordance with one example, an active ranking scheme employs a pool query set, which is the set of queries for which expert determination will be sought in order to improve the performance of the ranking algorithm 92 with respect to ambiguous geological features. With respect to the pool query set, one issue that arises in the active querying process with respect to seismic data is the absence of a consistent (i.e., objective) querying framework. In particular, the potential lack of uniformity in the evaluation of seismic data may hinder the development of a global approach to formulating queries.
With this in mind, in certain implementations pairs of classified features may be presented to a reviewer as a query to elicit the reviewer's opinion as to which geological feature of the presented pair is more likely to be properly classified, i.e., is better classified. Such a pairwise comparison is in contrast with presenting customized queries related to each individual ambiguous feature classification. That is, this approach of providing paired, classified features for ranking allows for active learning with respect to ranking, by formulating and presenting those pairs of classified features for review which will provide the most useful feedback for reducing the remaining ambiguity in the ranking results. An example of this approach is depicted in
With the foregoing in mind, the pairwise comparison approach provides useful mathematical properties that may facilitate implementation of the present active ranking approach. For example, consider a feedback function Φ, the domain of which contains all pairs of instances, where an instance is defined as xi=(gi, li), where gi is the respective geological feature being examined and li is the label or classification that the classification algorithm, f, 70 has assigned to the respective features. For any pair of instances x0, x1, Φ (x0, x1) is a real number that indicates whether x1 should be ranked above x0. For example, if Φ (x0, x1) is greater than 0, the reviewer has indicated that x1 should be ranked above x0. Conversely, if Φ (x0, x1) is less than 0, the reviewer has indicated that x1 should be ranked below x0. If Φ (x0, x1) equals 0, the reviewer has indicated no preference between x0 and x1 with respect to ranking. Because Φ (x0, x1)=−Φ (x1, x0), another function D (x0, x1) may be employed where D (x0, x1)=c.max {0, Φ (x0, x1)}, which ignores all negative entries of Φ. As will be appreciated, c ensures that:
Σx
A further aspect of such a pairwise comparison implementation is the process by which candidate instances are selected for review by a reviewer for ranking. In certain implementations, choosing or generating the pool query set 100 is a matter of maximizing the diversity of the pool query set and reducing the clarity of each candidate. With respect to the issue of clarity, a suitable metric may be used that is indicative of the amount of ambiguity in a candidate instance xi.
In one implementation, the clarity of an unlabeled instance xi may be represented as:
is the set of instances that have been labeled by previous active ranking rounds, i.e., ranked pairs 104. Bootstrapping the active ranking may involve randomly selecting unlabeled instances until T can be established, taking the median instance in an initial ranked list, or other suitable approaches.
With respect to the diversity of the pool query set, it may be useful to have a pool query set 100 that is as diverse as possible. In one embodiment, to construct a diverse pool query set 100, two different measures may be considered, one measure that is based on angular diversity and another measure that is based on entropy. To consider the angular diversity of the pool query set, the angular diversity between two instances xi and xj can be computed as:
where xc is the mean of the relevant instances and <.> denotes the dot product. After minimizing, this gives:
maxlεQS cos(∠(xl,xj)) (7)
which provides a diverse pool query set of instances. In this example, QS are the instances that have already been chosen for the pool query set and xj is the new diverse instance. In one such implementation, to construct the final pool query set the convex combination:
F(i)=βCL(xi,f,T)+(1−β)maxlεQS cos(∠(xl,xj)) (8)
is computed, where β represents a pre-specified mixing parameter. Apart from the angular diversity of the pool query set, the standard entropy measure may also be tested to establish sufficient diversity.
With the foregoing discussion in mind related to the construction of the pool query set 100, and as discussed herein, the pool query set 100 provides a basis for providing pairs of instances, as queries, to one or more reviewers for relative ranking. As noted above, in certain of implementations where the ranker 92 and classifier 70 are separate, the obtained reviewer feedback is not used to refine the classification results (i.e., the operation of the classification algorithm 70 of
While the preceding discussion describes the construction of the pool query set 100 and the use of paired comparisons, the following discussion relates to the training of a ranking algorithm 92. With respect to such training, in one implementation a weak ranking algorithm, h, 92 is initially provided which takes the probability provided by the classification algorithm 70 for each classification and simply thresholds these initial probabilities:
where ⊥ indicates that the instance is unranked by the classification algorithm (f).
As discussed herein r may be defined as:
is the potential of x. In certain implementations, the quantity of r is maximized since this value represents that D(x0,x1) (i.e., the reviewer feedback) agrees with h(x1)−h(x0), thereby allowing determination of the best θ and qdef. By further extension
Since:
Σx:f(x)=⊥π(x)=−Σx:f(x)≠⊥π(x) (13)
this equates to:
r=Σx:f(x)>θπ(x)−qdefΣxεχπ(x) (14)
where χ is the set of feedback instances ranked by the classification algorithm (f).
With the foregoing in mind, two algorithms can be formulated which in combination provide the active ranking functionality described herein. As will be appreciated, the two algorithms can be characterized by various control logic steps which can in turn be represented in any suitable processor-executable code for implementation on a computer or an application specific device. The first algorithm can be used to calculate θ and qdef and assumes the presence of a classification algorithm, f, and a set of feedback instances χ ranked by the classification algorithm. In accordance with this first algorithm:
The second algorithm provides updates to the ranking algorithm h (i.e., ranker boosting) in view of the reviewer feedback and assumes the presence of a classification algorithm, f, and the θ and qdef values returned by the first algorithm. In accordance with the second algorithm:
As noted above, in combination algorithm 1 and algorithm 2 can be implemented to provide an active ranking algorithm. Referring to algorithm 2, the ranking loss of H can be characterized as:
rlossD(H)≦Πt=1TZt (15)
Therefore, at each round of t, an αt may be selected so as to minimize:
Zt=Σx
For a weak ranking algorithm iteration in [0,1]:
Z≦√{square root over (1−r2)} (17)
where
r=Σx
such that the goal is maximizing r. Once r is maximized, α should be set such that:
As will be appreciated from the preceding discussion, an active ranking application of active learning, such as described above, may utilize a trained classifier f that generates a ranked list as an output and may further employ a weak ranker h that is improved using active ranking approaches that incorporate feedback from a user. The user feedback is obtained in response to a set of queries that are generated with a goal of reducing ambiguity with respect to the generated rankings (i.e., to better discern and discriminate between what are otherwise close calls related to a ranking). In particular, in certain embodiments the query format uses a pairwise approach and generates queries based on diversity and clarity.
Other approaches, as discussed below, may also be employed that may be characterized as active learning, though which are distinct form the active ranking approach discussed above. For example, in one such active learning approach discussed below, a training data set may be employed for training an algorithm used to evaluate seismic data. With this training, user elicited feedback is incorporated to improve the function of the evaluation algorithm. In one implementation, queries are generated by selecting data points that are ambiguous from the perspective of the algorithm and obtaining user feedback to resolve the ambiguity with respect to the selected data points. For example, ambiguous data points may be selected for presentation to a reviewer which, once resolved, may be used to improve the performance of the algorithm to reduce the greatest amount of ambiguity in the remaining data points from the perspective of the active learning algorithm.
With this in mind, in a further embodiment a logistic regression algorithm (or other linear classifier or model) may be employed in the evaluation of features of interest within a seismic data set undergoing evaluation. In this approach, an active learning aspect involves soliciting reviewer feedback to reduce the variance associated with the logistic regression algorithm, thereby improving performance of the algorithm at identifying geological features of interest. For example, queries may be selected so as to elicit responses that will provide the greatest reduction in variability associated with a regression model used in the classification process. That is, a determination is made as to which geological regions or features of interest would be most helpful in improving the performance of the classification algorithm if they were to be correctly classified by a reviewer.
In one embodiment, a self-monitoring system may be employed such that changes in the data properties (i.e., due to the addition of data, the evaluation of a new portion of existing data, and so forth) are detected which trigger the active-learning functionality. Such a self-monitoring active-learning approach may be useful to address issues related to within-site variances. For example, in one implementation, the system, as it evaluates a seismic data volume, continuously monitors the statistical distribution of the data subset it is currently examining. If a change in data properties (i.e., a change in the statistical distribution) is detected as part of this ongoing monitoring, the system may trigger the active learning module automatically, thereby requesting human (e.g., expert) assistance in the form of queries presented to the reviewer. Based on the feedback received from the reviewer, the analysis parameters associated with the active learning algorithm) may be tuned and the evaluation of the seismic data can continue.
For example, turning to
A classification algorithm f 160 attempting to model, and label, the features within the primary data set 150 may utilize the training data set of labeled feature 74 to construct a model that labels the features of the primary data set 150 to the extent which the data mismatch allows. Based upon this, a query set 170 of unlabeled features within the primary seismic data 150 is determined (block 164) to present (block 172) to a reviewer to assign labels 174. The query set 170 is generated so as to include requests for reviewer feedback (i.e., labels 174) for those unlabeled features of the primary data set 150 which, if labeled, would allow for the greatest reduction in the variability associated with the model. For example, in one implementation reviewer feedback in the form of labels 174 is requested for those features which, once labeled, will allow the greatest number of additional unlabeled features to be automatically classified by the classification algorithm correctly.
It should be appreciated that, though the above description suggests that some initial modeling, and labeling, of features may be performed by the classification algorithm 160, in other implementations the classification algorithm 160 does not initially attempt to label any features of interest in the primary data set 150 and instead attempts to identify which unlabeled features of the primary data set 150 would be of the most value to have labeled by a reviewer. This may also be true in implementations related to active ranking, as discussed above. Based upon the labels 174 provided by the reviewer, the primary seismic data 150 may be relabeled (block 180), so as to reduce the statistical distribution mismatch between the primary seismic data 150 and training seismic data of labeled features 74, allowing the classification algorithm 160 to better classify other features within the primary seismic data 150. In one embodiment, the approach requests labels 174 that maximize the determinant (Q−1), where Q−1 is the inverse Fisher information and where Q−1 is the lower bound of the covariance matrix of the estimated classifier w, as discussed below.
With this in mind, the training data set may be characterized as:
Da={xna,yna}n=1, . . . ,N
such that all features 74 of interest in the training data set are labeled. Conversely, the primary data set 150 may be characterized as:
DP={DlP,DuP} where (21)
DlP={xnp,ynp}n=1, . . . ,N
is the labeled portion of the primary data set 150 and
DuP={xnP}n=N
is the unlabeled portion of the primary data set 150.
The classifier f, in accordance with one algorithm, may then be characterized such that:
and such that:
where μi is a compensation factor for the statistical distribution mismatch between the primary data set 150 and the training data set of labeled features 74.
With this in mind, the Fisher information may be characterized as:
where L is a log likelihood. In one such implementation, the selection of labels proceeds in a sequential manner in which, initially, the labeled primary set, DlP, is empty (i.e., all features of the primary data set 150 is unlabeled). In one embodiment, one at a time, an unlabeled sample is selected from the primary data set 150, is labeled by a reviewer, and is moved from DuP to DlP (i.e., from an unlabeled state to a labeled state within the primary data set 150. The classifier f (i.e., classification algorithm 160) and Fisher information Q are correspondingly updated based on the labeling of the sample feature. In this example, at each iteration, the selection of the feature to be labeled, x*, is based on:
x*=maxx
σn=σ(fTxn) (27)
such that each iteration, the selection of the feature to be provided for classification to the reviewer provides the maximum information for the classifier f (i.e., classification algorithm 160).
Technical effects of the invention include automatic analysis or evaluation of a seismic data set based on, in certain implementations, a separate implementation of a classification algorithm and a ranking algorithm, where active learning is used to train and update operation of the ranking algorithm. In other implementations, a linear model, such as a linear regression algorithm, is used to classify features within a seismic data set. Active learning may be employed with respect to the classification by analyzing unlabeled features within the seismic data set and selecting, sequentially, those features or samples which, if classified by a reviewer, would reduce the variance associated with the model to the greatest extent.
This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.
Number | Name | Date | Kind |
---|---|---|---|
4916615 | Chittineni | Apr 1990 | A |
4992995 | Favret | Feb 1991 | A |
5047991 | Hsu | Sep 1991 | A |
5265192 | McCormack | Nov 1993 | A |
5274714 | Hutcheson et al. | Dec 1993 | A |
5416750 | Doyen et al. | May 1995 | A |
5444619 | Hoskins et al. | Aug 1995 | A |
5465308 | Hutcheson et al. | Nov 1995 | A |
5539704 | Doyen et al. | Jul 1996 | A |
5586082 | Anderson et al. | Dec 1996 | A |
5677893 | de Hoop et al. | Oct 1997 | A |
5852588 | de Hoop et al. | Dec 1998 | A |
5940777 | Keskes | Aug 1999 | A |
6052650 | Assa et al. | Apr 2000 | A |
6226596 | Gao | May 2001 | B1 |
6236942 | Bush | May 2001 | B1 |
6295504 | Ye et al. | Sep 2001 | B1 |
6363327 | Wallet et al. | Mar 2002 | B1 |
6411903 | Bush | Jun 2002 | B2 |
6466923 | Young | Oct 2002 | B1 |
6473696 | Oayia et al. | Oct 2002 | B1 |
6526353 | Wallet et al. | Feb 2003 | B2 |
6574565 | Bush | Jun 2003 | B1 |
6574566 | Grismore et al. | Jun 2003 | B2 |
6618678 | Van Riel | Sep 2003 | B1 |
6625541 | Shenoy et al. | Sep 2003 | B1 |
6725163 | Trappe et al. | Apr 2004 | B1 |
6735526 | Meldahl et al. | May 2004 | B1 |
6751558 | Huffman et al. | Jun 2004 | B2 |
6754380 | Suzuki et al. | Jun 2004 | B1 |
6754589 | Bush | Jun 2004 | B2 |
6757614 | Pepper et al. | Jun 2004 | B2 |
6771800 | Keskes et al. | Aug 2004 | B2 |
6801858 | Nivlet et al. | Oct 2004 | B2 |
6804609 | Brumbaugh | Oct 2004 | B1 |
6847895 | Nivlet et al. | Jan 2005 | B2 |
6882997 | Zhang et al. | Apr 2005 | B1 |
6941228 | Toelle | Sep 2005 | B2 |
6950786 | Sonneland et al. | Sep 2005 | B1 |
6957146 | Taner et al. | Oct 2005 | B1 |
6970397 | Castagna et al. | Nov 2005 | B2 |
6977866 | Huffman et al. | Dec 2005 | B2 |
6988038 | Trappe et al. | Jan 2006 | B2 |
7006085 | Acosta et al. | Feb 2006 | B1 |
7053131 | Ko et al. | May 2006 | B2 |
7092824 | Favret et al. | Aug 2006 | B2 |
7098908 | Acosta et al. | Aug 2006 | B2 |
7162463 | Wentland et al. | Jan 2007 | B1 |
7184991 | Wentland et al. | Feb 2007 | B1 |
7188092 | Wentland et al. | Mar 2007 | B2 |
7203342 | Pedersen | Apr 2007 | B2 |
7206782 | Padgett | Apr 2007 | B1 |
7222023 | Laurent et al. | May 2007 | B2 |
7243029 | Lichman et al. | Jul 2007 | B2 |
7248258 | Acosta et al. | Jul 2007 | B2 |
7248539 | Borgos et al. | Jul 2007 | B2 |
7266041 | Padgett | Sep 2007 | B1 |
7295706 | Wentland et al. | Nov 2007 | B2 |
7295930 | Dulac et al. | Nov 2007 | B2 |
7308139 | Wentland et al. | Dec 2007 | B2 |
7453766 | Padgett | Nov 2008 | B1 |
7453767 | Padgett | Nov 2008 | B1 |
7463552 | Padgett | Dec 2008 | B1 |
7502026 | Acosta et al. | Mar 2009 | B2 |
7697373 | Padgett | Apr 2010 | B1 |
7743006 | Woronow et al. | Jun 2010 | B2 |
7881501 | Pinnegar et al. | Feb 2011 | B2 |
8010294 | Dorn et al. | Aug 2011 | B2 |
8027517 | Gauthier et al. | Sep 2011 | B2 |
8055026 | Pedersen | Nov 2011 | B2 |
8065088 | Dorn et al. | Nov 2011 | B2 |
8121969 | Chan et al. | Feb 2012 | B2 |
8219322 | Monsen et al. | Jul 2012 | B2 |
8326542 | Chevion et al. | Dec 2012 | B2 |
8346695 | Pepper et al. | Jan 2013 | B2 |
8358561 | Kelly et al. | Jan 2013 | B2 |
8363959 | Boiman et al. | Jan 2013 | B2 |
8385603 | Beucher et al. | Feb 2013 | B2 |
8447525 | Pepper et al. | May 2013 | B2 |
8515678 | Pepper et al. | Aug 2013 | B2 |
20040225442 | Tobias et al. | Nov 2004 | A1 |
20040260476 | Borgos et al. | Dec 2004 | A1 |
20050137274 | Ko et al. | Jun 2005 | A1 |
20050171700 | Dean | Aug 2005 | A1 |
20050288863 | Workman | Dec 2005 | A1 |
20060115145 | Bishop et al. | Jun 2006 | A1 |
20060184488 | Wentland | Aug 2006 | A1 |
20080123469 | Wibaux et al. | May 2008 | A1 |
20080175478 | Wentland et al. | Jul 2008 | A1 |
20080185478 | Dannenberg | Aug 2008 | A1 |
20080189269 | Olsen | Aug 2008 | A1 |
20080270033 | Wiley et al. | Oct 2008 | A1 |
20100174489 | Bryant et al. | Jul 2010 | A1 |
20100175886 | Bohacs et al. | Jul 2010 | A1 |
20100211363 | Dorn et al. | Aug 2010 | A1 |
20100245347 | Dorn et al. | Sep 2010 | A1 |
20110054869 | Li et al. | Mar 2011 | A1 |
20110099132 | Fruehbauer et al. | Apr 2011 | A1 |
20110118985 | Aarre | May 2011 | A1 |
20110264430 | Tapscott et al. | Oct 2011 | A1 |
20110307178 | Hoekstra | Dec 2011 | A1 |
20120072116 | Dorn et al. | Mar 2012 | A1 |
20120117124 | Bruaset et al. | May 2012 | A1 |
20120150447 | Van Hoek et al. | Jun 2012 | A1 |
20120195165 | Vu et al. | Aug 2012 | A1 |
20120197530 | Posamentier et al. | Aug 2012 | A1 |
20120197531 | Posamentier et al. | Aug 2012 | A1 |
20120197532 | Posamentier et al. | Aug 2012 | A1 |
20120197613 | Vu et al. | Aug 2012 | A1 |
20120257796 | Henderson et al. | Oct 2012 | A1 |
20120261135 | Nowak et al. | Oct 2012 | A1 |
20120322037 | Raglin | Dec 2012 | A1 |
20130006591 | Pyrez et al. | Jan 2013 | A1 |
20130138350 | Thachaparambil et al. | May 2013 | A1 |
20130144571 | Pepper et al. | Jun 2013 | A1 |
20130158877 | Bakke et al. | Jun 2013 | A1 |
Number | Date | Country |
---|---|---|
3559102 | Jun 2002 | AU |
2001084462 | Mar 2001 | JP |
2003293257 | Oct 2003 | JP |
2005244200 | Sep 2005 | JP |
2012220500 | Nov 2012 | JP |
WO 9964896 | Dec 1999 | WO |
0133255 | May 2001 | WO |
Entry |
---|
‘Logistic Regression Models for R elevance Feedback III Content-Based Image Retrieval’: Caenen, 2002, SPIE, 0277-786X, pp. 49-58. |
‘Probabilistic Reservoir Characterization via Seismic Elastic Inversion in East Andaman Basin *’: Kishore, Mar. 2012, Search and Discovery article #40881 (2012). |
Helmy et al.,“Hybrid Computational Models for the Characterization of Oil and Gas Reservoirs”, Expert Systems with Applications, vol. 37, Issue 7, pp. 5353-5363, Jul. 2010. |
Castillo et al.,“Fuzzy Logic and Image Processing Techniques for the Interpretation of Seismic Data”, Journal of Geophysics and Engineering, vol. 8, Issue 2, 2011. |
Grabner, H., et al.; “On-line Boosting and Vision”; Conference on Computer Vision and Pattern Recognition, vol. 1, Jun. 17, 2006, pp. 260-267. |
Ma, Aiyesha, et al.; “Confidence Based Active Learning for Whole Object Image Segmentation”; MRCS 2006, pp. 753-760. |
Williams, David P.; “Classification and data acquisition with incomplete data”; Jan. 1, 2006, pp. 1-179; http://search.proquest.com/docview/305331087. |
Carin, Lawrence, et al.; “Final Report Detection of Buried Targets via Active Selection of Labeled Data: Application to Sensing Subsurface UXO SERDP Project MM-1283”; Jun. 1, 2007, pp. 1-93; http://www.dtic.mil/get-tr-doc/pdf?Location=U2&doc=GetTRDoc.pdf&AD=ADA520344. |
Grabner, Helmut, et al.; “Semi-supervised On-Line Boosting for Robust Tracking”; ECCV 2008, Part I, LNCS 5302, pp. 234-247. |
Hashemi, H., et al.; “Gas Chimney detection based on improving the performance of combined multilayer perceptron and support vector classifier”; Nonlinear Processes in Geophysics, Oct. 21, 2008, pp. 863-871; http://www.nonlin-processes-geophys.net/15/863/2008. |
Tuia, D., et al.; “Active Learning Methods for Remote Sensing Image Classification”; IEEE Transactions on Geoscience and Remote Sensing, vol. 47, No. 7, Jul. 1, 2009, pp. 2218-2232. |
Bucaro, J.A., et al.; “Naval Research Laboratory Wide Area Detection and Identification of Underwater UXO Using Structural Acoustic Sensors—Final Report to SERDP MR-1513”; Jul. 8, 2011, pp. 1-85; http://www.dtic.mil/cgi-bin/GetTRDoc?Location+U2&doc+GetTRDoc.pdf&AD=ADA546324. |
International Searching Authority, PCT/ISA/220; PCT/US2013/077179; mailed Jul. 9, 2014; pp. 1-23. |
Ma, Aiyesha, et al.; “Confidence Based Active Learning for Whole Object Image Segmentation”; MRCS 2006, LNCS 4105, pp. 753-760. |
Hashemi, H., et al.; “Gas chimney detection based on improving the performance of combined multilayer perceptron and support vector classifier”; Nonlinear Processes in Geophysics 15, 863-871, 2008. |
Grabner, Helmut, et al.; “On-line Boosting and Vision”; Conference on CVPR 2006, IEEE Computer Society Jun. 2006, pp. 260-267. |
Grabner, Helmut, et al.; “Semi-supervised On-Line Boosting for Robust Tracking”; Computer Vision A ECCV 2008, pp. 234-241. |
Tuia, Devis, et al.; “Active Learning Methods for Remote Sensing Image Classification”; IEEE Transactions on Geoscience and Remote Sensing, vol. 47, No. 7, Jul. 1, 2009, pp. 2218-2232. |
Invitation to Pay Additional Fees (form: PCT/ISA/208); PCT/US2013/077179, mailed Apr. 11, 2014. |
Number | Date | Country | |
---|---|---|---|
20140188769 A1 | Jul 2014 | US |