The present invention relates to a method of producing a mass spectrum from a time-varying transient signal detected in a mass spectrometer.
One of the primary goals of Fourier Transform Mass Spectrometry (FTMS) is the identification of the ionic species, along with their relative abundances present, in a form of coherently oscillating ion packets contained by the trapping field within a mass spectrometer. The frequency of oscillation of a coherent packet of ions is a function of the mass to charge (m/z) ratio of the ionic species and is referred to herein as the “characteristic frequency” of an ionic species. The trapping field can be provided by the combination of an electrostatic field and a magnetostatic field, for example in a Fourier Transform Ion Cyclotron Resonance (FTICR) mass analyzer, or by an electrostatic field only, for example in an ORBITRAP mass analyzer. FTMS using RF fields is also known.
Typically, ions are detected by an image current S(t) (also termed a continuous transient image current and herein referred to as the “transient”) induced on detection electrodes of the mass analyzer as the oscillating ions pass nearby. Therefore, the transient comprises a superposition of one or more periodic signals. Each periodic signal corresponds to the oscillation of a respective coherent packet of ions within the mass analyzer with a respective characteristic frequency. The transient is only measured (or captured or recorded) over a finite time T, termed the “duration” of the transient.
The transient processing usually involves discrete Fourier transform (DFT), which decomposes the transient into a number of periodic functions (also termed Fourier basis functions). Each Fourier basis function is localized at a respective frequency (also termed a Fourier Transform bin). The frequencies corresponding to the Fourier basis functions form a set of frequencies (referred to as the Fourier grid). The Fourier basis functions are equally spaced in the frequency domain i.e. the separation between adjacent frequencies is a constant. In particular, the separation between adjacent frequencies in the set of frequencies (herein referred to as the “separation” of the set of frequencies) is determined by the inverse of the duration of the transient 1/T. The decomposition comprises calculating, based on the transient, individual complex amplitudes corresponding to each Fourier basis function. Thereby a set of complex amplitudes is formed. Therefore, the discrete Fourier transform (DFT) represents the transient in the frequency domain. In particular, the transient is represented as a set of complex amplitudes. Each complex amplitude of the set of complex amplitudes corresponds to a respective frequency of the set of frequencies i.e. the frequency at which the corresponding Fourier basis function is localized.
The periodic signals present in the transient (as described previously) are related to the complex amplitudes. In particular, the periodic signal will contribute to the complex amplitudes corresponding to a plurality of frequencies in the set of frequencies. The plurality of frequencies will be substantially centred on the characteristic frequency of a particular ionic species for given experimental conditions. Therefore a plot of the set of complex amplitudes against the set of frequencies (referred to as a mass spectrum) will show one or more peaks, each peak substantially centred on a respective characteristic frequency present in the transient i.e. the centroid of each peak will be substantially equal to the characteristic frequency.
As described above, the frequencies of the periodic signals present in the transient are a function of the m/z ratios of the ionic species. Therefore, the centroid of each peak can be converted (or transformed or interpreted) into a respective m/z ratio thereby identifying a respective ionic species. Furthermore the height of each peak can be converted (or transformed or interpreted) into the respective relative abundance of the respective ionic species.
Due to the spacing of frequencies in the Fourier grid, determining the centroids of the peaks, and/or the heights of the peaks can be subject to errors. These errors lead to errors in the estimation of correct m/z ratios (and therefore ionic species being identified incorrectly) along with errors in the estimation of relative abundances. These errors can be particularly significant when the difference between a characteristic frequency present in the transient and the closest frequency in the set of frequencies is large.
A number of approaches aimed at apparent “smoothing” of spectra (hence potentially reducing the estimation errors) currently employed include interpolating the complex amplitudes onto a further set of frequencies with reduced separation between frequencies i.e. interpolating the mass spectrum. The most common interpolation method is zero-padding (see Marshall A. G.; Verdun, F. R., “Fourier Transforms in NMR, Optical, and Mass Spectrometry”, Elsevier, 1990, the entire contents of which are incorporated herein by reference). Zero padding, if done explicitly, comprises of appending the transient by zero-signal of a predetermined duration resulting in an artificial increase of the transient duration and, correspondingly a decrease in the separation of the set of frequencies. Therefore, if the transient duration is increased by a factor P through the appending of a zero-signal, the separation of the set of frequencies is correspondingly reduced by a factor P. Due to the mechanics of implementation of Fast Fourier Transform (FFT), the most common algorithm for computing DFT, it is common for P to be a power of two and the interpolated mass spectrum is called log2 P-times zero-padded.
Whilst this can reduce the errors described above in relation to isolated peaks corresponding to respective characteristic frequencies, there is still a problem when a transient comprises two or more close characteristic frequencies. This causes the spectrum to comprise two or more overlapping peaks. If the separation (or difference) between two characteristic frequencies of the transient is less than a threshold value, then the two peaks will not be resolved. This error leads to errors in the converted m/z ratios (and therefore ionic species being identified incorrectly) along with errors in the converted relative abundances. Although it depends on the local spectral density, the practical threshold value for reliable resolution is twice the separation of the Fourier grid corresponding to the original transient i.e. the transient without zero padding.
Interpolation of the spectrum, for example by zero-padding as described above, neither reduces these errors nor improves the resolution. In fact, the zero-padded and optionally apodized FT amplitudes are linear combinations of the FT amplitudes and carry no extra useful information. This can be seen by the fact that the complex amplitudes sn of the Fourier transform of a signal S(t) without zero-padding obey the following relation:
whilst the complex amplitudes bm of the Fourier transform of the signal S(t) with P times zero-padding obey the following relation:
Therefore, the complex amplitudes bm (i.e. with the zero-padding) are required to be linear combinations of the complex amplitudes sn (i.e. without the zero-padding). In particular, the complex amplitudes bm must obey:
which reduces to:
Embodiments of the invention seek to address the above described problems and other of the related prior-art.
A first aspect of the invention provides a method of producing a mass spectrum from a time-varying transient signal detected in a mass spectrometer. The method comprises the following steps.
A Fourier transform of the transient signal is performed to produce a first set of complex amplitudes, where each of the complex amplitudes corresponds to a respective frequency of a first set of frequencies. The first set of frequencies may be equally spaced in frequency. A second set of complex amplitudes is generated, where each of these complex amplitudes corresponds to a respective frequency of a second set of frequencies. The second set of frequencies may be equally spaced in frequency. The second set of frequencies may have a spacing (or a minimum spacing) that is less than that of the first set of frequencies. The second set of frequencies may have a spacing (or a minimum spacing) that is less than the inverse of the duration of the transient signal. The second set of complex amplitudes may cover (or span or correspond to) the same frequency range as the first set of complex amplitudes, and so the second set may contain more complex amplitudes than the first set. Hence, the second set of complex amplitudes may provide greater resolution.
The second set of complex amplitudes is optimized to produce an improved second set of complex amplitudes. At least some of the complex amplitudes from the improved second set are used to generate and display a mass spectrum. The improved second set of complex amplitudes provides a better quality mass spectrum.
Optimizing the second set of complex amplitudes comprises varying at least one of the complex amplitudes of the second set based on (or in dependence on) an objective function. For example, the at least one complex amplitudes may be varied with the aim of obtaining a substantially extremum value of the objective function. Optionally, all of the complex amplitudes from the second set may be varied as part of the optimizing step, or a subset may be optimized as part of the optimizing step.
The optimization may be performed subject to a constraint. That is, for at least some of the complex amplitudes of the second set, a constraint is placed on the phase of each of the at least some complex amplitudes relative to one or more expected phases. The expected phases may be frequency-dependent. The objective function depends on one or more complex amplitudes of the first set of complex amplitudes and one or more complex amplitudes of the second set of complex amplitudes. The objective function may, for each frequency of the first set of frequencies, relate one or more complex amplitudes of the second set to the respective complex amplitude from the first set (such as by having the objective function a function of the one or more complex amplitudes of the second set and the respective complex amplitude from the first set). The constraint may be applied to all the complex amplitudes of the second set that are being varied as part of the optimizing step, or to a subset of those complex amplitudes.
It can be seen that by generating and optimizing a second set of complex amplitudes, the transient may be thought of as being decomposed onto a finer frequency grid. As the second set of complex amplitudes is not bound to the first set of complex amplitudes as a linear combination of these amplitudes, unlike in the interpolation method described previously, the resolution increases as the grid spacing of the second set of frequencies decreases. This leads to a much increased accuracy of the resulting mass spectrum. In other words the method may be thought of as operating with two sets of frequencies. The first set of frequencies may comprise frequencies with a minimum separation of 1/T, where T is the time duration of the transient signal. The second set of frequencies may comprise the frequencies with a minimum separation less than 1/T. The second set of frequencies may contain the first set as a subset. Since the minimum spacing of the second set is less than that of the first set of frequencies, the second set of complex amplitudes may provide greater resolution.
It will be appreciated that by “complex” is to be understood as relating to a number that can be expressed with a real and imaginary part. The imaginary part may be zero i.e. complex as used herein covers real numbers.
One advantage of the invention is the integrability of the mass spectrum produced. In other words, the intensity of all peaks, both resolved and unresolved, is conserved. As such the suppression effects of the standard Fourier transform approach, caused by the interference of adjacent peaks is avoided. Thus the invention may be of particular benefit where highly accurate intensity information is desired. Moreover, computations can be conducted on shorter transients increasing the speed and throughput of the instruments. Examples include many mass spectrometry (MS) techniques such as any of: tandem MS, hybrid MS, Liquid Chromatography MS, Ion Mobility Spectroscopy MS, Gas Chromatography MS, Capillary Electrophoresis MS, and so forth.
In some embodiments, the step of performing a Fourier transform includes windowing the Fourier-transformed transient signal in the frequency domain, wherein the first set of complex amplitudes correspond to the windowed Fourier-transformed transient signal. Said windowing may comprise applying a windowing function to the first set of complex amplitudes. Typically, applying a windowing function includes scaling each complex amplitude of the first set of complex amplitudes by the value of the windowing function at the respective frequency. Additionally, or alternatively, said windowing may comprise discarding the complex amplitudes whose respective frequency is outside of one or more pre-defined ranges. For example, complex amplitudes of the first set of complex amplitudes whose respective frequency is above the Nyquist frequency of the transient signal may be discarded, and/or set to zero.
Advantageously, this may allow an increase in processing speed and reduction of computational burden, as the subsequent processing may be limited to regions of interest only. For a sparse enough spectrum or sparse enough segments of interest, calculations can be carried only within windows of the spectra encapsulating these regions. Such an approach may be particularly beneficial when extracting a Total Ion Chromatogram (TIC), which involves monitoring the entire ion population within a mass spectrometer across a chromatographic run. Here computational efficiency may be increased by using windowing to limit the above calculations to the range of frequencies corresponding to the feasible mass range or, in some cases, regions (e.g. where the feasible mass range is not substantially contiguous). In a similar way, by employing windowing, it may be particularly beneficial when extracting a Selected Ion Chromatogram (XIC or EIC), also known as Reconstructed-Ion Chromatogram (RIC). Here the ion population of one or more ionic species of interest within a mass spectrometer is monitored across a chromatographic run. As a particular case of such XIC, Base Peak Chromatogram monitors the most abundant species in each spectrum. Here, the improvements offered by this embodiment allow for the windowing to be applied dynamically in a data dependent way.
Other mass spectrometry scanning modes which may benefit from this embodiment of the invention for similar reasons include:
In this way there is also provided the use of the method as part of any one of, or any combination of: a Selected Ion Chromatogram monitoring; a Reconstructed-Ion Chromatogram monitoring; a Selected Ion Monitoring experiment; a Base Peak Chromatogram monitoring; a tandem mass spectrometry experiment (such as a Selected Reaction Monitoring experiment); a Consecutive Reaction Monitoring experiment; a Multiple Reaction Monitoring experiment; and a Parallel Reaction Monitoring experiment.
For a number of applications, both high mass resolution and abundance fidelity are very important. An example of that would be experiments requiring isotope fine structure analysis. Due to the phenomenon known as mass defect (the difference between the mass of an atomic nuclei and the sum of the masses of its constituents caused by the binding energy), isotopic mass spectra tend to be very dense with irregular spacing. Resolving fine isotopic structure, even for relatively simple chemical compounds, requires extreme mass resolving power, typically in excess of 105-106. As chemical complexity of interrogated compounds increases, errors in isotopic abundances as calculated in the prior-art become inevitable because of interference. Therefore, it may be advantageous to use the above embodiments of the invention to improve the fidelity of isotope ratios.
In some embodiments, the constraint comprises requiring the phase of a complex amplitude to be equal to the expected phase or to be within a range around the expected phase. The expected phase may be derived from any of: the arrangement of the mass spectrometer; an ion injection process into the mass spectrometer; an ion excitation process in the mass spectrometer; a signal detection method; a measured phase of one or more harmonic spectral components in the transient; or a measured phase of one or more harmonic spectral components in any transient obtained in this mass spectrometer before or after obtaining the processed transient. This advantageously allows the optimization to be guided by some known properties of the mass spectrometer and/or data other that what is already present in the transient. This leads to an increased accuracy of the resulting mass spectrum.
In some embodiments the range is based at least in part on the jitter of the mass spectrometer. Thus, the method can take into account this possible source of error in the mass spectrometer.
In some embodiments, for each complex amplitude of the improved second set of complex amplitudes, the objective function comprises the product of that complex amplitude and the overlap of a respective Fourier basis function corresponding to a complex amplitude of the first set of complex amplitudes and a respective second basis function corresponding to that complex amplitude. Such an overlap may be seen as representing the basis function corresponding to a complex amplitude from the second set in terms of the basis function of a complex amplitude from the first set. As such, this allows the objective function to directly compare complex amplitudes from the second set with complex amplitudes from the first set. Put another way the objective function may be seen as comprising an object which is the expansion of one or more of the complex amplitudes of the second set in terms of a coarser frequency grid corresponding to the first set of complex amplitudes.
In some embodiments the respective second basis function comprises a Fourier basis function.
In some embodiments, account may be taken of harmonics that contribute to the complex amplitudes. For example, at least one complex amplitude of the second set of complex amplitudes may comprise a respective auxiliary complex amplitude corresponding to the respective frequency; and a scaled further complex amplitude. Here the scaled complex amplitude corresponds to a further frequency of the second set of frequencies. In this case the constraint on the phase of the particular complex amplitude comprises a constraint on the phase of the respective auxiliary complex amplitude relative to a frequency-dependent expected phase. The respective frequency may correspond to a harmonic of the further frequency. This allows the method to take into account the fact that the shape of each periodic signal in the transient may not be typically exactly the same as the shape of the basis functions that correspond to the set of second complex amplitudes. In particular where a periodic signal gives rise to complex amplitudes at more than one frequency (or harmonic) the optimization can use this data (via the scaling) to improve the accuracy of the improved set of complex amplitudes.
In some embodiments the further complex amplitude is a complex amplitude from the second set of complex amplitudes or an auxiliary complex amplitude of a complex amplitude from the second set of complex amplitudes. Use of an auxiliary complex amplitude in this way can advantageously allow for the fact that a harmonic frequency of a particular frequency may itself have its own harmonic frequencies. Hence, the complex amplitude corresponding to a harmonic frequency may also be decomposed into an auxiliary complex amplitude and a further complex amplitude.
In some embodiments the scaling is based at least in part on any of: (a) the arrangement of one or more electrodes in the mass spectrometer; (b) arrangement of the mass spectrometer; (c) amplitude of the ion oscillation; or (d) shape of the ion orbit.
In some embodiments the optimization comprises substantially maximizing a dual function of the objective function. As is well known in the art, a dual function may be thought of as a function which may be substantially maximized (or substantially minimized) as a proxy for substantially minimizing (or substantially maximizing), an objective function subject to constraints. Typically the dual function may be substantially maximized (or substantially minimized) across a different set of arguments than the objective function.
In some embodiments the optimization is based on any of: an iterative procedure, or Proximal Minimization.
In some embodiments the optimization is based on the Alternate Direction Method of Multipliers. This is a particularly efficient way of optimizing the complex amplitudes can comprise only component-wise operations, which may be efficiently implemented on parallel computing hardware, and Fast Fourier Transform (FFT) operations which can also be efficiently implemented on parallel computing hardware. This allows the method to scale efficiently in terms of computing power required for increasing numbers of complex amplitudes which may correspond to improved resolution.
According to another aspect of the invention there is provided an apparatus arranged to carry out a method according to the first aspect (or embodiments thereof). According to another aspect of the invention there is provided a computer program which, when executed by one or more processors, causes the one or more processors to carry out a method according to the first aspect (or embodiments thereof). The computer program may be stored on a computer-readable medium.
The present invention may be put into practice in various ways, a number of which will now be described by way of example only and with reference to the accompanying drawings in which:
As seen in
As explained in more detail in WO-A-02/078046, ions are held in the linear trap 60 in a potential well, the bottom of which may be located adjacent to an exit electrode thereof. Ions are ejected out of the linear trap 60 into a lens arrangement 70 by applying a DC pulse to the exit electrode of the linear trap 60. Ions pass through the lens arrangement 70 along a line that is curved to avoid gas carry-over, and into an electrostatic trap 80 (also known as a mass analyzer). In
In operation, a voltage pulse is applied to the exit electrode of the linear trap 60 so as to release trapped ions. The ions arrive at the entrance to the electrostatic trap 80 as a sequence of short, energetic packets, each packet comprising ions of a similar m/z ratio.
The ions enter the electrostatic trap 80 as coherent bunches and are squeezed towards the central electrode 90. The ions are then trapped in an electrostatic field such that they oscillate along the central electrode with the frequencies depending on their m/z ratios. Image currents are detected by the first outer electrode 84 and the second outer electrode 85, providing first harmonic transient signal 81 and second harmonic transient signal 82 respectively. These two signals are then processed by a differential amplifier 100 and provide a transient image current signal 101 (herein referred to as the transient).
Therefore, the transient 101 comprises a superposition of one or more periodic signals (or harmonic spectral components). Each periodic signal corresponds to the oscillation of a respective coherent packet of ions within the mass analyzer with a respective characteristic frequency determined by the m/z ratio of the ions.
It will be appreciated that the mass spectrometer 10 outlined above serves merely as an exemplar as to how the transient 101 may be generated. The embodiments of the invention presented below may use any suitable transient 101 produced by any mass spectrometer 10. In particular whilst the mass spectrometer described above is an ORBITRAP mass spectrometer, a particular example of a mass spectrometer that uses an orbital trapping electrostatic trap, the embodiments of the invention described below are not limited to such a mass spectrometer.
The storage medium 304 may be any form of non-volatile data storage device such as one or more of a hard disk drive, a magnetic disc, an optical disc, a ROM, etc. The storage medium 304 may store an operating system for the processor 308 to execute in order for the computer 302 to function. The storage medium 304 may also store one or more computer programs (or software or instructions or code).
The memory 306 may be any random access memory (storage unit or volatile storage medium) suitable for storing data and/or computer programs (or software or instructions or code).
The processor 308 may be any data processing unit suitable for executing one or more computer programs (such as those stored on the storage medium 304 and/or in the memory 306), some of which may be computer programs according to embodiments of the invention or computer programs that, when executed by the processor 308, cause the processor 308 to carry out a method according to an embodiment of the invention and configure the system 300 to be a system according to an embodiment of the invention. The processor 308 may comprise a single data processing unit or multiple data processing units operating in parallel, separately or in cooperation with each other. The processor 308, in carrying out data processing operations for embodiments of the invention, may store data to and/or read data from the storage medium 304 and/or the memory 306.
The interface 310 may be any unit for providing an interface to a device 322 external to, or removable from, the computer 302. The device 322 may be a data storage device, for example, one or more of an optical disc, a magnetic disc, a solid-state-storage device, etc. The device 322 may have processing capabilities—for example, the device may be a smart card. The interface 310 may therefore access data from, or provide data to, or interface with, the device 322 in accordance with one or more commands that it receives from the processor 308.
The user input interface 314 is arranged to receive input from a user, or operator, of the system 300. The user may provide this input via one or more input devices of the system 300, such as a mouse (or other pointing device) 326 and/or a keyboard 324, that are connected to, or in communication with, the user input interface 314. However, it will be appreciated that the user may provide input to the computer 302 via one or more additional or alternative input devices (such as a touch screen). The computer 302 may store the input received from the input devices via the user input interface 314 in the memory 306 for the processor 308 to subsequently access and process, or may pass it straight to the processor 308, so that the processor 308 can respond to the user input accordingly.
The user output interface 312 is arranged to provide a graphical/visual output to a user, or operator, of the system 300. As such, the processor 308 may be arranged to instruct the user output interface 312 to form an image/video signal representing a desired graphical output, and to provide this signal to a monitor (or screen or display unit) 320 of the system 300 that is connected to the user output interface 312.
Finally, the network interface 316 provides functionality for the computer 302 to download data from and/or upload data to one or more data communication networks.
It will be appreciated that the architecture of the system 300 illustrated in
The mass spectrum 390 comprises one or more m/z values (or mass to charge ratios) 394-n. Each m/z value corresponds to a respective ionic species and is equal to the molecular mass of the respective ionic species divided by the absolute elemental charge of the respective ionic species. The mass spectrum 390 comprises one or more intensity values 396-n with each intensity value 396-n appearing for a respective m/z value 394-n. Each intensity value 396-n correlates to the relative abundance of the ionic species corresponding to the respective m/z value 394-n. Each intensity value 396-n may be proportional to the relative abundance of the ionic species corresponding to the respective m/z value.
An experimental mass spectrum such as the mass spectrum 390 may be plotted in the form of a continuum plot, indicated by the dashed line, and a centroid plot, indicated by the vertical solid lines. The widths of peaks indicated by the dashed line represent the limit of the mass resolving power, which is the ability to distinguish two different ionic species with close m/z ratios.
However it will be appreciated that the mass spectrum 390 does not need to be plotted in the form of a graph. Indeed, the mass spectrum 390 may be represented in any suitable form. For example, the mass spectrum 390 may be represented a list comprising the one or more intensity values 396-n and the one or more m/z values 394-n.
The transient processing system 400 comprises a Fourier transform module 410 and a post processing module 480. The transient processing system 400 may be implemented on a computer system 300 as described with reference to
The transient 101 can be represented by a time varying function S(t). The transient is only measured (or captured or recorded) over a finite time T, termed the “duration” of the transient. For the purposes of discussion the time varying function S(t) representing the transient is shown as a continuous function of time, t. However it will be appreciated that the transient 101 may also, or alternatively, be sampled. In particular, the transient may be represented by a set of values St
The Fourier transform module 410 is arranged to calculate at least part 425 of a discrete Fourier transform of the transient 101. The discrete Fourier transform is described shortly below.
The post processing module 480 is arranged to calculate a mass spectrum 390 based on the at least part 425 of a discrete Fourier transform.
The set 430 of frequencies 435-n comprises a plurality of frequencies f0, f1, . . . . For the purposes of discussion an arbitrary frequency 435-n from the plurality of frequencies will be referred to herein as fn. Each frequency 435-n of the set 430 of frequencies 435-n corresponds to a respective frequency bin of the discrete Fourier transform. The separation (or arithmetic difference between) between adjacent frequencies 435-n in the set 430 of frequencies 435-n (referred to herein as the “spacing” of a set of frequencies) is determined by the duration of the transient 101. In particular, the arithmetic difference between two adjacent frequencies 435-n in the set 430 of frequencies 435-n is proportional to the inverse of the duration of the transient 101. For example, the separation may be given by the mathematical relation fn−fn-1=1/T. In particular, each frequency 435-n in the set 430 of frequencies 435-n may follow the relation fn=n/T, where n is an integer number.
It will be appreciated that the set 430 of frequencies 435-n may not be equally spaced. In other words the separation between adjacent frequencies 435-n may not be constant. In this case the separation described above may refer to the “minimum separation” i.e. the arithmetic difference between the two closest frequencies 435-n in the set 430 of frequencies 435-n.
The set 440 of basis functions 445-n comprise a plurality of basis functions 445-n, h0, h1, . . . . For the purposes of discussion an arbitrary basis function 445-n from the plurality of basis functions 445-n will be referred to herein as hn. Each basis function 445-n of the set of basis functions 450 corresponds to a respective frequency 435-n of the set 430 of frequencies 435-n. A basis function 445-n may be time-dependent. Each basis function 445-n of the set 440 of basis functions 445-n may comprise a respective Fourier basis function. The respective Fourier basis function of the basis function 445-n may correspond to the respective frequency corresponding to the basis function 445-n. For example, for a basis function 445-n hn(t), the basis 445-n may follow the relation:
h
n(t)=e2πif
The set 450 of complex amplitudes 455-n comprises a plurality of complex amplitudes 455-n, s0, s1, . . . . For the purposes of discussion an arbitrary complex amplitude 455-n from the plurality of complex amplitudes 455-n will be referred to herein as sn. Each complex amplitude 455-n of the set 450 of complex amplitudes 455-n corresponds to a respective frequency 435-n of the set 430 of frequencies 435-n. Each complex amplitude 455-n of the set 450 of complex amplitudes 455-n-n corresponds to a respective basis function 445-n of the set 440 of basis functions 445-n. In particular, each complex amplitude 455-n of the set 450 of complex amplitudes 455-n may follow the relation:
If the transient 101 S(t) is sampled, as described previously, then the above integral may be replaced with the appropriate sum. For example, each complex amplitude 455-n of the set 450 of complex amplitudes 455-n may follow the relation:
Thus it can be seen that the discrete Fourier transform of a transient 101 is the representation of the transient 101 as a superposition of the basis functions 445-n of the set 440 of basis functions 445-n, where each basis function 445-n of the set 440 of basis functions 445-n is scaled by a respective complex amplitude 455-n of the set 450 of complex amplitudes 455-n.
At a step 510 the transient 101 is obtained. The step 510 may comprise retrieving the transient 101 from the storage medium 304. Step 510 may comprise obtaining the transient 101 directly from the mass spectrometer 10. If the transient 101 is represented by a continuous time varying function S(t) the step 510 may comprise sampling the transient 101 as described previously in relation to
At a step 520 the Fourier transform module performs a Fourier transform of the transient 101. The step 520 comprises generating (or calculating) at least part of the set 450 of complex amplitudes 455-n. The at least part of the set 450 of complex amplitudes 455-n may comprise one or more complex amplitudes 455-n from the set 450 of complex amplitudes 455-n The step 520 may comprise using a fast Fourier transform (FFT) algorithm. The FFT algorithm may be any of: a Cooley-Tukey algorithm; a prime-factor algorithm; a Sande-Tukey algorithm; Rader's algorithm; etc. The step 520 comprises generating (or calculating or otherwise obtaining) at least part of the set 430 of frequencies 435-n.
At a step 530 the post processing module 480 generates a mass spectrum 390 based on the at least part of the set 450 of complex amplitudes 455-n. The step 530 may comprise generating the one or more intensity values 396-n from the at least part of the set 450 of complex amplitudes 455-n. For example, each of the one or more intensity values 396-n may be generated using the absolute value of one or more respective complex amplitudes 455-n from the at least part of the set 450 of complex amplitudes 455-n. The step 530 may comprise generating the one or more m/z values 394-n from one or more frequencies 435-n of the at least part of the set 430 of frequencies 435-n. For example, each of the one or more m/z values 394-n may be converted from one or more respective frequencies 435-n from the at least part of the set 430 of frequencies 435-n. The conversion may comprise using a calibration approach. Many such calibration approaches are known in the art (see for example, A. Makarov, “Theory and Practice of the ORBITRAP Mass Analyzer”, in Practical aspects of Trapped Ion Mass Spectrometry, Vol. 4, Ed. R. E. March and J. F. J. Todd, CRC Press 2010, the entire contents of which are incorporated herein by reference) and are therefore not further described in detail herein.
In an example, the generating step 530 may comprise partitioning the complex amplitudes into one or more groups of complex amplitudes. The one or more frequencies 435-n corresponding to the one or more complex amplitudes 455-n in a group of complex amplitudes 455-n form a contiguous part of the set 430 of frequencies 435-n. Each complex amplitude 455-n in a group of complex amplitudes may exceed a predetermined threshold value. Additionally, or alternatively, the partitioning may be based at least in part on a user selection of one or more frequencies and/or one or more complex amplitudes. In this example, each of the one or more intensity values 396-n is generated based on the one or more complex amplitudes 455-n of the respective group of complex amplitudes. In particular, an intensity value 396-n may be a function of any of: the absolute values of the one or more complex amplitudes 455-n in the respective group of complex amplitudes, the real values of the one or more complex amplitudes 455-n in the respective group of complex amplitudes; the imaginary values of the one or more complex amplitudes 455-n in the respective group of complex amplitudes; etc. For example, an intensity value 396-n may be the sum of the absolute values of the one or more complex amplitudes 455-n in the respective group of complex amplitudes. Each of the one or more m/z values 394-n is converted from the one or more frequencies 435-n from the respective group of frequencies. In particular, each of the one or more m/z values 394-n may be converted from a weighted average of the one or more frequencies 435-n from the respective group of frequencies. Generating the weighted average of the one or more frequencies of a group of frequencies may comprise scaling each frequency 435-n of the one or more frequencies 435-n of a group of frequencies by the respective complex 455-n amplitude of the respective group of complex amplitudes. Such a weighted average may be referred to as a “centroid” of a peak represented by the group of complex amplitudes. The intensity of such a centroid could be considered to be the intensity value 396-n corresponding to the respective group of complex amplitudes which may be calculated as described above.
Thus the system 400 and method 500 enable the relative abundance of ionic species present in the ion source 20 to be determined from the transient 101 produced by the mass spectrometer 10. In particular, by decomposing the transient 101 into a set 430 of frequencies 435-n and corresponding complex amplitudes 455-n, through a discrete Fourier transform 420, one or more frequencies 435-n of the set 430 of frequencies 435-n can be converted to m/z values 394-n, from which ionic species can be identified. This is because one or more frequencies 435-n (or one or more groups of frequencies) each correspond closely to the characteristic frequency of a respective periodic signal of the transient 101. However, as described previously, this is only the case when all possible pairs of characteristic frequencies have a greater separation than that of the set 430 of frequencies 435-n. When the separation of any pair of characteristic frequencies is equal to or less than the separation of the set 430 of frequencies 435-n then, due to the Fourier uncertainty principle, the pair of characteristic frequencies cannot be properly resolved in the discrete Fourier transform. Therefore, significant errors are introduced into the m/z values 394-n and/or relative abundances 396-n produced.
The second set 630 of frequencies 635-n comprises a plurality of frequencies 635-n, F0, F1, . . . . For the purposes of discussion an arbitrary frequency 635-n from the plurality of frequencies will be referred to herein as Fk. The separation between adjacent frequencies 635-n in the second set 630 of frequencies 635-n (or spacing of the second set 630 of frequencies 635-n) may be less than the separation between adjacent frequencies 435-n in the set 430 of frequencies 435-n (or spacing of the set 430 of frequencies 435-n). For example, the separation may be given by the mathematical relation Fk−Fk−1=1/PT, where T is the duration of the transient 101 and P is a positive number. P (referred to herein as the refine factor) may be an integer number. In particular, each frequency 635-n in the second set 630 of frequencies 635-n may follow the relation Fk=k/PT, where k is an integer number. In particular, the spacing of the second set of frequencies may be less than the inverse of the duration of the transient signal.
It will be appreciated that the second set 630 of frequencies 635-n may not be equally spaced. In other words the separation between adjacent frequencies 635-n may not be constant. In this case the separation described above may refer to the “minimum separation” i.e. the arithmetic difference between the two closest frequencies 635-n in the second set 630 of frequencies 635-n.
The second set of basis functions 640 is similar to the set of basis functions 440 described previously with reference to
g
k(t)=e2πiF
The second set 650 of complex amplitudes 655-n comprises a plurality of complex amplitudes 655-n, a0, a1, . . . . For the purposes of discussion an arbitrary complex amplitude 655-n from the plurality of complex amplitudes will be referred to herein as ak. Each complex amplitude 655-n of the second set 650 of complex amplitudes 655-n corresponds to a respective frequency 635-n of the second set 630 of frequencies 635-n. Each complex amplitude 655-n of the second set 650 of complex amplitudes 655-n corresponds to a respective basis function 645-n of the second set of basis functions.
The expected phase data 660 comprises one or more expected phases 665-n. For the purposes of discussion an arbitrary expected phase 665-n will be referred to herein as φl. Each expected phase 665-n corresponds to a respective frequency 635-n of the second set 630 of frequencies 635-n. Each expected phase 665-n may be generated (or calculated or determined) based on any of: the arrangement of the mass spectrometer 10; a signal detection method; a measured phase of one or more harmonic spectral components in the transient; a measured phase of one or more harmonic spectral components in any transient obtained in this mass spectrometer before or after obtaining the processed transient; or experimental conditions. In particular, each expected phase 665-n may be calculated based on any of: the method of injection of and/or excitation of the ions within the mass spectrometer 10; at least part of a time of flight of the ions in the mass spectrometer; the angular displacement between the excitation electrodes and the detection electrodes in the mass spectrometer 10. Each expected phase 665-n may correspond to a respective expected phase value at the respective frequency 635-n in the frequency domain of the transient 101. In particular, each phase value may be dependent on any of: local space-charge conditions, global space-charge conditions, etc.
It will be appreciated that it is known that the phase value of a periodic signal in a transient 101 can be dependent on m/z ratio of the ionic species of the coherent ion packet corresponding to the periodic signal. Therefore, the phase value of a periodic signal in a transient 101 can be dependent on the characteristic frequency of the periodic signal. It will, therefore be appreciated that the expected phases 665-n may be calculated based on such phase values. Many approaches to such calculation are known in the art (see for example, “Autophase: An algorithm for automated generation of absorption mode spectra for FT-ICR MS” D. P. A. Kilgour, R. Wills, Y. Qi, and P. B. O'Connor, Analytical Chemistry 2013 85 (8), pp 3903-3911 the entire contents of which are incorporated herein by reference; and “Enhanced Fourier transform for ORBITRAP mass spectrometry”, O. Lange, E. Damoc, A. Wieghaus, A. Makarov International Journal of Mass Spectrometry, Volume 369 (2014) pp 16-22, the entire contents of which are incorporated herein by reference; U.S. Pat. No. 8,853,620 the entire contents of which are incorporated herein by reference; U.S. Pat. No. 8,399,827 the entire contents of which are incorporated herein by reference; and U.S. Pat. No. 8,431,886 the entire contents of which are incorporated herein by reference). Each expected phase 665-n may be stored in the storage medium 306. Additionally, or alternatively, an expected phase 665-n may be calculated (or otherwise determined) based on a function of the frequency 635-n or the index n. Preferably, an expected phase 665-n is calculated based on a polynomial function of the frequency 635-n or the index n. The coefficients of the polynomial function may be calculated (or known or otherwise determined) from the arrangement of the mass spectrometer 10. The coefficients of the polynomial function may be calculated based on a best fitting to the arguments of one or more of the complex amplitudes 455-n. The polynomial function may take into account space-charge correction. In particular space-charge correction may be introduced in the form of additional variables based on any of: intensity values, automatic gain control (AGC) readings, etc.
It will be appreciated that whilst the expected phase data 660 above has been described as comprising one or more expected phase values 665-n, this is only one example of how the expected phase data may be represented. In particular, the expected phase data 660 may also, or alternatively, be represented as a smooth varying function of frequency φ(f).
The generation module 610 is arranged to generate (or initialize) a second set 650 of complex amplitudes 655-n. The second set 650 of complex amplitudes 655-n are as described previously with reference to
The optimization module 620 is arranged to optimize the second set 650 of complex amplitudes 655-n to produce an improved second set 650 of complex amplitudes 655-n. The optimization module may be arranged to use an objective function that relates the complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n to the complex amplitudes 455-n from the set 450 of complex amplitudes 455-n. The objective function may, for each frequency 435-n in the set 430 of frequencies 435-n, relate the complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n to the respective complex amplitude 455-n from the set 450 of complex amplitudes 455-n. The objective function may comprise a matrix (or function) Ψ(n,k) (herein referred to as the “overlap function”).
The objective function may depend on the norms of one or more vectors. Each vector may correspond to respective complex amplitude 455-n from the set 450 of complex amplitudes 455-n. Each element of a vector may comprise the difference between a respective complex amplitude 655-n of the second set 650 of complex amplitudes 655-n scaled with the overlap function and the complex amplitude 455-n corresponding to the vector. For example, the objective function may obey the relation:
A norm, ∥ . . . ∥, may be any convex norm. In particular the norm may be an Lm norm i.e. any one of an L1 norm; an L2 norm; an L3 norm; etc. If the norm is an Lp norm, then the objective function may obey the relation:
The overlap function may depend on one or more basis functions 445-n from the set of basis functions 440 and one or more basis functions 645-n from the second set 640 of basis functions 645-n. In particular, the overlap function may comprise one or more overlaps of a respective basis function 445-n from the set 440 of basis functions 445-n and a respective basis function 645-n from the second set 640 of basis functions 645-n. An overlap may comprise the inner product of the respective basis function 445-n from the set 440 of basis functions 445-n and the respective basis function 645-n from the second set 640 of basis functions 645-n, i.e. the overlap function may obey the relation Ψ(n,k)=<hn, gk>. The inner product may be taken over the duration of the transient 101. For example the overlap function may obey the relation:
which may also be represented as:
It will be appreciated that the overlap function Ψ(n,k) may be a Fourier image of the basis function (with the index k) 645-n of the second set 640 of basis functions 645-n in relation to the basis function (with the index n) 445-n of the first set 440 of basis functions 445-n. The overlap function Ψ(n,k) may be represented as a N×(NP) complex-value matrix Ψ.
A step 710 comprises the generation module 610 generating the second set 650 of complex amplitudes 655-n. The second set 650 of complex amplitudes 655-n may be generated based on one or more predetermined values. The second set 650 of complex amplitudes 655-n may be generated based on the Fourier transform of a model transient. The model transient may be generated based on user specified one or more predetermined m/z values and one or more predetermined relative abundances. The second set 650 of complex amplitudes 655-n may be generated based on the Fourier transform 420 of the transient 101. Additionally, or alternatively, one or more (or all) of the complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n may be set to zero (or values substantially close to zero).
A step 720 comprises the optimization module 620 optimizing the second set 650 of complex amplitudes 655-n to produce an improved second set 650 of complex amplitudes 655-n. The step 720 may comprise varying one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n with the aim of obtaining (or achieving or generating) an extremum value of the objective function. The improved second set 650 of complex amplitudes 655-n may be set to (or comprise or otherwise be equivalent to) the resulting complex amplitudes. The extremum value of the objective function may be a value of the objective function where the rate of change of the value of the objective function with respect to one or more of the complex amplitudes 655-n is substantially zero. In this exemplary embodiment, the extremum value of the objective function is be a global minimum. However, this need not be the case. For example, the extremum value of the objective function may be a local minimum, a global maximum, or a local maximum.
The optimizing is subject to one or more constraints based on the expected phase data 660. Each of the one or more constraints may correspond to a respective expected phase 665-n of the expected phase data 660. Each of the one or more constraints may correspond to a respective complex amplitude 655-n of the improved second set 650 of complex amplitudes 655-n. In particular the optimizing may be subject to, for at least some of the complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n, a constraint on the phase of each complex amplitude 655-n relative to a respective expected phase 665-n. A constraint may require (or impose or set or otherwise enforce) the phase of the respective complex amplitude 655-n of the improved second set 650 of complex amplitudes 655-n be equal to the respective expected phase 665-n of the expected phase data 660. For example such a constraint may be represented as:
arg ak=φk
It will be appreciated that such a constraint can be imposed in many different, mathematically equivalent, ways. For example, for an arbitrary complex amplitude 655-n ak with a corresponding expected phase 665-n φk, the expected phase may be incorporated into the basis function 645-n corresponding to the complex amplitude 655-n. In particular, the phase of the basis function may be set equal to the expected phase. Thus, the constraint corresponding to the complex amplitude 655-n may require the complex amplitude 655-n to be real valued and of a particular sign.
One or more φk may be (but not necessarily are) zero; in this particular case one or more ak are non-negative real numbers. One or more φk may be (but not necessarily are) equal to 180 degrees; in this particular case one or more ak are non-positive real numbers.
Alternatively, a constraint may require (or impose or set or otherwise enforce) the phase of the respective complex amplitude 655-n of the improved second set 650 of complex amplitudes 655-n be within a predefined range around (or substantially centred on, or within, or otherwise based on) the respective expected phase 665-n of the expected phase data 660. For example, such a constraint may be represented as:
φk−Δφ≦arg ak≦φk+Δφ
The range may be any of: set by a user; based on the mass spectrometer 10; dependent on the frequency corresponding to the expected phase 665-n; based on the expected phase jitter of the mass spectrometer 10; etc. A complex amplitude 655-n ak, which is zero, may be considered as satisfying any phase constraint.
It will be appreciated that the step 720 may be mathematically equivalent to generating a further set 450 of complex amplitudes 455-n. Each complex amplitude of the set of further complex amplitudes may correspond to a respective frequency 435-n from the set 430 of frequencies 435-n. For the purposes of discussion, an arbitrary complex amplitude of the further set 450 of complex amplitudes 455-n will be referred to herein as s′n. Each complex amplitude of the further set 450 of complex amplitudes 455-n may comprise the sum of one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n, where each of the one or more complex amplitudes 655-n is scaled by the overlap function. For example, s′n=ΣkakΨ(n,k). The improved second set 650 of complex amplitudes 655-n may be formed from varying one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n. This is performed with the aim of minimizing the sum of the norm of each difference between a complex amplitude of the further set 450 of complex amplitudes 455-n and the corresponding complex amplitude 455-n of the set 450 of complex amplitudes 455-n. The phase of each complex amplitude 655-n of the improved second set 650 of complex amplitudes 655-n may be constrained to be substantially equal to a respective expected phase 665-n of expected phase data 660.
It will be appreciated that the step 720 may be implemented using a numerical optimization technique of which many examples are known in the art. In particular the step 720 may be implemented using (or comprise or be based on) an iterative method (or procedure). As such, the optimization described above may not actually obtain an extremum value of the objective function. The optimization described above may be complete (or successful or may terminate) when a value of the objective function is obtained that is suitably close (or estimated to be suitably close) to an extremum value (or estimated or predicted extremum value) of the objective function. If the step 720 is implemented using an iterative method the optimization described above may be complete if any of the following conditions are met:
For example, the step 720 may be implemented, in whole or in part, using any of: a finite differences method e.g. such as Newton's method; a Quasi-Newton method; a conjugate gradient method; a steepest descent method; proximal minimization etc. Any of these methods may be combined with projection onto the domain of amplitudes that satisfy the phase restrictions. Preferably, the numerical optimization technique comprises a plurality of steps, each of which can be implemented using either component-wise updates of the second set 650 of complex amplitudes 655-n and/or vector and matrix operations that can be reduced to Fast Fourier Transform operations. This, advantageously, enables the optimization to be carried out parallel computing hardware (e.g. general purpose graphical processing unit-type systems). Thus, enabling improved computational efficiency and better computational scaling with respect to the number of complex amplitudes in the second set 650 of complex amplitudes 655-n (and hence the separation of the second frequency grid). A particular embodiment of the invention uses the alternating direction method of multipliers (ADMM) method and is described in further detail below.
It will be appreciated that the second set 650 of complex amplitudes 655-n and the improved second set 650 of complex amplitudes 655-n may be the same entity. In particular, the improved second set 650 of complex amplitudes 655-n may be the values of the second set 650 of complex amplitudes 655-n after the step 720. Put another way, the step 720 may comprise varying the second set 650 of complex amplitudes 655-n directly and the improved second set 650 of complex amplitudes 655-n being the varied second set 650 of complex amplitudes 655-n.
It will be appreciated that the second set 650 of complex amplitudes 655-n generated in the step 710 may be seen as providing an initial starting point (or guess) to the optimization described in the step 720. As such there are many ways known in the art for generating starting points for optimization processes. Therefore, the skilled person would appreciate that the description of the step 710 above is exemplary and may be modified or changed.
In the step 530 the improved second set 650 of complex amplitudes 655-n are used in place of the set 450 of complex amplitudes 455-n. Additionally the second set 630 of frequencies 635-n is used in place of the set 430 of frequencies 435-n.
Thus the method 700 enables the relative abundance of ionic species present in the ion source 20 to be determined from the transient 101 produced by the mass spectrometer 10. In particular, the method 700 significantly increases the accuracy of the m/z values 394-n and relative abundances 396-n compared to those produced by method 500. This is achieved by decomposing the transient 101 onto a second set 630 of frequencies 635-n which has a P-times smaller separation than the set 430 of frequencies 435-n used in method 500 (the Fourier grid corresponding to the duration of the transient 101). The method 700 results in a true decomposition onto the second set 630 of frequencies 635-n, rather than a simple interpolation onto the second set 630 of frequencies 635-n such as that produced by the zero-padding approach described previously. Therefore, frequency resolution is improved with the method 700. In particular, the method of 700 enables pairs of characteristic frequencies whose separation is 2/PT or greater to be resolved. Thus, the method 700 enables a P-times improvement of the frequency resolving power of the method 500.
with extra condition wn=ΣkΨ(n,k)zk−sn. Here the following will be appreciated:
i.e. a plain a represents the vector of all of the complex amplitudes of the second set 650 of complex amplitudes 655-n;
where each yn may be a complex number. Vector y may be thought of as corresponding to the discrepancy between a and s;
where each vn may be a complex number. Vector v may be known as a “dual variable”. Vector v may be thought of as a vector of Lagrange multipliers;
where each uk may be a complex number. Vector u may be known as a “dual variable”. Vector u may be thought of as a vector of Lagrange multipliers;
where each wn may be a complex number;
where each zk may be a complex number;
i.e. a plain s represents the vector of all of the complex amplitudes of the set 450 of complex amplitudes 455-n; and
The regularization parameter is greater than zero. In particular, the regularization parameter may depend on the objective function. For example, if the objective function is squared L2 norm, the regularization parameter may be in the range 10−3 to 10−1. If the objective function is L1 norm, the regularization parameter may be in the range 10−3×smax to 10−1×smax, where smax is the maximal absolute value (or an estimate of the maximal absolute value) of the spectrum 420. The regularization parameter may vary from iteration to iteration.
The indicator function may follow the relation:
Put another way the indicator function may be zero within the cone in complex space defined as: arg akε[φk−Δφk+Δφ], and plus infinity outside this cone. Consistent with the ADMM a dual function may be written as:
A step 810 comprises setting the vectors a, y, z, u, and v to some initial values. These values may be zero. However, it would be appreciated that for a convex objective function and feasibility domain any choice of initial conditions may lead to convergence.
A step 820 comprises updating the complex amplitudes ak 655-n of the improved second set 650 of complex amplitudes based on the vector z; and the vector u. In particular, the step 820 may comprise updating the complex amplitudes ak 655-n of the improved second set 650 of complex amplitudes 655-n in accordance with the formula:
a
k
:=z
k
−u
k
A step 825 comprises applying the one or more constraints as described previously with reference to
A step 830 comprises minimization of the Lagrangian with respect to the variables yn. The vector y may be updated element-wise based on the elements wn and vn. Herein “updating” can refer to any process or step where a variable (such as any of; a vector; a component of a vector; a scalar etc.) is given (or set or calculated) a new value (or values). Herein, “element-wise” refers to any process or step where each component (or element) of a vector (or matrix) is updated independently of each other component of the vector (or matrix). It will be appreciated that such updates can be carried out: in serial; in parallel or as a mix of serial and parallel operations. In particular, each component yk of the vector y may be updated using a proximal operator. The proximal operator may be dependent on the regularization parameter and the objective function. For example, each component yk of the vector y may be updated using following relation:
Typically, the proximal operator depends on the choice of the objective function. For instance, if the objective function comprises the squared L2 norm, the explicit form of the proximal operator is
If the objective function comprises the L1 norm, the explicit form of the proximal operator is
A step 840 comprises updating the vector z based on one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n; the vector u; the vector v and the vector y. The step 840 also comprises updating the vector w based on the complex amplitudes of the second set 650 of complex amplitudes 655-n; the vector u; the vector v and the vector y. In particular, the step 840 may comprise calculating a first intermediate vector, {tilde over (z)}, based on one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n; and the vector u. The first intermediate vector, {tilde over (z)}, may follow the relation:
{tilde over (z)}=a+u
The step 840 may comprise calculating a second intermediate vector, {tilde over (w)}, based on one or more complex amplitudes of the second set 650 of complex amplitudes 655-n; and the vector u. The second intermediate vector, {tilde over (w)}, may follow the relation:
{tilde over (w)}=y+v
The step 840 may comprise an orthogonal projection the first intermediate vector and the second intermediate vector onto a hyperplane. The hyperplane may be based on the overlap function and the set 450 of complex amplitudes 455-n. In particular, the hyperplane may be defined as: w=Ψz−s. Vector z may be updated based on the orthogonal projection. Vector w may be updated based on the orthogonal projection. For example, step 840 may comprise updating vector z using the following relation:
z={tilde over (z)}−
T
r
Step 840 may comprise updating vector w may be updated using the following relation:
w:={tilde over (w)}+r
Step 840 may comprise calculating the vector r using the relation:
r=(1−P)−1(Ψ{tilde over (z)}−s−{tilde over (w)})
where P is as described previously in relation to
It will be appreciated that step 840 may be seen as comprising updating the vectors z and w with the aim of minimizing the Lagrangian. This update may be subject to the condition w=Ψz−s. The update may be given by the formulas
It will be appreciated that the operations to calculate the intermediate values {tilde over (z)}k, {tilde over (w)}n and the updates of zk and wn may be element-wise. The operations of matrix-vector multiplications by Ψ and
A step 850 comprises updating the vectors u and v (also known as “the dual variables”). The vector u is updated based on one or more complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n, the vector z and the current state of the vector u. In particular, the step 850 may comprise subtracting the vector z from the vector u and adding the vector a. The vector v is updated based on the vector y, the vector w and the current state of the vector v. In particular, the step 850 may comprise subtracting the vector w from the vector v and adding the vector y.
A step 860 comprises checking one or more convergence criteria. A convergence criterion may be any of:
The step 860 may comprise continuing the method from the step 820 if at least one of the one or more convergence criteria has been satisfied. Alternatively the optimization step 720 may be terminated. The step 860 may comprise continuing the method from the step 820 if all of the one of the one or more convergence criteria have been satisfied. Alternatively the optimization step 720 may be terminated. The step 860 may comprise returning the improved second set 650 of complex amplitudes 655-n. In this way it will be appreciated that an iteration procedure can be defined comprising the steps 820 to 870.
As can be seen the mathematical operations on the steps 820, 825, 830, and 850 are component wise, as discussed previously, this enables these steps to be carried out efficiently using parallel computing hardware. The step 840 may comprise multiplications of a vector by matrices Ψ and
A step 920 comprises the optimization module 620 optimizing the second set 650 of complex amplitudes 655-n to produce an improved second set 650 of complex amplitudes 655-n. The step 920 is the same as the step 720 of
At least one of the complex amplitudes 655-n (referred to herein as a “harmonic” complex amplitude for the purposes of discussions) of the second set 650 of complex amplitudes 655-n each comprises a respective auxiliary complex amplitude corresponding to the respective frequency and a respective “base” complex amplitude, scaled by a respective parameter (which for the purposes of discussion may be represented as ξ). A base complex amplitude may be a complex amplitude 655-n from the second set 650 of complex amplitudes 655-n. A base complex amplitude may be an auxiliary complex amplitude of a complex amplitude 655-n from the second set 650 of complex amplitudes 655-n.
For each base complex amplitude in the second set 650 of complex amplitudes 655-n the frequency 635-n corresponding to the harmonic complex amplitude may be equal to the frequency 635-n corresponding to the corresponding base complex amplitude scaled by a factor. The factor may be an integer q. Put another way, the frequency 635-n corresponding to the corresponding harmonic complex amplitude may be a q-th harmonic of the frequency 635-n corresponding to the base complex amplitude 655-n.
The parameter, ξ, may be dependent on the factor. The parameter may be dependent on the frequency 635-n corresponding to the respective base complex amplitude 655-n. The parameter ξ may be dependent on any of:
For one or more harmonic complex amplitudes 655-n in the second set 650 of complex amplitudes 655-n the optimizing of step 920 may be subject to a respective constraint on the phase of the respective auxiliary complex amplitude relative to a respective expected phase 665-n. In particular, the respective constraint may require the phase of the respective auxiliary complex amplitude be equal to the expected phase 665-n of the expected phase data 660. For example such constraints may be represented as:
arg ak*=φk
Alternatively, the respective constraint may require the phase of the respective auxiliary complex amplitude of the improved second set 650 of complex amplitudes 655-n be within a predefined range around the respective expected phase 665-n of the expected phase data 660. For example, such constraints may be represented as:
φk−Δφ≦arg ak*≦φk+Δφ
In an example, each complex amplitude 655-n, ak, of the second set 650 of complex amplitudes 655-n may obey the relation:
Put another way, for each complex amplitude 655-n in the second set 650 of complex amplitudes 655-n corresponding to the q-th harmonic of another complex amplitude 665-n in the second set 650 of complex amplitudes 655-n, that complex amplitude 655-n comprises the sum of a respective auxiliary complex amplitude and the another complex amplitude 655-n (or the corresponding auxiliary complex amplitude), scaled by a parameter.
In a further example, each complex amplitude, ak, of the second set 650 of complex amplitudes 655-n may obey the relation:
This may be regarded as distributing the q-th harmonic evenly over the complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n.
As the shape of each periodic signal in the transient 101 may not be typically exactly the same as the shape of the Fourier basis functions, a single periodic signal typically contributes to a plurality complex amplitudes 455-n of the set 450 of complex amplitudes 455-n in the method 500 described previously. Put another way, a single ionic species typically generates multiple Fourier harmonics. The phase of the complex amplitudes corresponding to the harmonics generated by a single ionic species may be substantially the same. Thus for a given harmonic produced by one ionic species the contribution to the complex amplitude at that harmonic frequency may be different to the phase of the contribution to the complex amplitude at the same frequency given by another ionic species. The method 900 may improve overall accuracy, accounting for such a difference when applying phase constraints.
In some examples, it is advantageous to set q equal to three. For example in an ORBITRAP mass analyzer, the complex amplitude corresponding to the third harmonic frequency ranges between 3% and 5% of the complex amplitude of the corresponding base frequency. The presence of complex amplitudes at such harmonic frequencies can lead to false positives and lead to spurious m/z values 494 being obtained. The method 900 enables the use of the complex amplitudes 655-n at such harmonic frequencies 635-n to improve the accuracy of the decomposition relative to the method 700 and the method 500. In particular the accuracy of the complex amplitudes 655-n of the improved second set 650 of complex amplitudes 655-n generated by the method 900 is improved over the complex amplitudes 455-n, 655-n produced by the previous methods. Therefore, the mass spectra produced by the method 900 is of an improved accuracy relative to the previous methods.
A step 1020a comprises, for each harmonic complex amplitude 655-n (as described previously with reference to
A step 1020b is the same as the step 820 of
It will be appreciated that applying a constraint for a harmonic complex amplitude 655-n, as described above in step b, may be performed instead of or in addition to, applying a constraint to that complex amplitude 655-n as described previously in step 820.
A step 1020c comprises updating one or more harmonic complex amplitudes 655-n of the second set 650 of complex amplitudes 655-n. The updating may be based on any of: the parameter, ξ; a base complex amplitude 655-n corresponding to the harmonic complex amplitude; an auxiliary complex amplitude of a base complex amplitude 655-n corresponding to the harmonic complex amplitude; a complex amplitude from the improved second set 650 of complex amplitudes 655-n. For example, the updating may use the relation:
It will be appreciated that the steps a, b and c may be performed as a block component wise on the vector a. By way of an example the steps 1020a, 1020b and 1020c may be performed on a complex amplitude 655-n ak before the steps 1020a, 1020b and 1020c are performed on a complex amplitude 655-n ak+1 and so on. The steps 1020a, 1020b and 1020c may be performed in parallel on one or more complex amplitudes 655-n. By way of an example, the step 1020a may be performed on a complex amplitude 655-n ak and a complex amplitude 655-n ak+1, before the step 1020b may be performed on the complex amplitude 655-n ak and the complex amplitude 655-n ak+1, and so on. Additionally, it will be appreciated that many variants on these two schemes would be considered by the skilled person.
It will be appreciated that in some embodiments of the invention, in line with the preceding description, part 425 of the discrete Fourier transform 420 may be used in place of the whole discrete Fourier transform 420 of the transient 101. In particular, the set 430 of frequencies 435-n may be limited to (or span) only a part 425 (or parts) of the discrete Fourier transform 420. It will be appreciated that this may be the case even if the full discrete Fourier transform 420 has been calculated (such as in the step 520). The selection of a part 425 (or parts) of the discrete Fourier transform is referred to herein as “windowing” and is described in more detail below by reference to specific examples.
The forming (or generating) of such a part 425 may be considered to be equivalent to applying a windowing function 1490 to the discrete Fourier transform 420. The windowing function 1490 is typically a real-valued function of frequency. In particular, the part 425 may be the product of the window function 1490 and the discrete Fourier transform. The example 1490-1 window function 1490 shown in
One particular advantage of such windowing is increased processing speed and reduction of computational burden. This is due to the fact the subsequent processing need not take account of the frequency spectrum in its entirety, but only to the regions of interest. For a sparse enough spectrum or for sparse enough segments of interest, calculations can be carried only within windows of the spectrum encapsulating these regions. The use of windowing in general is known in the art and the selection of the window width in this particular case could be run automatically using algorithms known to the skilled person.
The use of a windowing function whose value varies within the window, as above, has an advantage of reducing artefacts that may be introduced by the windowing relative to using a boxcar type of windowing function like that shown at 1490-1. Examples of windowing functions whose value varies within the window include any of: Gaussian functions, Hann functions, Hamming functions. It will be appreciated that a windowing function 1490 may comprise two or more functions, such as in the form of any of a sum, product, convolution and so forth.
It will be appreciated that the windowing outlined above may be done based on frequencies that are expected in some way. In particular, and as set out previously, frequency values may be mapped (or converted) to m/z values and vice versa. As such, windows may be chosen to encompass m/z values and/or ranges of interest. This may be particularly beneficial in various quantitative proteomics experiments, in particular isobaric labelling, which would be familiar to the person skilled in the art. Here, analytes are typically covalently labelled with chemical compounds. These chemical compounds are usually chosen to appear to be of the same (or substantially the same) masses. As such they may be considered to be isobaric in the m/z sense. Typically, upon fragmentation, these chemical compounds yield ions, known as reporter ions, of different m/z, which can be used for quantitative analysis of the analytes. As the m/z of these expected reporter ions are known beforehand, the windowing described above may be used advantageously. In particular, windows could be selected so as to encompass the m/z segments where reporter ions are expected to be located.
Particular examples of quantitative proteomics experiments requiring high resolution scans include any of: Isobaric Tags for Relative and Absolute Quantification (iTRAQ) experiments; Tandem Mass Tags (TMT) experiments; and Neutron-Encoded Mass Signatures for Multiplexed Proteome Quantification (NeuCode) experiments. As such, the application of the methods outlined above would be advantageous in any one or more of these experiments.
It will be appreciated that various computational parameters (such as any of: the number of iterations; convergence criteria; the refine factor; etc.) of the above methods may be different for different windows. For example, a user may vary such computational parameters to alter the accuracy vs computation balance for different windows of the same overall spectrum.
It will be appreciated that in the methods described above where the term “complex amplitude” is used, the imaginary component of said complex amplitude may be zero. In other words a complex amplitude may be real valued. In this case, the phase of such a real valued complex amplitude will be zero if the real value is positive or 7 radians (180 degrees) if the real value is negative.
It is known that, typically, it is important to keep an optimal ion population in an electrostatic trap 80 such as the one described in
In this procedure, typically, information on the number of charges from a previous scan (this scan might be a dedicated AGC pre-scan), or from the readings of the dedicated detector used to “predict” the accumulation time needed to achieve the optimal ion population. It will therefore be appreciated that any of the methods described above would be well-suited for use in AGC. In particular, any of these methods may be used for the prediction of the total ion population. This prediction may be across the entire mass range. Additionally, or alternatively the prediction may be within selected regions by using windowing as described above.
An example approach may involve a dedicated pre-scan. Typically the pre-scan would have a substantially shorter acquisition time compared the usual mass analysis. The pre-scan may then be used to estimate a current ion flux. This information may be used to control or adjust the ion optics and/or determine the optimal number of charges to be introduced in the detector for mass analysis in any following scans.
Of course, it will be appreciated that in addition to, or instead of a pre-scan, the information on the number of charges may directly be obtained from a mass scan itself by either application of any of the above methods. It should be noted that such information could be obtained in real time. For example, this may be achieved by using only the initial part of the transient (such as data from 1-10% of the entire transient duration). A combinational approach might also be feasible, in which a dedicated pre-scan(s) can be used to roughly “predict” the optimal ion optic setting, whereas current mass spectra are used for fine adjustment.
It will be appreciated that mass-spectrometers 10 such as that shown in
It will therefore be recognized that use of the embodiments of the invention outlined above to evaluate the number of charges in the reported frequency spectrum for the purposes of calibration would be beneficial. In particular, the degree of the global space charge is directly related to the entire number of charges, therefore the above methods may be used to evaluate more accurately total ion population across the entire mass range, especially for analysis of proteins. For the local space charge evaluation, windowing can be used, as set out previously, to focus on m/z values or ranges of interest.
The resolving capability (or fidelity measure) of the method 700 and/or the method 900 may depend at least in part upon the local peak density. The local peak density can be thought of as the number of peaks within a given mass window. The resolving capability of the method 700 and/or the method 900 may also depend at least in part upon noise conditions. For example the degree of fidelity with which clusters of peaks are being resolved typically drops as the complexity of clusters increases and/or the noise level increases. The resolving capability may be thought of as the minimum separation between two peaks where the two peaks can be considered to be correctly resolved (to a pre-determined confidence). In other words, peaks separated by less than the resolving capability may be considered to compose the same peak to a pre-determined confidence. The resolving capability may be estimated as a function of any of: frequency separation, signal-to-noise ratio, false detection (or discovery) rate (FDR), local peak density, and so forth. Typically the resolving capability is estimated as a function of FDR. The general concept of the FDR is well known in the art and not discussed further herein, however in this case the FDR may be understood as the proportion of peaks (or m/z values with non-zero intensity) in the calculated mass spectrum which would not match within predefined tolerances to a hypothetical completely accurate version of the same mass spectrum.
Regression tests on simulated data may be used to estimate the resolving capability as a function of FDR. For example, numerical experiments, such as the calculation of ideal mass spectra of known doublet or triplets may be used. Typically, by comparing mass spectra for such doublet or triplets obtained using the methods outlined above with the numerical experiments the resolving capability as a function of FDR may be obtained.
It will be appreciated that further processing may be carried out in relation to the one or more intensity values 396-n that may be generated in any of the post processing steps 530 described above. As discussed above, the one or more intensity values 396-n may be generated based on one or more complex amplitudes. In particular, an intensity value 396-n may be a function of any of: the absolute values of the one or more complex amplitudes, the real values of the one or more complex amplitudes; the imaginary values of the one or more complex amplitudes; etc. For example, an intensity value 396-n may be the sum of the absolute values of the one or more complex amplitudes or the sum of the one or more complex amplitudes projected onto an expected phase (such as through phase correction). This further processing is preferably carried out on an already real-valued amplitude spectrum in the frequency domain.
In particular, intensity values 396-n (or centroids) separated by less than a threshold m/z value may be formed into a single merged intensity value (or centroid), such as via a weighted average as described previously. The threshold value may be set based on the local peak density and a user specified confidence level. Preferably, the threshold may be set using an estimate of the resolving capability as a function of FDR, as described above. For example, for a given FDR (such as 1%) the estimate of the resolving capability at that FDR would give the threshold value. The intensity values 396-n may then be further processed using this threshold value as described above. In effect, this example may be thought of as using the results of regression tests on simulated data to create a new mass spectrum (which could be termed the reported mass spectrum) that reflects a “true” mass spectrum with FDR not more than the specified value.
In another example, the local peak density of the mass spectrum 390 may be determined using a sliding window approach. A minimal frequency distance, for a user specified confidence, corresponding to the local peak density may be determined, such as based on the estimated fidelity measure, above. The threshold value may then be set based on this minimal frequency distance (typically the threshold value will be substantially equal to the minimal frequency distance).
In this particular example the estimated resolving capability for the system used to generate the complex amplitudes 655-n and the user specified FDR corresponds to a threshold value 1205 (shown as a frequency width). The centroid corresponding to intensity value 396-3 and the centroid corresponding to intensity value 396-4 are separated by less than this threshold value 1205. In other words, the difference between the frequency 394-3 corresponding to intensity value 396-3 and the frequency 394-4 corresponding to intensity value 396-4 is less than the threshold value 1205. As a result, the further processing replaces (or merges) these two centroids (and therefore the corresponding intensity values 396-3; 396-4 and the corresponding frequencies 394-3; 394-4) with a merged centroid with a new intensity value 1296-1 and a new frequency 1294-1. The new (or merged) intensity value 1296-1 may be the sum of the two intensity values 396-3; 396-4. The new (or merged) frequency 1294-1 may be the average of the two frequencies 394-3; 394-4 (typically weighted with the two intensity values 396-3; 396-4). In this way a reported mass spectrum with FDR not more than the specified value has been formed. It will be appreciated that in this example the “true” mass spectrum might have indeed contained peaks at 394-3 and 394-4. Merging these peaks may still provide an advantage if the probability of erroneous “over-resolving” peaks and giving false positives is too high to be acceptable by instead presenting an under-resolved peak.
Additionally, when a mass spectrum produced by further process such as that described above, is displayed to the user visual indication of merged centroids may be provided. For example a distribution curve centred on a merged centroid (such as the frequency 1294-1 of the merged centroid) may be displayed. An example of such a distribution curve 1210 is shown in
In this way a visual representation of the estimated resolving “error” may be presented to the user to allow the user to better understand mass resolution of the mass spectrum.
It will be appreciated that the methods described have been shown as individual steps carried out in a specific order. However, the skilled person will appreciate that these steps may be combined or carried out in a different order whilst still achieving the desired result.
It will be appreciated that embodiments of the invention may be implemented using a variety of different information processing systems. In particular, although the figures and the discussion thereof provide an exemplary computing system and methods, these are presented merely to provide a useful reference in discussing various aspects of the invention. Embodiments of the invention may be carried out on any suitable data processing device, such as a personal computer, laptop, server computer, etc. Of course, the description of the systems and methods has been simplified for purposes of discussion, and they are just one of many different types of system and method that may be used for embodiments of the invention. It will be appreciated that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or elements, or may impose an alternate decomposition of functionality upon various logic blocks or elements.
It will be appreciated that the above-mentioned functionality may be implemented as one or more corresponding modules as hardware and/or software. For example, the above-mentioned functionality may be implemented as one or more software components for execution by a processor of the system. Alternatively, the above-mentioned functionality may be implemented as hardware, such as on one or more field-programmable-gate-arrays (FPGAs), and/or one or more application-specific-integrated-circuits (ASICs), and/or one or more digital-signal-processors (DSPs), and/or other hardware arrangements. Method steps implemented in flowcharts contained herein, or as described above, may each be implemented by corresponding respective modules; multiple method steps implemented in flowcharts contained herein, or as described above, may be implemented together by a single module.
It will be appreciated that, insofar as embodiments of the invention are implemented by a computer program, then a storage medium and a transmission medium carrying the computer program form aspects of the invention. The computer program may have one or more program instructions, or program code, which, when executed by a computer carries out an embodiment of the invention. The term “program” as used herein, may be a sequence of instructions designed for execution on a computer system, and may include a subroutine, a function, a procedure, a module, an object method, an object implementation, an executable application, an applet, a servlet, source code, object code, a shared library, a dynamic linked library, and/or other sequences of instructions designed for execution on a computer system. The storage medium may be a magnetic disc (such as a hard drive or a floppy disc), an optical disc (such as a CD-ROM, a DVD-ROM or a BluRay disc), or a memory (such as a ROM, a RAM, EEPROM, EPROM, Flash memory or a portable/removable memory device), etc. The transmission medium may be a communications signal, a data broadcast, a communications link between two or more computers, etc.
Number | Date | Country | Kind |
---|---|---|---|
15165127.0 | Apr 2015 | EP | regional |
16163872.1 | Apr 2016 | EP | regional |