Radiation detection with non-parametric decompounding of pulse pile-up

FIELD OF THE INVENTION

A novel method is provided for estimating the energy distribution of quanta of radiation such as photons incident upon a detector in spectroscopic systems such as in X-ray or gamma-ray spectroscopy. The method is particularly useful for count-rate regimes where pulse pile-up is an issue. A key step in the derivation of the estimator of an embodiment of the invention is the novel reformulation of the problem as a decompounding problem of a compound Poisson process. The method can be applied to any form of radiation detector detecting quanta or other corpuscles of radiation, such as x-rays, gamma rays or other photons, neutrons, atoms, molecules, or seismic pulses. Applications of spectroscopy from such detectors are well known. Such applications are described widely in the prior art, including in international patent applications PCT/AU2005/001423, PCT/AU2009/000393, PCT/AU2009/000394, PCTAU2009/000395, PCT/AU2009/001648, PCT/AU2012/000678, PCT/AU2014/050420, PCT/AU2015/050752, PCT/AU2017/050514, and PCT/AU2017/050512, each of which is incorporated herein in its entirety for the purpose of describing potential applications of the current invention and any other background material needed to understand the current invention.

1 BACKGROUND

X-RAY and gamma-ray spectroscopy underpin a wide range of scientific, industrial and commercial processes. One goal of spectroscopy is to estimate the energy distribution of photons incident upon a detector. From a signal processing perspective, the challenge is to convert the stream of pulses output by a detector into a histogram of the area under each pulse. Pulses are generated according to a Poisson distribution whose rate corresponds to the intensity of X-rays or gamma-rays used to illuminate the sample. Increasing the intensity results in more pulses per second on average and hence a shorter time before an accurate histogram is obtained. In applications such as baggage scanning at airports, this translates directly into greater throughput. Pulse pile-up occurs when two or more pulses overlap in the time domain. As the count rate (the average number of pulses per second) increases so does the incidence of pulse pile-up. This increases the difficulty in determining the number of pulses present and the area under each pulse. In the limit, the problem is ill-conditioned: if two pulses start at essentially the same time, their superposition is indistinguishable from a single pulse. The response of an X-ray or gamma-ray detector to incident photons can be modelled as the superposition of convolutions of pulse shape functions Φ_j(t) with Dirac-delta impulses,

$\begin{matrix} r (t) = \sum_{j = - \infty}^{\infty} a_{j} δ (t - τ_{j}) {★Φ}_{j} (t) . & (1) \end{matrix}$

The arrival times . . . , τ₋₁, τ₀, τ₁, . . . are unknown and form a Poisson process. Each photon arrival is modelled as a Dirac-delta at time τ_j, and with amplitude a_jthat is proportional to the photon energy and induces a detector pulse shape response Φ_j. The amplitudes a_jare realizations of identically distributed random variables A_jwhose common probability density function ƒ_A(x) is unknown. The pulse shape function Φ_jis determined by the geometry of the detector and the photon interaction. In some systems the variation in pulse shape is minimal and may be ignored, while other systems (e.g., HPGe) individual pulse shapes may differ significantly from one another ill. It is assumed all pulse shapes are causal i.e., Φ_j(t)=0 for t<0, uni-modal, of finite energy, and decay exponentially toward zero as t→∞. The integral of the pulse shape functions are normalized to unity, i.e., ∫_−∞^∞Φ_j(t) dt=1, so that the area under the pulse is given by A_j. The observed signal consists of the detector output corrupted by noise, i.e.,

$\begin{matrix} s (t) = r (t) + w (t) . & (2) \end{matrix}$

The mathematical goal of pulse pile-up correction is to estimate ƒ_A(x) given a uniformly-sampled, finite-length version of s(t). We assume throughout that the noise distribution w(t) is zero-mean Gaussian with known variance σ². We also assume the photon arrival times form a homogeneous Poisson process with a known rate. Consisting of the detector response R corrupted by a noise process W, where S_k, R_kand W_krepresent the kth elements of each series and where 0≤k<K. Let S, R and W be the uniformly sampled time-series corresponding to these signals,

$\begin{matrix} S = R + W & (3) \\ = {s (t_{k}) : 0 \leq k < K} & (4) \\ R = {r (t_{k}) : 0 \leq k < K} & (5) \\ W = {W_{t_{k}} : 0 \leq k < K} & (6) \\ where t_{0} < t_{1} < . . . < t_{K - 1} . \end{matrix}$

Summary of Pulse Processing Methodologies: Numerous approaches have been proposed over the decades to address the issue of pulse pile-up. Approaches can be broadly categorized into two types: time-domain based and energy-domain based. A popular strategy is to attempt to detect when pile-up has occurred in the time domain, and either reject or compensate for the affected pulses. Early spectroscopic systems adopted a rejection-based approach along with matched filtering. The disadvantage of this approach is that an increasing proportion of pulses are rejected as the probability of pile-up grows. The system rapidly succumbs to paralysis, placing an upper limit on the count rate [1]. Strategies that compensate or correct for pile-up have grown in number with the increase of cheap computational power. These include template fitting [2], baseline subtraction [3], adaptive filtering [4, 5], sparse regression [6, 7] and more. These approaches all attempt to identify and compensate for pile-up in the time domain, and are generally best suited to systems with low pulse shape variation. The complexity of these approaches increases significantly with increasing variability between pulse shapes Φ_j. It can be shown that any method that attempts to characterise individual pulses will suffer from pile-up. The best that these approaches can do is to reduce the onset of pile-up. Energy-based approaches attempt to address pile-up based on the statistics of an ensemble of pulses rather than individual pulses. They typically operate on histograms of estimated energy (the areas under clusters of pulses). The early work of Wielopolski and Gardner [8] and more recent extensions of their idea [9] operate primarily in the energy domain using ensemble-based strategies. Trigano et al. [10, 11] estimate the incident spectrum utilizing marginal densities from the joint distribution of the statistical properties of variable length clusters of pulses where the beginning and end of each cluster is detected. This circumvents the need to identify individual pulses, and is robust to pulse shape variation. Ilhe et al. [12] examine exponential shot-noise processes, restricting pulse shapes to a simple exponential to obtain tractable results. Further work [13] has been done to allow a wider range of pulse shapes. In both cases, knowledge is required of the pulse shape, along with estimates of the characteristic function and its derivative.

2 SUMMARY OF THE INVENTION

We chose an energy-based pile-up correction approach in order to i) avoid the limitations associated with the detection of individual pulses [14] and to ii) be able to handle pulse shape variation without undue increase in computational complexity. Rather than utilizing the joint distribution [10, 11] or shot-process [12] approaches, we recast the pile-up problem as a ‘decompounding’ problem of a compound Poisson process. A compound Poisson process is a discrete-time random process where each component consists of the sum of a random number of independent identically distributed random variables, where the number of random variables in each sum is Poisson distributed [15]. ‘Decompounding’ of a compound Poisson process is the task of using the random sums to estimate the distribution from which the random variables have been drawn. Buchmann and Grübel [16] formulated the decompounding of compound Poisson processes in the context of insurance claims and queuing theory. Decompounding of uniformly sampled compound Poisson processes has received some attention in recent times [16, 17, 18, 12, 19]. The context of these derivations frequently assume (reasonably) that each event is detectable (i.e., there is no ambiguity regarding the number of events), or that the density estimators are conditioned on at least one event occurring in each observation [20]. These assumptions are of limited value when addressing the spectroscopic pile-up problem.

The investigation of non-parametric decompounding without conditioning on event detection has received relatively little attention in the literature. Gugushvili [18] proposes a non-parametric, kernel-based estimator for the decompounding problem in the presence of Gaussian noise. In embodiments of the invention, the inventors have conceived that once a method for selecting the kernel bandwidth is obtained, along with a method for transforming the observed detector output to fit the mathematical model, this estimator can be readily extended and applied to a reformulation of the spectroscopic pile-up problem.

In accordance with a first broad aspect of the invention there is provided a method of determining a spectrum of energies of individual quanta of radiation received in a radiation detector, the method comprising the steps of: (1) obtaining a time series of digital observations from the radiation detector comprising pulses corresponding to the detection of the individual quanta; (2) computing spectrum sensitive statistics from the detector signal, the spectrum sensitive statistics defining a mapping from a density of amplitudes of the pulses to a spectrum sensitive statistics; (3) determining the spectrum by estimating the density of amplitudes of the pulses by applying an inversion of the mapping to the spectrum sensitive statistics.

In embodiments, the spectrum sensitive statistics may be based on a sum of the digital observations over a plurality of time intervals and the mapping may be defined using an approximate compound Poisson process, which may be augmented by a modelled noise. The mapping may be expressed as a relation between characteristic functions of the amplitudes, the spectrum sensitive statistics and the modelled noise. The characteristic functions of the spectrum sensitive statistics may be computed with the use of a histogram of the sum of the digital observations to which is applied an inverse Fourier transform. Computation of a characteristic function of the amplitudes may comprise the use of a low pass filter.

In a first embodiment, the plurality of time intervals are nonoverlapping and have have a constant length L and each interval is selected to encompass zero or more approximately entire clusters of the pulses. This may be accomplished by requiring a maximum value of the detector signal at a beginning and end of each time interval. In this embodiment, the compound Poisson process may be defined as a sum of the amplitudes of the pulses in each time interval. The mapping may be expressed as defined in equations (40) and (41), which may be augmented by windowing functions.

In a second embodiment, a plurality of intervals comprise a first set of nonoverlapping time intervals constant length L selected without regard to entirety of clusters of the pulses, and a 2nd set of nonoverlapping time intervals of constant length L1 less than L also selected without regard to entirety of clusters of the pulses. L is at least as long as a duration of the pulses and preferably L1 is less than the duration of the pulses. In this embodiment, the compound Poisson process may be defined as in Section 6. The mapping may be expressed as defined in Section 6. The second embodiment may utilise processes and calculations for each set of time intervals as defined for the set of time intervals in the first embodiment.

In embodiments, a data-driven strategy is used that results in a near optimal choice for a kernel parameter, which minimises the integrated-square-of-errors (ISE) of the estimated probability density function of incident photon energies.

According to a second broad aspect of the invention there is provided a method of estimating count rate of individual quanta of radiation received in a radiation detector, the method comprising the steps of: (1) obtaining a time series of digital observations from the radiation detector comprising pulses corresponding to the detection of the individual quanta; (2) computing spectrum sensitive statistics from the detector signal, the spectrum sensitive statistics using the intervals of constant length L and constant length L1 as described above in relation to the 1st broad aspect; (3) determining an estimate of a characteristic function of the compound Poisson process using formula (109); (4) estimating the count rate from the estimate of the characteristic function. Step (4) above may be achieved by using an optimisation routine or some other means to fit a curve, estimating a DC offset of a logarithm of the estimate of the characteristic function, or fitting a curve to the logarithm of the estimate of the characteristic function.

The rest of this application is organized as follows. Sections 3, 4 and 5 relate to the 1st embodiment of the 1st aspect of the invention. Sections 3 provides preliminary background, defines notation, outlines the mathematical model and gives a derivation of the estimator of the 1st embodiment, including modifications. Section 4 shows the performance of the modified estimator of the 1st embodiment for both simulated and experimental data, and discusses the results. Section 5 provides a conclusion for the 1st embodiment. Section 6 describe the 2nd embodiment, with reference to the 1st embodiment where relevant. Section 7 describes the 2nd aspect of the invention, a novel method of estimating count rate.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the relationship between realizations of the interval Y and the sampled detector response.

FIG. 2 plots a typical estimate made by the data-driven logistic shaped filter for an experiment with a parameter pair.

FIG. 3 plots a typical estimate made at the same operating point as FIG. 2, but with an estimator having a rectangular filter and a selected bandwidth.

FIG. 4 shows the distribution densities of the integrated square of the error (ISE) measures as a function of sample count using a rectangular filter and various fixed bandwidths.

FIG. 5 shows the distribution densities of the ISE measure as a function of the total number of estimates N in each histogram at a first count rate.

FIG. 6 shows the distribution densities of the ISE measure as a function of the total number of estimates N in each histogram at a second count rate.

FIG. 7 shows the distribution densities of the ISE measure as a function of the total number of estimates N in each histogram at a third count rate.

FIG. 8 shows plots of the observed and estimated probability mass vectors. FIG. 9 plots various quantities obtained during the estimation process.

FIG. 10 illustrates the observed and true probability density of input photon energies for the experiment from which FIGS. 9-13 were derived.

FIG. 11 illustrates the trajectory of the curve γ in the complex plane.

FIG. 12 illustrates internal quantities similar to FIG. 9, however there are some additional signals.

FIG. 13 relates to the 2nd broad aspect of calculating the count rate.

3 DERIVATION OF ESTIMATOR OF THE FIRST EMBODIMENT

The general approach we take to addressing pile-up is based on the following strategy; i) obtain statistics from s(t) that are sensitive to the distribution of incident photon energies, and estimate those statistics using the observed, finite-length sampled version of s(t), ii) obtain a mapping from the density of incident photon energies to the statistical properties of the observed statistics, iii) estimate the density of the incident photon energies by inverting the mapping. Section 3.1 describes our choice of statistics. Section 3.2 argues that these statistics (approximately) have the same distribution as a compound Poisson process. Section 3.3 introduces a decompounding technique for recovering the spectrum from these statistics. It is based on the decompounding algorithm in [18] but further developed to obtain near optimal performance in terms of the integrated square of error. Our approach to the pile-up problem follows the general theme of finding some statistics of s(t) that are sensitive to the underlying spectrum, estimating these statistics from a finite-time sampled version of s(t), then inverting the map that describes the statistical properties of these statistics given the underlying spectrum, thereby producing an estimate of the spectrum.

3.1 Choice of Statistic

We wish to obtain estimates of the photon energies from the observed signal given in (2). In typical modern spectroscopic systems, the detector output s(t) is uniformly sampled by an ADC. Without loss of generality, we assume the raw observations available to the algorithm are {s(k): k∈ custom character _≥0}. Since identification of individual pulses can be difficult, we look instead for intervals of fixed length L∈_>0containing zero or more clusters of pulses. Precisely, we define these intervals to be [T_j, T_j+L) where

$\begin{matrix} T_{0} = \inf {k : | s (k) | \leq ϵ, \langle s (k + L) | \leq ϵ, k \in ℤ_{\geq 0}} & (7) \\ T_{j} = \inf {k : ❘ s (k) | \leq ϵ, \langle s (k + L) | \leq ϵ, T_{j - 1} + L \leq k, k \in ℤ_{\geq 0}} . & (8) \end{matrix}$

Here, ϵ is chosen as a trade-off between errors in the energy estimate and the probability of creating an interval. The value of ϵ should be sufficiently small to ensure the error in the estimate of total photon energy arriving within each interval is acceptably low, yet sufficiently large with respect to the noise variance to ensure a large number of intervals are obtained. Although the probability of partitioning the observed data into intervals approaches zero as the count-rate goes to infinity, this approach succumbs to paralysis at higher count-rates compared to pile-up rejection strategies based on individual pulses, since multiple photons are permitted to pile-up within in each interval. Section 4.2 describes the selection of L and ϵ for real data. Each interval contains an unknown, random number of pulses and may contain zero pulses.

We estimate the total photon energy x_jin the interval [T_j, T_j+L) using the sampled raw observations. Since the area under each pulse is proportional to the photon energy A_jdefined in (1), we let

$\begin{matrix} x_{j} = \sum_{k = T_{j}}^{T_{j} + L - 1} s (k) & (9) \end{matrix}$

The number of photon arrivals, the energy of each arriving photon and the detector output noise in each interval [T_j, T_j+L) are assumed to be random and independent of other intervals. For pulse shapes with exponential decay, a small amount of the photon energy arriving in an interval may be recorded in the next interval. The amount of leakage is proportional to ϵ, and is negligible for sufficiently small ϵ. Consequently, the estimates x₁, x₂, . . . may be treated as the realization of a weakly-dependent, stationary process where each estimate is identically distributed according to the random variable X. This relationship is illustrated in FIG. 1 for the noise free case using a typical pulse shape.

3.2 Approximation with Compound Poisson Process

In this subsection we describe the distribution of X in terms of ƒ_A(x). We will then invert this in section 3.3, to obtain an estimator for the density ƒ_A(x). Using (9), (2), (1) and the fact that custom character (t) is causa we have

$\begin{matrix} x_{j} = \sum_{{ℓ : τ_{ℓ} < T_{j}}} \sum_{k = T_{j}}^{T_{j} + L - 1} a_{ℓ} Φ_{ℓ} (k - τ_{ℓ}) + \sum_{{ℓ : T_{j} \leq τ_{ℓ} < T_{j} + L}} \sum_{k = T_{j}}^{T_{j} + L - 1} a_{ℓ} Φ_{ℓ} (k - τ_{ℓ}) + \sum_{k = T_{j}}^{T_{j} + L - 1} w (k) & (10) \end{matrix}$

As justified below, this simplifies to

$\begin{matrix} x_{j} \approx y_{j} + z_{j} & (11) \\ where y_{j} = \sum_{{ℓ : T_{j} \leq τ_{ℓ} < T_{j} + L}} a_{ℓ} & (12) \\ and z_{j} = \sum_{k = T_{j}}^{T_{j} + L - 1} w (k) . & (13) \end{matrix}$

Both y_jand z_jare i.i.d. sequences of random variables. We denote their distributions by Y and Z. The distribution of Z is fully determined from the distribution of w(t), which is assumed zero-mean Gaussian with known variance σ². Moreover, Y is a compound Poisson process since the number of terms in the summation (number of photon arrivals in an interval of length L) has Poisson statistics. Equations (11)-(13) are justified as follows. The first term of (10) represents leakage from earlier intervals and is approximately zero. This is easily shown for Gaussian noise by performing a Taylor expansion about ϵ=0

$\begin{matrix} P r (| s (k) | < ϵ) = \frac{1}{2} \erf (\frac{r (k) + ϵ}{\sqrt{2} σ}) - \frac{1}{2} \erf (\frac{r (k) - ϵ}{\sqrt{2} σ}) & (14) \\ \approx 0.7 9 7 8 5 \frac{ϵ}{σ} e^{- \frac{{r (k)}^{2}}{2 σ^{2}}} + O (ϵ^{3}) . & (15) \end{matrix}$

Thus there is a finite but small probability that some energy belonging to a previous interval will be included in the current estimate. In practice, this contribution is comparable to the noise for sufficiently small E. The third term in (10) is zero since custom character (t) is causal. The second term in (10) can be written as

$\begin{matrix} \sum_{{ℓ : T_{j} \leq τ_{ℓ} < T_{j} + L}} a_{ℓ} \overset{T_{j} + L - 1}{\sum_{k = T_{j}}} Φ_{ℓ} (k - τ_{ℓ}) \approx \sum_{{ℓ : T_{j} \leq τ_{ℓ} < T_{j} + L}} a_{ℓ} & (16) \end{matrix}$

where we assume the pulse shapes custom character (t) are sufficiently smooth such that

$Σ_{k = T_{j}}^{T_{j} + L - 1} Φ_{ℓ} (k - τ_{ℓ}) \approx \int_{- \infty}^{\infty} Φ_{ℓ} (t) d t = 1 .$

It approximates the total energy of all the photons arriving in the interval [T_j, T_j+L). Let ν_jdesignate the number of photon arrivals in the interval [T_j, T_j+L). We assume ν_jis a realization of a homogeneous Poisson process with rate parameter λ, where λ is expressed in terms of the expected number of photons per interval of length L. Henceforth we shall assume that (11) holds exactly, and write

$\begin{matrix} X = Y + Z & (17) \end{matrix}$

Finally, we write x_jas

$\begin{matrix} x_{j} \approx y_{j} + z_{j} & (18) \end{matrix}$

where we assume Z has known variance σ². In this subsection we model the statistic of section 3.1 using a compound Poisson process. This allows us to derive an estimator for the density ƒ_A(x) in terms of observable quantities. The number of photons arriving in the interval [T_j, T_j+L) is a Poisson random variable which we designate ν_j. The total energy in the interval Y can be modelled as a compound Poisson process i.e.,

$\begin{matrix} y_{j} = {\begin{matrix} \sum_{k = 0}^{v_{j} - 1} a_{(ℓ_{j, 1} + k)}, & ν_{j} > 0 \\ 0 & ν_{j} = 0 \end{matrix} & (19) \\ ν_{j} \sim P n (λ) & (20) \end{matrix}$

where custom character _j,1=min{:T_j≤τ<T_j+L} is the index of the first-photon arrival time in the interval, the arrival times are assumed ordered, and representing photon energy are independent realizations of the random variable A with density function ƒ_A(x). The {ν_j} form a homogeneous Poisson process with rate parameter λ. The Poisson rate λ is expressed in terms of the expected number of photons per interval of length L.

$\begin{matrix} y_{j} \approx \sum_{k = T_{j}}^{T_{j} + L} r (k) . & (21) \end{matrix}$

The relationship between realizations of Y and the sampled detector response is illustrated in FIG. 1. The observed x_jcan be approximated in terms of the custom character by substituting (2) into (9),

$\begin{matrix} x_{j} = \sum_{k \in [{\hat{T}}_{j}, {\hat{T}}_{j} + L]} (r (k) + w (k)) & (22) \\ x_{j} = \sum_{k = T_{j}}^{T_{j} + L} r (k) + \sum_{k = T_{j}}^{T_{j} + L} w (k) & (23) \\ \approx y_{j} + z_{j} & (24) \end{matrix}$

where z_jis the realization of the unobservable random variable Y that represents the photon energy in an interval of the discrete-time detector response,

$\begin{matrix} Y_{j} = \sum_{k \in [T_{j}, T_{j} + L]} R_{k} & (25) \end{matrix}$

where z_jis a realization of Z, an independent random variable representing errors in the sampling process and estimation of T_j. We assume Z has known variance σ². With these definitions of X and Y, the number of intervals which can be found in a finite length of detector output is a random variable N. At high count-rates this approach succumbs to paralysis, as the probability of being able to partition the observed data into intervals approaches zero. The onset of paralysis occurs at higher count-rates compared to pile-up rejection based strategies, since multiple photons are permitted to pile-up within in each interval. Assume the time-series defined in (3)-(6) has been sampled uniformly. Without loss of generality, assume unit sample intervals beginning at t₀=0 i.e., t_k=k, 0≤k<K. Let R be a discrete-time random process representing the sampled detector response of (1). Let Y={Y_j:0≤j<N} be a discrete-time random process whose components Y_jrepresent the total photon energy arriving during a fixed time interval. A compound Poisson process can be used to model Y, i.e.,

$\begin{matrix} Y_{j} = {\begin{matrix} \sum_{k = 1}^{v_{j}} A_{k}, & ν_{j} > 0 \\ 0 & ν_{j} = 0 \end{matrix} & (26) \\ ν_{j} \sim P n (λ) & (27) \end{matrix}$

where ν_jis an independent Poisson random variable, and A_kare independent identically distributed random variables with density function ƒ_A(x). The {ν_j} form a homogeneous Poisson process with rate parameter λ. The process Y is not directly observable. Assume the pulse shape Φ(t) has finite support. Let custom character _A(t) be the indicator function for the set A. Let the pulse length be given by =sup({t: Φ>0})−inf({t: Φ(t)>0}). Let S=R+W={s(k):0≤k<K} be a discrete-time random process representing the observed detector output given by (2). It consists of the detector response R corrupted by a noise process W. Without loss of generality, we assume unit sample intervals. From the observations S we form the process X, where

$\begin{matrix} X_{j} \overset{△}{=} Y_{j} + Z_{j} & (28) \end{matrix}$

and where Z_jis a random variable from an independent noise process of known variance σ². A simple model for testing theory is obtained when we let the pulse shape Φ(t)= custom character _(0,1)(t) in (1), in which case we let X_j=S_j, and N is simply the sample length K. Obtaining X_jfrom S is more complicated for real data. In that case we partition the process S into non-overlapping blocks of length L, where L>. The Poisson rate λ is expressed in photons per block. The start of each block T_j∈ custom character is chosen such that the total energy of any pulse is fully contained within the block in which it arrives

$\begin{matrix} T_{j} = \min {k : R_{k} = 0, R_{k + L} = 0, T_{j - 1} + L < k < K - L} & (29) \end{matrix}$

FIG. 1 shows that

$Y_{j} = \sum_{k = T_{j}}^{T_{j} + L} R_{k} .$

We let

$X_{j} = \sum_{k = {\hat{T}}_{j}}^{{\hat{T}}_{j} + L} S_{k}$

$\begin{matrix} X_{j} = \sum_{k = {\hat{T}}_{j}}^{{\hat{T}}_{j} + L} S_{k} & (30) \end{matrix}$

where {circumflex over (T)}_jis an estimate of T_j. Section 4.2 describes the selection of L and ϵ for real data. With this definition of X_j, the number of components in Y becomes a random variable for a given sample length K. At high count-rates this approach succumbs to paralysis, as the probability of being able to create a block approaches zero. The onset of paralysis occurs at higher count-rates compared to pile-up rejection based strategies, since multiple photons are permitted to pile-up within in each block. Let Y={Y_j: 0≤j<N} be a discrete-time random process whose components Y_jare given by

$\begin{matrix} Y_{j} = \sum_{k = T_{j}}^{T_{j} + L} R_{k} & (31) \\ T_{j} = \min_{k > T_{j - 1} + L} {R_{k} < d, R_{k + L} < d} & (32) \end{matrix}$

where L∈ custom character is a constant chosen such that h and d is a small threshold value close to zero. The random variable Y_jthus represents the total photon energy arriving during a fixed time interval of length L. The value of d ensures the signal associated with photon arrivals is very small at the start and end of each interval. This is illustrated in FIG. 1. A compound Poisson process can be used to model Y, i.e.,

$\begin{matrix} Y_{j} = {\begin{matrix} \sum_{k = 1}^{v_{j}} A_{k}, & v_{j} > 0 \\ 0 & v_{j} = 0 \end{matrix} & (33) \\ v_{λ} = {v_{j} : 0 \leq j < N} & (34) \\ v_{j} \sim Pn (λ) & (35) \end{matrix}$

where ν_λ is a homogeneous Poisson process with rate parameter λ, and A_kare independent identically distributed random variables with density function ƒ_A(x). Let S=R+W be a discrete-time random process representing the sampled detector output given by (2). It consists of the detector response R corrupted by a noise process W. The process Y is not directly observable. Using (2), (25) and (32), we model observations by the process X={X_j: 0≤j<N}, i.e.,

$\begin{matrix} X_{j} = \sum_{k = T_{j}}^{T_{j} + L} S_{k} & (36) \\ = \sum_{k = T_{j}}^{T_{j} + L} (R_{k} + W_{k}) & (37) \\ = Y_{j} + \sum_{k = T_{j}}^{T_{j} + L} W_{k} & (38) \\ \overset{Δ}{=} Y_{j} + Z_{j} & (39) \end{matrix}$

- where Z is a noise process of known variance σ². All the random variables (ν_j, A₁, . . . , A_ν_j, Z_j) involved in modelling a given observation X_jare assumed independent. Let X₁, X₂, . . . , X_Nbe N independent, identically distributed observations. Let X, Y, Z, A be the collections of X_j, Y_j, Z_j, A_j: 0≤j<N. Let the corresponding characteristic functions be ϕ_X, ϕ_Y, ϕ_Z, ϕ_A.
  
  3.3 Basic Form of Estimator

We seek to invert the mapping from the distribution of photon energy A to the distribution of X. Our strategy is to first obtain the characteristic function of X in terms of ƒ_A, then invert the mapping assuming the count-rate and noise characteristics are known. Let ϕ_X, ϕ_Y, ϕ_Z, ϕ_Abe the characteristic functions of X, Y, Z, A. It is well known [15] that for the compound Poisson process Y with rate λ,

$\begin{matrix} ϕ_{Y} (u) = e^{- λ} e^{λ_{ϕ_{A} (u)}} & (40) \end{matrix}$

and since X=Y+Z then

$\begin{matrix} ϕ_{X} (u) = ϕ_{Y} (u) ϕ_{Z} (u) . & (41) \end{matrix}$

Given the observations x_jwe can form an empirical estimate {circumflex over (ϕ)}_Xof the characteristic function of X. Treating this as the true characteristic function, we can invert (40), (41) to obtain the characteristic function of A and then take the Fourier transform to find the amplitude spectrum ƒ_A. Specifically, using (40), (41) and exploiting the assumption that Z is Gaussian to ensure ϕ_Z(u) will be non-zero ∀u∈ custom character , we let γ: → be the curve described by

$\begin{matrix} γ (u) = \frac{ϕ_{X} (u)}{e^{- λ} ϕ_{Z} (u)} & (42) \\ = e^{{λϕ}_{A} (u)} & (43) \end{matrix}$

Temporarily assuming ∀u, γ(u)≠0, after taking the distinguished logarithm of (43) and rearranging we have

$\begin{matrix} ϕ_{A} (u) = \frac{1}{λ} dlog (γ) (u) . & (44) \end{matrix}$

Ideally, ƒ_Ais recovered by taking a Fourier transform

$\begin{matrix} f_{A} (x) = \int_{- \infty}^{\infty} e^{- i 2 π ux} ϕ_{A} (u) du & (45) \end{matrix}$

The basic form of our proposed estimator is given in (88) and is derived from (45) via a sequence of steps. First, ϕ_Xis estimated from the data (Step 1). Simply substituting this estimate for ϕ_Xin (42) does not produce an ISE optimal estimate of γ. The approximate ISE is obtained from an approximate estimate of the error distribution of ϕ_X(Step 2). We then determine a sensible windowing function G(u) (in Step 3) and estimate γ by

$\begin{matrix} \hat{γ} (u) = G (u) \frac{{\hat{ϕ}}_{X} (u)}{e^{- λ} ϕ_{Z} (u)} . & (46) \end{matrix}$

The windowing function G(u) is designed to minimise the approximate ISE between ƒ_Aand our estimate of ƒ_Abased on (44), (45) and (46), but with γ in (44) replaced by (46). A similar idea is used for estimating ϕ_Afrom (44): a weighting function H(u) is found (in Step 4) such that replacing ϕ_Ain (45) by

$\begin{matrix} {\hat{ϕ}}_{A} (u) = H (u) \frac{1}{λ} dlog (\hat{γ}) & (47) \end{matrix}$

produces a better estimate of ƒ_Athan using the unweighted estimate 1/λd log ({circumflex over (γ)}). Finally, the weighting function H(u) is modified (in Step 5) to account for the integral in (45) having to be replaced by a finite sum in practice. The following subsections expand on these five steps.

3.4 Estimating ϕ_X

An estimate of ϕ_X(u) is required to estimate γ(u). In this subsection we define a histogram model and describe our estimation of ϕ_X(u) based on a histogram of the x_jvalues. Assume N intervals (and corresponding x_jvalues) have been obtained from a finite length data sample. Although the empirical characteristic function

$\begin{matrix} {\hat{ϕ}}_{X emp} (u) = \frac{1}{N} \sum_{j = 0}^{N - 1} e^{i u x_{j}} & (48) \end{matrix}$

provides a consistent, asymptotically normal estimator of the characteristic function [21], it has the disadvantage of rapid growth in computational burden as the number of data points N and the required number of evaluation points u∈ custom character increases. Instead, we use a histogram based estimator that has a lower computational burden. Assume that a histogram of the observed X values is represented by the 2M×1 vector n, where the count in the mth bin is given by

$\begin{matrix} n_{m} = \sum_{k = 0}^{N - 1} [m - 0.5, m + 0.5) (x_{k}), m \in {- M, \dots, M - 1} . & (49) \end{matrix}$

All bins of the histogram have equal width. The bin-width is chosen in relation to the magnitude of the x_jvalues. Since the effect of choosing a different bin width is simply equivalent to scaling the x_jvalues, we assume the bin-width to be unity without loss of generality. The bins are apportioned equally between non-negative and negative data values. The number of histogram bins 2M influences the estimator in various ways, as discussed in later subsections. For now, it is sufficient to assume that 2M is large enough to ensure the histogram includes all x_jvalues. We estimate ϕ_X(u) by forming a histogram of scaled x_jvalues and take the inverse Discrete Fourier transform i.e.,

$\begin{matrix} ϕ_{X} (u) = \sum_{m = - M}^{M - 1} \frac{n_{m}}{N} e^{\frac{i 2 π u m}{2 M}} . & (50) \end{matrix}$

This is a close approximation of the empirical characteristic function but where x_jterms have been rounded to the nearest histogram bin centre (and u contracted by a factor of 2π). The term n_msimply counts the number of rounded terms with the same value. Clearly, this function can be efficiently evaluated at certain discrete points u∈−M, . . . , M−1 using the fast Fourier Transform (FFT).

3.5 Error Distribution of {circumflex over (ϕ)}_X

The design of the filters G(u) and H(u) in (46) and (47) rely on the statistics of the errors between {circumflex over (ϕ)}_Xand the true characteristic function. In this subsection we define and describe the characteristics of these errors. We assume the density function ƒ_Xis sufficiently smooth (i.e., |dⁿƒ_X(u)/duⁿ|≤C_n∈ custom character ∀n∈) and that the width of the histogram bins are sufficiently small (relative to the standard deviation of the additive noise Z) such that the errors introduced by rounding x_jvalues to the centre of each histogram bin are approximately uniformly distributed across each bin, have zero mean and are small relative to the peak spreading caused by Z. In other words, the source of error arising from the binning of x_jvalues is considered negligible. Due to both the statistical nature of Poisson counting and the expected count in each bin being non-integer ( custom character [n_m]∈_≥0), discrepancies exist between the observed number of counts in any given histogram bin and the expected number of counts for that bin. We combine these two sources of error in our model and refer to it as ‘histogram noise’. We emphasize that this noise is distinct from the additive noise Z modelled in (11), which causes peak spreading in the histogram. Let the probability that a realization of X falls in the m-th bin be

$\begin{matrix} {px}_{m} = \Pr (m - 0.5 \leq X < m + 0.5) & (51) \end{matrix}$

Let the normalized histogram error ϵ_min the m-th bin be the difference between the observed count n_mand the expected count custom character [n_m]=Npx_min the mth bin, relative to the total counts in the histogram N i.e.,

$\begin{matrix} ϵ_{m} = \frac{n_{m} - N p x_{m}}{N} & (52) \end{matrix}$

Using (50), (51) and (52) we have

$\begin{matrix} \begin{matrix} {\hat{ϕ}}_{X} (u) = \sum_{m = - M}^{M - 1} \frac{n_{m}}{N} e^{i \frac{2 π u m}{2 M}} \\ = \sum_{m = - M}^{M - 1} {px}_{m} e^{i \frac{2 π u m}{2 M}} + \sum_{m = - M}^{M - 1} ϵ_{m} e^{i \frac{2 π u m}{2 M}} \\ \approx ϕ_{X} (u) + ϕ_{ϵ} (u) \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} (53) \\ (54) \end{matrix} \end{matrix} \\ (55) \end{matrix} \end{matrix}$

If the histogram is modelled as a Poisson vector, show that

$\begin{matrix} 𝔼 [ϵ_{i}] = 0 & (56) \\ 𝔼 [ϵ_{i} ϵ_{j}] = {\begin{matrix} \frac{{px}_{j}}{N} & i = j \\ 0 & i \neq j \end{matrix} & (57) \\ 𝔼 [\langle ϕ_{ϵ} |^{2}] = \frac{1}{N} . & (58) \end{matrix}$

Since the characteristics of the histogram noise can be expressed in terms of the total number of observed intervals N, the impact of using observation data of finite length may be accounted for by incorporating this information into the design of G(u) and H(u).

3.6 Estimating γ

Having obtained {circumflex over (ϕ)}_X, the next task is to estimate γ. Rather than substitute {circumflex over (ϕ)}_X(u) for ϕ_X(u) in (42), we instead use (46) as the estimator, which requires us to choose a windowing function G(u). In this subsection we attempt to find a function G(u) that is close to optimal. When the distribution of errors in {circumflex over (ϕ)}_X(u) are considered, the windowing function G(u)=G_opt(u) that results in the lowest ISE estimator of the form given in (46) is

$\begin{matrix} G_{opt} (u) = \frac{1}{\sqrt{1 + \frac{e^{- 2 λℜ {ϕ_{A} (u)}}}{N e^{- 2 λ} {\langle ϕ_{Z} (u) \rangle}^{2}}}} & (59) \end{matrix}$

where custom character {z} denotes the real component of z∈. We cannot calculate G_opt(u) since ϕ_A(u) is unknown, so instead we attempt to find an approximation. We let

$\begin{matrix} G (u) = \frac{1}{\sqrt{1 + \frac{1}{N e^{- 2 λ} {\langle ϕ_{Z} \rangle}^{2} (u)}}} . & (60) \end{matrix}$

This is justified by considering the magnitude of the relative error between the functions g_opt(u) and g₁(u) where

$\begin{matrix} g_{opt} (u) = 1 + \frac{e^{- 2 λℜ {ϕ_{A} (u)}}}{N e^{- 2 λ} {\langle ϕ_{Z} \rangle}^{2} (u)} & (61) \\ g_{1} (u) = 1 + \frac{1}{N e^{- 2 λ} | ϕ_{Z} |^{2} (u)} . & (62) \end{matrix}$

The magnitude of the relative error is given by

$\begin{matrix} \langle \frac{g_{opt} - g_{1}}{g_{1}} \rangle = \langle \frac{e^{- 2 λℜ {ϕ_{A}}} - 1}{N e^{- 2 λ} | ϕ_{Z} |^{2} + 1} \rangle . & (63) \end{matrix}$

Since custom character {ϕ_A}∈[−1,1], we see the right hand side of (63) is maximum when {ϕ_A(u)}=−1. The relative error is thus bound by

$\begin{matrix} \langle \frac{g_{opt} - g_{1}}{g_{1}} \rangle \leq \frac{e^{2 λ} - 1}{N e^{- 2 λ} | ϕ_{Z} |^{2} + 1} & (64) \end{matrix}$

which justifies the approximation when λ is small, or when N|ϕ_Z|²(u)>>e^4λ. Furthermore, we note that the above bound is quite conservative. The distribution of photon energies in spectroscopic systems can typically be modelled as a sum of K Gaussian peaks, where the kth peak has location μ_kand scale σ_ki.e.,

$\begin{matrix} f_{A} (x) = \overset{K - 1}{\sum_{k = 0}} α_{k} \frac{1}{\sqrt{2 π} σ_{k}} e^{- \frac{{(x - μ_{k})}^{2}}{2 σ_{k}^{2}}} where & (65) \\ \sum_{k = 0}^{K - 1} α_{k} = 1. & (66) \end{matrix}$

Consequently, the characteristic function will have the form

$\begin{matrix} ϕ_{A} (u) = \sum_{k = 0}^{K - 1} α_{k} e^{- 2 π^{2} σ_{k}^{2} u^{2}} e^{i 2 π μ_{k} u} . & (67) \end{matrix}$

i.e., oscillations within an envelope that decays as e^−cu²for some c>0. The upper bound given by (64) is quite conservative since | custom character {ϕ_A}|>>1 for most values of u. The approximation error will be significantly smaller at most evaluation points across the spectrum. Having chosen G(u), we can form an estimate of γ using (46). The windowing function reduces the impact of histogram noise arising from the finite number of data samples. For large values of Ne^−2λ|ϕ_Z(u)|², the impact of windowing is negligible and the estimator is essentially the same as using (42) directly. However, in the regions where

$\begin{matrix} \ln N < 2 λ + 4 π^{2} σ^{2} u^{2} & (68) \end{matrix}$

the windowing becomes significant, and acts to bound our estimate of γ i.e., Using the fact that the noise Z is Gaussian (so ϕ_Z(u)∈ custom character and hence |ϕ_Z|²=ϕ_Z²), and since e^−2λ>0 we see that

$\begin{matrix} \begin{matrix} \langle \hat{γ} (u) \rangle = | \frac{{\hat{ϕ}}_{X} (u)}{e^{- λ} ϕ_{Z} (u)} \frac{1}{\sqrt{1 + \frac{1}{N e^{- 2 λ} {\langle ϕ_{Z} \rangle}^{2} (u)}}} | \\ = \langle {\hat{ϕ}}_{X} (u) \frac{1}{\sqrt{e^{- 2 λ} ϕ_{Z}^{2} (u) + (\frac{ϕ_{Z}^{2} (u)}{N {\langle ϕ_{Z} \rangle}^{2} (u)})}} \rangle < \\ \sqrt{N} \end{matrix} . & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} (69) \end{matrix} \end{matrix} \end{matrix} \\ (70) \end{matrix} \end{matrix} \end{matrix} \\ (71) \end{matrix} \end{matrix}$

This ensures the argument to the distinguished logarithm in (47) remains finite even though lim_u→∞ϕ_Z(u)=0.

3.7 Estimating ϕ_A

Once {circumflex over (γ)} has been obtained, we proceed to estimate ϕ_Ausing (47). This requires another windowing function H(u). In this subsection we find a function H(u) for estimating ϕ_Athat is close to ISE optimal. We begin by defining a function ψ(u) for notational convenience

$\begin{matrix} ψ (u) = \frac{1}{G (u)} e^{- λ} ϕ_{Z} (u) . & (72) \end{matrix}$

The ISE is minimized when H(u)=H_opt(u), where the optimal filter H_opt(u) is given by

$\begin{matrix} \begin{matrix} H_{opt} (u) = \frac{ϕ_{A} (u)}{{\hat{ϕ}}_{A} (u)} = \frac{\frac{1}{λ} d \log (\frac{ϕ_{X}}{e^{- λ} ϕ_{Z}}) (u)}{\frac{1}{λ} d \log (\frac{ϕ_{X} + ϕ_{ϵ}}{ψ}) (u)} \\ = \frac{ϕ_{A} (u)}{ϕ_{A} (u) + \frac{1}{λ} d \log (\frac{ϕ_{X}}{ϕ_{X}}) (u) + \frac{1}{λ} d \log (G) (u)} \end{matrix} & \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} (73) \end{matrix} \end{matrix} \end{matrix} \\ (74) \end{matrix} \end{matrix} \end{matrix}$

Again, we cannot calculate the optimal filter by using (73)-(74) since ϕ_X(u), ϕ_A(u) and ϕ_ϵ(u) are unknown. We instead make the following observations to obtain an approximation of the ISE-optimal filter.

3.7.1 Initial Observations

The optimal filter remains close to unity as long as the estimated {circumflex over (ϕ)}_A(u) remains close to the true value of ϕ_A(u). This will invariably be the case for small values of u since

$\begin{matrix} \begin{matrix} 𝔼 [\langle ϕ_{ϵ} (u) \rangle] \approx \sqrt{\frac{π}{4 N}} ⪡ \\ \langle ϕ_{X} (u) \rangle \approx 1 for small u \end{matrix} & \begin{matrix} \begin{matrix} (75) \end{matrix} \\ (76) \end{matrix} \end{matrix}$

Furthermore, equation (73) shows that if |ϕ_ϵ(u)|≤≤|ϕ_X(u)|, then {circumflex over (ϕ)}_X(u)=ϕ_X(u)+ϕ_ϵ(u)≈ϕ_X(u) so H_opt(u)≈1. For larger values of u, when the magnitude of |ϕ_X(u)| becomes comparable to or less than |ϕ_ϵ(u)|, the estimator

${\hat{ϕ}}_{A} (u) = \frac{1}{λ} d \log ((ϕ_{X} + ϕ_{ϵ}) / ψ) (u)$

is dominated by noise and no longer provides useful estimates of ϕ_A(u). In the extreme case |ϕ_X(u)|<<ϕ_ϵ(u)| so |{circumflex over (ϕ)}_X(u)|≈|ϕ_ϵ(u)| and hence

$\begin{matrix} ℛ {{\hat{ϕ}}_{A}} \approx \frac{1}{λ} \ln \langle \frac{ϕ_{ϵ}}{ψ} \rangle & (77) \end{matrix}$

The window H(u) should exclude these regions from the estimate, as the bias introduced in doing so will be less than the variance of the unfiltered noise. Unfortunately the estimate of ϕ_A(u) can be severely degraded well before this boundary condition is reached, so (77) is not particularly helpful. A more useful method for detecting when noise begins to dominate is as follows.

3.7.2 Filter Design Function

Further manipulation of (67) shows that for typical spectroscopic systems, the magnitude of ϕ_Awill have the form

$\begin{matrix} {\langle ϕ_{A} \rangle}^{2} (u) = \sum_{k = 0}^{K - 1} α_{k}^{2} e^{- 4 π^{2} σ_{k}^{2} u^{2}} + \underset{j \neq k}{\sum_{k = 0}^{K - 1} \sum_{j = 0}^{K - 1}} α_{k} α_{j} \cos (2 π (μ_{k} - μ_{j}) u) e^{- 2 π^{2} (σ_{k}^{2} + α_{j}^{2}) u^{2}} & (78) \end{matrix}$

i.e.; a mean component that decays according to the peak widths σ_k, and a more rapidly decaying oscillatory component that varies according to the location of the spectral peaks μ_k. In designing the window H(u), we are interested in attenuating the regions in |{circumflex over (ϕ)}_A| where |ϕ_A|²≲|ϕ_ϵ/ψ|², i.e., where the signal power is less than the histogram noise that has been enhanced by the removal of ϕ_Zduring the estimation of γ. To obtain an estimate of |ϕ_A|, a low-pass, Gaussian shaped filter H_lpf(u) is convolved with |{circumflex over (ϕ)}_A| to attenuate all but the slowly varying, large scale features of |{circumflex over (ϕ)}_A|. We denote this |{circumflex over (ϕ)}_Asmooth|(u)

$\begin{matrix} \langle {\hat{ϕ}}_{Asmooth} \rangle (u) = \langle \frac{1}{λ} d \log (\hat{γ}) (u) \rangle ★ H_{1 pf} (u) . & (79) \end{matrix}$

We see that |ϕ_ϵ(u)| has a Rayleigh distribution with scale parameter

$σ_{R a y} = \frac{1}{\sqrt{2 N}} .$

Consequently

$\begin{matrix} \frac{1}{λ} \frac{\langle ϕ_{ϵ} (u) \rangle}{ψ (u)} \sim Rayleigh (σ_{Ray} = \frac{1}{λψ (u) \sqrt{2 N}}) . & (80) \end{matrix}$

It is well known that the cumulative distribution function of a Rayleigh distributed random variable X_Rayis given by

$\begin{matrix} F_{Ray} (x; σ_{Ray}) = \Pr (X_{Ray} < x; σ_{Ray}) & (81) \\ = 1 - e^{- \frac{x^{2}}{2 σ_{Ray}^{2}}} . & (82) \end{matrix}$

Hence, to assist with computing the window H(u) we will make use of the function

$\begin{matrix} α_{\min} (u) = 1 - e^{- \frac{1}{2} \langle {\hat{ϕ}}_{Asmooth} \rangle {(u)}^{2} λ^{2} {ψ (u)}^{2} 2 N} & (83) \\ \approx \Pr (\frac{1}{λ} \frac{\langle ϕ_{ϵ} (u) \rangle}{ψ (u)} < \langle {\hat{ϕ}}_{Asmooth} \rangle (u)) & (84) \end{matrix}$

to control the shape of H(u). The function α_min(u) provides an indication of how confident we can be that the estimate {circumflex over (ϕ)}_A(u) contains more signal energy than noise energy. The approximation in (84) arises from the fact that |{circumflex over (ϕ)}_Asmooth| is also a random variable slightly affected by the noise ϵ. On occasion—particularly for larger values of |u|—the histogram noise may result in sufficiently large values of α_min(u) to give a false sense of confidence, and potentially allow noisy results to corrupt the estimate of ϕ_A. To overcome this problem, the function was modified to be uni-modal in u

$\begin{matrix} α_{m o d} (u) = \inf {α_{\min} (υ), \langle υ \rangle < \langle u \rangle} & (85) \end{matrix}$

This modification was justified on the assumption that Gaussian noise causes ϕ_Z(u) to be decreasing in |u|. Consequently we expect custom character [|ϕ_ϵ(u)|/ψ(u)] to be increasing in |u|. If we ignore the local oscillations in ϕ_A(u) that are due to peak locations in ƒ_A(x), the envelope approximated by the smoothed |ϕ_Asmooth|(u) will be non-increasing in |u|. Equation (74) indicates the optimal window has the form λϕ_A(u)/(λϕ_A(u)+d log({circumflex over (ϕ)}_X/ϕ_X)(u)+d log(G)(u), so the overall window shape will be decreasing in |u|. Hence, if the estimated characteristic function in the region of some u₀(where the signal to noise ratio is high) has determined that the window value should be H(u₀)<1, then it is reasonable to reject the suggestion that in the region u₁>u₀(where the signal to noise ratio will be worse) that H(u₁)>H(u₀). Using the knowledge that |H_opt(u)| should be close to unity for small |u|, close to zero for large |u|, and should ‘roll off’ as the signal-to-noise-ratio decreases—we consider two potential windowing functions as approximations of H_opt(u).

3.7.3 Rectangle Window

The indicator function provides a very simple windowing function

$\begin{matrix} H (u) = 1_{{α_{\mod} (u) > α_{0}}} (u) . & (86) \end{matrix}$

The threshold value α₀determines the point at which cut-off occurs, and can be selected manually as desired (e.g., α₀=0.95). Once the threshold is chosen, the estimator exhibits similar ISE performance regardless of peak locations in the incident spectra. Rather than requiring the user to select a window width depending on the incident spectrum¹, the width of the window is automatically selected by the data via α_mod(u). While simplicity is the primary advantage of the rectangular window, the abrupt transition region provides a poor model for the roll-off region of the optimal filter. The second filter shape attempts to improve on that.

3.7.4 Logistic Window

A window based on the logistic function attempts to model smoother roll-off. It is given by

$\begin{matrix} H (u) = \frac{1 + e^{- β_{0} (1.0 - α_{0})}}{1 + e^{- β_{0} (α_{\mod} (u) - α_{0})}} & (87) \end{matrix}$

where α₀again acts as a threshold of acceptance of the hypothesis that the signal energy is greater than the noise energy in the estimate {circumflex over (ϕ)}_A(u). The rate of filter roll off in the vicinity of the threshold region is controlled by β₀>0. This provides a smoother transition region than the rectangle window, reducing Gibbs oscillations in the final estimate of ϕ_A. Once again, although the parameters α₀, β are chosen manually, they are much less dependent on ϕ_Aand can be used to provide close to optimal filtering for a wide variety of incident spectra. Typical values used were α₀=0.95, β₀=40.0. The performance of the rectangle and logistic window functions are compared in section 4.

3.8 Estimating ƒ_A

Having designed a window function H(u) and thus an estimator {circumflex over (ϕ)}_A(u), the final task is to estimate ƒ_A(x) by inverting the Fourier transform. This sub-section describes several issues that arise with numerical implementation. Firstly, it is infeasible to evaluate {circumflex over (ϕ)}_X, {circumflex over (γ)}(u) and {circumflex over (ϕ)}_Anumerically on the whole real line. Instead we estimate it at discrete points over a finite interval. The finite interval is chosen sufficiently large such that a tolerably small error is incurred as a result of excluding signal values outside the interval. This is justified for ƒ_A(x) being a Gaussian mixture, since the magnitudes of ϕ_Xand ϕ_Awill decay as e^−cu²for some c>0. The Fast Fourier Transform (FFT) is used to evaluate {circumflex over (ϕ)}_Xat discrete points, and hence also determines the points where {circumflex over (γ)}(u) and {circumflex over (ϕ)}_Aare evaluated. Likewise, the FFT is used to evaluate the final estimate of {circumflex over (ƒ)}_Aat discrete points. In order to use the FFT, the signals outside the interval should be sufficiently small to reduce impact of aliasing. The evaluation points also need to be sufficiently dense to avoid any ‘phase wrap’ ambiguity when evaluating d log({circumflex over (γ)})(u). Both these objectives can be achieved by increasing the number of bins 2M in the histogram (zero-padding) until a sufficiently large number of bins is attained. As M increases, the sampling density of {circumflex over (γ)} increases, which allows phase wrapping to be detected and managed. A larger M also allows aliasing (caused by the Gaussian shaped tails of |ϕ_X|) to be negligible. Typically a value of M was chosen as the smallest power of two sufficiently large such that the non-zero values of the histogram were confined to the ‘lower half’ indexes i.e., M=min{M: n_m=0, |m|∈{M/2, . . . , M}, M=2^N, N∈ custom character }. Secondly, the distinguished logarithm in (47) is undefined if {circumflex over (γ)}(u)=0. In estimating γ(u) from the data, there is a small but non-zero probability that the estimate will be zero. In this case, the distinguished logarithm in (47) is undefined and the technique fails. As |u| increases, |ϕ_X|(u) decreases and may approach |ϕ_ϵ|(u). When |ϕ_X|(u) and |ϕ_ϵ|(u) have similar magnitudes, the probability of |ϕ_X+ϕ_ϵ| (and hence {circumflex over (γ)}) being close to zero can become significant. The filter H(u) should roll off faster than |ϕ_X|(u) approaches |ϕ_ϵ|(u) to reduce the impact this may have on the estimate. Ideally H(u) should be zero in regions where noise may result in |{circumflex over (γ)}|(u) being close to zero. Gugushvili has shown [18] that for a rectangular window, the probability of inversion failure approaches zero as the length of the data set increases N→∞.

3.9 Discrete Notation

We digress momentarily to introduce additional notation. Throughout the rest of the paper, bold font will be used to indicate a 2M×1 vector corresponding to a discretely sampled version of the named function, e.g., {circumflex over (ϕ)}_Arepresents a 2M×1 vector whose values are given by the characteristic function {circumflex over (ϕ)}_A(u) evaluated at the points u∈{0, 1, . . . , M−1, −M, . . . , −2, −1}. Square bracket notation [k] is used to index a particular element in the vector, e.g., {circumflex over (ϕ)}_A[M−1] has the value of {circumflex over (ϕ)}_A(M−1). We also use negative indexes for accessing elements of a vector in a manner similar to the python programming language. Negative indexes should be interpreted relative to the length of the vector, i.e., {circumflex over (ϕ)}_A[−1] refers to the last element in the vector (which is equivalent to {circumflex over (ϕ)}_A[2M−1]).

3.10 Summary of Estimator

The estimation procedure we use may be summarized in the following steps.

- 1. Partition sampled time series into intervals using (8).
- 2. Calculate x_jvalue for each interval according to (9).
- 3. Generate histogram n from the x_jvalues.
- 4. Calculate {circumflex over (ϕ)}_Xusing the inverse FFT to efficiently evaluate (50) at various sample points.
- 5. Calculate ϕ_Zand G at the appropriate points.
- 6. Calculate {circumflex over (γ)} via (46) using {circumflex over (ϕ)}_X, G and ϕ_Z.
- 7. Calculate |ϕ_Asmooth(u)|, a low-pass filtered version of

$\langle \frac{1}{λ} d \log (\hat{γ}) (u) \rangle .$

- 8. Calculate α_modvia (83) and (85).
- 9. Calculate H using α_modand either (86) or (87).
- 10. Calculate {circumflex over (ϕ)}_Avia (47) using {circumflex over (γ)} and H. If any element of {circumflex over (γ)} is zero and the corresponding element of H is non-zero, the estimation has failed as the distinguished logarithm is undefined.
- 11. Calculate {circumflex over (ƒ)}_Ausing the FFT of {circumflex over (ϕ)}_Aaccording to

$\begin{matrix} {\hat{f}}_{A} [k] = \frac{1}{2 M} \sum_{m = - M}^{M - 1} {\hat{ϕ}}_{A} [m] e^{- \frac{i 2 π mk}{2 M}} . & (88) \end{matrix}$

3.11 Performance Measures

The performance of the estimator is measured using the integrated square of the error (ISE). The ISE measures the global fit of the estimated density.

$\begin{matrix} ISE ({\hat{f}}_{A}, f_{A}) = \int_{- \infty}^{\infty} {({\hat{f}}_{A} (x) - f_{A} (x))}^{2} dx & (89) \end{matrix}$

The discrete ISE measure is given by

$\begin{matrix} ISE ({\hat{p}}_{A}, p_{A}) = \sum_{m = - M}^{M - 1} {({\hat{p}}_{A} [m] - p_{A} [m])}^{2} & (90) \end{matrix}$

where p_Ais a 2M×1 vector whose elements contain the probability mass in the region of each histogram bin i.e.,

$\begin{matrix} p_{A} [m] = \int_{m - 0.5}^{m + 0.5} f_{A} (x) dx . & (91) \end{matrix}$

The vector {circumflex over (p)}_Arepresents the corresponding estimated probability mass vector.

4 NUMERICAL RESULTS OF THE FIRST EMBODIMENT

Experiments were performed using simulated and real data.

4.1 Simulations

The ideal density used by Trigano et al. [11] was used for these simulations. It consists of a mixture of six Gaussian and one gamma distribution to simulate Compton background. The mixture density is given by

$\begin{matrix} f \propto 0.5 g + 10 𝒩 (40, 1) + 10 𝒩 (112, 1) + 1 𝒩 (50, 2) + 1 𝒩 (63, 1) + 2 𝒩 (140, 1) & (92) \end{matrix}$

where custom character (μ, σ²) is the density of a normal distribution with mean μ and variance σ². The density of the gamma distribution is given by g(x)=(0.5+x/200)e^{−(0.5+x/200)}. The density was sampled at 8192 equally spaced integer points to produce the discrete vector p_Aof probability mass. The FFT was taken to obtain ϕ_A, a sampled vector of ϕ_Avalues.

- A particular count rate λ was chosen for an experiment, corresponding to the expected number of events per observation interval. The expected pile-up density was obtained via (40). i.e., the discrete vector ϕ_Awas scaled by λ, exponentiated, then scaled by e^−λ and finally an FFT was applied

$\begin{matrix} p_{Y} [m] = FFT (e^{- λ} e^{{λϕ}_{A}}) [m] . & (93) \end{matrix}$

Equation (93) was convolved with a Gaussian to simulate the effect of noise Z smearing out the observed spectrum

$\begin{matrix} p_{X} = p_{Y} ★ \frac{1}{\sqrt{2 π} σ} e^{- \frac{m^{2}}{2 σ^{2}}} . & (94) \end{matrix}$

This represents the expected density of the observed spectrum, including pile-up and additive noise. Observation histograms were created using random variables that were distributed according to (94). Experiments were parameterized by the pair (N, λ) where N∈{10⁴, 10⁵, 10⁶, 10⁷, 10⁸, 10⁹} and λ∈{1.0, 3.0, 5.0}. For each parameter pair (N, λ), one thousand observed histograms were made. Estimates of the probability mass vector p_Awere made using (88) with both (86) and (87) used for H(k). A threshold value of α₀=0.95 was used for both window shapes, and β₀=40.0 for the logistic shape. The discrete ISE measure of the error between each estimate {circumflex over (p)}_Aand the true vector p_Awere recorded. For comparison with asymptotic bandwidth results, estimates were made using a rectangular window whose bandwidth was selected according to the condition 1.3 specified by Gugushvili in [18] i.e., h_N=(ln N)^−βwhere β<½. We emphasize that the β of Gugushvili's filter is not to be confused with the β₀of (87). The asymptotic bandwidth criterion was implemented by using

$\begin{matrix} H [k] = 1_{{\langle k \rangle < α_{0}}} [k] & (95) \\ where α_{0} = \frac{M}{π} {(\ln N)}^{β} . & (96) \end{matrix}$

Three values for Gugushvilli's β were trialed, namely β=½, ⅓, ¼.

- Estimates were also made using a rectangular filter (95) with fixed bandwidths of various values α/M∈[0.2, 0.4, 0.6, 0.8]. Finally time-series data was created according to (1) with an idealised rectangular pulse shape and 10⁷pulses whose energies were distributed according to (92). The pulse length and count rate were chosen to give a Poisson rate Δ=1.0. The algorithm described by Trigano et al. [11] was used to estimate the underlying amplitude density from a bi-dimensional histogram containing 32×1024 (duration×energy) bins—this choice of bins reportedly giving the best accuracy and reasonable execution times. The performance and processing time of the core algorithm were recorded for comparison with our proposed algorithm. FIG. 2 plots a typical estimate {circumflex over (p)}_Amade by the data-driven logistic shaped filter for an experiment with parameter pair (N=10⁶, λ=3.0). The true vector p_A(thin solid line), and the observed histogram {circumflex over (p)}_X(lower curve containing some noise) are also plotted. Pile-up peaks can be clearly seen in the observed histogram. Although the estimated density suffers from ringing (due to the Gibbs phenomenon), it otherwise estimates the true density and corrects the pile-up that was present in the observed histogram. FIG. 3 plots a typical estimate made at the same operating point as FIG. 2, but with an estimator having a rectangular filter where the bandwidth was selected using (96) and β=¼. This corresponds to the operating region in FIG. 6 where the performance of the fixed bandwidth filter (β=¼) is approaching that of the data-driven filters. It is evident that while also correcting pile-up, the resulting estimate contains more noise. FIG. 4 shows the distribution densities of ISE measures as a function of sample count using a rectangular filter and various fixed bandwidths. Lines were plotted between distribution means (MISE) to assist visualization. The results for the data-driven rectangular filter (86) were also plotted, connected with a thicker curve. This clearly illustrates the weakness of fixed bandwidth filtering. For any fixed bandwidth, the ISE decreases as sample count increases, eventually asymptoting as the bias becomes the dominant source of error. At that point (which is noise and bandwidth dependent) the ISE remains largely constant despite increases in sample count. The fixed bandwidth excludes the use of some estimates {circumflex over (ϕ)}_A[k] in the final calculation, even when they have high signal-to-noise-ratio (SNR). FIG. 4 also shows the results given by the rectangular filter with our proposed data-driven bandwidth selection. This curve lies close to the inflection point of each fixed bandwidth curve. This indicates the bandwidth selected for the data-driven rectangular filter is close to the optimal bandwidth value (for a rectangular filter) across the range of sample counts. FIG. 5-FIG. 7 show the distribution densities of the ISE measure as a function of the total number of estimates N in each histogram at three count rates λ∈{1.0, 3.0, 5.0}. The MISE curves for the logistic and rectangular filters are lower than those obtained using the bandwidth given by (96) for much of the region of application interest. There are various regions where the non-data-driven bandwidth (β=¼) gives similar performance to the data-driven bandwidths, however this is not maintained across the whole range of sample counts. The logistic filter shape has slightly better performance than the rectangular filter shape, although the differences between the two filters appears relatively minor to the ISE measure. Table 1 compares the results between the proposed algorithm and the algorithm recently described in [11]. The ISE for both methods were similar at the operating point under test (λ=1.0, N=10⁷), however our proposed algorithm requires considerably less computation.

TABLE 1

Comparison With Algorithm Described in [11]

Algorithm
Avg. ISE
Avg. Time (sec)

Fast Trigano Algorithm
1.3 × 10⁻⁵
3.19

32 × 1024 (duration × energy) bins

Proposed Algorithm

1 × 10⁻⁵
0.019

4.2 Real Data

The estimator was applied to real data to assess its usefulness in practical applications. The threshold value ϵ found in (8) was chosen to be approximately one half the standard deviation of the additive noise w(t). This ensured a reasonably high probability of creating intervals, yet ensured errors in the estimation of interval energy were low. A value for the interval length L was chosen approximately four times the ‘length’ of a typical pulse—that is, four times the length of the interval {t: Φ(t)>ϵ}. An energy histogram was obtained from a manganese sample, with a photon flux rate of nominally 10⁵events per second. A slight negative skew was present in the shape of the main peaks of the observed histogram, suggesting a complicated noise source had influenced the system. This is barely visible in FIG. 8. The noise was modelled as a bimodal Gaussian mixture rather than a single Gaussian peak. A simple least-squares optimization routine was used to fit bimodal Gaussian parameters Z˜α₁ custom character (μ₁, σ₁²)+α₂(μ₂, σ₂²) to the noise peak located around bin index zero. A suitable value for λ was chosen manually. The logistic filter with data-driven bandwidth was used to estimate the true density. FIG. 8 shows plots of the observed and estimated probability mass vectors. The main peaks (bins 450˜600) have been enhanced while the pile-up has been attenuated though not fully removed. The first order pile-up peaks have been reduced. The peak-to-pile-up ratio (ratio of the height of main peak to that of first pile-up peak) has increased from around 6 to around 120. These improvements are comparable to other state of the art systems (e.g., [11]). There are several possible reasons the estimator fails to fully resolve pile-up. The accuracy of the estimator depends on correctly modelling the Gaussian noise peak. The bimodal Gaussian mixture modelled the noise peak such that the maximum error was less than 1% of the noise density peak. Given that the residual pile-up peaks in the estimated spectrum are below 1% of the main peak, the sensitivity of the estimator to errors in noise modelling may have contributed to this in some part. A second reason for the unresolved pile-up may be due to the uncertainty in the estimation of the observed spectrum. Several of the residual pile-up peaks are relatively close to the floor of the observed histogram. The residual peaks may simply be a noise induced artefact of the estimator. Finally, the mathematical model may be an overly simple approximation of the observed spectrum. The detection process includes numerous second-order effects that have not been included in the model (e.g., ballistic deficit, supply charge depletion, correlated noise, non-linearities, etc. . . . ). These minor effects may limit the accuracy of the pile-up correction estimator.

5 SUMMARY OF THE FIRST EMBODIMENT

We have taken the estimator proposed by Gugushvili [18] for decompounding under Gaussian noise, and adapted it for correcting pulse pile-up in X-ray spectroscopy. We have proposed a data-driven bandwidth selection mechanism that is easily implemented, and provides significant reduction in ISE/MISE across a broad range of sample counts of interest to spectroscopic applications (10⁴˜10⁹counts). The data-driven rectangular bandwidth selection is close to optimal (for rectangular filters), and over the range of interest outperforms bandwidth selection based on asymptotic results or fixed bandwidth.

- Although initial results appear promising, further work is required to improve the performance for practical implementations. The estimation still contains ‘ringing’ artefacts associated with the Gibbs phenomenon. Additional filter shape attempts to reduce this, there are other shapes that are closer to MSE optimal.

6 SECOND EMBODIMENT

This section gives a summary of the spectrum estimator of the 2nd embodiment. The 2nd embodiment solves the problem of the 1st embodiment which requires entire clusters to be approximately encompassed in each interval. In the 2nd embodiment, the entire data series if desired can be used, and the overlap is compensated by introduction of 2 different interval lengths L and L1.

- We need to include a few additional terms not mentioned in the first embodiment. In particular {circumflex over (ϕ)}_X1. The spectrum estimator is based on

$\begin{matrix} {\hat{f}}_{A} (x) = \frac{1}{2 πλ} \int_{- \infty}^{\infty} e^{- iux} H (u) d \log (γ) (u) du . & (97) \end{matrix}$

The introduction of the filter H(u; α) allows us to address several implementation issues that arise. The estimation procedure we use may be summarized in the following steps.

- 1. Partition sampled time-series into fixed length intervals [T_j, T_j+L), j∈
- 2. Calculate x_jvalue for each interval according to x_j=Σ_k∈[T_j_{, T}_j_+L)s(k).
- 3. Generate histogram n from the x_jvalues.
- 4. Calculate {circumflex over (ϕ)}_Xusing the inverse FFT of n.
- 5. Partition the sampled time series into a different set of intervals with length L₁, and following similar calculations to obtain {circumflex over (ϕ)}_X1.
- 6. Calculate ϕ_Zand G.
- 7. Calculate {circumflex over (γ)} using {circumflex over (ϕ)}_X, G, ϕ_Zand {circumflex over (ϕ)}_X1,

$\begin{matrix} \hat{γ} = \frac{{\hat{ϕ}}_{X}}{ϕ_{Z} e^{- λ} {\hat{ϕ}}_{X 1}} & (98) \end{matrix}$

- 8. Calculate |ϕ_Asmooth(u)|, a low-pass filtered version of

$\langle \frac{1}{λ} d \log (\hat{γ}) (u) \rangle .$

- 9. Calculate α_mod.
- 10. Calculate H using α_mod.
- 11. Calculate {circumflex over (ϕ)}_Ausing {circumflex over (γ)} and H. If any element of {circumflex over (γ)} is zero and the corresponding element of H is non-zero, the estimation has failed as the distinguished logarithm is undefined.
- 12. Calculate {circumflex over (ƒ)}_Ausing the FFT of {circumflex over (ϕ)}_Aaccording to

$\begin{matrix} {\hat{f}}_{A} [k] = \frac{1}{2 M} \sum_{m = - M}^{M - 1} {\hat{ϕ}}_{A} [m] e^{- \frac{i 2 π mk}{2 M}} . & (99) \end{matrix}$

6.1 Algorithm Details

Partition the detector output stream into a set of non-overlapping intervals of length L i.e., [T_j, T_j+L), T₀∈ custom character _≥0, T_j+1≥T_j+L, j∈_≥0. Let x_jbe the sum of the detector output samples in the jth interval i.e.,

$\begin{matrix} x_{j} = \sum_{k = T_{j}}^{T_{j} + L - 1} s (k) & (100) \end{matrix}$

Assuming L is greater than a pulse length, the jth interval may contain ‘complete’ pulses as well as pulses which have been truncated by the ends of the interval. It can be shown that x_jconsists of a superposition of the energy of ‘complete’ pulses which we denote custom character , the energies of truncated pulses which we denote with _1jand noise z_j

- Let the detector output stream be partitioned into a second set of non-overlapping intervals [T_1j, T_1j+L₁), T_1,0∈_≥0, T_1,j+1≥T_1,j+L, j∈_≥0where L₁<L. Let x_1jbe given by

$\begin{matrix} x_{1 j} = \sum_{k = T_{1 j}}^{T_{1 j} + L_{1} - 1} s (k) & (101) \end{matrix}$

If L₁is chosen to be slightly less than the pulse length, the x_1jterm will contain no ‘complete’ pulses, but consist of a superposition of only the energies of truncated pulses y_ljand noise z_j. The number of truncated pulses in any interval has a Poisson distribution. We have

$\begin{matrix} X_{1} = Y_{1} + Z_{1}, & (102) \\ ϕ_{X_{1}} = ϕ_{Y_{1}} ϕ_{Z_{1}} . & (103) \end{matrix}$

- We can decompose the total energy in the interval [T_j, T_j, +L) into the energy contribution Y₁from pulses that have been truncated and the energy contribution Y₀from pulses that are fully contained in the interval [T_j, T_j+L) i.e.,

$\begin{matrix} X = Y_{0} + Y_{1} + Z_{0} + Z_{1} & (104) \end{matrix}$

where Z₀represents noise in the regions where pulses are fully contained in the interval (a length of L−L₁), and Z₁represents noise in the regions where pulses are truncated (a length of L₁). Hence,

$\begin{matrix} ϕ_{X} = ϕ_{Y_{0}} ϕ_{Y_{1}} ϕ_{Z_{0}} ϕ_{Z_{1}} . & (105) \end{matrix}$

- By combining (103) with (105) we have

$\begin{matrix} ϕ_{X} = ϕ_{X_{1}} ϕ_{Y_{0}} ϕ_{Z_{0}} & (106) \end{matrix}$

Rearranging gives

$\begin{matrix} ϕ_{Y_{0}} = \frac{ϕ_{X}}{ϕ_{X_{1}} ϕ_{Z_{0}}} & (107) \\ = e^{- λ_{0}} e^{λ_{0} ϕ_{A} (u)} & (108) \end{matrix}$

We can estimate ϕ_X₁in a similar manner that we estimated ϕ_Xor some other method, e.g., via the empirical characteristic function or by performing an FFT on the normalized histogram of x_1jvalues.

- When performing the decompounding operation, the Poisson rate λ₀for the reduced interval length L−L₁is used to account for the sub-interval over which the compound Poisson process Y₀occurs.
  
  6.2 Visualization of Internal Quantities

To aid the reader's understanding, FIG. 9 plots various quantities obtained during the estimation process. The upper blue curve (with a value around 0.3 at bin zero) plots |{circumflex over (ϕ)}_X|, the estimated characteristic function of the observed spectrum. A brown curve is used to show the true value of |ϕ_ƒ|, which is distinctly visible as the lower curve with periodic nulls in the region [6000, 10000]. The quantity |ϕ_ϵ|/(λϕ_Ze^−λ) is shown in transparent red and appears as ‘noise’ whose average density peaks around bin #8000. The expected value of |ϕ_ϵ|/(λϕ_Ze^−λ) is shown with a black dashed line. This is obtained using (75), the known value of λ, and assuming Gaussian noise with known σ to obtain ϕ_Z(k). The quantity |{circumflex over (ϕ)}_ƒ| is shown with a transparent blue curve. This is barely visible as it coincides closely with |ϕ_ƒ| in the intervals [0, 4000], [12000, 16000], and closely with |ϕ_ϵ|/(λϕ_Ze^−λ) in the interval [5000, 11000]. Note that the colour of |ϕ_ϵ|/(λϕ_Ze^−λ) appears to change from red to purple in the interval [5000, 11000] as both transparent plots overlap. A solid black line shows |{circumflex over (ϕ)}_fsmooth|, a low pass filtered version of |{circumflex over (ϕ)}_ƒ|. The low pass filtering removes any local oscillations in |{circumflex over (ϕ)}_ƒ(k)| due to the peak localities, as described in paragraph on smoothing at the beginning of this section. The term |{circumflex over (ϕ)}_fsmooth(k)| serves as an estimate of custom character [|ϕ_ƒ(k)|]. It can be see that |{circumflex over (ϕ)}_ƒ| provides a reasonably good estimate of |ϕ_ƒ| in the region where |{circumflex over (ϕ)}_fsmooth|>>[|ϕ_ϵ|/(λϕ_Ze^−λ)]. As these two quantities approach each other, the quality of the estimate deteriorates until it is eventually dominated by noise. The filter H(k) should include good estimates of |ϕ_ƒ| while excluding poor estimates. To find the regions where good estimates of |ϕ_ƒ| are obtained we address the question: Given custom character [|ϕ_ϵ/(λϕ_Ze^−λ)], what is the probability that the calculated values of {circumflex over (ϕ)}_ƒin a local region arise largely from noise?

7 COUNT RATE ESTIMATION

The previous estimator assumed λ was known. An estimate of λ can be obtained without prior knowledge as follows.

- 1. Using {circumflex over (ϕ)}_X, {circumflex over (ϕ)}_X1, {circumflex over (ϕ)}_Z, G from the previous section, calculate

$\begin{matrix} {\hat{ϕ}}_{Y} = \frac{G {\hat{ϕ}}_{X}}{ϕ_{Z} ϕ_{X_{1}}} & (109) \end{matrix}$

- 2. Using {circumflex over (ϕ)}_Y, estimate the count rate. This can be done a number of ways.
- 3. One way is to use an optimization routine or some other means to fit a curve to {circumflex over (ϕ)}_Y. The fitted parameters can be used to obtain an estimate of the count rate.
- 4. Another way involves estimating the DC offset of Ψ=d log({circumflex over (ϕ)}_Y). This can be done by averaging a suitable number of points of Ψ. The points obtained by filtering by H(u) in the previous section are usually suitable, although less points may also produce an adequate estimate.
- 5. Another way involves using an optimization engine or some other means to fit a curve to Ψ=d log({circumflex over (ϕ)}_Y). A suitable parameterized curve to fit d log({circumflex over (ϕ)}_Y) is given by

$\begin{matrix} f (u; λ, α, σ, μ) = - λ + λ \sum_{k = 0}^{K - 1} α_{k} G (σ_{k} u) e^{- j 2 π u μ_{k}} & (110) \\ where α = (α_{0}, \dots, α_{K - 1}) & (111) \\ σ = (σ_{0}, \dots, σ_{K - 1}) & (112) \\ μ = (μ_{0}, \dots, μ_{K - 1}) & (113) \end{matrix}$

- - and where K∈ is chosen to allow the curve fit to sufficient accuracy. The parameter λ provides an estimate of the count rate. The optimization engine is not required to give equal weighting to each point in Ψ.

8 DESCRIPTION OF FIGURES

The following figures are to aid understanding of the process. FIG. 1 shows one possible scheme used to partition the detector output. The illustration depicts the sampled detector response to three incident photons. To aid clarity of the figure, the effects of noise have been removed. The output response has been partitioned into several regions of equal length (L). The number of pulses arriving in each region is unknown to the processing system. One pulse has arrived in the first interval. Two pulses have arrived in the second interval. No pulses have arrived in the third interval. The total photon energy arriving within each interval is calculated as the statistic of interest, being the sum of all sample values in each interval. Intervals are not temporally aligned with pulse arrivals. FIG. 2 illustrates the output of the estimation procedure. The true probability density of incident photon energy is plotted as a solid black line. The photons arrival rate is such that three photons on average arriving during any given interval [T_j, T_j+L). The standard deviation of additive noise in the detector output signal s(t) is equal to one histogram bin width. One million intervals were collected. A histogram was made of the total energy in each interval. This is plotted with the blue line. The effects of pile-up is clearly evident, particularly around bins 75, 150 and 225. The red trace plots the estimate of the true incident energy spectrum after the data has been processed by the system. Although some noise appears in the estimate, the effects of pile-up have been removed. The estimate is expected to correctly recover the true incident spectrum on average. This result was obtained using an internal filter whose bandwidth was determined automatically from the data. FIG. 3 illustrates the same quantities as FIG. 2 under the same operating conditions, however in this instance the bandwidth of the internal filter has been determined using asymptotic results from the literature. Although the estimated probability density of incident energies has been recovered, the variance is significantly greater compared to FIG. 2. FIG. 8 illustrates the operation of the system on real data. The blue trace plots the probability density of observed energy values, while the red trace plots the estimated true probability density of incident photon energies. There is no black trace as the true probability density is unknown. In this experiment, X-ray fluorescence of a manganese sample was used as a photon source. The photon arrival rate was around 10⁵photons per second. The interval length was chosen such that the average time between photons corresponded to the length of two intervals. Sufficient data was collected and partitioned to form 5.9×10⁶intervals. The standard deviation of the additive noise corresponds to 4.7 histogram bins. The estimation process has clearly reduced the pile-up peaks and enhanced the true peaks. FIG. 9 illustrates various quantities obtained during the simulation of the system described in the 2nd embodiment. It is described in section 5.1 Visualization of Internal Quantities. FIGS. 9-12 relate to the 2nd embodiment. FIG. 10 illustrates the observed and true probability density of input photon energies for the experiment from which FIGS. 9-13 were derived. The black trace plots the true probability density. The red trace plots the density expected to be observed when three photons on average arrive during a given interval length. The blue trace plots the actual observed density. Up to tenth-order pile-up can be seen in the observed density. FIG. 10 included several plots arising from a typical spectroscopic system. The actual incident photon density (‘Ideal Density’) is plotted with a solid dark line. An observed histogram obtained by partitioning the time-series data is shown in dark blue. Distortion of the spectrum caused by pulse pile-up is evident FIG. 13 plots various internal quantities using a logarithmic vertical axis. The dark blue curve that dips in the centre of the plot is |{circumflex over (ϕ)}_X|. The green quantity that crosses the plot horizontally is |{circumflex over (ϕ)}_Y|. The upper cyan curve that dips in the centre of the plot is |ϕ_Z|.1 FIG. 11 illustrates the trajectory of the curve γ in the complex plane. FIG. 12 illustrates internal quantities similar to FIG. 9, however there are some additional signals. The horizontal red trace that is largely noise, and the corresponding black dashed line represent |ϕ_ϵ|, the magnitude of the characteristic function of the histogram noise. The transparent green plot that forms the ‘noisy peak’ in the center of the figure is the estimate |ϕ_ϵ|. This quantity was plotted in blue in FIG. 9, and was barely visible as it was obscured by |{circumflex over (ϕ)}_X|/(λϕ_Ze^−λ), which is not shown in FIG. 12. The horizontal trace with an average value of −3 is a plot of |ϕ_Y|. The cyan trace that begins with a value of zero at bin zero, and dips to a minimum around bin 8000 is |ϕ_Z|, the magnitude of the characteristic function of the additive Gaussian noise. FIG. 13 relates to the 2nd broad aspect of calculating the count rate. It illustrates internal quantities used in the calculation of {circumflex over (λ)}. The cyan trace that begins with a value of zero at bin zero, and dips to a minimum around bin 8000 is |ϕ_Z|, the magnitude of the characteristic function of the additive Gaussian noise. The dark blue trace that dips to a minimum in the center of the Figure is |{circumflex over (ϕ)}_X|, the estimate of the characteristic function of the observed data. The yellow/green horizontal trace with an average value of −3 is the estimate of |ϕ_Y|.

REFERENCES

[1] G. F. Knoll, Radiation Detection and Measurement, 3rd Edition. New York: Wiley, 2000.

[2] P. A. B. Scoullar and R. J. Evans, “Maximum likelihood estimation techniques for high rate, high throughput digital pulse processing,” in 2008 IEEE Nuclear Science Symposium Conference Record, pp. 1668-1672, October 2008.

[3] M. Haselman, J. Pasko, S. Hauck, T. Lewellen, and R. Miyaoka, “FPGA-based pulse pile-up correction with energy and timing recovery,” IEEE Transactions on Nuclear Science, vol. 59, pp. 1823-1830, October 2012.

[4] T. Petrovic, M. Vencelj, M. Lipoglavsek, R. Novak, and D. Savran, “Efficient reduction of piled-up events in gamma-ray spectrometry at high count rates,” IEEE Transactions on Nuclear Science, vol. 61, pp. 584-589, February 2014.

[5] B. A. VanDevender, M. P. Dion, J. E. Fast, D. C. Rodriguez, M. S. Taubman, C. D. Wilen, L. S. Wood, and M. E. Wright, “High-purity germanium spectroscopy at rates in excess of 10⁶events/s,” IEEE Transactions on Nuclear Science, vol. 61, pp. 2619-2627, October 2014.

[6] Y. Sepulcre, T. Trigano, and Y. Ritov, “Sparse regression algorithm for activity estimation in spectrometry,” IEEE Transactions on Signal Processing, vol. 61, pp. 4347-4359, September 2013.

[7] T. Trigano, I. Gildin, and Y. Sepulcre, “Pileup correction algorithm using an iterated sparse reconstruction method,” IEEE Signal Processing Letters, vol. 22, pp. 1392-1395, September 2015.

[8] L. Wielopolski and R. P. Gardner, “Prediction of the pulse-height spectral distortion caused by the peak pile-up effect,” Nuclear Instruments and Methods, vol. 133, pp. 303-309, March 1976.

[9] N. P. Barradas and M. A. Reis, “Accurate calculation of pileup effects in PIXE spectra from first principles,” X-Ray Spectrometry, vol. 35, pp. 232-237, July 2006.

[10] T. Trigano, A. Souloumiac, T. Montagu, F. Roueff, and E. Moulines, “Statistical pileup correction method for HPGe detectors,” IEEE Transactions on Signal Processing, vol. 55, pp. 4871-4881, October 2007.

[11] T. Trigano, E. Barat, T. Dautremer, and T. Montagu, “Fast digital filtering of spectrometric data for pile-up correction,” IEEE Signal Processing Letters, vol. 22, pp. 973-977, July 2015.

[12] P. Ilhe, E. Moulines, F. Roueff, and A. Souloumiac, “Nonparametric estimation of mark's distribution of an exponential shot-noise process,” Electronic Journal of Statistics, vol. 9, no. 2, pp. 3098-3123, 2015.

[13] P. Ilhe, F. Roueff, E. Moulines, and A. Souloumiac, “Nonparametric estimation of a shot-noise process,” in 2016 IEEE Statistical Signal Processing Workshop (SSP), pp. 1-4, June 2016.

[14] C. McLean, M. Pauley, and J. H. Manton, “Limitations of decision based pile-up correction algorithms,” in 2018 IEEE Statistical Signal Processing Workshop (SSP), pp. 693-697, June 2018.

[15] D. Snyder and M. Miller, Random Point Processes In Time And Space. New York: Springer-Verlag, 2, revised ed., September 2011.

[16] B. Buchmann and R. Grübel, “Decompounding: An estimation problem for Poisson random sums,” Annals of Statistics, pp. 1054-1074, 2003.

[17] B. van Es, S. Gugushvili, and P. Spreij, “Deconvolution for an atomic distribution,” Electronic Journal of Statistics, vol. 2, pp. 265-297, 2008.

[18] S. Gugushvili, Non-Parametric Inference for Partially Observed Levy Processes. PhD, University of Amsterdam, Thomas Stieltjes Institute, 2008.

[19] S. Said, C. Lageman, N. Le Bihan, and J. H. Manton, “Decompounding on compact Lie groups,” IEEE Transactions on Information Theory, vol. 56, pp. 2766-2777, June 2010.

[20] B. van Es, S. Gugushvili, and P. Spreij, “A kernel type nonparametric density estimator for decompounding,” Bernoulli, vol. 13, pp. 672-694, August 2007.

[21] J. Yu, “Empirical characteristic function estimation and its applications,” Econometric Reviews, vol. 23, pp. 93-123, December 2004.

Number	Name	Date	Kind
3872287	Koeman	Mar 1975	A
6590957	Warburton et al.	Jul 2003	B1
20080025385	Barat et al.	Jan 2008	A1
20120041700	Scoullar et al.	Feb 2012	A1
20140029819	Zeng et al.	Jan 2014	A1
20160161390	Greiner	Jun 2016	A1
20180204356	Xia et al.	Jul 2018	A1
20200170586	Takahashi	Jun 2020	A1

Radiation detection with non-parametric decompounding of pulse pile-up

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract

Description

Claims

Priority Claims (1)

PCT Information

US Referenced Citations (8)

Non-Patent Literature Citations (3)

Related Publications (1)

Entry
Gugushvili, Shota. “Decompounding under Gaussian noise.” arXiv preprint arXiv:0711.0719 (2007).
International Search Report received in PCT/AU2020/050275, mailed Jun. 9, 2020.
Extended European Search Report received in EP Application No. 20777486.0, mailed Nov. 18, 2022.