The current invention relates to a system for monitoring water quality, biological contamination in water and, more specifically to a fluorescent system based on tryptophan-like-fluorescence and humic-like fluorescence. The invention includes a circulation Jig allowing of simulating field conditions requires for the sake of developing the said system.
Use of a treated wastewater as irrigation water source is a common practice around the globe. This concept allows to reduce demand for water from other sources (e.g., groundwater) and got attention in recent years due to expected water shortage in near future. Therefore, nowadays use of treated wastewater for irrigation is higher than ever. In Israel, for instance, treated water constitutes 38% of overall agricultural water consumption in 2015, in 2019 it was rise to 45%. While using treated water save water resources, this technique not as safe as use of fresh water from natural resources. Typical wastewater contains variety of inorganic substances from domestic and industrial sources, organic matter (OM), and dissolved organic matter (DOM) for the sake of simplicity of terminology, farther on, we refer to both OM and DOM as DOM. Most conventional thereat techniques, like chlorination, not only found to be harmful for human and environment, but also are inefficient. As a result, online water monitors and early warning contamination detectors for treated wastewater lines might play a crucial role in water systems. Fluorescence spectroscopy is a well-known technique for water analyses, and in recent year received additional attention due to availability of light emitting diodes (LED) as UV sources. Analysis of excitation of tryptophan-like fluorescence (TLF) at 280/350 nm emission and humic-like-fluorescence (HLF) emission at 320-360/400-480 nm emission/excitation pairs may reveal OM and DOM contamination respectively. Therefore, numerous methods implement fluorescence spectroscopy for detection of OM and DOM in different environments such as oceanic, urban and others.
Those methods have the same goal, but mostly divided in to two main categories, field-based system that monitors quality of a water reservoir by publicly available portable fluorimeters and custom-built fluorimeter prototypes for controlled environment measurements.
While former works deal with environmental conditions, later aim to build cheap and reliable prototype for specific task under known conditions. In most cases wastewater plant and small-scale irrigation systems left without attention. Moreover, In these systems monitoring rely on a simple excitation/emmission setups which did not aimed to a proper. Special care requires irrigation systems because of large amount of water to process from unknown, inconsistent and time dependent water sources.
Consequently, the proposed method aims to fill those gaps. Presented system successfully monitors fluorescence of thread water for irrigation by examining TLF and HLF signals that occur in range of 300-520 nm, and provide a range of additional measure parameters (e.g. temperature) for later signal processing. While TLF/HLF signals are typical for DOM contamination, upon adjustment of the illumination sources and spectro-flourometer one may adjust the proposed system to detect other contaminations.
We show a real implementation of array passed photo multipliers sensing which both measure flow system and full fluorescence spectra, which for the best of our knowledge suggested for the first time. DOM break into the irrigation water line is demonstrated using, milk, which is rich with proteins that highly fluorescent for 280 nm and 340 nm light source. Moreover, substance flow and sensors are in outdoor condition, thus measurements are affected by weather and additional environmental conditions that encountered in common farm or plant. Realizing the outdoor conditions is essential for sensors developing, testing and data collection. Since bringing the optical lab to the field is impractical, a cycling system which mimics flow in a standard irrigation system is presented, such that the field conditions are simulated with every experiment. In a first embodiment, the measurement technique suits standard flow rate of a 1″ pipe system, and may be modified for different pipe systems, i.e. not limited to cuvette dimensions, to a low flow rate and may be realized to a large diameter pipe system. The test samples for evaluation of the proposed system were taken from irrigation water reservoirs, which consist of mixed treated water and sweet water.
Linear regression problems are very common for modeling chemo-metrics relations between concentrations of chemicals content in material to the measured spectral emission while illuminated by a known light source. For instance, such a method used to estimate and model pollution of water by fuel and biological sources. Notable methods are original least squares (OLS), partial least squares (PLS), support vector regression (SVR). Yet, in many cases spectroscopy dependence on chemical compounds is non-linear problem, where one must or find workaround to force linearity (e.g. kernel trick). Uses of linear regression is feasible in a controlled environment, i.e. in laboratory with precise equipment and control over effect of environment. Data acquired in such a setting is denoted as high-quality data. Moreover, even if a model has desirable results, next step of a standard application for a method may be implementation in field conditions. Under field conditions researcher has lesser control on environment, and measurement equipment produces less accurate samples. In this case data denoted as low-quality data. As result previously trained models which worked well on high-quality data may achieve poor results in this setting.
Establish linear regression on data collected in the field is challenging from few reasons. First model developed on high quality lab measurements, is hardly usable such that training starts from the basic level. Second, there is difficult to use a data measured in the lab which is easier to measured, to enrich the field measurements which are harder to collect. Third when the changing environment in the outdoor effect the interaction of light and matter, linear relation between spectra and concentration which worked well in the lab, changes, resulted in a poor prediction of the trained linear model.
An alternative estimation approach to tackle the regression problem of measurements taken under environmental conditions is to establish a deep neural networks (DNN) model. When DNNs are used there is also a possibility to perform a transfer learning technique, i.e. train model on high-quality data, and then adopt it to low-quality data. Naturally, transfer learning for DNN models preformed on well-known architecture for image classification tasks, where a part of weights are replaced by newly initialized ones and trained with new data.
DNN are abstract mathematical models, composed of enormous parameters which learn hidden properties and features of the data. Many times, the complicated and abstract feature representation is non interpretive for human. Nevertheless, this abstract representation works, and even in more sophisticated tasks such as domain transfer (DT). DT models are based on DNN auto-encoders, thus samples encoded to compact representation, or features, with pre-defined dimension. The DT features are also non interpretive, but it is easier to construct a training process that will guide model to simultaneous mapping of different domains to with the same feature set. Novel DT techniques even allow to control an effect of each feature set on output. Yet, in all of the mentioned techniques correlation between features and physical model found after series of tests, and not straight forward by initial design.
Both of the challenges, namely knowledge transfer from one model to the other, and estimation of true physical model, arise in the very important research field of water quality monitoring by chemo-metrics techniques. Due to high demand for clean water in the world, that will only rise, many works are dedicated to fast and accurate systems for water quality estimation based on analysis of emitted fluorescence spectrum (EFS). Even if prototypes are robust and relatively precise, result of a work limited by estimation technique, which mostly is linear methods with modification based on physical model.
It is hence one object of the invention to disclose an inline early warning system of water contamination with organic matter or other water quality contamination which may be detected by transmission or flouresence spectroscopy. The aforesaid system comprises: (a) an optical chamber embeddable into a water-supply system; the optical chamber having an internal passage configured for conducting a flow of water to be tested; the optical chamber having optically transparent entrance and exit windows; (b) a UV light source configured for illuminating the flow of water via the entrance window and excite tryptophan-like fluorescence at 280 nm and humic-like-fluorescence at 320-360 nm; (c) an optical sensor arrangement configured for sensing the tryptophan-like fluorescence and humic-like-fluorescence emissions at 350 nm and 400 to 480 nm via the exit window, respectively; (d) a non-optical sensor arrangement; (e) an acquisition and control unit configured for measuring tryptophan-like fluorescence and humic-like-fluorescence emissions and (f) a processor configured processing obtained measured data.
Another object of the invention is to disclose the optical sensor arrangement comprises disclosed along to a propagation path of a fluorescence emission light beam an optical tube, a spectral dispersing element and an optical sensor.
A further object of the invention is to disclose the exit window and optical sensor arrangement optically connected by an optical fiber.
A further object of the invention is to disclose the spectral dispersing element which is a diffraction grating.
A further object of the invention is to disclose the optical sensor which is a photoelectric multiplier tube.
A further object of the invention is to disclose the non-optical sensor which is a photoelectronic multiplier tube.
A further object of the invention is to disclose the UV light source which is a deep ultraviolet LED array.
A further object of the invention is to disclose the arrangement comprising at least one sensor selected from the group consisting of a flow meter, a conductivity meter and thermocouple and any combination thereof.
A further object of the invention is to disclose the processor preprogrammed for performing: (a) a step of single domain training with high quality data; and (b) a new transfer learning step.
A further object of the invention is to disclose the transfer learning step comprising initializing a new encoder and replacing the new encoder with a Siamese encoder.
A further object of the invention is to disclose the circulation Jig allowing of simulating field conditions requires for the sake of developing the said system.
In order to understand the invention and to see how it may be implemented in practice, a plurality of embodiments is adapted to now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which
The following description is provided, so as to enable any person skilled in the art to make use of said invention and sets forth the best modes contemplated by the inventor of carrying out this invention. Various modifications, however, are adapted to remain apparent to those skilled in the art, since the generic principles of the present invention have been defined specifically to provide an in-line early warning system of water contamination with organic matter.
Reference is now made to
The water circulation jig sub-system contains three key elements: water tank, pipe system and PKm60 pump (see
This sub system located entirely outdoor, in order to simulate field conditions for experiments.
In one mode of operation the circulation is stopped. By closing a valve below the optical chamber one can perform low volume calibration test by purring small amount of test solution from an entrance in the top of the column (water intake)
There are tree sensors that serve for system control and environmental data collection. First sensor is DRH-1X50G6N6F300 (Kobold, Hofheim, Germany) flow meter (see
which covers possible substance flow. It used for indication of pump stability and, as we'll see later, possible substance density measurement. For temperature estimation standard PT-100 thermo-couple with M-345 (Michshur, Holon, Israel) transmitter (
The core of the system is optical chamber that on the one hand allows to UV light pass to substance flow, and on the other not interfere with it. Assurance of both is complicated, because large water supply systems operate at high flow rate and wide pipe diameters, later will cause inner filter effect and signal will be faulty. Thus, our optical chamber is a pipe extension, with cuvette-like gap. During flow, random substance portion will fill cuvette-gap, and by this, measured flow sample remain properties of a flow as whole.
Reference is now made to
Reference is now made to
For water flow irradiation, optical measuring chamber (
The interaction of OM and DOM with UV radiation in 280 nm and 340 nm, resulted in typical emitted florescence spectra which is very weak and requires extremely sensitive spectrofluorometer (
For the sake of measuring the weak fluorescence emotion, emitted out of the optical chamber, a spectrofluorometer prototype was built. The spectrometer is a modification of Czerny-Turner arrangement. The optical design preformed in few steps: first, based on assumption on the expected concentration, a course evaluation of power loss and signal to noise ratio was done. Having that we simulate the spectrometer behavior using OSLO Pro (Lambda, Littleton, Massachusetts). The expected diffraction behavior of the spectrometer, as well as, the initial values for focal lengths estimated by MATLAB simulation of the diffraction grating (Mathworks, Natick, Massachusetts). The system built in steps, where each subcomponent was checked separately.
Reference is now made to
Natural florescence of tryptophan and humic acids are characterized by a low quantum efficiency, typically of the order of few percentage. Thus, the optical signals are weak, the spectrometer losses are invariant to the power of the fluorescence. However, if the power is too low the PMT would not be able to identify the signal. Given and PMT with potential gain of 2.106, the burden of providing sufficient signal fall on the efficiency of illumination and light collection in the optical chamber.
Excitation of tryptophan is 280 nm with emission at 335-365 nm, while for humic origins best excitation wave is 340 nm with emission at 400-480 nm. Since fluorescence power is within proportion to concentration of the active material, the determination of required power relayed on testing of real substance. These experimental results suggested that the required optical power has to be at least 25 mW. To meat that we chose light sources from deep ultraviolet LEDs array series DUVXXX-SD356LN (Roithner Lasertechnik, Vienna, Austria) with nominal optical power (Po) 26 mW and 32 mW respectively. The sources bandwidths (FWHM) are 10±2 nm. LEDs were integrated to printed circuit boards (PCB) SD35-PCB (Roithner Lasertechnik, Vienna, Austria), that mounted in aluminum housing which is consists of mounting plane and cover.
The DUVXXX-SD356LN assembly includes concentration lens. Following the manufacturer this resulted in concentration of 50% of the radiation power in a field of view (FOV) of 40°. As a result, coupling of sources to standard 22° FOV 600 μm core fiber optics was poor. To cover most of the radiation the single core fiber was replaced by a 2 mm diameter bundle of seven 550 μm cores, which was found sufficient experimentally. Furthermore, to reduce parasite radiation in unrequired bandwidth, a bandpass filter was added to the illumination sources. 35-881 (Edmund optics, Barrington, New Jersey) for 280 nm LEDs array, and 65-129 (Edmund optics, Barrington, New Jersey) for 340 nm LEDs array. For the sake of measuring the emitted spectra with the two sources, the LEDs array were alternate in 1/15 Hz.
Coupling of the LEDs array to the fiber optic required 3D adjustment. Using SM1SMA (Thorlabs, Newton, New Jersey, USA), the fiber optic cable was mounted to the LED through SMA905. The distance between the fiber optic and the emitted surface done using the SM1 thread. Parallel adjustment done using while tightening the treads. To maximize the coupling the adjustment done while measuring the illumination with a spectrometer. To insure LEDs working temperature, ‘each light module supported by a heat sink (ABL, Wednesbury, UK).
We have two optical fiber cables that enter two spectrometers. One optical fiber carries absorption optical signal and goes straight to Flame-T-RX1 (Ocean Insight, Rostock, Germany) (
As mentioned above the PC (
The signal S(t) resulted from the Fluorescence emission was written to the PC continuously and post processed Post processing begins with normalization of an signal by subtraction of clean water fluorescence signal Eq. 1. It expected to be zero for every wavelength, therefore it is suits the task of removing unwanted artifacts. Due to measurement range, excitation source of 340 nm produces scattering that excides maximum value even of the emission produced by 280 nm source. By exploiting this information, one may rely on simply hypothesis test—calculation of maximum values for each normalized excitation wave and mean value. Mean will be used as a threshold, or separation hyperplane, that spread fluorescence signal produced by each wavelength. To reduce noise in resulted sets of signals running mean is calculated on every 1500 spectral signals, without overlap. 1500 signal are collected in a course of 30 seconds, which in a typical flow rate of
resulted in a measurement for every 15 L passes through the system.
Reference is now made to
Just as fluorescence signal, the other sensory data had noise resulted from the harsh measurement conditions. To overcome this issue sensory data is smoothed by second order of Savitzky-Golay filter.
Prior to the assemble, UV sources, fiber cables, and spectrometer system were tested on temporal workbench. Test required for light source and in one point required for a natural florescence source which could mimic the real-world DOM break. Searching for available, and safe material we have found the diluted 3% fat cow's milk (“Tnuva”, Herzeliya, Israel) suitable for the task. Apart of its availability, the milk was chosen due to its high concentration of proteins, that rich with amino acids, especially tryptophan. I.e., it simulates expected TLF peak where DOM is active. Milk was injected to cuvette in ration of 10−4 milk to water. Process of signal acquisition of such a substance is a very easy task for standard laboratory fluorimeter, therefore it suit as a baseline for our system setting. The expected tryptophan concentration in our dilution is 40 ppb. Reference si now made to
The system includes three non-optical sensors, where each of them had to be calibrated. Firstly, flow meter calibration was performed in a course of 12 experiments with different flow rates that were manually changed by adjusting pump voltage. Temperature and conductivity meters were calibrated in a same experiment that was performed during nine days. The first four days were dedicated to stability test of a system, and last five days were aimed to test and calibration of conductivity meter by daily adding of sodium chloride (NaCl) to the water tank. To determine the required salinity range. Using a calibrated portable field conductivity meter CD-4303 (Lutron Electronic Enterprise, Taipei, Taiwan), the level of expected salinity range was predetermined by measuring samples of irrigated water source from various sources. The same instrument used as a reference to sample the salinity through the experiment. Compering the measurements of the inline sensor to the portable sensor used for calibrate the inline sensor, this resulted in a correction function Eq.2 which realized as a post processing correction on Arduino Mega2560. Where each parameter was estimated empirically from measured errors and Ĉm is after compensation and Cm before.
For the sake of calibrating the in-line spectrofluorometer, the measurements of the proposed system compered to measurements taken by a lab graded calibrated spectrometer FLAMET-RX1 (The calibration in the required bandwidth (300-520 nm) relay on measuring SL1-LED (StellarNet Inc., Tampa, Florida, USA) LED kit. Measuring the kit with the two spectrometers. One should know that when scale of nm/bin is scaled, two calibrated spectroscopes should resulted with the same spaces when measuring a series of the same calibration LEDs, showing the same profile to each of the calibration LEDs.
Reference is now made to
Two issues arise at such a procedure: (1) FLAME and PMT have different resolution (2) Light sources have different intensities. For former issue solution is relative simple—decimation of a signal to 32 bins. Later is solved by separate normalization of each signal by Eq. 3.
To evaluate the wavelength dependent shift, cross-correlation was calculated for every PMT-FLAME signal pair (SFLAME,i,SPMT,i). The shift δ(λ) determined according to the maximum of cross-correlation operation. A wavelength dependent formula for the shift derived by preforming polynomial approximation fit to the shift value as function of wavelength or channel (λi) Eq. 4 parameters are estimated and it is implemented for signal correction.
Yet, λdrift(λ) is not a wavelength axis correction, but a step correction as shown in Eq. (5) where Δλ is contains step of decimated signal(ŜFLAME,i) and correction of full pipeline is presented in a form of algorithm (1), which implemented in SciPy.
Reference is now made to
For the sake of signal amplification, the PMT requires for a reverse voltage bias. Effective signal form was acquired with gain of −900 volt and integration time of 500 μsec with sampling rate of 79 Hz.
For the sake of testing the proposed spectro-fluorometer. Samples of mixed irrigation water taken from various irrigation reservoir was measured both by the proposed system and in standard lab graded spectro-fluorometer RF-5301PC (Simadzu, Kyoto, Japan) A comparison of the measured fluorescence spectra of irrigated water with 280 nm excitation presented in
Reference is now made to
The built system was tested in tree scenarios. In each case milk was injected to base substance as simulation of dissolved organic matter contamination. In the first case the base substance was clean tap water under excitation wave of 280 nm, in the second the base substance was threaded irrigated water under same excitation wavelength. The third scenario summarizes system evaluation by implementing both UV sources i.e., excitation in 280 nm and 340 nm in alternating manner, with irrigated water as a substance and milk as a DOM contamination.
Initial experiment based on various concentrations of milk and only one UV source of 280 nm is used. Milk was added to a 200 l of tap water every 24 hours for overall 72 hours of experiment, simulating a break of DOM into drinking water system. Plan of milk injection presented in Table 3. Sensory data was recorded constantly with sample frequency of 5 minutes, while fluorescence was recorded for 30 min with intervals of couple of hours within day hours. Expected fluorescence signal form successfully measured, compared to laboratory results (
Due to high environmental temperature (above 30° C.) one may assume that bacteria reproduction rate increased in a second day of experiment, which led to un-predicted fluorescence behavior—maximum values started to climb (
Reference is now made to
Second experiment was performed in a similar manner to previous one, but in this case treated irrigation water from functioning water reservoir of TsabarKama co-operative (Revadim, Israel) was used as the base substance and both excitation wavelengths were activated. The optical data and fluorescence signal were recorder 12 hours per day and reference samples, for florescence and microbial count, were taken four times a day: morning, twice at afternoon (before and after milk injection) and at evening.
Reference is now made to
Reference is now made to
Reference is now made to
During our experiments non optical data was collected. Despite some minor change in flow speed, which, by our assumptions are result of temperature change, numbers are stable, and no anomalies were found. Example of such a data from experiment is in
One must be careful with a left overs of substance on equipment. Despite cleaning with high concentration of Ca (ClO2), biofilm was found on fiber and resulted in fluorescence signal. It was reduced after additional cleaning with fine wipes, but still remained (see
System built upon on two excitation waves (280 nm, 340 nm), emitted from UV LED sources, custom made optical measuring chamber, that suit for various pipe diameters and shapes, common optical system configuration and a PMT. It provides 32 channel fluorescence spectrum signal that shows TLF and HLF with complimentary sensory data—substance temperature, conductivity and flow rate. Prototype was able to successfully measure fluorescence of irrigated water, and detect simulated OM breach by milk injections equivalent top 40 ppm of tryptophan by analysis of TLF. Furthermore, due to constant recording unexpected fluorescence spectrum behavior was revealed, thus system also grants monitoring of solution's biological and chemical dynamics that are reflected in fluorescence spectrum.
Given a physical world model xi=hw(Pi) where xi∈Rd is single observation (Ex. spectroscopy measurement) and Pi∈S is the available world state data of environmental variables (Ex. Chemical concentration) with rank S. The world state may be separated into two sets of parameters yi and pi, where yi∈S-K composed of S-K response values to be estimated, and pi∈K composed of K environmental parameters accounted in the physical model which describe the relation between the observations-response, denoted ƒw. In this formulation, the estimation task is to find function ƒθ with a set of tunable weights θ that minimizes pre-defined loss function L such that:
First obstacle on the way of estimation is measurement noise σenv that effects our knowledge of response variable:
The estimation process is subjected to few main challenges. The estimator to ƒw is ƒθ since ƒw is mostly non-linear, the estimation may be non linear with reference to ƒw or linear after applying non-linear transformation to (xi,Pi). In both cases, the form of the physical model ƒw has to be known, analytically or empirically. Additional complications originate from the measurement equipment, while we are interesting in the observations true values xi, due to the limited measuring accuracy the measured values
Upon changes in environmental conditions the true world physical model ƒw resulted in different measuring values to the same inputs. Measurements in aquatic environment may be influence by a physical parameters such as turbidity, temperature, and solvent PH, while this values kept reasonably stable in lab conditions Eq. (2). it may subjected to a significant variation under outdoor environmental conditions Eq. (3), which may affect the measuring values. In addition, as shown in Eq. (3) above, changes in measuring condition may resulted from a change of the measuring instrument.
A common case of influence of equipment on the measurements, is utilization of a outdoor measurement equipment FLQ which characterized by higher measurement error, and lower scan resolution that results in low quality samples
Where dLQ denoted the number sampling points in a low-quality measurement, and dHQ is the number of sapling points in a high-quality measurement.
Since measurements Eq. (3) influence by many factors estimating of the physical process is highly complicated scenario.
One of the most important challenges in water quality monitoring, is estimating of organic matter (OM) concentration, and dissolved organic matter (DOM). This is estimated by the analysis of tryptophan-like-fluorescence (TLF) at 280/350 nm and humic-like-fluorescence (HLF) at 320-360/400-480 nm emission/excitation pairs may reveille the level of contamination. In addition, temperature has an effect on solution, and through this on a signal, such that data collection for model training has to consider both concentration (c) and temperature (T).
There are a variety of high-grade fluorimeters that typically have high signal resolution
and high SNR values. Meanwhile, in field conditions photomultiplier tubes (PMT) are used due to low signal energy. They perform well in signal detection, yet there are limitations on resolution. Data collection is based on specific setting described below.
Data was collected for system training and evaluation, summed up to 1190 EFS of tryptophan in double distilled water (DDW) samples. Sampling was performed on laboratory equipment RF-5301PC (Shimadzu, Kyoto, Japan) with high resolution capabilities (Resolution=1 nm/bin). To allow real evaluation of the method's robustness, training set and test sets were taken in different c-T sets (Table 1), ten samples for each couple, with a test-train ratio of 36%-64% (800 train samples and 290 test samples). Thus, data is balanced. EFS wavelength range was limited to [280,500] nm, i.e. (dHQ=221). Samples in various c-T couples presented in
To simulate real world environment, one must take into account equipment limitations, and the physical structure of a sample. The motivation for simulation is based on our project where we are building an outdoor 32 channel spectrofluorometer aim to measure in flow the quality of irrigation water. The data equitation system of that prototype relies on the IQSP480 acquisition system (Vertilon Corporation, Westford, Massachusetts, US) and the photo multiplayer array H7260-03 PMT (Hamamatsu Photonics, Hamamatsu, Japan). To mimic the expected data, number of synthetic data sets types were created. All of them are in low resolution, namely they were matched to output dimension of 32 channel, thus the high-quality data (dHQ=221) reduced to 32 (dLQ=32) by nearest neighbor technique, Next, another data set was built upon low resolution samples. This time, signal shape is modified to match fluorescence of real-life solution (irrigation water), an example to high quality measurements of irrigation water shown in
Following above, our attention is using network suitable to handle both high quality and low-quality data sets. Assuming that the high-quality and low-quality data sets share the same features set, the training of a model has two steps: single domain training, with high-quality data, and transfer learning step. In single domain training mode, an encoder (E) is trained to tie samples to some set of features with a lower dimension which can support a transformation both from high quality data and from low quality data. Next this set of latent features {Zi} is mapped to concentration estimation {
Siamese networks combined with contrastive loss proposed for dimension reduction, where sample dimension is dHQ and latent is dz where (dHQ>>dz), preserving latent manifold. Let Dij be the distance between two latent variables zi and zj, which are output of Siamese encoder Eφ such that:
Where φ denoted the encoder parameters. The contrastive loss Lcont will have a following expression:
And sij denoted as the similarity factor between two (xi,xj) samples
Lcont minimizes distance (Dij), when samples xi, xj belong to same class ci=cj, forces distance to match pre-defined minimal margin m when samples belong to different classes. Eventually, training is finalized when encoded samples are clustered by classes (
Here we suggest to look at regression as classification problem, where yi=ci∈, thus K=S=1, this implies that there are infinite classes. Since, relation between values of ci connected to relation between xi, ci's must be reflected on to the latent space. To inject this information, we suggest to modify only margin m by tying it to (ci,cj):
Where h is an arbitrary task dependent function that defines relation between pairs of response values.
Method encodes samples to latent representation by an encoder function Eφ and with a help of regression module Rϕ outputs desired response value. Regression and an estimator have a form of Eq. 2. Such that:
In addition, we use all available prior knowledge about measurement state in a form of vector Pi∈RS. The dimension of latent variable is defined by the number of known environmental values zi∈RS. It is possible to rewrite zi as:
Note that response value is also belongs environmental variables yi⊆Pi.
Substitute Eq. (7) in Eq. (5b) modified contrastive loss will be applied independently for every environmental variable and summed as mean of losses:
To estimate the concentration value
To realize constrictive loss (Eq.5b) one is training siamese encoders, the outcome of the encoders are two latent variables zi, zj. Each latent variable is passing through the regressor Rφ, each of which resulted in an estimation for the concentration values ĉi, ĉj respectively. Thus, the overall loss function for the regressor composed of a summation of two loss function terms:
And overall optimization loss is:
Where α and β are hyper-parameters. The overall training process is presented in
To assure that ƒθ is robust, we have used Online Hard Example Mining (OHEM) training strategy while minimizing LR and LE. After each loss was calculated per sample in batch, samples were sorted according to loss value in descending order. Bottom half of a loss were zeroed, such that gradient was calculated only for highest loss values. Utilized OHEM improves robustness of a network and prevents from over fitting to a specific sample behavior pattern. The proposed loss function in this process is:
This enforcement will be used in all non-linear methods for fair comparison between them.
Following the original intention of a paper to implement prior knowledge of the world physical model acquired from the high-quality dataset to another model that works on low-quality data, we suggest a new transfer learning step. The engaging of high-quality knowledge is surprisingly simple. Due to Siamese setting, model is trained on two batches (or parts of batches) simultaneously. First of all, the new encoder ELQ,γ with weights γ is initialized, then it replaces one of Siamese encoders. Instead of splitting batch, we apply two separate batches, each from different domain as described in
For the sake of comparison, the proposed ER net-works compered to two other DNNs. A fully connected network (FC), and empirical estimated DNN architecture that is denoted as Regression module (R). Both FC and R correspond to straight forward approach described in (2.1). The custom networks (R and ER) networks composed mostly of cascade of convolutional layers and non-linear operations. Best results achieved with architectures presented in Table 5. GeLU activation and BatchNormalization were applied to most of the layers. FC parameters were chosen empirically. To date, most of chemometry models are linear, thus for the sake of completeness the NN results compered to a three linear regression methods OLS [XXXX], PLS [XXX], and SVR[XX].
The training was performed on a standardized data. Each sample compose of three elements (vi): spectrum, temperature and concentration. Each of which is standardized separately by Eq. 14:
Where
The training data is ordered in batches, which shuffled before every epoch. For the sake of over-fitting check, 10% of the training data was used for validation. In order to prevent misbalance and interfere to final result, the validation set selected uniformly across the whole data set. To achieve high quality encoding, namely, reduce dimension and separate samples by the corresponding parameters, both margins (for temperature and concentration) are based on same formula:
Where m sets according to Eq. (7). OHEM was implemented for each loss and for each method. Batch size is set to 64, and training routine lasted 10,000 epochs under learning rate of 5·10−4 with applied weight decay of 10−3. Loss hyper parameters was α=7 and β=1 respectively.
Each of the DNN models (ER,R, and FC) was trained twenty times (M=20), i.e. twenty trained networks for each model. Evaluation preformed on N=290 test samples. Linear techniques were fitted under various parameters, and only best results are summarized. Quality of an estimation measured by few types of a sample error—absolute εabs,i and relative εrel,i as described below:
Overall quality of a models is measured by mean over all test set samples and iterations for both errors (Eq. 18, 19) and maximum relative error (Eq. 20). While motivation for former is straight forward, later quantifies accuracy in lower concentrations.
For completeness we present the R2 score for each model.
Four scenarios were considered: (A) High-quality data with original dimension (dHQ=221), (B) low-quality data (dLQ=32), (C) low-quality data with simulated TLF and HLF peaks. Following above, the performance of the ER, R, and FC methods as well as previously suggested linear methods were investigated. The results of the various methods in the three scenarios are summarized in Table 6. Observing the results in all scenarios the DNN methods (FC, R, ER) out preformed the linear methods (OLS, PLS, SVRpoly). While the average relative error of the DNNs methods was
Regression curves of the DNNs performances for scenario A and scenario B are found at
The interesting part is a comparison for low-quality data Scenarios B and C. While FC and R show lower performances due to the reduction in samples number from DHQ=221 to dLQ=32, the ER method show improvement in performance. The improvement resulted from the use of a prior knowledge that was achieved during training on high-quality data in training of the network for the low-quality data. This is especially important in the estimation of the low concentration where εmax is showing half error than the R and FC networks resulted in ˜78% max relative error. While this error seems large, in a low ground truth concentration this resulted in only a 2.5 ppb error which is very low concentration to be measured by the means of fluorescence spectroscopy. Given a 500 ppb, this error reflects only 0.5% full scale error as a worst case scenario. A zoom in regression curves for low concentration up to 20 ppb for the DNN presented in
For regression task with both high-quality and low-quality data proposed method, just as trivial DNN solutions, achieves better results than linear ones. Difference is very noticeable—errors about tenth times lower with mean MAE˜10 [ppb] and relative error˜10 [%] in contrast to ˜30 [ppb] and ˜200 [%]. After clearing out that linear methods are no match to modern DNN approach, comparison shifts to nonlinear techniques and mainly ability to preserve prior knowledge from high-quality data where they have similar results. ER has a lead in low concentrations from a start with mean error of 3.46 [ppb], and surprisingly not only preserves its knowledge, but also manages to improve itself to 2.5 [ppb] with samples in low resolution and to 2.36 [ppb] with modified samples, which are as twice as better that other techniques. In addition, when quantity of data is reduced error rises with a delay compared to FC and R, therefore is also suggested for a scenarios with a low amount of low-quality train samples. Yet, the down side of ER, just as other methods, is in ability to cope with unbalanced data, thus this area is suggested for future work.
While several embodiments of the disclosure have been shown in the drawings and/or discussed herein, it is not intended that the disclosure be limited thereto, as it is intended that the disclosure be as broad in scope as the art will allow and that the specification be read likewise. Therefore, the above description should not be construed as limiting, but merely as exemplifications of particular embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the claims appended hereto.
Each set was prepared from the same stock solution of DDW and tryptophan with concentration of 10,000 part per billion [ppb] in a course of 3 days. Tryptophan solution was made from a solvent of Tryptophan powder (CAS number 73-22-3, Sigma-Aldrich, St. Louis, MO, USA). Samples preparation done by injecting small portions of stock solution to cuvette with previous tryptophan concentration solution, and thus increasing concentration. Fluorescence was measured by using RF-5301PC spectrofluorometric with magnetic stirrer cell holder (Shimadzu, Kyoto, Japan). Sample stirring done by using Multistirrer cc-301 magnetic stirrer (Scinics, Tokyo, Japan). Where measurements using standard 3.5 mL quartz cuvette (Hellma, Mullheim, Germany) having a path-length of 10 mm. Measurements kept in refrigerator in 4° C. The measurements in different temperatures achieved by heating up the samples first to room temperature (25° C.) and then to more than 40° C. using water bath 20B with thermostate VC/2 (Julabo, Seelbach, Germany) to worm up the cuvette. Upon measuring the fluorescence response the sample temperature was measured using TM-947SD thermometer with two T-type thermocouple probes (Sun—well Global, Dacun, Taiwan).
Measurements with the RF-5301PC were taken with excitation wavelength of 280 nm where Tryptophan is most efficient [35]. Illumination source was Xenon arc lamp, for the sake of monochromatic illumination (280 nm) and detailed spectral reading of emission in the 270-650 nm range detector's mono-chromators used 3 nm optical slits. Auto shutter was set to “On”, PMT (photo multiplier) set to high voltage mode. Baseline response was taken while the UV source was disable to insure consistency of the system through the experiment (
Following the above preparation step, sampling routine was performed as followed:
Tryptophan emission is typical to appear in range of [300,500] nm therefore fluorescence spectrum was limited to [280,500] nm to achieve 221 bins, which corresponds to resolution of 1 [nm/bin]. One of presented work's goals is to overcome unpredicted effect of equipment and environment on samples, therefore only easy accessed data from equipment was used that don't rely on steps or materials that inaccessible in field conditions.
Example of resulted series found in
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IL2022/050602 | 6/7/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63197525 | Jun 2021 | US |