The present disclosure relates to an information processing system, an information processing apparatus, and an information processing method.
In the fields of medicine, biochemistry, and the like, a flow cytometer is sometimes used to quickly measure the properties of a large amount of microparticles. The flow cytometer is a measuring device using an analysis method called flow cytometry and irradiates a microparticle such as a cell flowing through a flow cell with light and detects fluorescence or the like emitted from the microparticle.
In a next-generation flow cytometer, a fluorescence signal is multicolored in order to enable detailed analysis of cells. As such a next-generation flow cytometer, a spectral-type flow cytometer has been developed. In the spectral-type flow cytometer, a spectroscopic element such as a prism or a grating is used to disperse light emitted from a microparticle such as a cell labeled with a plurality of fluorescent dyes. The dispersed light is detected by a light receiving element array in which a plurality of light receiving elements in different detection wavelength regions are arrayed. Detection values of the light receiving elements are collected, whereby a measurement spectrum of a measurement target such as a cell is acquired.
Such a spectral-type flow cytometer has an advantage that information on fluorescence can be utilized as analysis information without leaking the information compared with a filter scheme for separating and detecting fluorescence for each wavelength region using an optical filter.
Patent Literature 1: JP 2009-104026 A
When the spectral-type flow cytometer is used, it is possible to acquire a measurement spectrum in which spectra of the plurality of fluorescent dyes are mixed and measurement data representing a measurement result for each of the fluorescent dyes. Therefore, there is an advantage that analysis of a measurement target can be finely performed using both of the measurement spectrum and the measurement data. However, in order to perform such an analysis in a local environment, it is necessary to secure sufficient calculation resources in the local environment.
Therefore, it is considered to transfer data obtained in the local environment to a Cloud environment and analyze the measurement target in the Cloud environment. By cloudizing an analysis application, a detailed analysis of the measurement target can be easily performed utilizing sufficient calculation resources of the Cloud environment, data sharing and the like can be easily performed, and convenience is improved.
However, when the number of dimensions of data acquired per one sample increases due to multi-coloring of the fluorescence signal, a data amount of a sample group greatly increases. Therefore, when it is attempted to perform the analysis on the Cloud side, there is a problem in that data transfer takes a very long time.
Since the increase in the data amount directly leads to an increase in storage cost for storing the data, there is also a problem in that storage cost required for the Cloud side greatly increases due to the multi-coloring.
Therefore, the present disclosure proposes an information processing system, an information processing apparatus, and an information processing method capable of reducing the data amount.
An information processing system according to one embodiment comprises: an excitation light source that irradiates a respective plurality of samples belonging to a sample group with excitation light; a measurement unit that measures fluorescence generated by irradiation of the sample with the excitation light; and an information processing unit that generates differential data based on a difference between similar fluorescence signals among fluorescence signals based on the fluorescence measured for the respective samples.
Preferred embodiments of the present disclosure are explained in detail below with reference to the accompanying drawings. Note that, in this specification and the drawings, redundant explanation is omitted about components having substantially the same functional configurations by denoting the components with the same reference numerals and signs.
Note that the explanation is made in the following order.
1. First Embodiment
1.1 Overview of a flow cytometer
1.2 Schematic configuration example of a spectral-type flow cytometer
1.3 Schematic configuration example of an information processing system
1.4 About un-mixing
1.5 About a data structure
1.5.1 Data structure example of a measurement spectrum
1.5.2 Data structure example of fluorescent dye information
1.6 Sample data example
1.7 Problem concerning sample data
1.8 Data reducing method
1.8.1 Reduction of an unnecessary bit representation
1.8.2 Lexicography (LZ method)
1.8.3 Entropy code
1.8.4 Statistical prediction
1.9 Problem in reversibly compressing high-dimensional data
1.9.1 Case of a reduction of an unnecessary bit representation
1.9.2 Case of the Lexicography (the LZ method)
1.9.3 Case of the entropy code
1.9.4 Case of the statistical prediction
1.10 Data reducing method
1.11 Data reducing method
1.11.1 Compression/decompression of data
1.11.2 Format of differential data
1.11.3 Generation/restoration method for differential data
1.11.3.1 Determination method for a similar sample
1.11.3.1.1 First similarity determination method
1.11.3.1.2 Second similarity determination method
1.11.3.2 Selection method for a similar sample
1.11.3.2.1 First similar sample selection method
1.11.3.2.2 Second similar sample selection method
1.12 Summary
2. Second Embodiment
2.1 Mutual use of similarity information obtained in fluorescence spectrum/fluorescent dye information
3. Third Embodiment
3.1 Acceleration of Cloud transfer by division compression/decoding
A first embodiment of the present disclosure is explained in detail below with reference to the drawings.
1.1 Overview of a Flow Cytometer
A flow cytometer according to the present embodiment may be a device that individually analyzes samples using an analysis method called flow cytometry. In the flow cytometer, a sample is labeled with a fluorescent reagent, which emits light under a specific condition, and light emitted when excitation light is irradiated is collected as fluorescence information. Cells can be analyzed from this fluorescence information.
A general flow cytometer uses an optical filter to divide and extract, for each of wavelength regions, fluorescence radiated from a sample and adopts, as information concerning a fluorescent dye (equivalent to fluorescent dye information explained below), data obtained by measuring the fluorescence.
On the other hand, the spectral-type flow cytometer separates, without using an optical filter, fluorescence for each of wavelengths with a spectroscope configured from a prism or the like and measures light intensity for each of the wavelengths to acquire spectrum information (hereinafter referred to as measurement spectrum) of light radiated from a sample. The spectral-type flow cytometer separates the measurement spectrum for each of fluorescent dyes with processing called spectrum un-mixing (hereinafter simply referred to as un-mixing) using a fluorescence spectrum reference.
The un-mixing is a method of approximating the measurement spectrum obtained by the spectrum-type flow cytometer with a linear sum of fluorescence spectra for each of the fluorescent dyes to obtain fluorescent dye information for each of the fluorescent dyes from the measurement spectrum. The fluorescent dye information for each of the fluorescent dyes generated by the un-mixing is used for, for example, an analysis of a sample such as a cell.
Note that a fluorescence signal in the present explanation may be defined as a concept including both of the measurement spectrum and the fluorescent dye information.
In the present explanation, the fluorescence spectrum for each of the fluorescent dyes is referred to as fluorescence spectrum reference. The fluorescence spectrum reference is a spectrum obtained from a sample labeled with a single fluorescent dye and may include an autofluorescence spectrum obtained from an unlabeled sample. Here, the fluorescence spectrum reference may be acquired by the spectral-type flow cytometer or a catalog value or the like provided from a provider of a fluorescent dye may be used.
In the present embodiment, as the optical measuring device, the spectrum-type flow cytometer that can acquire both the measurement spectrum and the fluorescent dye information is illustrated. However, not only this, but a general flow cytometer that acquires fluorescent dye information can also be used.
Here, in a flow cytometer, there are a microchip scheme, a droplet scheme, a cuvette scheme, a flow cell scheme, and the like as a scheme for supplying a sample to an observation point (hereinafter referred to as spot.) on a flow path. In the present embodiment, a flow cytometer of the microchip scheme (partially, the flow cell scheme) is illustrated. However, the flow cytometer is not limited to this and may be a flow cytometer of another supply scheme.
As the flow cytometer, there are an analyzer type for the purpose of an analysis of a sample such as a cell and a cell sorter type for the purpose of an analysis to sorting of the sample. In the present embodiment, an analyzer-type flow cytometer is illustrated. However, the flow cytometer is not limited to this and may be a cell sorter-type flow cytometer.
Further, the present disclosure is not limited to the flow cytometer and may be various optical measuring devices that irradiate a sample with excitation light and analyze the sample based on fluorescence of the sample. For example, the present disclosure may be a microscope that acquires an image of a sample such as a tissue section on a slide.
1.2 Schematic Configuration Example of a Spectral-Type Flow Cytometer
As illustrated in
The sample is, for example, a biologically derived particle such as a cell, a microorganism, or a biologically relevant particle and includes a population of a plurality of biologically derived particles. The sample may be, for example, a biologically derived microparticle such as a cell such as an animal cell (for example, blood cells) or a plant cell, a bacterium such as Escherichia coli, a virus such as tobacco mosaic virus, a microorganism such as a fungus such as yeast, a biologically related particle constituting a cell such as a chromosome, a liposome, a mitochondria, an exosome, or various organelles (organelles), or a biologically related polymer such as a nucleic acid, a protein, a lipid, a sugar chain, or a complex thereof. Further, the sample widely includes synthetic particles such as latex particles, gel particles, and industrial particles. The industrial particles may be, for example, an organic or inorganic polymer material, a metal, or the like. Examples of the organic polymer material include polystyrene, styrene-divinylbenzene, and polymethyl methacrylate. Examples of the inorganic polymer material include glass, silica, and a magnetic material. Examples of the metal include gold colloid, aluminum, and the like. The shape of these particles is generally spherical but may be non-spherical. The size, the mass, and the like of the particles are not particularly limited.
Here, the sample is labeled (stained) with one or more fluorescent dyes. The labeling of the sample with the fluorescent dye can be performed by a known method. For example, when the sample is a cell, a fluorescently labeled antibody that selectively binds to an antigen present on the cell surface and a cell to be measured are mixed and the fluorescently labeled antibody is bound to the antigen on the cell surface, whereby the cell to be measured can be labeled with the fluorescent dye.
The fluorescently labeled antibody is an antibody to which a fluorescent dye is bound as a label. Specifically, the fluorescently labeled antibody may be a fluorescently labeled antibody obtained by binding a fluorescent dye, to which avidin is bound, to a biotin-labeled antibody by an avidin-biotin reaction. Alternatively, the fluorescently labeled antibody may be a fluorescently labeled antibody obtained by directly binding a fluorescent dye to an antibody. Note that, as the antibody, either a polyclonal antibody or a monoclonal antibody can be used. The fluorescent dye for labeling the sample is not particularly limited. At least one or more known dyes used for staining cells and the like can be used.
(Light Source Unit 100)
As illustrated in
In this configuration, the total reflection mirror 111, the dichroic mirrors 112 and 113, and the total reflection mirror 115 configure a waveguide optical system that guides excitation lights L1 to L3 emitted from excitation light sources 101 to 103 onto a predetermined optical path.
The objective lens 116 configures a condensing optical system that condenses the excitation lights L1 to L3, which are propagated on the predetermined optical path, on a spot 123a set on the flow path in the microchip 120. Note that the spot 123a is not limited to one spot, that is, the excitation lights L1 to L3 may be respectively condensed on different spots. The condensing positions of the respective excitation lights L1 to L3 do not need to coincide with the spot 123a and may be shifted back and forth on optical axes of the excitation lights L1 to L3.
In the example illustrated in
For example, the total reflection mirror 111 totally reflects the excitation light L1 emitted from the excitation light source 101 in a predetermined direction.
The dichroic mirror 112 is an optical element for matching or collimating the optical axis of the excitation light L1 reflected by the total reflection mirror 111 and the optical axis of the excitation light L2 emitted from the excitation light source 102. For example, the dichroic mirror 112 transmits the excitation light L1 reflected by the total reflection mirror 111 and reflects the excitation light L2 emitted from the excitation light source 102. For example, a dichroic mirror designed to transmit light having a wavelength of 637 nm and reflect light having a wavelength of 488 nm may be used as the dichroic mirror 112.
The dichroic mirror 113 is an optical element for matching or collimating the optical axes of the excitation lights L1 and L2 reflected from the dichroic mirror 112 and the optical axis of the excitation light L3 emitted from the excitation light source 103. For example, the dichroic mirror 113 transmits the excitation light L1 reflected by the total reflection mirror 111 and reflects the excitation light L3 emitted from the excitation light source 103. For example, a dichroic mirror designed to transmit light having a wavelength of 637 nm and light having a wavelength of 488 nm and reflect light having a wavelength of 405 nm may be used as the dichroic mirror 113.
The excitation lights L1 to L3 finally collected as light traveling in the same direction by the dichroic mirror 113 are totally reflected by the total reflection mirror 115 and made incident on the objective lens 116.
Note that a beam shaping unit for converting the excitation lights L1 to L3 into parallel light may be provided on an optical path from the excitation light sources 101 to 103 to the objective lens 116. The beam shaping unit may be configured by, for example, one or more lenses or mirrors.
The objective lens 116 condenses the excitation lights L1 to L3 made incident thereon on the predetermined spot 123a on a flow path in the microchip 120 explained below. When the spot 123a is irradiated with the excitation lights L1 to L3, which are pulsed light, while the sample is passing through the spot 123a, fluorescence is emitted from the sample and the excitation lights L1 to L3 are scattered by the sample to generate scattered lights.
In the present explanation, among the scattered lights generated from the sample in all directions, a component within a predetermined angle range traveling forward in a traveling direction of the excitation lights L1 to L3 is referred to as forward scattered light L12, a component within a predetermined angle range traveling backward in the traveling direction of the excitation lights L1 to L3 is referred to as backward scattered light, and a component in a direction deviating from the optical axes of the excitation lights L1 to L3 by more than a predetermined angle is referred to as sideward scattered light.
The objective lens 116 has, for example, a numerical aperture corresponding to approximately 30° to 40° with respect to the optical axis. A component within a predetermined angle range traveling forward in the traveling direction of the excitation lights L1 to L3 (hereinafter referred to as fluorescence L13) in the fluorescence emitted from the sample and the forward scattered light L12 are input to the demultiplexing optical system 150 arranged forward in the traveling direction of the excitation lights L1 to L3.
(Demultiplexing Optical System 150)
As illustrated in
The filter 151 disposed on the downstream side of the microchip 120 on the optical path of the excitation lights L1 to L3 selectively blocks, for example, a part (for example, the excitation lights L1 and L3) of the excitation lights L1 to L3 in the light L11 traveling to the downstream side of the microchip 120. Here, the light traveling to downstream side of the microchip 120 includes the excitation light L1 to L3 (including forward scattered lights thereof) and fluorescence L13 radiated from the sample in the microchip 120. Therefore, the filter 151 blocks components of the excitation lights L1 and L3 and transmits a component (this is referred to as forward scattered light L12) of the excitation light L2 and the fluorescence L13.
Note that the filter 151 is disposed to be inclined with respect to the optical axis of light L16. Consequently, return light of the light L16 reflected by the filter 151 is prevented from being made incident on the scattered light detection unit 130 and the like via the objective lens 116 and the like.
The forward scattered light L12 and the fluorescence L13 transmitted through the filter 151 are, for example, converted into collimated light by the collimator lens 152 and then demultiplexed in the dichroic mirror 153. For example, the dichroic mirror 153 reflects the forward scattered light L12 in the incident light and transmits the fluorescence L13. The forward scattered light L12 reflected by the dichroic mirror 153 is guided to the scattered light detection unit 130. The fluorescence L13 transmitted through the dichroic mirror 153 is guided to the fluorescence detection unit 140.
(Scattered Light Detection Unit 130)
The scattered light detection unit 130 includes, for example, a plurality of lenses 131, 133, and 135 that shape the beam cross section of the forward scattered light L12 reflected by the dichroic mirror 153 and a total reflection mirror 132, a diaphragm 137 that adjusts a light amount of the forward scattered light L12, a mask 134 that selectively transmits light (for example, a component of excitation light L2) having a specific wavelength in the forward scattered light L12, and a photodetector 136 that detects light transmitted through the mask 134 and the lens 135 and made incident.
The photodetector 136 is configured by, for example, a two-dimensional image sensor or a photodiode and detects an amount and the size of light that transmitted through the mask 134 and the lens 135 and made incident. A signal detected by the photodetector 136 is input to, for example, an information processing apparatus 2 explained below.
(Fluorescence Detection Unit 140)
The fluorescence detection unit 140 includes, for example, a spectroscopic optical system 141 that disperses the fluorescence L13 made incident thereon into dispersed light L14 for each of wavelengths and a photodetector 142 that detects a light amount of the dispersed light L14 for each of predetermined wavelength bands (also referred to as channel).
The spectroscopic optical system 141 includes, for example, one or more optical elements 141a such as a prism and a diffraction grating and disperses the fluorescence L13 made incident thereon into the dispersed light 7L14 emitted toward different angles for each of wavelengths.
The photodetector 142 may be configured from, for example, a plurality of light receiving units that receive light for each of channels. In that case, the plurality of light receiving units may be arrayed in one line or two or more lines in a spectroscopic direction by the spectroscopic optical system 141. For each of the light receiving units, for example, a photoelectric conversion element such as a photomultiplier tube can be used. However, a two-dimensional image sensor or the like can be used instead of the plurality of light receiving units.
A signal (a fluorescence signal) indicating a light amount of the fluorescence L13 for each of the channels detected by the photodetector 142 is input to, for example, the information processing apparatus 2 explained below.
1.3 Schematic Configuration Example of an Information Processing System
The information processing apparatus 2 is configured by, for example, a personal computer or a workstation and executes acquisition of data detected by the flow cytometer 1, partial analysis work on a sample to be analyzed, and the like. The information processing apparatus 2 can be equivalent to, for example, an example of an information processing unit in claims. Note that the information processing apparatus 2 may include a transmission unit for transmitting various data via a predetermined network and a reception unit for receiving various data from the predetermined network.
The Cloud 3 is connected to the information processing apparatus 2 via a predetermined network such as a LAN (Local Area Network), the Internet, or a mobile communication network and executes a detailed analysis of a sample based on data transferred from the information processing apparatus 2.
The terminal 4 is a terminal on a user side that is configured by, for example, a personal computer, a table terminal, or a smartphone and is in charge of a detailed analysis of a sample and is a terminal for the user to perform an analysis instruction to the Cloud 3, acquisition and inspection of an analysis result obtained by the Cloud 3, and the like.
1.4 About Un-Mixing
Here, un-mixing executed in the information processing apparatus 2 and/or the Cloud 3 in the present embodiment is explained more in detail.
Usually, the number of dimensions of the fluorescent dye information is smaller than the number of dimensions of the measurement spectrum. Therefore, a data amount can be reduced by converting the measurement spectrum into the fluorescent dye information with the un-mixing. Note that the number of dimensions is a value equivalent to the number of types of data. For example, in the measurement spectrum, the number of dimensions can be equivalent to the number of channels and, in the fluorescent dye information, the number of dimensions can be equivalent to the number of colors.
For example, a spectral-type cell analyzer ID7000 (registered trademark) manufactured by Sony Corporation (registered trademark) can convert a measurement spectrum of maximum 188 channels (that is, the number of dimensions=188) into fluorescent dye information of 44 colors (that is, the number of dimensions=44). However, the number of dimensions of the fluorescent dye information may be a value that changes according to the number of fluorescent reagents for labeling the sample.
1.5 About a Data Structure
Here, the data structure of each of the measurement spectrum and the fluorescent dye information is explained below. Note that, in the following explanation, a data structure of a measurement spectrum output from a flow cytometer 1 that generates a measurement spectrum of maximum 188 channels using seven excitation light sources (that is, seven types of excitation lights having different wavelengths; in
1.5.1 Data Structure Example of a Measurement Spectrum
Each sample data has a unit called a deck. Each deck corresponds to one excitation light source (that is, one excitation light). Therefore, in this example, one sample data has seven decks #1 to #7.
Each of the decks #1 to #7 is configured from maximum thirty-two channels ch1 to ch32. However, in each of the decks #1 to #7, since fluorescence does not appear in a channel equivalent to a wavelength shorter than the excitation light, not all the decks #1 to #7 have thirty-two channels. In this example, entire one sample data configures data of maximum 188 channels in total.
Each of the channels is configured from data of Area (area) and Height (height). However, in addition to these or instead of one of these, Width (width) may be used. Note that Area (area) may be a value calculated by Height (height)×Width (width) or a value obtained by multiplying the value by a predetermined coefficient.
Here, assuming that Area is 28 bits, Height is 20 bits, and the number of samples is 20 million, a data amount of sample data of maximum 188 channels is an enormous data amount of approximately 23 gigabytes.
1.5.2 Data Structure Example of Fluorescent Dye Information
In the present example, each of the sample data is configured from color information of maximum 44 colors #1 to #44 and each of the colors #1 to #44 includes data of Area (area) and Height (height). However, in addition to these or instead of one of these, Width (width) may be used.
Here, assuming that Area is 28 bits, Height is 20 bits, and the number of samples is 20 million, a data amount of the sample data of maximum 44 colors is also an enormous data amount of approximately 5 gigabytes.
Note that the data structures of the measurement spectrum and the fluorescent dye information explained above are merely examples. It is not essential that the measurement spectrum and the fluorescent dye information have the data structures explained above. That is, the present embodiment can be applied to various data if there is a group that holds a large amount of high-dimensional data as data to be transferred and/or data to be saved (In the present embodiment, the measurement spectrum and/or the fluorescent dye information) and a type of the high-dimensional data held by the group is data having a data structure smaller than the entire high-dimensional data. For example, the present embodiment can also be applied to fluorescent dye information acquired by a general flow cytometer in which an optical filter is used.
1.6 Sample Data Example
Subsequently, the sample data according to the present embodiment is explained with reference to several examples.
As illustrated in
Similarly, as illustrated in
1.7 Problem Concerning Sample Data
As explained above, in the flow cytometer 1 according to the present embodiment, the number of dimensions per one sample acquired by multi-coloring increases. Accordingly, the data of the sample group increases. In the flow cytometer 1, an analysis environment is cloudized for improvement of convenience and an advanced analysis (see
When the analysis environment is cloudized, it is necessary to transfer data to be analyzed (a fluorescence signal, that is, a measurement spectrum and/or fluorescent dye information) from the flow cytometer 1 side (the information processing apparatus 2) to the Cloud 3. However, since a data amount of the measurement spectrum and/or the fluorescent dye information is enormous as explained above, an enormous transfer time occurs if it is attempted to transfer these data to the Cloud 3.
After the data transfer, it is necessary to save the transferred data (fluorescence signal, that is, the measurement spectrum and/or the fluorescent dye information) on the Cloud 3 side. However, to save the data, it is necessary to secure an enormous capacity of a storage on the Cloud 3 side and storage cost required on the Cloud 3 side becomes enormous.
As explained above, when the flow cytometer 1 is multicolored, problems such as an increase in a data transfer time and an increase in storage cost occur because of an increase in data amount due to multidimensionalization.
Therefore, in the present embodiment, a method of reducing a data amount of data (for example, a measurement spectrum) output from the flow cytometer 1 or data (for example, fluorescent dye information) generated from the data, the data being data to be transferred or saved (a fluorescence signal, that is, the measurement spectrum and/or the fluorescent dye information), is explained with reference to several examples.
1.8 Data Reducing Method
When a data amount of the measurement spectrum or the fluorescent dye information obtained from the flow cytometer 1 is reduced, it is required to restore the data before the data amount reduction at the time of an analysis. Therefore, in the present embodiment, a data reducing method by reversible compression is proposed as a data amount reducing method. Several examples are explained below about a reversible compression method that can be used in the present embodiment.
1.8.1 Reduction of an Unnecessary Bit Representation
First, as a first reversible compression method, a method of compressing by reducing an unnecessary bit representation is illustrated. This method is a method of, when a numerical value represented as bits, reducing the number of unused bits and representing data with a smaller number of bits. For example, in a general computer, a structure (also referred to as a type) such as System.int32 is widely used.
Here, a dynamic range that can be represented by System.int32 is in a range of ‘−231’ to ‘231−1’, However, if a numerical value to be represented is present only up to 8 bits of ‘0’ to ‘255’, the dynamic range of System.int32 is not used up. Therefore, the unused bits are wasted.
Therefore, in such a case, data can be reduced from 32 bits to 8 bits by replacing a structure to be used with System.uint8. In such a method of reducing unused bits, it is possible to restore original data by adding bits corresponding to the reduced bits.
1.8.2 Lexicography (LZ Method)
As a second reversible compression method, a lexicographic compression method (an LZ method) is conceivable. The LZ method is a method of reducing an amount of data by representing data with a dictionary. An example of compression processing by the LZ method is illustrated in
In the LZ method, for example, when the input data ‘a b ab aa ba aab aaba aaba’ illustrated in
In such an LZ method, the original data (input data) can be restored by referring to the dictionary based on the output data.
1.8.3 Entropy Code
As a third reversible compression method, a compression method using an entropy code is conceivable. The compression method using the entropy code is a method of representing data having a high appearance frequency with a short bit length and representing data having a low appearance frequency with a long bit length to reduce data. An example of compression processing using an entropy code (a Huffman code) is illustrated in
As illustrated in
Such a compression method using the entropy code can also restore the data string represented by the entropy code to a data string represented by normal 2 bits based on a correspondence relation (
Note that, in the compression method using the entropy code, since the bit length is determined according to an appearance probability of data, in particular, when there is a bias in an appearance frequency, the data can be greatly reduced.
1.8.4 Statistical Prediction
As a fourth reversible compression method, a compression method using statistical prediction is conceivable. The compression method using statistical prediction is a method of reducing data by predicting data that appears next from observed data. For example, data in which ‘abcabc’ continues is conceived. When this data is compressed by an entropy code, since there is no bias of an appearance frequency of ‘a’, ‘b’, and ‘c’, a data reduction ratio cannot be increased. On the other hand, when the data is encoded using a probability that ‘b’ appears next to ‘a’, since a bias can be imparted, the data reduction ratio can be increased.
Note that two or more of the reversible compression methods illustrated above can be used in combination. For example, in a compression method such as zip, the lexicographic compression method (the LZ method) and the compression method using the entropy code are combined to compress data. In the present embodiment, various reversible compression methods and combinations thereof can be used without being limited to the reversible compression methods explained above.
1.9 Problem in Reversibly Compressing High-Dimensional Data
Subsequently, problems in reversibly compressing high-dimensional data such as a sample group using each of the reversible compression methods illustrated above are explained.
1.9.1 Case of a Reduction of an Unnecessary Bit Representation
In the method of compressing by reducing an unnecessary bit representation illustrated as the first reversible compression method, a dynamic range of a structure that saves data is calculated from a value that can be taken by a device. Therefore, there is a problem in that a reduction ratio cannot be effectively achieved except for extreme measurement data.
1.9.2 Case of the Lexicography (the LZ Method)
In the lexicographic compression method (the LZ method) illustrated as the second reversible compression method, it is difficult to grasp features of a spectrum shape using a dictionary for data that changes for each of samples such as a fluorescence spectrum. Therefore, there is a problem in that it is difficult to effectively increase a reduction ratio. Even if one sample data is registered in the dictionary, it is difficult to increase the reduction ratio because a spectral shape of other sample data less easily completely coincides. Similarly, even if sample data is finely divided and registered in the dictionary, it is difficult to increase the reduction ratio as expected because perfect coincidence is rare.
1.9.3 Case of the Entropy Code
In the compression method using the entropy code illustrated as the third reversible compression method, when data having a wide dynamic range such as 28 bits or 20 bits such as sample data is targeted, there are many variations of numerical values that can be taken and a bias of an appearance frequency less easily occurs. Therefore, there is a problem in that it is difficult to increase the reduction ratio.
1.9.4 Case of the Statistical Prediction
In the compression method using the statistical prediction illustrated as the fourth reversible compression method, it is difficult to predict the next value from an observed value for a spectral shape such as a fluorescence spectrum. Therefore, there is a problem in that it is difficult to generate a highly accurate prediction model and it is difficult to increase the reduction ratio.
As explained above, in the existing reversible compression method explained above, there is a problem in that effective data reduction cannot be performed on high-dimensional data such as a sample group.
1.10 Data Reducing Method
Therefore, in the present embodiment, the data reduction ratio can be effectively increased by using characteristics of a sample group.
As illustrated in
A reason for generating the differential data is to increase an effect of reducing compression by calculating a difference between samples having similar spectral shapes. An example of the generation of the differential data executed in step S01 of
As a reason for calculating the difference between the samples in the sample group, the characteristics of the sample groups are related.
As illustrated in
As a first characteristic, samples of the same type have similar feature values. In the example illustrated in
In this way, when the entire sample group is observed in units of samples, a redundant portion is present. Therefore, in the present embodiment, the data reduction ratio is increased by removing the redundant portion using the difference.
1.11 Data Reducing Method
Next, a data reducing method according to the present embodiment is explained below with reference to specific examples. Note that, in the following explanation, a data reducing method in the compression and the decompression illustrated in
1.11.1 Compression/Decompression of Data
In the data compression/decompression illustrated in
1.11.2 Format of Differential Data
As illustrated in
In the data region R2, for example, a difference value for each dimension (channel) calculated by calculating a difference between sample data for each of dimensions (channels) is stored.
In the header region R1, an index for specifying the samples, the difference of which is calculated, is stored. Note that, when a method of compressing by reducing unnecessary bit representation is used as the reversible compression method, information for specifying a most significant bit (MSB) in a difference value of each of dimensions is also stored in the header region R1.
An index of a similar sample in the header region R1 is used when the sample data of the sample #1 is restored to original data. Note that, when a sample similar to the sample #1 is not found from the sample group, instead of the index of the similar sample, a specific numerical value (For example, ‘0’) allocated in advance as a value indicating that a similar sample is absent may be stored in the header region R1.
According to such a data format, although a data amount for the header region R1 is increased compared with original data, a data amount stored in the data region R2 can be significantly reduced. Therefore, as a result, a data amount can be significantly reduced compared with the original data.
1.11.3 Generation/Restoration Method for Differential Data
Subsequently, a generation method and a restoration method for differential data according to the present embodiment are explained. Note that, In the following explanation, a similar sample determination method and a similar sample selection method are specifically described.
1.11.3.1 Determination method for a similar sample
First, a method (a similarity determination method) for determining which sample is most similar to a certain sample when a plurality of samples are given is explained. As explained above, since the sample data is the multidimensional data, in general, similarity between two samples can be determined using a Euclidean distance, cosine similarity, or the like. However, in the present embodiment, since a difference value between samples determined as being similar is data to be compressed, compression efficiency can be changed according to in what kind of method similarity between two samples is determined, in other words, by appropriately selecting a similarity determination method. This means that the compression efficiency can be controlled by selecting a similarity determination method and designing a difference value. Therefore, in the present embodiment, in addition to the general similarity determination method (a Euclidean distance, cosine similarity, or the like) explained above, the following two methods are illustrated.
1.11.3.1.1 First Similarity Determination Method
As a first similarity determination method, a method of obtaining a difference having a narrow dynamic range is illustrated.
As illustrated in
Subsequently, a most significant bit (MSB) is specified about a data set (difference values #1 to #188) of the difference values calculated for the samples. In the example illustrated in
Next, a sample of sample data used in calculating a data set including a minimum MSB among maximum MSB specified for respective data sets is specified as a similar sample. In the example illustrated in
Note that, when there is a plurality of data sets including the minimum MSB, for example, a sample having the smallest index attached to the sample may be selected.
As explained above, by determining similarity between samples to be a combination of samples, a MSB of a difference value of which is the minimum, for example, it is possible to exert, at the maximum, compression efficiency of a method of compressing by reducing unnecessary bit representation.
Note that, when the differential data is compressed by the method of compressing by reducing unnecessary bit representation, information for specifying MSB of the difference values may be stored in the header region R1.
1.11.3.1.2 Second Similarity Determination Method
As a second similarity determination method, a method of obtaining a difference having high entropy is illustrated.
In the second similarity determination method, a method of generating a difference between samples may be similar to the first similarity determination method. Therefore, detailed explanation of the method is omitted here.
As illustrated in
In
Next, in the second similarity determination method, appearance frequencies of the difference values #1 to #188 are specified for each of the difference AB and the difference AC. A total value of the specified appearance frequencies is calculated for each of the difference AB and the difference AC. A sample of sample data used to create a data set having a larger calculated total value is specified as a similar sample. In the example illustrated in
This is explained using another example. For example, when five samples A, B, C, X, and Y are present in a sample group and a similar sample of the sample A is found from the five samples after the sample X and the sample Y are determined as being similar, an appearance frequency specified from a difference value between the sample X and the sample Y is stored in the difference value appearance frequency management database 301. When a similar sample similar to the sample A is found in this state, a sum of appearance frequencies af1 to af188 of difference values in data sets of the respective differences AB, AC, AX, and AY is calculated. A sample of a data set having a largest total value is specified as a similar sample similar to the sample A.
The two similarity determination methods are illustrated above. However, in the present embodiment, a similar sample does not always have to be determined. When a preferable value is obtained in a MSB or a total appearance frequency in original data than in a data set of difference values, the original data may be directly used as data to be compressed without calculating a difference. In this case, information indicating that data in the data region R2 is the original data may be stored in the header region R1 instead of an index indicating a similar sample.
1.11.3.2 Selection Method for a Similar Sample
Next, a method of selecting a similar sample is explained. As a method of selecting a similar sample, for example, a method using general clustering and a method using a dictionary can be illustrated.
1.11.3.2.1 First Similar Sample Selection Method
The method using clustering illustrated as a first similar sample selection method is a method of selecting representative samples from representative points of clusters and representing samples with differences from the representative samples.
As illustrated in
In the first similar sample selection method, samples other than the representative samples are represented by differences from the representative samples. In the example illustrated in
1.11.3.2.2 Second Similar Sample Selection Method
The method using a dictionary exemplified as a second similar sample selection method is a method of constructing a dictionary while reading a sample group from the top and generating a difference using the dictionary.
In the second similar sample selection method, a dictionary in an initial state may be in an empty state, that is, in a state where nothing is registered. In the second similar sample selection method, as illustrated in
Next, as illustrated in
Next, as illustrated in
Next, as illustrated in
Thereafter, by repeatedly executing the same operation, differential data including a reference dictionary number in a header is finally generated for all samples as illustrated in
1.12 Summary
As explained above, according to the present embodiment, since data (a sample group) to be compressed can be compressed according to characteristics of the data, it is possible to reduce a data transfer time or prevent an increase in the data transfer time and reduce storage cost necessary for saving of the data or prevent an increase in the storage cost.
For example, even when a sample group acquired from the multicolored next-generation flow cytometer 1 is transferred from the information processing apparatus 2 to the Cloud 3, it is possible to reduce a transfer time of the sample group or prevent an increase in the transfer time. By applying the data reducing method explained above to a sample group to be stored in the Cloud 3, it is also possible to achieve a reduction in storage cost necessary for saving of the sample group or prevent an increase in the storage cost.
Subsequently, a second embodiment of the present disclosure is explained. Note that, since configurations and operations of a flow cytometer and an information processing system according to the present embodiment may be similar to those of the embodiment explained above, detailed explanation thereof is omitted here.
2.1 Mutual Use of Similarity Information Obtained in Fluorescence Spectrum/Fluorescent Dye Information
The data to be compressed in the first embodiment is the fluorescence spectrum and/or the fluorescent dye information. Therefore, when both the fluorescence spectrum and the fluorescent dye information are compressed, it can be necessary to execute generation of differential data (equivalent to step S01 in
However, the fluorescence spectrum and the fluorescent dye information to be compressed are a fluorescence spectrum measured from the same sample group and fluorescent dye information generated from the fluorescence spectrum. Therefore, samples determined to have high similarity in the fluorescence spectrum are extremely highly likely to be determined to have high similarity in the fluorescent dye information as well. This is because the number of dimensions is different between the fluorescence spectrum and the fluorescent dye information but types of samples represented the fluorescence spectrum and the fluorescent dye information are the same.
Under such conditions, it is considered that information concerning similarity (hereinafter referred to as similarity information) obtained in generation of differential data (S01) in data compression of one of the fluorescence spectrum and the fluorescent dye information can be used in data compression of the other (mutual use of similarity information).
Therefore, in the present embodiment, a result (similarity information) obtained in similar sample determination processing in the generation of one differential data (S01) in the compression processing of each of the fluorescence spectrum and the fluorescent dye information is used in the generation of the other differential data (S01) to omit similar sample determination processing in the generation of the other differential data (S01). Consequently, since the other compression processing is accelerated, the entire compression processing can be accelerated.
The mutual use of the similarity information can be realized, for example, by managing similarity information (information indicating which sample is similar) for each of samples generated in a process of one compression processing in a database or the like and referring to the similarity information managed in the database or the like in the other compression processing.
Other configurations, operations, and effects may be similar to those in the embodiments explained above. Therefore, detailed explanation thereof is omitted here.
Subsequently, a third embodiment of the present disclosure is explained. Note that, since configurations and operations of a flow cytometer and an information processing system according to the present embodiment may be similar to those of the embodiment explained above, detailed explanation thereof is omitted here.
3.1 Acceleration of Cloud Transfer by Division Compression/Decoding
As illustrated in (a) of
On the other hand, as illustrated in (b) of
The information processing apparatus 2 executes the compression in units of blocks and transfers (transmits→receives) the compressed data to the Cloud 3 in order from the block for which the compression is completed. The Cloud 3 sequentially restores compressed data received in units of blocks from the information processing apparatus 2.
With such pipelining, compression processing (for example, compression #2 and #3) for the next block is hidden behind transfer processing (for example, transmission #1 and #2 and reception #1) of the preceding block and restoration processing (for example, restoration #1 and #2) for the preceding block is hidden behind transfer processing (for example, transmission #3 and reception #2 and #3) for the next block. Therefore, a processing time from the compression to the restoration for all the sample data can be greatly reduced.
Note that, when data to be compressed is divided into smaller block units, there is a possibility that a data reduction ratio decreases. However, whereas the number of samples is several ten thousand to twenty million or more as in the sample data illustrated in the present embodiment, when the number of types of samples is approximately several hundred, even if a sample group is divided into approximately several thousand to several hundred thousand blocks, a sufficient data reduction ratio can be realized in the blocks.
Other configurations, operations, and effects may be similar to those in the embodiments explained above. Therefore, detailed explanation thereof is omitted here.
The preferred embodiments of the present disclosure are explained in detail above with reference to the accompanying drawings. However, the technical scope of the present disclosure is not limited to such examples. It is evident that those having the ordinary knowledge in the technical field of the present disclosure can arrive at various alterations or corrections within the category of the technical idea described in claims. It is understood that these alterations and corrections naturally belong to the technical scope of the present disclosure.
The effects described in this specification are only explanatory or illustrative and are not limiting. That is, the technique according to the present disclosure can achieve other effects obvious for those skilled in the art from the description of this specification together with or instead of the effects.
Note that the following configurations also belong to the technical scope of the present disclosure.
(1)
An information processing system comprising:
an excitation light source that irradiates a respective plurality of samples belonging to a sample group with excitation light;
a measurement unit that measures fluorescence generated by irradiation of the sample with the excitation light; and
an information processing unit that generates differential data based on a difference between similar fluorescence signals among fluorescence signals based on the fluorescence measured for the respective samples.
(2)
The information processing system according to (1), wherein
the information processing unit sets, as the similar fluorescence signal, a combination having a smallest calculated difference among combinations of two fluorescence signals selected from the plurality of fluorescence signals.
(3)
The information processing system according to (1) or (2), wherein
the fluorescence signal includes a plurality of dimensions, and
the information processing unit sets, as the similar fluorescence signal, a combination having a smallest maximum value of a difference calculated between corresponding dimensions among combinations of two fluorescence signals selected from the plurality of fluorescence signals.
(4)
The information processing system according to any one of (1) to (3), wherein
the information processing unit sets, as the similar fluorescence signal, a combination having a highest appearance frequency of a calculated difference among combinations of two fluorescence signals selected from the plurality of fluorescence signals.
(5)
The information processing system according to any one of (1) to (4), wherein
the fluorescence signal includes a plurality of dimensions, and
the information processing unit sets, as the similar fluorescence signal, a combination having a largest total of appearance frequencies of differences calculated between corresponding dimensions among combinations of two fluorescence signals selected from the plurality of fluorescence signals.
(6)
The information processing system according to any one of (1) to (5), wherein
the information processing unit specifies the similar fluorescence signal using at least one of a Euclidean distance and cosine similarity.
(7)
The information processing system according to any one of (1) to (6), wherein
the differential data includes first information for specifying a combination of the similar fluorescence signals used to calculate the difference.
(8)
The information processing system according to (7), wherein,
when a fluorescence signal similar to a first fluorescence signal among the plurality of fluorescence signals is absent in the sample group, the differential data includes predetermined second information instead of the first information.
(9)
The information processing system according to any one of (1) to (8), wherein
the information processing unit generates compressed data by compressing the differential data.
(10)
The information processing system according to (9), wherein
the information processing unit compresses the differential data using a reversible compression method.
(11)
The information processing system according to (9) or (10), wherein
the information processing unit compresses the differential data using at least one of a method of compressing by reducing unnecessary bit representation, a lexicographic compression method, a compression method using an entropy code, and a compression method using statistical prediction.
(12)
The information processing system according to (10), wherein
the differential data includes information for specifying a most significant bit of the difference, and
the information processing unit compresses the differential data using the reversible compression method including a method of compressing by reducing unnecessary bit representation.
(13)
The information processing system according to any one of (1) to (12), wherein
the fluorescence signal includes first spectrum information of light generated by irradiating the sample with light.
(14)
The information processing system according to any one of (1) to (13), wherein
the fluorescence signal includes a fluorescent dye information of the fluorescent dye obtained from spectrum information of light generated by irradiating the sample labeled with the fluorescent dye with excitation light.
(15)
The information processing system according to any one of (1) to (14), wherein
the fluorescence signal includes spectrum information of light generated by irradiating the sample labeled with a fluorescent dye with excitation light and fluorescent dye information of the fluorescent dye obtained from the spectrum information, and
the information processing unit specifies the similar fluorescent dye information based on a combination of the samples of the respective similar spectrum information specified when a difference between the similar spectrum information is calculated and calculates a difference between the specified similar fluorescent dye information.
(16)
The information processing system according to any one of (9) to (12), further comprising
a transmission unit that transmits the compressed data generated by the information processing unit via a predetermined network.
(17)
The information processing system according to any one of (9) to (12), further comprising
a storage unit that stores the compressed data generated by the information processing unit.
(18)
The information processing system according to any one of (1) to (17), further comprising:
a decompression unit that decompresses the compressed data of the difference generated by the information processing unit; and
a restoration unit that restores the plurality of fluorescence signals based on the difference decompressed by the decompression unit.
(19)
An information processing apparatus comprising:
a difference calculation unit that calculates a difference between similar fluorescence signals among fluorescence signals based on fluorescence generated by irradiating of a respective plurality of samples belonging to a sample group with excitation light; and a compression unit that compresses the difference.
(20)
An information processing method comprising:
calculating a difference between similar fluorescence signals among fluorescence signals based on fluorescence generated by irradiating a respective plurality of samples belonging to a sample group with excitation light; and
compressing the difference.
Number | Date | Country | Kind |
---|---|---|---|
2020-056230 | Mar 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/006046 | 2/18/2021 | WO |