The invention relates to a method for the storage ensuring fast retrieval and efficient transfer of interrelated, high-volume, mainly 3D, information, whereby the input information is assigned to two files, so that image data information required for histological reconstruction is stored in one file and information regarding predetermined parameters of the image data is stored in the other file. Another subject of the present invention is a computer program product containing a computer-readable storage medium including a computer-readable programme code.
Analysis of biopsy specimens is most important in medical diagnostics. Histological examinations are performed by removing a tissue sample from the body of the patient and making thin cuttings out of it. The excisions are placed on a glass plate—so-called (microscopic) slide—, and then painted and subjected to microscopic examination. It is a disadvantage of the said method that, of the 3D tissue, only a layer of a thickness of 2 to 5 μm can be examined. As for the known 3D medical diagnostic instruments, examinations by computer tomography (CT) and magnetic resonance investigation (MRI) have the disadvantage of offering a much smaller resolution capacity than the microscope, and neither can the special painting methods of histological examinations be used.
Instead of making a few cuttings out of the tissue sample, the entire sample or a significant part of it is subjected to histotomic excision. The known methods involve the fast digitalisation of all biopsy specimens and a reconstruction of the 3D image out of the cuttings. The methods known to those skilled in the art furthermore allow to view the reconstructed samples on networked workstations on the intranet or the Internet.
The useful (net) surface area of a slide is 25×50 mm. At a resolution of 0,3 μm, what could be called the standard ratio, this means approximately 83 000×166 000 pixels. The systems in use to date are not prepared to display, manipulate and transmit images of this size. If each pixel of an image of this size is represented on 3 bytes, as is the standard practice, at a compression rate of 1:10, the size of the resulting file would still be 3,8 GB.
Another problem is that doctors use object-glasses of different magnifications to examine tissue samples. The digital simulation of the use of the microscope requires the down-scaling of images made with high-resolution lenses. For example, in order to display an image made by an object-glass of 20 mm as one made by an object-glass of 4 mm, it is to be reduced in the directions x and y at a rate of five to one, that is, the entire system must handle a file that is 25 times bigger than the one actually needed. In case of Internet-based use, this may imply such deceleration as is unacceptable for the user. The digitized slide may also be used as an input for automatic measurement algorithms. Such algorithms can perform a series of measurements that are hardly or not at all feasible by traditional methods. The relevant measurement results, however, must also be displayed on the screen, but this raises the same problems as displaying the images themselves. Even if the system displays a small section only of the slide, it must still manage the entire data set. An alternative solution is for the system to select the measurement data associated with the given segment, but this is a slow process requiring high-capacity machines.
Most image formats store a maximum of three colour channels. The known microscopes, on the other hand, allow the fitting of fluorescent filters over the lens, and one sample can be digitized at the corresponding number of wavelengths, which increases the resulting data quantity even further.
U.S. Pat. No. 6,272,235 describes a method and equipment for the purpose of creating a virtual microscope slide. The method consists of the digitalisation of the slide, that is, its reading into the memory and, subsequently, the arrangement of the input frames so that said frames can be viewed, firstly, conveniently, even without microscope, and, secondly, they can be transferred easily to one or several remote locations, to be viewed there. The method consists of making several original microscope images of the slide, digitizing them at small magnification, and then storing said digitized frames so that they be inter-linked in a standardised way and hence produce a full-scale virtual image of the slide read at small magnification. Furthermore, several original microscope images are made at higher magnifications, too, which are then digitized and stored again so that the digitized images make up a single, continuous, coherent image and hence show the high-resolution virtual macro image of the slide. In connection with these virtual, digitized, macro and micro images, the method also creates a data structure containing the mapping co-ordinates of the individual images as well. In order to be able to view the images that have been read, digitized, stored and occasionally also compiled, the solution also proposes a general browsing/viewing programme in the data structure allowing remote users to display the images compiled of the frames and to manipulate, if need be, the image displayed on their own, remote, screen. In order to ensure the reasonable use of the imaging and data structure outlined above away from their place of origin, by transfer through data transmission and communication channels, often characterised by low band-widths, to remote locations, the data quantity must inevitably be compressed to a large extent which is a “lossy” process in most cases, although such data loss is contrary to the demand that the transferred images be suitable for detailed analysis at remote locations if needed. The interactive general browser developed by this solution is partly meant to fulfil this task, allowing as it does the person viewing the frames to navigate up and down, left and right, and look at those areas which are of interest to him/her. One typical feature of the browser is the marker provided for the end-user in the macro image, telling the exact location of the actual micro image, and helping the user select the areas of interest to him/her from the macro image and then subjecting them to more detailed examination through higher-resolution digitized images.
Despite the advantage of reducing the storage capacity demand outlined in the introduction, this solution has the deficiency that navigation in the images created this way is still time-consuming, a phenomenon manifesting itself not only at remote sites, but occasionally even at the server used to store the data concerned.
Therefore, an objective of the present invention is to work out a method whereby, while retaining the system- and user-specific advantages of the solutions known so far, it is possible to ensure fast and reliable navigation in a very large database, whether it is used on-site or remotely, e.g. through the Internet, so that data storage, retrieval and display should not require the costly upgrading of the available instrument pool or the deployment of new equipment, a costly move in any case.
The present invention is based on the recognition that, in addition to the data sets known so far, another, auxiliary, data set must be compiled, exclusively for the purpose of promoting the tracking and identification of data segments selected stochastically, at random, and offering users a search facility in a transparent and easily manageable form.
The preferred embodiments of the proposed method are described under the Claims below.
Our proposed solution to the objective set above is based on a method for the storage ensuring fast retrieval and efficient transfer of interrelated, high-volume, mainly 3D, information, whereby the input information is assigned to two files, so that image data information required for histological reconstruction is stored in one file, and information regarding predetermined parameters of the image data is stored in the other file, whereas in the course of the storing of the received information, associated information regarding the hierarchical relationship of individual image data is also stored in a further file.
According to one advantageous implementation of the proposed method slides digitized as information are being used.
It is also advantageous according to the present proposal to store the files of each digitized slide are in a storage unit linked to the digitized slide.
It is furthermore advantageous according to the present proposal that a directory is used in each case as storage unit.
According to another advantageous implementation of the proposed method the digitized slides are being stored in separate storage units.
According to another advantageous implementation of the proposed method a directory is used in each case as storage unit.
It is furthermore advantageous according to the present proposal that information on versions is also stored in the files.
Our proposed solution to the objective set above is based further on a computer program product for the storage ensuring fast retrieval and efficient transfer of interrelated, high-volume, mainly 3D, information, whereby the input information is assigned to two files, so that image data information required for histological reconstruction is stored in one file, and information regarding predetermined parameters of the image data is stored in the other file, containing a computer-readable storage medium including a computer-readable programme code, further a software tool to store associated information regarding the hierarchical relationship of individual image data, while storing input information.
In the following, the proposed solution will be described in more detail with reference to the attached drawings showing an exemplary embodiment of the data structure created by the proposed method, whereas
A data storage format developed specially for this task is used to solve the relevant problems.
The term “channel” means either a fluorescent channel, defined by the fluorescent filter block used to record the channel, or by the traditional trough-light illumination.
Given the fact that each digitized slide is associated with several files, each digitized slide is stored in a separate directory of the storage system.
The principle of storage is as follows. One file will always contain the fields of view associated with a given channel and magnification. To speed up display, one field of view is reduced to several fields of smaller magnification already in the digitalisation phase, by always halving the given image in the directions X and Y belonging to coordinates. For example, if an image of 1024×768 pixels is recorded by 20-fold magnification of a given field of view, the reduced images consisting of 512×384, 256×192, 128×96, 32×24, 16×12 and 4×3 pixels, respectively, will be generated as well. Images smaller than that would already result in pixel expressed in fractions, and, anyway, this minimum magnification is sufficient from the point of view of the application, too. The images in question are stored in separate files. One file includes not only one image, but as many of the images of the given size belonging to the given channel as possible. Every file has a pre-defined upper limit. If no more images fit into the file, a new file is opened and the remaining images are written into that. It is expedient to define the upper limit of the files so that they can be adapted to a data carrier of some kind. If, for example, digitized slides are to be archived on CD, and one digitized slide is bigger than the storage capacity of a CD, no problem will be encountered if the maximum limit is set at the maximum storage capacity of the CD. Certain hard-disc-based file systems also have upper file size limits, hence it will be easy to match this solution to these, too.
Faster display is again the main reason why images of different magnification are assigned to separate files. Generally, at a given moment, the end-user will view the slide on the screen using a single magnification only. Thanks to the sequential recording of the fields of view, images associated with juxtaposed fields of view will be sequential in the file, too, allowing optimum management of the files and their display. If images of all magnifications associated with a given channel were stored in one file, then, for example, to display a small magnification, instead of sequential reading, after each field of view, the operating system and a hard disk serving as storage medium would have to find in the file the next field of view. From another aspect, given the fact that the size of the fields of view diminishes exponentially, the use of an infinitesimal part of the file would require moving about in the entire file.
A natural approach adopted in similar systems is to store one image in one file, as in the case of digital photo cameras. This method has the drawback that, smaller magnifications included, ten thousands of files are needed for storing a single slide, and the management of these is not optimal from any point of view.
The images can be stored in the file in any format whatsoever. In the current preferred embodiment of the method, JPEG and JPEG2000 formats can be used for the purpose of lossy compression, and TIFF, BMP, PNG or JPEG2000 formats for the purpose of lossless storage.
Current IT systems can manage images of several gigabytes hardly or not at all, and neither is their operation optimal in the case of very small images. For example, the header of the commonly used image formats describing the features of the image in the file is usually much bigger than the image information content of an image of 4×3 pixels. For the image to be displayed on the screen, the relevant application must issue a system call. While the operating system can display an image of 1024×768 pixels in one step, in order to fill the same image area by images of 3×4 pixels each, the image display function must be called 65 536 times, which makes the process much slower. Therefore, in the course of digitalisation, the image of an optical field is not only reduced, but several small images are also joined into one image. To stick to the previous example: the original image of 1024×768 pixels is treated as one image, while for smaller magnifications, four of the 512×384 pixel ones are linked and treated as one image, and so on.
The slide format stores the graphic outcomes of automatic evaluations in the form of a new channel in every respect. Graphic outcome may mean, for example, that the application automatically identifies cells and glands in the tissue sample, and creates a mask for these. A gland mask, for example, will cover the given gland completely in the image.
Numeric evaluation data are also stored by field of view. If the evaluation application, for example, measured the surface of every gland in square micrometer, this data set will be distributed so that data associated with one field of view form one group, and this data group will be entered in the file as one unit, as if it were one image. As reduction cannot be interpreted for numeric data in the same way as for images, channels containing such data will consist of a single layer.
As compressed images may differ in size and hence it is impossible to calculate in advance where exactly an image associated with a given view-field will be located in the file, to ensure fast image access, an index file is created, specifying for an image of a given magnification in a given channel of a given view-field the byte it starts from within the file and its length in bytes.
In what follows, we shall describe a preferred embodiment of the files storing the digitized slide data created by the method according to the present invention. The method creates three types of files:
1) a central description file designed to include all information associated with the digitized slide or links pointing to the files including them;
2) a table file describing the hierarchy of the frames making up the digitized slide; and
3) a data file containing the image data of the frames making up the digitized slide.
In what follows, we shall describe exemplary structures of the above three file types in more detail.
Structure of a Data File
1) The inner structure of the header of the data file outlined in
2) The image data of the data file outlined in
Structure of a Table File
1) The inner structure of the header of the table file outlined in
2) Table of hierarchical (tree-structured) layers:
The table includes the pointers to the images in a two-level hierarchy. At the upper level, there is a multi-dimensional table (hierarchy table) including pointers pointing to the two-dimension tables (layer-level table) at the lower level.
Hence in the example shown here, the size of the entire record shown in
3) Table of non-hierarchical layers:
The table includes the pointers to the images in a two-level hierarchy. At the upper level, there is an array (hierarchy table) including pointers pointing to the arrays (layer-level tables) at the lower level.
Central Description File:
The function of this file is to store all sorts of information relating to the digitized slide. The file often includes links only to other files which actually store the data. The general rule in this respect is that this file contains exclusively a description of the physical properties of the digitized slide, whereas image data and any related other data (e.g. data of the patient whose digitized slide it is etc.) are only recorded in the file in the form of links.
As for the structure of the file, it is an INI file. Its structure includes the following sections:
1) General physical information (GENERAL section): the majority of the physical data obtained from the digitized slide is to be found here, including the following:
IMAGENUMBER_X, IMAGENUMBER_Y: expresses the extension of the levels measured in frames at each level of the digitized slide (this extension refers exclusively to the hierarchical layers—non-hierarchical layers have individually definable (and alterable) sizes.
2) Table file hierarchy information (HIERARCHICAL): this section contains the description of the hierarchical structure of the digitized slide.
Characteristic features of layers described by HIER . . . parameters:
This structures include the element numbers of the dimensions of an array as well as the name of the physical parameters stored in the said dimensions in the layers. Currently, the following physical parameters are conceivable:
At the given parameter levels, each parameter value may be associated with a separate section including the parameters relating to the given parameter values of the given digitized slide level. A good example is the parameter SLIDE ZOOM LEVEL: every one of its possible values is associated with a set of values of the following physical features:
Characteristic features of layers described by NONHIER . . . parameters:
The layers described this way are fully independent of one another; each layer defines the number and name of levels to be found in it. The parameters of the individual levels, on the other hand, specify the extension of the levels (in the directions X and Y). Currently, the following non-hierarchical layers are conceivable:
At the given parameter levels, each parameter value may be associated with a separate section including the parameters relating to the given parameter values of the given digitized slide level. The extension of the dimensions of the hierarchy tables is defined in the table file by the hierarchy and, within that, the extension of the dimensions, recorded in this section.
3) Data file information (‘DATAFILE’): this section includes the number and name of data files containing image data:
4) Link information (‘LINKS’): this section includes links to attached information not to be detailed here:
5) Further sections, and parameters within them, the name of which is defined by the records provided in the HIERARCHICAL section, to be described in more detail there.
The following table shows an exemplary embodiment of the addressing of the hierarchy table of the hierarchical layer levels.
The dec numeric value can be used to address the hierarchy table of the table file:
dec=idx0+n0*idx1+n0*idx2+ . . . +n0*n1*n2* . . . *n_HIER_COUNT*idx_HIER_COUNT
If idx_i is not used, then d_i is used instead.
The following table on the next page shows a specific exemplary embodiment of the central description file modelled on the structure outlined above:
Number | Date | Country | Kind |
---|---|---|---|
P0401870 | Sep 2004 | HU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2005/050344 | 1/27/2005 | WO | 00 | 11/7/2007 |