1. Field
Embodiments of the present invention are related to classifying data, for example into categories or classes. More specifically, certain embodiments relate to classifying data, such as events from biological sample analyzers including flow cytometer instruments, based on thresholds or gates. Certain embodiments apply to using at least partially parallel processing to perform the processing of the large amounts of data, including the at least partial parallel processing of captured data such as captured flow cytometry data.
2. Related Art
As hardware capabilities increase, researchers, statisticians, diagnosticians, clinicians, and others are demanding more sophisticated applications software that processes larger and larger amounts of data as quickly as possible. For example, users may interact with multidimensional graphs showing terabytes of data to aid data analysis. These users demand rapidly responding user interfaces and fast data displays because slow response times hinder data analysis speed and productivity.
In a specific example of a system which generates large amounts of data, consider a biological sample analyzer, such as a flow cytometer instrument. Flow cytometers are widely used for clinical and research use. A biological mixture may comprise a fluid medium carrying a biological sample such as a plurality of discrete biological particles, e.g., cells, suspended therein. Biological samples can include blood samples or other types of samples having a heterogeneous population of cells. Information obtained from the biological particles is often used for clinical diagnostics and/or data analyses.
Flow cytometry is a technology that is used to simultaneously measure and analyze multiple parameters (e.g., physical characteristics or dimensions) of particles, such as cells. Flow cytometry includes techniques for analyzing multiple parameters or dimensions of samples. Parameters (e.g., characteristics, properties, and dimensions) measurable by flow cytometry include cellular size, granularity, internal complexity, fluorescence intensity, and other features. Some parameters may be measurable after adding a marker. For example, fluorochrome-conjugated antibodies may emit photons of light in an identifiable spectrum upon excitation of the fluorochrome. Detectors are used to detect forward scatter, side scatter, fluorescence, etc. in order to measure various cellular properties. Cellular parameters identified by flow cytometer instruments can then be used to analyze, identify, and/or sort cells.
In traditional flow cytometry systems, a flow cytometer instrument is a hardware device used to pass a plurality of cells singularly through a beam of radiation formed by a light source, such as a laser beam. A flow cytometer instrument captures light that emerges from interaction(s) with each of the plurality of cells as each cell passes through the beam of radiation.
Currently available flow cytometry systems may include three main systems, i.e., a fluidic system, an optical system, and an electronics system. The fluidic system may be used to transport the particles in a fluid stream past the laser beam. The optical system may include the laser that illuminates the individual particles in the fluid stream, optical filters that filter the light before or after interacting with the fluid stream, and detectors (e.g., having photomultiplier tubes) that detect the light beam after the light passes through the fluid stream to detect, for example, fluorescence and/or scatter. The electronic system may be used to process the signal generated by the photomultiplier tubes or other detectors, convert those signals, if necessary, into digital form, store the digital signal and/or other identification information for the cells, and generate control signals for controlling the sorting of particles. The data point having the parameters corresponding to the measurement of one cell or other particle is termed an event. In traditional flow cytometry systems, a computer system converts signals received from detectors such as light detectors into digital data that is analyzed.
Flow cytometry systems capture large numbers of events from passing thousands of cells per second through the laser beam. Captured flow cytometry data is stored so that statistical analysis can subsequently be performed on the data. Typically, flow cytometers operate at high speeds and collect large amounts of data. Statistical analysis of the data can be performed by a computer system running software that generates reports on the characteristics (i.e., dimensions) of the cells, such as cellular size, complexity, phenotype, and health. Polychromatic flow cytometry refers to methods to analyze and display complex multi-parameter data from a flow cytometer instrument. Polychromatic flow cytometry data may include many parameters. Many conventional flow cytometry systems depict this data as series of graphs such as dot plots, tree plots, and/or histograms to aid operator analysis of the data.
In the case of histograms and tree plots, each event may be classified or “classed” according to certain attributes of the event. Because of the large number of events typically processed, the classification process may take a significant amount of time, slowing analysis and frustrating users.
Accordingly, what are needed are methods and systems that allow for the rapid classification of data.
Methods, systems, and computer program products for classifying data using a collision free hash table are disclosed. In an embodiment, a respective category index for each of a plurality of categories is determined. A respective class counter for each of the plurality of categories based on the respective category index is generated. A respective event index for each of a plurality of events associated with captured data based on respective first event values are determined substantially simultaneously in parallel. Selected ones of the respective class counters based on the respective event indices are incremented substantially simultaneously in parallel.
In another embodiment, an apparatus includes a first memory, a second memory, and a plurality of processors configured to share the second memory. In one example, the first and second memory may be partitioned portions of a single memory device. Each processor is further configured to control the display of captured data by determining a respective category index for each of a plurality of categories, generating a respective class counter for each of the plurality of categories based on the respective category index, determining, substantially simultaneously in parallel, a respective event index for each of a plurality of events associated with captured data based on respective first event values; and incrementing, substantially simultaneously in parallel, selected ones of the respective class counters based on the respective event indices.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments thereof, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the embodiments of present invention and, together with the description, further serve to explain the principles of the invention and to allow for a person skilled in the relevant art(s) to make and use the invention.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art based on the teachings contained herein.
This specification discloses one or more embodiments that incorporate the features of this invention. The disclosed embodiment(s) merely exemplify the invention. The scope of the invention is not limited to the disclosed embodiment(s). The invention is defined by the claims appended hereto.
The embodiment(s) described, and references in the specification to “one embodiment”, “an embodiment”, “an example embodiment”, etc., indicate that the embodiment(s) described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is understood that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Although embodiments are applicable to any system or process for classifying various types of data, for brevity and clarity a flow cytometry environment is used as an example to illustrate various features of the present invention.
Thus, flow cytometry data includes a set of values for various parameters for respective cells (or other particles). In one example, the set of values (i.e., event values) associated with each cell (or other particle of interest) is termed an “event.” Thus event values include the measured parameter values for the event. Other event values include information associated with the event, such as event gate values as described below. For example, the measured parameters include fluorescent energy emitted at particular wavelengths and scatter (e.g., front scatter and side scatter) intensities. Each event can have a number, N, N being a integer greater than or equal to 0, of measured parameter values associated with it, and may be thought of as a point in N dimensional space. In a typical flow cytometer sample, several million events are measured and recorded for analysis. Flow cytometry data may be analyzed after the fact (e.g., read from a data file) or it may be analyzed in substantially real-time, as a sample is passing through the instrument. As used herein, the term “serial processing” means using non-parallel processing. “Parallel processing” includes partial and completely parallel processing. Embodiments of the invention may be used in parallel processing environments and/or serial processing environments. Some embodiments may be used in and/or include flow cytometry systems. Some of these embodiments may be used in and/or include parallel flow cytometry systems.
In step 210, captured data is read from a source. As discussed elsewhere herein, the source may be a file or database, or may be immediately stored after being collected from a sample.
In step 220, the data is compensated. In one example, compensation removes spectral overlap introduced during data collection. In an embodiment, compensation includes solving a system of linear equations. Because the flow cytometry data can be viewed as an M×N matrix of M events and N parameters, where M and N are integers equal or greater than 0, compensation may be performed using a matrix multiplication operation. In this example, the M×N data matrix is multiplied by an N×N compensation-matrix. The N×N matrix includes coefficients defining the proportion of a corresponding parameter to be removed from other parameters. In some implementations, matrix multiplication is an O(n3) operation.
In step 230, the data is transformed. In one example, transformation scales the data for display. When viewing a displayed graph (e.g., on a screen or printed page), the range of the data can reduce the effectiveness of the display. For example, a parameter may have a range of possible parameter values from 0 to 1,000,000, but a data set may have actual values in the range of 100 to 500. Thus, displaying the full scale axis on a 100 pixel square dot plot would force the entire data set to a single pixel row or column. Thus, the data needs to be transformed to provide a viewer with an accurate representation. In various examples, parameter values may be transformed to a linear scale or a logarithmic scale. Linear transformation may be performed by computing a new parameter value from the original parameter value using the equation y=a*x+b, where x is the old value, y is the new value, and a and b are constants. Logarithmic transformation may be performed by computing a new parameter value form the original parameter value using the equation y=b*log (a*x), where x is the old value, y is the new value, a and b are constants, and log is a logarithm of any base. In one example, all of the events in the data are sequentially or serially traversed for the particular parameter to be transformed resulting in an O(n) operation, where n is the number of events.
In step 240, plots are generated, for example plots for a graphical representation of the data to be shown on a display through a graph or through a hard copy output. There are various types of plots that may be generated. For example, dot plots, density plots, and other plots may be generated by scanning the data set are scanned to determine the pixel corresponding to the parameter value(s) of each event being drawn. In histograms, tree plots and certain other plots, the data set is scanned and the requisite counters are incremented. Example tree plots and user interactions with tree plots are described in more detail in U.S. Patent Appl. No. [To Be Assigned], Atty. Docket No. 2512.2340000, to Zigon, et al., which is incorporated by reference herein in its entirety. The counters may be visualized by drawing bars (e.g., “leaves” in a tree plot) of corresponding heights. Generation of some of these types of plots is described in more detail herein.
In step 250, statistics are generated. For example, a user may desire to measure various statistics, such as mean, median, mode, standard deviation etc. to describe the data. Statistics may be measured on the entire data set or on sub-populations (e.g., median value of parameter x for all the events inside gate A).
In step 260, plots and/or statistics are displayed. For example, plots and/or statistics may be displayed on any media (e.g., computer screen, printed on paper, etc.) for the user. Although display of the data for analysis is an important use of flow cytometry systems, some embodiments of the invention herein are not concerned with the display of the data per se, but in the underlying processing, determination, decisions making, and/or calculations resolving various aspects of the displaying of the data. Thus, when discussing determining a pixel or pixel value, the term pixel and pixel value refers to not only a potential specific location on a display, but also a corresponding memory location or other storage area. Further, an attribute such as shape may be used to convey information to the viewer. In that case, a pixel would not be a pixel in the ordinary sense of the term, but instead would be a discrete location on a display, where the location may include more than one pixel in the ordinary sense.
In step 270, gating is performed. Gating is discussed in detail elsewhere herein. In this step, the user may manipulate graphical displays of gates (e.g., click and drag or otherwise draw a gate on a displayed graph or plot) or use any other method of describing a gate to the system, including having default gates. Additionally, or alternatively, after completion of the gating process, process 200 may return back to any one of step 230, 240, and/or 250 to re-transform the data, re-generate the plots, and/or re-compute statistics. These steps may be repeated for all data or only for the data affected by the gating.
Thus, according to one or more embodiments, the flow cytometry processes described herein allow the user to iteratively analyze the data by selecting and/or modifying the types of graphs displayed and the variables, axes, and/or gates of interest.
In step 302, an event or a next event is retrieved (e.g., accessed) and/or received. For example, data corresponding to an event is received or accessed.
In step 304, a corresponding pixel is determined for the received or retrieved or accessed event. In this step, the parameter values of the event are used to determine a corresponding pixel. For example, the parameter values corresponding to the parameters associated with axes of the dot plot are examined and the corresponding location on the dot plot is determined. As discussed above, the term pixel as used throughout this application means not only a pixel on a computer screen display, but a discrete location on any display media, and also encompasses an associated memory location or storage location. If an attribute, such as shape, is used to convey information to the viewer, a pixel would not be a pixel in the ordinary sense of the term, but instead would be a set of pixels representing a discrete location or area on a display, such that it may include more than one pixel in the ordinary sense. Thus, this step may include determining the location on the graph to which the event maps and an associated memory location.
In step 306, the corresponding pixel is marked. For example, a value is assigned to the pixel based on the parameter values of the event. This step is discussed in detail below.
In step 308, a determination is made whether there are more events to be plotted. If yes, then process 300 returns to step 302. If no, then process 300 proceeds to step 310.
In step 310, plotting is complete.
In step 352, an event or a next event is retrieved (e.g., accessed) and/or received. For example, data corresponding to an event is received, retrieved or accessed.
In step 354, corresponding counters are determined, for example using the parameter values of the event. For example, the parameter values corresponding to a gate are examined and a counter associated with the gate is located. In another example, a parameter value associated with a histogram variable or axis is located along with an associated counter depending on a parameter value of the event.
In step 356, the corresponding counter is incremented. For example, the counters used in step 354 are incremented depending on the parameter value(s). For example, if a gate found in step 354 is satisfied, the associated counter is incremented. Additionally, or alternatively, this step may be combined with step 354 (e.g., locate and increment the counter in one step). The performance of steps 354 and 356 may be collectively called “classifying” an event, as the events are being classified into each category. Thus a class counter is a term that refers to a counter that is incremented when an event is determined to belong to the associated category/class.
In step 358, a determination is made whether there are more events to be plotted. If yes, then process 350 returns to step 352. If no, then process 350 proceeds to step 360.
In step 360, plotting is complete.
In this example, events 410 having X and Y values within the scales of X axis 404 and Y axis 402 are displayed on dot plot 400. However, events may also be excluded from display based on whether they satisfied certain gates. In one example, each event 410 may have more than two parameter values, however only the parameter values corresponding the parameters associated with X axis 404 and Y axis 402 determine the location or pixel where event 410 is displayed. For the sake of simplicity, the location where event 410 is displayed will be referred to as a pixel, however, this is not intended to limit the display of data such as dot plot 400 to a particular media. For this example, pixel will be used throughout this document as to describe a discrete location on a graph and an associated memory location storing a value or values associated with that discrete location on the graph.
An exemplary two dimensional gate 407 is shown on dot plot 400 of
In this example, gate 407 may be expressed as “(200<FS Area<510) AND (180>SS Area).” Thus “FS Area” and “SS Area” are gate variables, numbers “200,” “510,” and “180” are gate values, symbols “<” and “>” are gate conditionals, and “AND” is a gate operator. Events with parameter values that satisfy gate 407 may be displayed inside gate 407. Thus, an event with FS Area=300 and SS Area=100 is inside gate 407. Of course, if gate 407 were instead equivalent to the expression “NOT(200<FS Area<510) OR (180<SS Area),” the events 410 circumscribed by gate 407's boundaries, such as the example event with FS Area=300 and SS Area=100, would be outside gate 407, and the remaining events would be inside gate 407.
Gates may include gate variables corresponding to parameters, which are not displayed on a currently visible dot plot. For example, event 410 includes parameter values corresponding to the FS Area parameter and the SS Area parameter. It may also have parameter values corresponding to other parameters w, x, y, and z. Thus, a gate may be defined that may be expressed as “(125<w) AND (445<x<489) OR (z>500)” and event 410 may be inside (or outside) the gate even though the gate is not visible. However, for ease of description, gates are often discussed in conjunction with a display showing the gate.
Throughout this document, the notation “+” when placed next to a gate means inside the gate, and “−” when placed next to a gate means outside the gate. In tree plot 460, the inside (“+”) path is always to the right and the outside (“−”) path is always to the left. When reading a gate hierarchy, each branch follows a “+” or a “−” at each level to define the category represented by the leaf at the end of the branch. For example, branch 474 may be read as follows: at level 466, branch 474 follows the “+” path for gate B; at level 468, branch 474 follows the “−” path for gate C; and at level 470, branch 474 follows the “+” path for gate A. Thus, the category delineated by leaf 482 and defined by branch 474 may be described as “B+C−A+,” which translates to inside of gate B, outside of C and inside of A. An event is considered to be within this category only if it meets all three of those conditions. In tree plot 460, leaf 484 indicates that approximately 70,000 events were classified in category “B+C−A+” in this example. Throughout this document, the statement that an event “belongs” to a category means that the event should be classified into that category. Similarly, an event is classified when it is determined to which category the event belongs and an associated class counter is incremented. In other words, of the events measured and classified in the sample, roughly 70,000 were inside of gate B, outside of gate C, and inside of gate A, and thus belonged to the category “B+C−A+.” Similarly, leaf 480 indicates that approximately 400,000 events belonged to category “B+C+A+” in this example. It is important to note that each event will belong to one and only one category as the categories describe every possible inside/outside combination of the gates. The following sections describe exemplary methodologies and systems which may be used to classify and count events and generate plots such as tree plot 460.
In step 502, an event or a next event is retrieved, accessed, and/or received. For example, data corresponding to the event is received, accessed, or retrieved in this step.
In step 504, a gate or a next gate is retrieved, accessed, and/or received. For example, information corresponding to a gate is received, accessed, or retrieved.
In step 506, the event is compared to the gate to determine whether the event is inside the gate. For example, this can be done as discussed throughout the application.
In step 508, a determination is made whether there are more gates to be processed. If yes, then process 500 returns to step 504. If no, then process 500 proceeds to step 510.
In step 510, a determination is made whether there are more events to be processed. If yes, then process 500 returns to step 502. If no, then process 500 proceeds to step 512.
In step 512, gating is complete.
Embodiments of the invention may be used in and/or include a serial (non-parallel) processing environment or in a parallel processing environment. For example, certain embodiments of the invention apply to and/or include the parallel processing architectures: Single Instruction Multiple Data (SIMD), Single Process Multiple Data (SPMD), and/or Single Instruction Multiple Thread (SIMT). Flow cytometry analysis is particularly suited to architectures such as these as they are particularly suited to the performance of an operation or process on a large number of data points. The following description describes an example parallel processing architecture for flow cytometry. This architecture is used merely as an example to describe various features of the invention. In various examples, this may be optimized through use of a multiple-processor chip, such as a graphical processing unit, instead of or in addition to a single or dual processing chip, such as a more traditional central processing unit. For example, a graphics card as manufactured by nVIDIA of Santa Clara, Calif. or by ATI/AMD of Sunnyvale, Calif. may be used as described below as a device 650.
Example embodiments, such as those using an nVIDIA GPU having 128 Processing Elements (e.g., certain 8800 series products), using the techniques herein may process five million event-parameters of captured data (e.g., captured flow cytometry data) in less than 5 seconds, preferably less than 2 seconds and most preferably less than 1 second. One hundred million to one billion (preferably at least 500 million, most preferably at least 750 million) event-parameters may be processed in less than 30 seconds, preferably less than 15 seconds and most preferably less than 5 seconds. Event parameters are the number of events multiplied by the number of parameters in each event. As hardware technology progresses, the performance of embodiments of this invention will continue to likewise improve. Similarly, improvements to operating systems and other software that yield general performance gains will also improve the performance of embodiments of this invention.
Parallel computer system 600 includes a display interface 602. Connected to the display interface may be display 630. Display 630 may be integral with a flow cytometer system or it may be a separate component. Parallel computer system 600 includes one or more processors, such as host processor 604. Host processor 604 can be a special purpose or a general purpose processor. Host processor 604 is connected to a communication infrastructure 606 (for example, a bus, or network).
Parallel computer system 600 also includes a main memory 608, preferably random access memory (RAM), and may also include a secondary memory 610. Secondary memory 610 may include, for example, a hard disk drive 612, a removable storage drive 614, flash memory, a memory stick, and/or any similar non-volatile storage mechanism. Removable storage drive 614 may comprise a floppy disk drive, a magnetic tape drive, an optical disk drive, a flash memory, or the like. The removable storage drive 614 reads from and/or writes to a removable storage unit 618 in a well known manner. Removable storage unit 618 may comprise a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 614. As will be appreciated by persons skilled in the relevant art(s), removable storage unit 618 includes a computer usable storage medium having stored therein computer software and/or data.
In alternative implementations, secondary memory 610 may include other similar means for allowing computer programs or other instructions to be loaded into parallel computer system 600. Such means may include, for example, a removable storage unit 622 and an interface 620. Examples of such means may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 622 and interfaces 620 which allow software and data to be transferred from the removable storage unit 622 to parallel computer system 600.
Parallel computer system 600 may also include a communications interface 624. Communications interface 624 allows software and data to be transferred between parallel computer system 600 and external devices. Communications interface 624 may include a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, or the like. Software and data transferred via communications interface 624 are in the form of signals which may be electronic, electromagnetic, optical, or other signals capable of being received by communications interface 624. These signals are provided to communications interface 624 via a communications path 626. Communications path 626 carries signals and may be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link or other communications channels.
Parallel computer system 600 also includes at least one device 650. Device 650 is coupled to rest of parallel computer system 600, including host processor 604, via communication infrastructure 606. Device 650 comprises a plurality of multiprocessors 652a-652n, where n is an integer having a value of 0 or higher. Each multiprocessor 652 may have a SIMD architecture, as described in detail elsewhere herein. The device 650 also includes a device memory 654, coupled to each multiprocessor 652a-652n.
During operation, multiprocessor 652 may map one or more threads to each processor 704a-704m. Threads of execution, or simply threads, are simultaneous (or pseudo-simultaneous, such as in a multitasking environment) execution paths in any serial or parallel computer. Some threads may execute independently and/or cooperate with other threads. In some parallel architectures, threads may execute on different processors and/or share data (e.g., use shared memory).
For example, in the Compute Unified Device Architecture (CUDA), all threads of a thread block reside on the same processor core, but multiple thread blocks are scheduled in any order across any number of processor cores. The NVIDIA CUDA Compute Unified Device Architecture Programming Guide, Version 2.0 of Jun. 7, 2008, is incorporated by reference herein in its entirety. The number of threads per thread block is limited by the resources available to each processor core. For example, on the NVIDIA Tesla hardware implementation of CUDA, a thread block is limited to 512 threads. Thread blocks are split into warps. Each warp is a set of parallel threads (e.g., 32 threads). A half-warp is the first half or the second half of a warp. Individual threads of a warp start together at the same program address, but may branch and execute independently. Warps are executed one common instruction at a time. If threads of a warp diverge due to a conditional branch, then the threads are serially executed until the threads converge back to the same execution path.
CUDA allows a programmer to define functions, called kernels. Typically a program running on a host such as host processor 604 invokes a kernel. When invoked, a kernel may be executed on a device, such as device 650 illustrated in
Transitioning a wholly serial or sequential cytometry process to an at least partial parallel environment, such as parallel computer system 600 presents several challenges. For example, memory access speeds have not increased proportionally with processor speeds. In some parallel architectures, a memory access may require an order of magnitude more clock cycles than a floating point operation. Memory accesses in those architectures should be minimized. Also, different types of memory accesses take different amounts of time. For example, a shared memory access (e.g., accessing shared memory 702) may take one or more orders of magnitude less time than a device memory access (e.g., accessing device memory 654). Taking these challenges into account, transitioning a serial or sequential flow cytometry data processing method to a parallel environment is not a straightforward process. Many innovative techniques are required to maximize the capabilities of the specific architecture. For example, consider generating a dot plot in a parallel environment. If each thread simply reads the data it needs to process an event and finds and marks corresponding pixel, the amount of time spent performing memory operations may be several hundred times the amount of time spent performing computations.
In another example, consider the generation of a tree plot such as exemplary tree plot 460 illustrated in
Parallel processing capabilities may be applied to reduce the total processing time. One exemplary process reduces total processing time by creating a thread for each event. Each thread compares the event against each of the 2n categories until the category to which the event belongs is found. This approach may be faster than the non-parallel process described above, but each thread still makes a significant number of comparisons. Further, a thread might determine the category to which its event belongs before some of the other threads (i.e., on one of the first comparisons). Therefore, many threads complete their categorization task and are idle during a significant amount of the total processing time.
The following sections describe exemplary embodiments using, for example, hash table solutions that reduce the number of comparisons, thus accelerating the processing speed and reducing the time threads wait for other threads to complete classification of their events.
A discussion of exemplary data structures that allow for the implementation of this embodiment using exemplary hash table solutions is discussed below. In an embodiment, each gate has a unique gate identifier associated with the gate. A gate identifier may be string of values (e.g., bits). In a further embodiment, the gate identifier for each gate encodes an assigned priority of the gate. For example, a gate identifier may be a bit string having a binary value equal to 2n+1, where n is the priority of the gate and n is an integer greater than zero. Thus, if gate 1 is priority 1, its gate identifier would be “0 . . . 00000011” (i.e., a plurality of “0” bits followed by two “1” bits. If gate 2 is priority 2, its gate identifier would be “0 . . . 00000101.” If gate 5 is priority 5, its gate identifier would be “0 . . . 00100001.” It is not necessary to prioritize the gates as long as each gate is assigned a unique number. Furthermore, other encoding schemes are possible. In an embodiment, a higher priority number represents a higher priority, but that need not be the case. For example, priority 5 may be either higher or lower priority than priority 1 depending on the priority convention used in a particular embodiment.
In an embodiment, the order of the categories in table 820 reflects the order of the gates in the levels of a gate hierarchy of a tree plot. For example, with reference to tree plot 460 illustrated in
In one example, a purpose of the LUT is to dynamically map the category indices to the categories displayed in a tree plot, such as tree plot 460. It is to be appreciated that if the order of the gates assigned to the levels of a gate hierarchy is changed, the indices of each category do not change in value, but are reordered in the LUT. For example, consider exemplary table 840 as illustrated in
In an embodiment, not all of the gates are required to be used when classifying events. More generally, not all of the category gate values may be important to the current classification process, i.e., not all category gate values are “interesting” values, e.g., biologically significant values or values of interest in the current analysis. In the example illustrated in
As discussed above, gate identifiers are unique identifiers associated with each gate. In an exemplary embodiment where gate identifiers are unique bit strings. For example, a gate identifier may be a bit string having a binary value equal to 2n+1, where n is the priority of the gate. Thus, if gate 1 is priority 1, its gate identifier would be “0 . . . 0000001” (i.e., a plurality of “0” bits followed by two “1” bits. If gate 2 is priority 2, its gate identifier would be “0 . . . 00000101.” If gate 5 is priority 5, its gate identifier would be “0 . . . 00100001.” Other encoding schemes are possible. Gating an event may be performed (whether in a wholly serial or sequential or at least partially parallel environment) by performing a bitwise OR of the satisfied gate identifiers and the event gate value. For example, an event gate value string may be initialized to “0 . . . 0000000.” If it is determined that the associated event satisfies gate 1, the event gate value string is bitwise ORed with gate 1's identifier. The resulting event gate value string is “0 . . . 00000011.” If the event is then determined to satisfy gate 5, the event gate value string is bitwise ORed with gate 5's identifier. The resulting event gate value string is “0 . . . 00100011.” Thus, according to this embodiment of the present invention, the gates satisfied by the event are encoded in the event gate value string. The event indices shown may be calculated in the same manner as the category indices described above. Thus, the LUT is a hash table and a hashing function is also used to map event values (e.g., event gate value strings) to the table. For example, the event gate value string, or just the interesting bits of the event gate value string, may be converted to a decimal number. As discussed with respect to category gate values above, uninteresting event gate values in event gate value strings may be safely ignored, effectively compressing the event gate value strings and reducing the amount of processing.
The preceding exemplary data structures demonstrate a property that is used for efficiently classifying events. In one example, an event may be compared to each and the results of the comparisons may be encoded into a gate value string of the event. Once the event has been compared to each gate, the event index, which depends upon the event gate value string, may be generated. The event belongs to the category having a matching index. The event index does not need to be compared to the category indices at all, rather the event index can be used to directly (or indirectly) reference and increment a counter associated with the category. The following discussion elaborates on methods and systems for efficiently classifying data using attributes of the example data structures discussed above. Although the example data structures are used in some of the following exemplary processes and systems, it is to be appreciated that other specific data structures have similar properties and could be used.
In step 1002, a set of gates is received, retrieved, and/or accessed. As described above, gates can be, for example, complex combinations of gate values, gate variables, gate conditions, gate operators, including any Boolean and/or algebraic construction involving any number of parameters (gate variables). In an embodiment, gate identifiers are assigned in this step. In another embodiment, gate identifiers are received along with the gates.
In step 1004, categories are determined. In an embodiment, 2n categories are defined using the n gates received in step 1002, where n is a positive integer. The determined categories are defined by the possible combinations of the gates received in step 1002.
In optional step 1006, a shift table is generated. The purpose of this optional step is to allow for the efficient compression of event gate value strings and category gate value strings by ignoring the uninteresting category gate values and event gate values. In an embodiment where step 1006 is not performed, each value of the value strings is processed. Interesting and uninteresting values are discussed above with reference to tables 820 and 840 of
In step 1008, category indices are generated. For example, category indices according to one embodiment are described in detail above in reference to tables 800, 820, 840, 860, and 880 as shown in
In step 1010, tables are copied to memory, for example a shared memory. In embodiments using an architecture having a shared memory or equivalent, this step is performed to provide the thread or threads performing the classification of the data rapid access to the category indices. If a LUT is generated in step 1008 above, the LUT is copied to the shared memory. In an embodiment using or including a computer system 600 as described above with reference to
In step 1022, a MainMask is generated. For example, a MainMask indicates which positions of the event gate value strings and category gate value strings can be safely ignored, i.e., which positions will contain interesting values and which positions will contain uninteresting values, as discussed above. In an embodiment, a MainMask is generated by performing a bitwise OR of each gate identifier for the gates of interest in the current classifying process. Referring back to
In step 1024, a counter (j) and NumBitsToCheck are initialized. In an embodiment, the counter and/or NumBitsToCheck are initialized to zero.
In steps 1026-1034 described below, the MainMask generated in step 1022 is traversed and the number of interesting bits (e.g., ‘1’ bits) are counted and the locations of the interesting bits are recorded in a shift table.
In step 1026, a determination is made whether the value in the jth slot of the MainMask generated in step 1022 is “1”. For example, if the jth slot has a value indicating an interesting value (e.g., a ‘1’), then process 1020 proceeds to strep 1028. If jth slot does not have a “1” value, process 1020 proceeds to step 1032.
In step 1028, NumBitsToCheck is incremented.
In step 1030, a shift table is updated with the current position being examined in the MainMask. This step records in the shift table the position of the interesting value found in step 1026. In an embodiment, the relative position is stored in shift table. In an embodiment, the shift table is a one dimensional array (e.g., “ShiftTable[ ]”). The relative position may be stored in shift table by setting ShiftTable[NumBitsToCheck]=j−ShiftTable[NumBitsToCheck−1]. In another embodiment, the absolute position is stored in the shift table. The absolute position may be stored in the shift table by setting ShiftTable[NumBitsToCheck]=j.
In step 1032, a determination is made whether the MainMask has been completely traversed. For example, MaxGates may indicate a maximum number of gates and also the maximum length of a MainMask. Thus, if j<MaxGates, i.e., MainMask has not been completely traversed, process 1020 proceeds to step 1034. If j>MaxGates, i.e., MainMask has been completely traversed, process 1020 proceeds to step 1036.
In step 1034, the counter j is incremented. Thus, when process 1020 proceeds to step 1026, the next position of MainMask is examined.
In step 1036, the final value of NumBitsToCheck is recorded. In an embodiment, the first position in the Shift Table is set to the current value of NumBitsToCheck. This step records the total number of interesting bits, e.g., the number of “1” values in MainMask, which has been counted by NumBitsToCheck. It is to be appreciated that in an embodiment, NumBitsToCheck was incremented prior to its use as an array index for Shift Table in step 1030. Thus ShiftTable[0] remains unused until the performance of this step.
In step 1042, a category or the next category is retrieved, accessed, or received.
In step 1044, a category gate value string is determine or calculated for the category. For example, one embodiment of category gate value strings are described above especially in reference to
In step 1046, a category index is generated. For example, one embodiment category indices are described above in reference to
In step 1048, an entry in a lookup table (LUT) is made. For example, a LUT as described above in the description of
In step 1050, a determination is made if there are any more categories for which category indices are to be generated. If yes, then process 1040 proceeds to step 1042. If no, process 1040 proceeds to step 1052.
In step 1052, process 1040 is done.
In step 1062, an index (L) is initialized. In an embodiment, L is an integer variable greater than or equal to 0. In a further embodiment, L is initialized to a value of 0.
In step 1064, a variable P is initialized. In an embodiment, P is an integer variable greater than or equal to 1. In a further embodiment, P is initialized to a value of 1.
In step 1066, a counter k is initialized. In an embodiment, k is an integer variable greater than or equal to 0. In a further embodiment, k is initialized to a value of 1.
In step 1068, a variable i is set to the kth value in a shift table. In an embodiment, the shift table has values indicating the absolute position of interesting gate values in category and/or event gate value strings. For example, the above descriptions of step 1006 of process 1000 and step 1030 of process 1020 may be used.
In step 1070, a determination is made whether the ith value of a gate value string (e.g., a category gate value string or an event gate value string) is set to a value (e.g., “1”). If yes, then process 1060 proceeds to step 1072. If no, then process 1070 proceeds to step 1074.
In step 1072, the index (L) is updated by adding value P to L.
In step 1074, P is updated by multiplying by two. In an embodiment, this multiplication step is performed by left shifting the value P by one bit.
In step 1076, a determination is made whether the total number of interesting values have been parsed and accounted for. In an embodiment, this step is performed by comparing k to the total number of interesting values in a gate value string (e.g., category gate value string or event gate value string). In a further embodiment, this step is performed by determining whether k is less than the 0th value in the shift table—in that embodiment, the first value of the shift table contains the total number of interesting values. See, for example, the description of step 1036 of process 1020 above and shown in
In step 1078, the counter k is incremented.
In step 1080, process 1060 is done.
It is to be appreciated that process 1060 illustrates an example process for determining an index. Other example processes may also be used. For example, the following pseudocode illustrates another example process for determining an index. In an embodiment, the process illustrated by the following pseudocode may be used to perform step 1046 of process 1040 shown in
In the above pseudocode, eMask represents a gate value string (e.g., category gate value string or event gate value string) sent to the function Compute_Index. ShiftTable[ ] is an array of values where the 0th value holds the total number of interesting values and the remaining values hold the relative positions of the interesting gate values in the gate value strings. The above descriptions of step 1006 of process 1000 and step 1030 of process 1020 describe relative positions in shift tables in detail. The “for loop” iterates for a number of times determined by the total number of interesting values (as stored in ShiftTable[0]). During each iteration of the for loop, the eMask is right shifted by the number (i.e., relative position) stored in the current slot in the ShiftTable. The Index is incremented by a power of two stored in pow—but only if the current interesting value in eMask is 1. It is to be appreciated that the number of previous iterations and thus the number of left shifts of pow determines the power of two stored in pow.
In one example, the above described processes may be performed and/or the data structures instantiated prior to and/or in conjunction with the exemplary parallel flow cytometry process in the following description.
In step 1102, compensation is performed. Compensation in general is described with reference to step 220 of process 200 as illustrated by
In step 1104, compensated events are read into shared memory (e.g., shared memory 702). For example, a set of threads may be used to read the compensated events into shared memory 702 at least partially in parallel. In an embodiment, shared memory 702 is not large enough to store all of the compensated events. In this case, a subset of the compensated events are read into one or more shared memories 702. In a further embodiment, steps 1104-1126 are repeated until all events have been read into shared memory and processed. Additionally, or alternatively, the data read performed in this step may be coalesced to optimize the read and avoid memory bank conflicts according to the specific system architecture.
For example, in a CUDA architecture, global memory access (e.g., access to a portion of device memory 654) by a half warp of sixteen threads may be coalesced into one or two memory transactions if it satisfies three conditions: (a) the threads access sixteen 32 bit words (one transaction of 64 bytes), sixteen 64 bit words (one transaction of 128 bytes), or sixteen 128 bit words (two transactions of 128 bytes each), (b) all sixteen words accessed lie in the same segment and that segment has the same size as the one or two transactions, and (c) the threads access the words in order (e.g., the third thread accesses the third word). Therefore, in an example flow cytometry embodiment implemented in a CUDA environment, each thread of a half warp reads a corresponding parameter of the events (e.g., thread 1 reads the values for parameter 1 of the events, thread 2 reads the values for parameter 2 of the events), etc. If each parameter value is stored in a word, then each memory transaction may read 16 parameters. Once read, the event data may be stored in shared memory (e.g., shared memory 702) in such a manner as to avoid shared memory bank conflicts, regardless whether the read was coalesced.
In CUDA, shared memory is divided into equally sized shared memory banks. A shared memory bank conflict occurs if multiple, simultaneous memory reads or writes are attempted to addresses in a single shared memory bank. In other words, shared memory reads or writes to several addresses can be performed simultaneously as long as each address is in a separate bank. If shared memory reads or writes attempt to access more than one address in a bank at the same time, a shared memory bank conflict results and the read or write is broken into as many reads or writes as necessary to be conflict-free. Shared memory banks in CUDA are organized such that successive 32 bit words are assigned to successive banks. Thus, memory reads or writes of multiple words to successive banks do not result in a conflict and may occur simultaneously.
Therefore, in an embodiment, event data is stored in shared memory (e.g., shared memory 702) in columns—that is, each parameter for the set of events is stored in an specific shared memory bank (e.g., parameter 1 of the events is stored in shared memory bank 1, parameter 2 in shared memory bank 2, etc.).
In step 1106, threads are synchronized. For example, each thread delays until the other threads that are executing one or more of the previous step(s) have reached the synchronization point. After all the threads have reached this synchronization point, the threads may proceed to step 1108, executing independently. For example, in CUDA, the execution of the threads in a block may be synchronized at defined synchronization points using a synchronize threads function. All threads of the block delay until all the threads of the block reach the synchronization point before proceeding.
In step 1108, a determination is made whether there are more events in shared memory to process, or whether all events in the shared memory (e.g., shared memory 702) have been processed. If there are more events in shared memory to process, process 1100 moves to step 1110. If all events in shared memory have been processed, process 1100 moves to step 1126. In one example, step 1108 allows for a set of threads to perform in parallel to process a larger number of events at a same time and/or faster than is possible when doing serial or sequential processing. For example, if there are 10 threads and 100 events, step 1108 may allow the 10 threads to process in parallel until all 100 events are processed, which can allow for 10× increase in processing speed as compared to serial processing since all 10 are processing at the same time on the 100 events, rather than sequentially or serially. In an embodiment, this step and steps 1108-1126 are performed in a separate operation from the other general flow cytometry steps (e.g., compensation and statistics generation). For example, in an embodiment, the transformation, the gating, and the plotting may be combined into one CUDA kernel. Thus, each block of threads will read a portion of the event data into shared memory and perform the processing required to perform these three general flow cytometry steps on that portion of event data. This reduces the amount of memory reads required to slower global memory (e.g., a portion of device memory 654).
In step 1110, each thread accesses, retrieves, or receives a next event from shared memory, such as shared memory 702. For example, the event currently being processed by a thread is termed its current event.
In step 1112, each thread gates its current event. For example, as discussed above, gating an event includes determining which gates are satisfied by the event by comparing the event (its parameter values) to the gate. An exemplary process for parallel gating which is discussed in with below respect to
In step 1114, each thread gets a plot. In an embodiment, each thread accesses certain parameters regarding a plot, for example, a dot plot that is to be displayed.
In step 1116, each thread makes a determination whether its current event is to be plotted on the current plot. If no, process 1100 may return to step 1108. Even if the current event is not to be plotted, however, process 1100 may proceed to step 1116. If yes, process 1100 proceeds to step 1116. In an embodiment, plots may be designated to show only events that are inside of (or outside of) one or more gates. For example, if the current plot is designated to display only plots inside of gate G, each thread may examine its event to determine whether the event is inside of gate G. In an embodiment, each thread examines its event's gate value string to determine whether its event is inside and/or outside each of the gates designated for the current plot. In another embodiment, each thread may examine its event to ensure its event is within the scale of the plot.
In step 1118, each thread transforms its current event for plotting. Transformation is described in detail elsewhere herein (e.g., see description of step 230 of flowchart 200).
In step 1122, each thread plots its event. In an embodiment, each thread determines a counter that maps to its current event. In an embodiment, the counter is incremented. The process of finding corresponding counters is sometimes referred to as classifying data (e.g., events) as the process is analogous to segregating items based on their characteristics and placing them in distinct classes. In an embodiment, a thread examines parameter values for a current event corresponding to the parameters associated with the each gate. Based on those parameter values, the thread determines which counter should be updated. In another embodiment, each thread uses an event gate value string associated with its event to determine the counter to be updated. In a further embodiment, an event index is determined. In one example, an event index may be determined by the same processes that can be used to determine a category index. For example, process 1060 as illustrated in
In step 1124, each thread determines whether there are any more plots to process for its current event. If yes, process 1100 returns to step 1114. If no, process 1100 returns to step 1108. In one example, step 1124 allows each thread to work through a set of plots that are being generated and update any pixels and/or counters associated with each plot that map to a current event of the thread. In embodiments where plots such as dot plots are generated, events are mapped to pixels. However, for simplicity and brevity, this example process details the generations of plots such as tree plots that update counters.
Again, if the determination in step 1108 is that there are no more events, process 1100 proceeds to step 1126. In step 1126, a determination is made whether there are any more events in global memory to process, or whether all events in the global memory (e.g., a portion of device memory 654) have been processed. If there are more events in global memory to process, process 1100 moves to step 1104. If all events in global memory have been processed, process 1100 moves to step 1128.
In step 1128, threads are synchronized. Threads delay until other threads executing the previous step(s) have reached this synchronization point. For example, in CUDA, the execution of the threads in a block may be synchronized at defined synchronization points using a synchronize threads function. In this example, all threads of the block delay until all the threads of the block reach the synchronization point before proceeding.
In step 1130, statistics computation is performed. Statistics generation in general is described in the discussion of step 250 of process 200 above. Statistics generation may be in parallel. In an embodiment, parallel statistics generation is performed as a separate function (e.g., a separate CUDA kernel) from the other main cytometry steps (e.g., compensation, transformation, plot generation, and gating). Parallel statistics generation may be performed by a plurality of threads.
In step 1132, plots and statistics may be displayed as described in the discussion of step 260 of process 200 above.
In step 1134, a change to a gate may be received. For example, a user may modify a gate using a graphical user interface (e.g., clicking and dragging or re-drawing a gate boundary) or by any other method (e.g., typing in a gate description). Additionally and/or alternatively, a user may update the categories displayed in the tree plot (e.g., change which levels are displayed). For example, tree plots and user interactions with tree plots are described in more detail in U.S. Patent Appl. No. [To Be Assigned], Atty. Docket No. 2512.2340000, to Zigon, et al., which is incorporated by reference herein in its entirety.
In step 1136, plots are updated. In this step, plots which may have events that could have been affected by the changed gate are re-determined. For example, in an embodiment, if only the categories are changed (i.e., no gates are changed), then a look up table (LUT) may be used to dynamically map categories and associated class counters to the displayed categories in a plot, such as a tree plot, as described above with reference to
As discussed above in “Example Parallel Flow Cytometry Process—Data Structures for Hash Tables” and “Example Parallel Flow Cytometry Process—Creating Data Structures,” gate identifiers are unique identifiers associated with each gate. In an embodiment, an event is gated and the gates satisfied by the event are encoded in the event gate value string, and event indices shown may be calculated in the same manner as the category indices described above. Gating an event (e.g., in step 1112 above) may be performed-whether in a wholly serial or sequential or at least partially parallel environment-by performing a bitwise OR of the satisfied gate identifiers.
In step 1152, a gate or next gate is retrieved, accessed, and/or received. In an embodiment, a thread retrieves, accesses, or receives information corresponding to a gate.
In step 1154, an event is transformed for the current gate. In an embodiment, a thread transforms the event for the gate. Transformation is described in detail elsewhere herein (e.g., see description of step 230 of flowchart 200). Transformation for a gate (as opposed for a plot) similarly scales the event for the gate. In an alternative embodiment, the gate is transformed for the event, i.e., the scale of the gate is transformed to the scale of the appropriate event parameters.
In step 1156, an event is compared to the gate to determine whether the event is inside the gate. In an embodiment, the thread makes this comparison using its current event. If the event satisfies the gate, process 1150 moves to step 1158. If the event does not satisfy the gate, the process 1150 moves to step 1160.
In step 1158, a gate value string of the event is updated. In an embodiment, the updating comprises a bitwise OR operation of a gate identifier and an event gate value string. In an embodiment, the thread performs the updating.
In step 1160, a determination is made whether there are more gates to be processed. If yes, process 850 returns to step 1152. If no, process proceeds to step 1162. In an embodiment, the thread makes this determination.
In step 1162, gating is complete for the event.
The preceding discussion described embodiments of the present invention in a specific application. However, as discussed in the embodiments below, embodiments of the present invention can be used in many other applications.
In step 1202, thresholds are received, accessed, and/or retrieved. In an embodiment, the thresholds include algebraic and/or Boolean descriptions of conditions. A threshold may be identified by a threshold identifier which includes a threshold identifier value. In a further embodiment, the thresholds are gates and threshold identifiers are gate identifiers.
In step 1204, categories are determined based on the received thresholds.
In step 1206, category indices are determined. A category index provides a short, unique index value for referencing a particular category. A category index may be determined using category threshold values associated with the category. In an embodiment. In an embodiment, thresholds are gates and category threshold values are category gate values. In a further embodiment, category indices are calculated by a process described herein.
In step 1208, class counters are generated. Each class counter is associated with a category and may be accessed directly or indirectly using the category's associated category index.
In step 1210, a biological mixture is received. In an embodiment, the biological mixture includes cells and markers.
In step 1212, the biological mixture is analyzed. In an embodiment, physical characteristics of each cell are measured and recorded. This recorded data is termed captured data. In an embodiment, the captured data is from a flow cytometer.
In step 1214, events are classified. In an embodiment, captured data comprises events. Classifying events includes finding and incrementing the corresponding class counters. In an embodiment, events is classified according to process 1220 described below and illustrated by
In step 1216, a tree plot is displayed. The tree plot represents at least one of the values of the class counters.
In step 1222, an event is received, accessed, and/or retrieved.
In step 1224, an event index is determined. An event index provides a short, unique value corresponding to a category to which the event belongs. In an embodiment, an event index is calculated by a process described herein.
In step 1226, a class counter corresponding to the event is incremented. The class counter is identified using the event index determined in step 1224.
In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to media such as removable storage unit 618, removable storage unit 622, and a hard disk installed in hard disk drive 612. Signals carried over communications path 626 can also embody the logic described herein. Computer program medium and computer usable medium can also refer to memories, such as main memory 608 and secondary memory 610, which can be memory semiconductors (e.g. DRAMs, etc.). These computer program products are means for providing software to parallel computer system 600.
Computer programs (also called computer control logic) are stored in main memory 608 and/or secondary memory 610. Computer programs may also be received via communications interface 624. Such computer programs, when executed, allow for parallel computer system 600 to implement the present invention as discussed herein. In particular, the computer programs, when executed, allow for host processor 604 to implement the processes of the present invention, such as the steps in the methods illustrated by processes 200, 300, 350, 500, 800, 1000, 1020, 1040, 1060, 1100, 1150, 1200, and 1220 of
An embodiment of the invention is also directed to computer program products comprising software stored on any computer useable medium. Such software, when executed in one or more data processing device, causes a data processing device(s) to operate as described herein. Embodiments of the invention employ any computer useable or readable medium, known now or in the future. Examples of computer useable mediums include, but are not limited to, primary storage devices (e.g., any type of random access memory), secondary storage devices (e.g., hard drives, floppy disks, CD ROMS, ZIP disks, tapes, magnetic storage devices, optical storage devices, MEMS, nanotechnological storage device, etc.), and communication mediums (e.g., wired and wireless communications networks, local area networks, wide area networks, intranets, etc.).
It is to be appreciated that the Detailed Description section, and not the Summary and Abstract sections, is intended to be used to interpret the claims. The Summary and Abstract sections may set forth one or more but not all exemplary embodiments of the present invention as contemplated by the inventor(s), and thus, are not intended to limit the present invention and the appended claims in any way.
The present invention has been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.
The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.
The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.