Methods and apparatus for sorting data

Description

TECHNICAL FIELD

The present disclosure generally relates to sorting data. The disclosure relates more specifically to improving efficiency of sorting data where the sort function is non-injective with a linearly ordered, finite codomain.

BACKGROUND

The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.

Current state-of-the-art genome sequencing machines do not, as one might expect, produce one continuous output sequence of the entire genome. Rather, they generate large numbers of relatively short fragments of sequence called reads, which range from dozens to thousands of base pairs in length. Because these reads are output by the machine in no particular order, the first step in analyzing the data in prior approaches is typically to map each read to a position on the reference genome with which the read is associated. This is called alignment. The second step in prior approaches is typically to sort these reads by their mapped positions. Genome sequencing produces large quantities of data that can take hours or days to align and sort, so prior approaches can be improved by eliminating or making more efficient the steps in this analysis.

SUMMARY

The appended claims may serve as a summary of the disclosure.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:

FIG. 1A illustrates a block diagram providing a high-level view of an example data processing system that may be used in an approach for sorting data.

FIG. 1B illustrates a flow diagram where one example method of processing implements the example system of FIG. 1A.

FIG. 2 illustrates a block diagram where a computer system implements an example sorting algorithm described herein.

FIG. 3 illustrates an example computing architecture for implementing the subject matter described herein.

DETAILED DESCRIPTION OF THE INVENTION

While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

The foregoing needs, and other needs and objectives that will become apparent from the following description, are achieved in the present invention, which comprises in one embodiment a method and apparatus for sorting data. Based on the foregoing, there is a clear need for improved ways to sort data.

The present approach improves sorting efficiency by eliminating the sorting step and producing a sorted result directly from the alignment step.

While the present disclosure is motivated by the problem of genome sequence data analysis, it can be applied to any problem with the same general characteristics. The genome sequencing problem can be generalized as follows:

Consider the human genome which comprises roughly 3 billion base pairs. If L is the actual length, then the genome can be represented as a string H of length L over a finite alphabet {A,C,T,G}. Reads can then be generalized as the set of all strings over the same alphabet with any length from 1 to L inclusive. The process of mapping a read to a position on the genome is therefore a function whose domain is all possible read strings and whose codomain is positions on the reference genome, which is the linearly ordered, finite set {1 . . . L}. As reads of different lengths can be mapped to the same position, this function is non-injective. The present disclosure is therefore applicable to any problem that is equivalent to sorting elements from a finite set where the ordering function is non-injective to a codomain that is a linearly ordered, finite set.

Structural Overview of Data Sorting System

FIG. 1A is a block diagram providing a high-level view of an example data processing system that may be used in an approach for sorting data. In one embodiment, there is initially a set of strings 100 of various lengths ranging from 1 to L, over a given finite alphabet. In various embodiments, strings 100 may be stored in computer storage devices of different types such as volatile main memory, non-volatile storage such as disk or FLASH memory, or other digital electronic storage. There can also be a processing element called the mapper 200, which may be implemented in various embodiments using digital logic in a special-purpose computer, or using one or more computer programs or other software elements that are executed using a general-purpose computer. The mapper 200 can be configured, given the set of strings 100 as input, to apply a non-injective mapping function 201 to each input string and to output each string paired with a position value, which can be a member of the codomain of the mapping function 201.

The embodiment can also include a set of data storage containers 300. The number of storage containers 300 may be equal to L, the number of elements in the codomain of the mapping function. Each of the storage containers 300 is addressable by one of the elements in the codomain. In the example of genome sequencing, there would be roughly 3 billion storage containers 300, one for each base pair, and each of the storage containers is addressable by the position in the genome that it represents.

Optionally one or more compact containers 400 may be provided. The function of compact containers 400 is further described in other sections.

The mapper 200 may be configured to both address any individual one of the storage containers 300 and to add a new data element to that container in O(1) time, constant in the number of containers n. One suitable implementation is an in-memory array of linked lists where each of the storage containers 300 is a linked list and the array is indexed by genome position. Another suitable implementation is a set of on-disk files whose filenames are genome positions and which can each have data appended to it in constant time.

FIG. 1B is a flow diagram that illustrates one method of processing using the example system of FIG. 1A. In an embodiment, the process of FIG. 1B may be implemented using mapper 200.

In step 800, the mapper 200 can read a particular one of the strings in the set 100.

In step 801, the mapper can apply the mapping function 201 to the particular string, which yields a position value for that string.

In step 802, the mapper can address the data container associated with the position value that was determined at step 801.

In step 803, the mapper cam append the particular string and its position value to the data container that was addressed at step 802. Step 803 may include forming a data item that includes the particular string and its position value prior to performing the appending. Data containers may contain more than one data item when multiple strings map to the same position value.

The mapper 200 can then loop back to step 800 until all strings in the set 100 have been processed. For some applications, the output as stored in the data containers 300 at this point can be sufficient as a final result. The data are already sorted and can be accessed in a linear ordered fashion by simply traversing the containers in order. No separate sorting step is required.

Compact Output

If the strings in set 100 are non-unique or there are fewer than L strings, then some of the data containers 300 may not contain any data items. In this situation, the final output can be made more compact, for example, by adding the following steps:

In step 900, the mapper 200 creates a new compact data container 400.

In step 901, the mapper addresses the first of the original data containers 300.

In step 902, the mapper copies any data items found in the first container addressed at step 901, and appends the data items to the new data container 400.

The mapper then loops back to step 901 and addresses the next one of the original data containers 300, and repeats this process until all of the original data containers have been copied and appended to the new compact data container 400.

This operation is O(n) linear in the number of strings in 100.

Non-Deterministic Mapping Function

In some applications the mapping function can map a string to multiple values in the codomain, each with a probability or score associated with it. Such a non-deterministic mapping function can be accommodated by altering step 803 to append the string, the position value, and the probability or score, to each of the data containers to which the mapping function maps the string. All other aspects of the processing can remain the same and the advantages of the present approach are preserved.

Hardware Overview

According to one embodiment, the techniques described herein can be implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.

For example, FIG. 2 is a block diagram that illustrates a computer system 200 upon which an embodiment of the invention may be implemented. Computer system 200 can include a bus 202 or other communication mechanism for communicating information, and a hardware processor 204 coupled with bus 202 for processing information. Hardware processor 204 may be, for example, a general purpose microprocessor.

Computer system 200 can also include a main memory 206, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 202 for storing information and instructions to be executed by processor 204. Main memory 206 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 204. Such instructions, when stored in non-transitory storage media accessible to processor 204, can render computer system 200 into a special-purpose machine that is customized to perform the operations specified in the instructions.

Computer system 200 can further include a read only memory (ROM) 208 or other static storage device coupled to bus 202 for storing static information and instructions for processor 204. A storage device 210, such as a magnetic disk or optical disk, can be provided and coupled to bus 202 for storing information and instructions.

Computer system 200 may be coupled via bus 202 to a display 212, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 214, including alphanumeric and other keys, can be coupled to bus 202 for communicating information and command selections to processor 204. Another type of user input device is cursor control 216, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 204 and for controlling cursor movement on display 212. This input device typically can have two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that can allow the device to specify positions in a plane.

Computer system 200 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 200 to be a special-purpose machine. According to one embodiment, the techniques herein can be performed by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206. Such instructions may be read into main memory 206 from another storage medium, such as storage device 210. Execution of the sequences of instructions contained in main memory 206 can cause processor 204 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.

The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operation in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 210. Volatile media includes dynamic memory, such as main memory 206. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.

Storage media is distinct from but may be used in conjunction with transmission media. Transmission media can participate in transferring information between storage media. For example, transmission media can include coaxial cables, copper wire and fiber optics, including the wires that comprise bus 202. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.

Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 204 for execution. For example, the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 200 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 202. Bus 202 can carry the data to main memory 206, from which processor 204 can retrieve and execute the instructions. The instructions received by main memory 206 may optionally be stored on storage device 210 either before or after execution by processor 204.

Computer system 200 can also include a communication interface 218 coupled to bus 202. Communication interface 218 can provide a two-way data communication coupling to a network link 220 that is connected to a local network 222. For example, communication interface 218 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 218 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 220 typically can provide data communication through one or more networks to other data devices. For example, network link 220 may provide a connection through local network 222 to a host computer 224 or to data equipment operated by an Internet Service Provider (ISP) 226. ISP 226 in turn can provide data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 228. Local network 222 and Internet 228 can both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 220 and through communication interface 218, which carry the digital data to and from computer system 200, are example forms of transmission media.

Computer system 200 can send messages and receive data, including program code, through the network(s), network link 220 and communication interface 218. In the Internet example, a server 230 might transmit a requested code for an application program through Internet 228, ISP 226, local network 222 and communication interface 218.

The received code may be executed by processor 204 as it is received, and/or stored in storage device 210, or other non-volatile storage for later execution.

Benefits of Certain Embodiments

In an embodiment, a solution as described herein can yield a number of benefits compared to prior solutions:

Prior approaches to aligning and sorting genome sequence reads can require a separate sort step which runs in O(n log n), or perhaps O(n log log n) time at best. The present approach improves efficiency by eliminating an explicit sort step. For some applications, no additional computation or processing is required after alignment. If a more compact output is desired, this can be accomplished with an additional linear O(n) step, which is still faster than a full sort.

Control Systems

The present disclosure provides computer control systems that are programmed to implement methods of the disclosure. FIG. 3 shows an example computer system 1001 that is programmed or otherwise configured to sort data. The computer system 1001 can regulate various aspects of data sorting of the present disclosure, such as, for example, data alignment, mapping, networking.

The computer system 1001 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 1005, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1001 also includes memory or memory location 1010 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1015 (e.g., hard disk), communication interface 1020 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1025, such as cache, other memory, data storage and/or electronic display adapters. The memory 1010, storage unit 1015, interface 1020 and peripheral devices 1025 are in communication with the CPU 1005 through a communication bus (solid lines), such as a motherboard. The storage unit 1015 can be a data storage unit (or data repository) for storing data. The computer system 1001 can be operatively coupled to a computer network (“network”) 1030 with the aid of the communication interface 1020. The network 1030 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 1030 in some cases is a telecommunication and/or data network. The network 1030 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1030, in some cases with the aid of the computer system 1001, can implement a peer-to-peer network, which may enable devices coupled to the computer system 1001 to behave as a client or a server.

The CPU 1005 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1010. Examples of operations performed by the CPU 1005 can include fetch, decode, execute, and writeback.

The storage unit 1015 can store files, such as drivers, libraries and saved programs. The storage unit 1015 can store user data, e.g., user preferences and user programs. The computer system 1001 in some cases can include one or more additional data storage units that are external to the computer system 1001, such as located on a remote server that is in communication with the computer system 1001 through an intranet or the Internet.

The computer system 1001 can communicate with one or more remote computer systems through the network 1030. For instance, the computer system 1001 can communicate with a remote computer system of a user (e.g., operator). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1001 via the network 1030.

Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1001, such as, for example, on the memory 1010 or electronic storage unit 1015. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1005. In some cases, the code can be retrieved from the storage unit 1015 and stored on the memory 1010 for ready access by the processor 1005. In some situations, the electronic storage unit 1015 can be precluded, and machine-executable instructions are stored on memory 1010.

The code can be pre-compiled and configured for use with a machine have a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.

Aspects of the systems and methods provided herein, such as the computer system 1001, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.

Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

The computer system 1001 can include or be in communication with an electronic display that comprises a user interface (UI) for providing, for example, sorted data can be displayed to a user. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.

It should be understood from the foregoing that, while particular implementations have been illustrated and described, various modifications can be made thereto and are contemplated herein. It is also not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the preferable embodiments herein are not meant to be construed in a limiting sense. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. Various modifications in form and detail of the embodiments of the invention will be apparent to a person skilled in the art. It is therefore contemplated that the invention shall also cover any such modifications, variations and equivalents

Claims

1. A method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence; and(c) for each string data value in the plurality of string data values: mapping the string data value to obtain a position value for said string data value; andappending said string data value to a data container of the plurality of data containers, wherein said data container is associated with the position value for the string data value.
2. The method of claim 1, wherein mapping comprises: applying a non-deterministic mapping function to said string data value to obtain two or more position values associated with two or more data containers of said plurality of data containers, and two or more probability values, wherein each probability value represents a probability that said string data value is associated with a particular data container among said two or more data containers; andwherein appending comprises appending said string data value and an associated probability value of the two or more probability values to said two or more data containers associated with said two or more position values.
3. The method of claim 1, further comprising accessing said plurality of data containers in linear order based on position values associated with said plurality of data containers to identify a continuous sequence.
4. The method of claim 1, further comprising generating a compact output by: (d) creating a compact data container;(e) addressing a particular data container among said plurality of data containers;(f) copying each string data value that is in said particular data container to said compact data container;(g) repeating (e)-(f) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(h) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
5. The method of claim 1, wherein said mapping is non-injective to said genome sequence.
6. The method of claim 1, wherein each string data value of said plurality of string data values comprises a sequencing read.
7. A method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence;(c) for each string data value in the plurality of string data values;appending with a programmed computer processor i) a data item comprising a particular mapped string data value of a plurality of mapped string data values and ii) a particular probability value of a plurality of probability values associated with said particular mapped string data value to a particular data container of said plurality of data containers in a computer memory, wherein said particular data container is addressable by a position value, wherein said particular mapped string data value is mapped to said position value; and(d) outputting a continuous output sequence generated from (c).
8. The method of claim 7, wherein said particular mapped string data value is mapped to said position value by applying a mapping function.
9. The method of claim 8, wherein said mapping function is a non-deterministic mapping function.
10. The method of claim 7, further comprising generating a compact output by: (e) creating a compact data container;(f) addressing said particular data container among said plurality of data containers;(g) copying each string data value that is in said particular data container to said compact data container;(h) repeating (f)-(g) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(i) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
11. The method of claim 8, wherein said mapping is non-injective to said genome sequence.
12. The method of claim 7, wherein said particular mapped string data value comprises a sequencing read.
13. A system comprising: a string data value database;a computing node comprising a computer readable storage medium having program instructions embodied therewith, said program instructions executable by one or more processors to cause said one or more processors to perform a method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence; and(c) for each string data value in the plurality of string data values: mapping a string data value received from said string value database to obtain a position value for said string data value;appending said string data value to a data container of the plurality of data containers, wherein said data container is associated with the position value for the string data value.
14. The system of claim 13, wherein mapping comprises: applying a non-deterministic mapping function to said string data value to obtain two or more position values associated with two or more data containers of said plurality of data containers, and two or more probability values, wherein each probability value represents a probability that said string value is associated with a particular data container among said two or more data containers; andwherein appending comprises appending said string data value and an associated probability value of the two or more probability values to said two or more data containers associated with said two or more position values.
15. The system of claim 13, wherein said method further comprises accessing said plurality of data containers in linear order based on position values associated with said plurality of data containers to identify a continuous sequence.
16. The system of claim 13, wherein said method further comprises generating a compact output by: (d) creating a compact data container;(e) addressing a particular data container among said plurality of data containers;(f) copying each string data value that is in said particular data container to said compact data container;(g) repeating (e)-(f) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(h) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
17. The system of claim 13, wherein said mapping is non-injective to said genome sequence.
18. The system of claim 13, wherein each string data value of said plurality of string data values comprises a sequencing read.
19. A system comprising: a string data value database;a computing node comprising a computer readable storage medium having program instructions embodied therewith, said program instructions executable by one or more processors to cause said one or more processors to perform a method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence;(c) for each string data value in the plurality of string data values:appending with a programmed computer processor i) a data item comprising a particular mapped string data value of a plurality of mapped string data values and ii) a particular probability value of a plurality of probability values associated with said particular mapped string data value to a particular data container of the plurality of data containers in a computer memory, wherein said particular data container is addressable by a position value, wherein said particular mapped string data value is mapped to said position value; and(d) outputting a continuous output sequence generated from (c).
20. The system of claim 19, wherein said particular mapped string data value is mapped to said position value by applying a mapping function.
21. The system of claim 20, wherein said mapping function is a non-deterministic mapping function.
22. The system of claim 19, wherein said method further comprises generating a compact output by: (e) creating a compact data container;(f) addressing said particular data container among said plurality of data containers;(g) copying each string data value that is in said particular data container to said compact data container;(h) repeating (f)-(g) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(i) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
23. The system of claim 20, wherein said mapping is non-injective to said genome sequence.
24. The system of claim 19, wherein said particular mapped string data value comprises a sequencing read.
25. A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, said program instructions executable by one or more processors to cause said one or more processors to perform a method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence; and(c) for each string data value in the plurality of string data values: mapping the string data value to obtain a position value for said string data value; andappending said string data value to a data container of the plurality of data containers, wherein said data container is associated with the position value for the string data value.
26. The computer program product of claim 25, wherein mapping comprises: applying a non-deterministic mapping function to said string data value to obtain two or more position values associated with two or more data containers of said plurality of data containers, and two or more probability values, wherein each probability value represents a probability that said string value is associated with a particular data container among said two or more data containers; andwherein appending comprises appending said string data value and an associated probability value of the two or more probability values to said two or more data containers associated with said two or more position values.
27. The computer program product of claim 25, said method further comprises accessing said plurality of data containers in linear order based on position values associated with said plurality of data containers to identify a continuous sequence.
28. The computer program product of claim 25, wherein said method further comprises generating a compact output by: (d) creating a compact data container;(e) addressing a particular data container among said plurality of data containers;(f) copying each string data value that is in said particular data container to said compact data container;(g) repeating (e)-(f) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(h) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
29. The computer program product of claim 25, wherein said mapping is non-injective to said genome sequence.
30. The computer program product of claim 25, wherein each string data value of said plurality of string data values comprises a sequencing read.
31. A computer program product comprising a computer-readable storage medium having program instructions embodied therewith, said program instructions executable by one or more processor to cause said one or more processor to perform a method comprising: (a) generating a plurality of data containers, wherein each data container represents a different position on a reference genome;(b) receiving a plurality of string data values, wherein said plurality of string data values comprises a portion of a genome sequence;(c) for each string data value in the plurality of string data values:appending with a programmed computer processor i) a data item comprising a particular mapped string data value of a plurality of mapped string data values and ii) a particular probability value of a plurality of probability values associated with said particular mapped string data value to a particular data container of the plurality of data containers in a computer memory, wherein said particular data container is addressable by a position value, wherein said particular mapped string data value is mapped to said position value; and(d) outputting a continuous output sequence generated from (c).
32. The computer program product of claim 31, wherein said particular mapped string data value is mapped to said position value by applying a mapping function.
33. The computer program product of claim 32, wherein said mapping function is a non-deterministic mapping function.
34. The computer program product of claim 31, wherein said method further comprises generating a compact output by: (e) creating a compact data container;(f) addressing said particular data container among said plurality of data containers;(g) copying each string data value that is in said particular data container to said compact data container;(h) repeating (f)-(g) for all said particular data containers among said plurality of data containers, to yield a compacted output; and(i) outputting said compacted output, wherein said compact data container does not contain any data containers that contain zero data items.
35. The computer program product of claim 32, wherein said mapping is non-injective to said genome sequence.
36. The computer program product of claim 31, wherein said particular mapped string data value comprises a sequencing read.

CROSS-REFERENCE

This application is a Continuation of U.S. application Ser. No. 15/730,119, filed Oct. 11, 2017, now U.S. Pat. No. 11,030,276, which is a Continuation of U.S. application Ser. No. 14/571,120, filed Dec. 15, 2014, now U.S. Pat. No. 9,824,068, which claims the benefit of U.S. Provisional Patent Application No. 61/916,687, filed Dec. 16, 2013, each of which is incorporated herein by reference in its entirety for all purposes.

US Referenced Citations (283)

Number	Name	Date	Kind
4124638	Hansen	Nov 1978	A
5185099	Delpuech et al.	Feb 1993	A
5270183	Corbett et al.	Dec 1993	A
5478893	Ghosh et al.	Dec 1995	A
5756334	Perler et al.	May 1998	A
5846719	Brenner et al.	Dec 1998	A
5900481	Lough et al.	May 1999	A
5942609	Hunkapiller et al.	Aug 1999	A
6033880	Haff et al.	Mar 2000	A
6057149	Burns et al.	May 2000	A
6123798	Gandhi et al.	Sep 2000	A
6171850	Nagle et al.	Jan 2001	B1
6172218	Brenner	Jan 2001	B1
6176962	Soane et al.	Jan 2001	B1
6297006	Drmanac et al.	Oct 2001	B1
6306590	Mehta et al.	Oct 2001	B1
6379929	Burns et al.	Apr 2002	B1
6409832	Weigl et al.	Jun 2002	B2
6492118	Abrams et al.	Dec 2002	B1
6632606	Ullman et al.	Oct 2003	B1
6915679	Chien et al.	Jul 2005	B2
6974669	Mirkin et al.	Dec 2005	B2
7294503	Quake et al.	Nov 2007	B2
7544473	Brenner	Jun 2009	B2
7622076	Davies et al.	Nov 2009	B2
7709197	Drmanac	May 2010	B2
7745178	Dong	Jun 2010	B2
7772287	Higuchi et al.	Aug 2010	B2
7901891	Drmanac	Mar 2011	B2
7910354	Drmanac et al.	Mar 2011	B2
7927797	Nobile et al.	Apr 2011	B2
7960104	Drmanac et al.	Jun 2011	B2
7972778	Brown et al.	Jul 2011	B2
8003312	Krutzik et al.	Aug 2011	B2
8053192	Bignell et al.	Nov 2011	B2
8067159	Brown et al.	Nov 2011	B2
8168385	Brenner	May 2012	B2
8298767	Brenner et al.	Oct 2012	B2
8318433	Brenner	Nov 2012	B2
8658430	Miller et al.	Feb 2014	B2
8835358	Fodor et al.	Sep 2014	B2
8975302	Light et al.	Mar 2015	B2
9012370	Hong	Apr 2015	B2
9012390	Holtze et al.	Apr 2015	B2
9029085	Agresti et al.	May 2015	B2
9085798	Chee	Jul 2015	B2
9089844	Hiddessen et al.	Jul 2015	B2
9126160	Ness et al.	Sep 2015	B2
9156010	Colston et al.	Oct 2015	B2
9194861	Hindson et al.	Nov 2015	B2
9216392	Hindson et al.	Dec 2015	B2
9238206	Rotem et al.	Jan 2016	B2
9266104	Link	Feb 2016	B2
9290808	Fodor et al.	Mar 2016	B2
9328382	Drmanac et al.	May 2016	B2
9371598	Chee	Jun 2016	B2
9417190	Hindson et al.	Aug 2016	B2
9486757	Romanowsky et al.	Nov 2016	B2
9498761	Holtze et al.	Nov 2016	B2
9500664	Ness et al.	Nov 2016	B2
9567631	Hindson et al.	Feb 2017	B2
9593365	Frisen et al.	Mar 2017	B2
9623384	Hindson et al.	Apr 2017	B2
9637799	Fan et al.	May 2017	B2
9644204	Hindson et al.	May 2017	B2
9689024	Hindson et al.	Jun 2017	B2
9701998	Hindson et al.	Jul 2017	B2
9764322	Hiddessen et al.	Sep 2017	B2
9824068	Wong	Nov 2017	B2
9868979	Chee et al.	Jan 2018	B2
9879313	Chee et al.	Jan 2018	B2
9946577	Stafford et al.	Apr 2018	B1
9951386	Hindson et al.	Apr 2018	B2
9957558	Leamon et al.	May 2018	B2
9975122	Masquelier et al.	May 2018	B2
10011872	Belgrader et al.	Jul 2018	B1
10017759	Kaper et al.	Jul 2018	B2
10030261	Frisen et al.	Jul 2018	B2
10059989	Giresi et al.	Aug 2018	B2
10119167	Srinivasan et al.	Nov 2018	B2
10221436	Hardenbol et al.	Mar 2019	B2
10221442	Hindson et al.	Mar 2019	B2
10253364	Hindson et al.	Apr 2019	B2
10273541	Hindson et al.	Apr 2019	B2
10323279	Hindson et al.	Jun 2019	B2
10347365	Wong et al.	Jul 2019	B2
10357771	Bharadwaj et al.	Jul 2019	B2
10395758	Schnall-Levin	Aug 2019	B2
10400280	Hindson et al.	Sep 2019	B2
10428326	Belhocine et al.	Oct 2019	B2
10533221	Hindson et al.	Jan 2020	B2
10544413	Bharadwaj et al.	Jan 2020	B2
10549279	Bharadwaj et al.	Feb 2020	B2
10590244	Delaney et al.	Mar 2020	B2
10745742	Bent et al.	Aug 2020	B2
10752949	Hindson et al.	Aug 2020	B2
10774374	Frisen et al.	Sep 2020	B2
10815525	Lucero et al.	Oct 2020	B2
10829815	Bharadwaj et al.	Nov 2020	B2
10837047	Delaney et al.	Nov 2020	B2
10995333	Pfeiffer	May 2021	B2
11183269	Sanborn	Nov 2021	B2
11459607	Terry et al.	Oct 2022	B1
11467153	Belhocine et al.	Oct 2022	B2
11655499	Pfeiffer	May 2023	B1
20010044109	Mandecki	Nov 2001	A1
20020005354	Spence et al.	Jan 2002	A1
20020034737	Drmanac	Mar 2002	A1
20020051971	Stuelpnagel et al.	May 2002	A1
20020119455	Chan	Aug 2002	A1
20020127736	Chou et al.	Sep 2002	A1
20030036206	Chien et al.	Feb 2003	A1
20030044836	Levine et al.	Mar 2003	A1
20030075446	Culbertson et al.	Apr 2003	A1
20030078686	Ma	Apr 2003	A1
20030108897	Drmanac	Jun 2003	A1
20030124509	Kenis et al.	Jul 2003	A1
20030207260	Trnovsky et al.	Nov 2003	A1
20050221339	Griffiths et al.	Oct 2005	A1
20050250147	Macevicz	Nov 2005	A1
20050266582	Modlin et al.	Dec 2005	A1
20060177832	Brenner	Aug 2006	A1
20060199193	Koo et al.	Sep 2006	A1
20060275782	Gunderson et al.	Dec 2006	A1
20070020640	McCloskey et al.	Jan 2007	A1
20070042419	Barany et al.	Feb 2007	A1
20070111241	Cereb et al.	May 2007	A1
20070172873	Brenner et al.	Jul 2007	A1
20070190543	Livak	Aug 2007	A1
20070196397	Torii et al.	Aug 2007	A1
20070207060	Zou et al.	Sep 2007	A1
20080056948	Dale et al.	Mar 2008	A1
20080166720	Hsieh et al.	Jul 2008	A1
20080241820	Krutzik et al.	Oct 2008	A1
20080242560	Gunderson et al.	Oct 2008	A1
20080268431	Choy et al.	Oct 2008	A1
20090005252	Drmanac et al.	Jan 2009	A1
20090011943	Drmanac et al.	Jan 2009	A1
20090047713	Handique	Feb 2009	A1
20090048124	Leamon et al.	Feb 2009	A1
20090099041	Church et al.	Apr 2009	A1
20090118488	Drmanac et al.	May 2009	A1
20090131543	Weitz et al.	May 2009	A1
20090137404	Drmanac et al.	May 2009	A1
20090137414	Drmanac et al.	May 2009	A1
20090143244	Bridgham et al.	Jun 2009	A1
20090148961	Luchini et al.	Jun 2009	A1
20090155563	Petsev et al.	Jun 2009	A1
20090155781	Drmanac et al.	Jun 2009	A1
20090202984	Cantor	Aug 2009	A1
20090203531	Kurn	Aug 2009	A1
20090235990	Beer	Sep 2009	A1
20090264299	Drmanac et al.	Oct 2009	A1
20090269248	Falb et al.	Oct 2009	A1
20090286687	Dressman et al.	Nov 2009	A1
20100021973	Makarov et al.	Jan 2010	A1
20100022414	Link et al.	Jan 2010	A1
20100035254	Williams	Feb 2010	A1
20100069263	Shendure et al.	Mar 2010	A1
20100086914	Bentley et al.	Apr 2010	A1
20100105866	Fraden et al.	Apr 2010	A1
20100130369	Shenderov et al.	May 2010	A1
20100184928	Kumacheva	Jul 2010	A1
20100216153	Lapidus et al.	Aug 2010	A1
20100248991	Roesler et al.	Sep 2010	A1
20100304982	Hinz et al.	Dec 2010	A1
20110033854	Drmanac et al.	Feb 2011	A1
20110091366	Kendall et al.	Apr 2011	A1
20110160078	Fodor et al.	Jun 2011	A1
20110257889	Klammer et al.	Oct 2011	A1
20110263457	Krutzik et al.	Oct 2011	A1
20110281738	Drmanac et al.	Nov 2011	A1
20110319281	Drmanac	Dec 2011	A1
20120015822	Weitz et al.	Jan 2012	A1
20120041727	Mishra et al.	Feb 2012	A1
20120071331	Casbon et al.	Mar 2012	A1
20120089339	Ganeshalingam et al.	Apr 2012	A1
20120135893	Drmanac et al.	May 2012	A1
20120184449	Hixson et al.	Jul 2012	A1
20120219947	Yurkovetsky et al.	Aug 2012	A1
20120230338	Ganeshalingam et al.	Sep 2012	A1
20120233202	Ganeshalingam et al.	Sep 2012	A1
20130060797	Saunier	Mar 2013	A1
20130130919	Chen et al.	May 2013	A1
20130157870	Pushkarev et al.	Jun 2013	A1
20130157899	Adler, Jr. et al.	Jun 2013	A1
20130185096	Giusti et al.	Jul 2013	A1
20130203605	Shendure et al.	Aug 2013	A1
20130246460	Maltbie et al.	Sep 2013	A1
20130268206	Porreca et al.	Oct 2013	A1
20130282330	Przestrzelski	Oct 2013	A1
20130311106	White et al.	Nov 2013	A1
20130317755	Mishra et al.	Nov 2013	A1
20140200166	Van Rooyen et al.	Jul 2014	A1
20140214334	Plattner et al.	Jul 2014	A1
20140221239	Carman et al.	Aug 2014	A1
20140227706	Kato et al.	Aug 2014	A1
20140272996	Bemis	Sep 2014	A1
20140274740	Srinivasan et al.	Sep 2014	A1
20140323316	Drmanac et al.	Oct 2014	A1
20140338753	Sperling et al.	Nov 2014	A1
20140378322	Hindson et al.	Dec 2014	A1
20140378345	Hindson et al.	Dec 2014	A1
20140378349	Hindson et al.	Dec 2014	A1
20140378350	Hindson et al.	Dec 2014	A1
20150005199	Hindson et al.	Jan 2015	A1
20150005200	Hindson et al.	Jan 2015	A1
20150011430	Saxonov	Jan 2015	A1
20150011432	Saxonov et al.	Jan 2015	A1
20150066385	Schnall-Levin et al.	Mar 2015	A1
20150111256	Church et al.	Apr 2015	A1
20150133344	Shendure et al.	May 2015	A1
20150218633	Hindson et al.	Aug 2015	A1
20150220532	Wong	Aug 2015	A1
20150267191	Steelman et al.	Sep 2015	A1
20150298091	Weitz et al.	Oct 2015	A1
20150361418	Reed	Dec 2015	A1
20150376605	Jarosz et al.	Dec 2015	A1
20150376609	Hindson et al.	Dec 2015	A1
20150376700	Schnall-Levin et al.	Dec 2015	A1
20150379196	Schnall-Levin et al.	Dec 2015	A1
20160008778	Weitz et al.	Jan 2016	A1
20160024558	Hardenbol et al.	Jan 2016	A1
20160024572	Shishkin et al.	Jan 2016	A1
20160053253	Salathia et al.	Feb 2016	A1
20160059204	Hindson et al.	Mar 2016	A1
20160060621	Agresti et al.	Mar 2016	A1
20160122753	Mikkelsen et al.	May 2016	A1
20160122817	Jarosz et al.	May 2016	A1
20160203196	Schnall-Levin et al.	Jul 2016	A1
20160232291	Kyriazopoulou-Panagiotopoulou et al.	Aug 2016	A1
20160244809	Belgrader et al.	Aug 2016	A1
20160281160	Jarosz et al.	Sep 2016	A1
20160289769	Schwartz et al.	Oct 2016	A1
20160304860	Hindson et al.	Oct 2016	A1
20160314242	Schnall-Levin et al.	Oct 2016	A1
20160348093	Price et al.	Dec 2016	A1
20160350478	Chin et al.	Dec 2016	A1
20170016041	Greenfield et al.	Jan 2017	A1
20170128937	Hung et al.	May 2017	A1
20170144161	Hindson et al.	May 2017	A1
20170145476	Ryvkin et al.	May 2017	A1
20170159109	Zheng et al.	Jun 2017	A1
20170235876	Jaffe et al.	Aug 2017	A1
20170260584	Zheng et al.	Sep 2017	A1
20180030515	Regev et al.	Feb 2018	A1
20180080075	Brenner et al.	Mar 2018	A1
20180105808	Mikkelsen et al.	Apr 2018	A1
20180265928	Schnall-Levin et al.	Sep 2018	A1
20180312822	Lee et al.	Nov 2018	A1
20180312873	Zheng	Nov 2018	A1
20180340169	Belhocine et al.	Nov 2018	A1
20180371545	Wong et al.	Dec 2018	A1
20190060890	Bharadwaj et al.	Feb 2019	A1
20190060905	Bharadwaj et al.	Feb 2019	A1
20190064173	Bharadwaj et al.	Feb 2019	A1
20190071656	Chang et al.	Mar 2019	A1
20190085391	Hindson et al.	Mar 2019	A1
20190127731	McDermott	May 2019	A1
20190134633	Bharadwaj et al.	May 2019	A1
20190136316	Hindson et al.	May 2019	A1
20190153532	Bharadwaj et al.	May 2019	A1
20190176152	Bharadwaj et al.	Jun 2019	A1
20190177800	Boutet et al.	Jun 2019	A1
20190249226	Bent et al.	Aug 2019	A1
20190323088	Boutet et al.	Oct 2019	A1
20190345636	McDermott et al.	Nov 2019	A1
20190352717	Schnall-Levin	Nov 2019	A1
20190367997	Bent et al.	Dec 2019	A1
20190376118	Belhocine et al.	Dec 2019	A1
20200002763	Belgrader et al.	Jan 2020	A1
20200005902	Mellen et al.	Jan 2020	A1
20200020417	Schnall-Levin et al.	Jan 2020	A1
20200032335	Martinez	Jan 2020	A1
20200033237	Hindson et al.	Jan 2020	A1
20200033366	Alvarado Martinez	Jan 2020	A1
20200056223	Bell	Feb 2020	A1
20200105373	Zheng	Apr 2020	A1
20200263232	Bell et al.	Aug 2020	A1
20200291454	Belhocine et al.	Sep 2020	A1
20200407775	Bharadwaj et al.	Dec 2020	A1
20210190770	Delaney et al.	Jun 2021	A1
20210270703	Abousoud	Sep 2021	A1

Foreign Referenced Citations (129)

Number	Date	Country
1841879	Oct 2007	EP
2635679	Apr 2017	EP
2097692	May 1985	GB
2006507921	Mar 2006	JP
2007193708	Aug 2007	JP
2012525147	Oct 2012	JP
WO-8402000	May 1984	WO
WO-9530782	Nov 1995	WO
WO-9641011	Dec 1996	WO
WO-9909217	Feb 1999	WO
WO-9952708	Oct 1999	WO
WO-2000008212	Feb 2000	WO
WO-0026412	May 2000	WO
WO-2001002850	Jan 2001	WO
WO-0114589	Mar 2001	WO
WO-0190418	Nov 2001	WO
WO-0231203	Apr 2002	WO
WO-03096223	Nov 2003	WO
WO-2004010106	Jan 2004	WO
WO-2004065617	Aug 2004	WO
WO-2005040406	May 2005	WO
WO-2005049787	Jun 2005	WO
WO-2006030993	Mar 2006	WO
WO-2006040551	Apr 2006	WO
WO-2007081387	Jul 2007	WO
WO-2007139766	Dec 2007	WO
WO-2007147079	Dec 2007	WO
WO-2008134153	Nov 2008	WO
WO-2008150432	Dec 2008	WO
WO-2009015296	Jan 2009	WO
WO-2009023821	Feb 2009	WO
WO-2009085215	Jul 2009	WO
WO-2009152928	Dec 2009	WO
WO-2010004018	Jan 2010	WO
WO-2010033200	Mar 2010	WO
WO-2010104604	Sep 2010	WO
WO-2010115154	Oct 2010	WO
WO-2010117620	Oct 2010	WO
WO-2010127304	Nov 2010	WO
WO-2010148039	Dec 2010	WO
WO-2011028539	Mar 2011	WO
WO-2011066476	Jun 2011	WO
WO-2011074960	Jun 2011	WO
WO-2012048341	Apr 2012	WO
WO-2012055929	May 2012	WO
WO-2012061832	May 2012	WO
WO-2012100216	Jul 2012	WO
WO-2012106546	Aug 2012	WO
WO-2012112804	Aug 2012	WO
WO-2012112970	Aug 2012	WO
WO-2012116331	Aug 2012	WO
WO-2012142531	Oct 2012	WO
WO-2012142611	Oct 2012	WO
WO-2012149042	Nov 2012	WO
WO-2012166425	Dec 2012	WO
WO-2012167142	Dec 2012	WO
WO-2013019751	Feb 2013	WO
WO-2013035114	Mar 2013	WO
WO-2013036929	Mar 2013	WO
WO-2013055955	Apr 2013	WO
WO-2013096643	Jun 2013	WO
WO-2013123125	Aug 2013	WO
WO-2013126741	Aug 2013	WO
WO-2013134261	Sep 2013	WO
WO-2013177220	Nov 2013	WO
WO-2014028378	Feb 2014	WO
WO-2014028537	Feb 2014	WO
WO-2014093676	Jun 2014	WO
WO-2014108810	Jul 2014	WO
WO-2014132497	Sep 2014	WO
WO-2014165559	Oct 2014	WO
WO-2015015199	Feb 2015	WO
WO-2015044428	Apr 2015	WO
WO-2015164212	Oct 2015	WO
WO-2015200891	Dec 2015	WO
WO-2016040476	Mar 2016	WO
WO-2016061517	Apr 2016	WO
WO-2016126871	Aug 2016	WO
WO-2016130578	Aug 2016	WO
WO-2016168584	Oct 2016	WO
WO-2017015075	Jan 2017	WO
WO-2017066231	Apr 2017	WO
WO-2017180949	Oct 2017	WO
WO-2017184707	Oct 2017	WO
WO-2017197343	Nov 2017	WO
WO-2018039338	Mar 2018	WO
WO-2018091676	May 2018	WO
WO-2018119301	Jun 2018	WO
WO-2018119447	Jun 2018	WO
WO-2018172726	Sep 2018	WO
WO-2018191701	Oct 2018	WO
WO-2018213643	Nov 2018	WO
WO-2018226546	Dec 2018	WO
WO-2018236615	Dec 2018	WO
WO-2019028166	Feb 2019	WO
WO-2019040637	Feb 2019	WO
WO-2019083852	May 2019	WO
WO-2019084043	May 2019	WO
WO-2019084165	May 2019	WO
WO-2019108851	Jun 2019	WO
WO-2019113235	Jun 2019	WO
WO-2019118355	Jun 2019	WO
WO-2019126789	Jun 2019	WO
WO-2019148042	Aug 2019	WO
WO-2019152108	Aug 2019	WO
WO-2019157529	Aug 2019	WO
WO-2019165318	Aug 2019	WO
WO-2019169028	Sep 2019	WO
WO-2019169347	Sep 2019	WO
WO-2019191321	Oct 2019	WO
WO-2019217758	Nov 2019	WO
WO-2020028882	Feb 2020	WO
WO-2020041148	Feb 2020	WO
WO-2020142779	Jul 2020	WO
WO-2020167862	Aug 2020	WO
WO-2020167866	Aug 2020	WO
WO-2020168013	Aug 2020	WO
WO-2020198532	Oct 2020	WO
WO-2021046475	Mar 2021	WO
WO-2021133845	Jul 2021	WO
WO-2021207610	Oct 2021	WO
WO-2021212042	Oct 2021	WO
WO-2021222302	Nov 2021	WO
WO-2021222301	Nov 2021	WO
WO-2022103712	May 2022	WO
WO-2022182682	Sep 2022	WO
WO-2022182785	Sep 2022	WO
WO-2022271908	Dec 2022	WO
WO-2023076528	May 2023	WO

Non-Patent Literature Citations (167)

Entry
10X Genomics, Inc. CG000153 Rev A. Chromium Single Cell DNA Reagent Kits User Guide. 2018.
10X Genomics, Inc. CG000184 Rev A. Chromium Single Cell 3' Reagent Kits v3 User Guide with Feature Barcoding Technology for CRISPR Screening. 2018.
10X Genomics, Inc. CG000185 Rev B. Chromium Single Cell 3' Reagent Kits User Guide with Feature Barcoding Technology for Cell Surface Protein. 2018.
10X Genomics, Inc. CG000208 Rev E. Chromium Next GEM Single Cell V(D)J reagent Kits v1.1 User Guide with Feature Barcode Technology for Cell Surface Protein. 2020.
10X Genomics, Inc. CG000209 Rev D. Chromium Next GEM Single Cell ATAC Reagent Kits v1.1 User Guide. 2020.
10X Genomics, Inc. CG000239 Rev B. Visium Spatial Gene Expression Reagent Kits User Guide. 2020.
10X Genomics, Inc. CG00026. Chromium Single Cell 3' Reagent Kit User Guide. 2016.
10X Genomics, Inc. LIT00003 Rev B Chromium Genome Solution Application Note. 2017.
Abate, A.R. et al. “Beating Poisson encapsulation statistics using close-packed ordering” Lab on a Chip (Sep. 21, 2009) 9(18):2628-2631.
Adamson et al., “Production of arrays of chemically distinct nanolitre plugs via repeated splitting in microfluidic devices”, Lab Chip 6(9): 1178-1186 (Sep. 2006).
Agasti, S.S. et al. “Photocleavable DNA barcode-antibody conjugates allow sensitive and multiplexed protein analysis in single cell” J Am Chem Soc (2012) 134(45):18499-18502.
Aitman, et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature. Feb. 16, 2006;439(7078):851-5.
Altemos et al., “Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly,” PLOS Computational Biology, May 15, 2014, vol. 10, Issue 5, 14 pages.
Amini, S. et al. “Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing” Nature Genetics (2014) 46:1343-1349 doi:10.1038/ng.3119.
Balikova, et al. Autosomal-dominant microtia linked to five tandem copies of a copy-number-variable region at chromosome 4p16. Am J Hum Genet. Jan. 2008;82(1):181-7. doi:10.1016/j.ajhg.2007.08.001.
Bansal et al. “An MCMC algorithm for haplotype assembly from whole-genome sequence data,” (2008) Genome Res 18:1336-1346.
Bansal et al. “HapCUT: an efficient and accurate algorithm for the haplotype assembly problem,” Bioinformatics (2008) 24:i153-i159.
Baret, “Surfactants in droplet-based microfluidics” Lab Chip (12(3):422-433 (2012).
Bedtools: General Usage, http://bedtools.readthedocs.io/en/latest/content/generalusage.html; Retrieved from the Internet Jul. 8, 2016.
Beer et al. On-Chip, Real-Time, Single-Copy Polymerase Chain Reaction in Picoliter Droplets. Anal Chem 79:8471-8475 (2007).
Braeckmans et al., Scanning the Code. Modern Drug Discovery. 2003:28-32.
Bray, “The JavaScript Object Notation (JSON) Data Interchange Format,” Mar. 2014, retrieved from the Internet Feb. 15, 2015; https://tools.ietf.org/html/rfc7159.
Brenner, et al. “In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs.” Proc Natl Acad Sci U S A. Feb. 15, 2000;97(4):1665-70.
Browning, et al. Haplotype phasing: existing methods and new developments. Nat Rev Genet. Sep. 16, 2011;12(10):703-14. doi: 10.1038/nrg3054. Review.
Buchman GW, et al. Selective RNA amplification: a novel method using dUMP-containing primers and uracil DNA glycosylase. PCR Methods Appl. Aug. 1993; 3(1):28-31.
Burns, et al. An Integrated Nanoliter DNA Analysis Device. Science. Oct. 16, 1998;282(5388):484-7.
Burns, et al. Microfabricated structures for integrated DNA analysis. Proc Natl Acad Sci U S A. May 28, 1996; 93(11): 5556-5561.
Burns, et al. The intensification of rapid reactions in multiphase systems using slug flow in capillaries. Lab Chip. Sep. 2001;1(1):10-15. Epub Aug. 9, 2001.
Cappuzzo, et al. Increased HER2 gene copy number is associated with response to gefitinib therapy in epidermal growth factor receptor-positive non-small-cell lung cancer patients. J Clin Oncol. Aug. 1, 2005;23(22):5007-18.
Chen et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods (2009) 6(9):677-681.
Chien et al. “Multiport flow-control system for lab-on-a-chip microfluidic devices”, Fresenius J. Anal Chem, 371:106-111 (Jul. 27, 2001).
Choi, et al. Identification of novel isoforms of the EML4-ALK transforming gene in non-small cell lung cancer. Cancer Res. Jul. 1, 2008;68(13):4971-6. doi: 10.1158/0008-5472.CAN-07-6158.
Chokkalingam, et al. Probing cellular heterogeneity in cytokine-secreting immune cells using droplet-based microfluidics. Lab Chip. Dec. 21, 2013;13(24):4740-4. doi: 10.1039/c3lc50945a.
Cleary et al. “Joint variant and de novo mutation identification on pedigrees from highthroughput sequencing data,” J Comput Biol (2014) 21:405-419.
Cook, et al. Copy-number variations associated with neuropsychiatric conditions. Nature. Oct. 16, 2008;455(7215):919-23. doi: 10.1038/nature07458.
Co-pending U.S. Appl. No. 15/985,388, inventor Schnall-Levin; Michael, filed May 21, 2018.
Co-pending U.S. Appl. No. 16/434,076, inventor Giresi; Paul, filed Jun. 6, 2019.
Co-pending U.S. Appl. No. 16/434,084, inventor Giresi; Paul, filed Jun. 6, 2019.
Co-pending U.S. Appl. No. 16/434,102, inventors Price; Andrew D. et al., filed Jun. 6, 2019.
Co-pending U.S. Appl. No. 16/708,214, inventors Wheeler; Tobias Daniel et al., filed Dec. 9, 2019.
Co-pending U.S. Appl. No. 16/737,762, inventors Price; Andrew D. et al., filed Jan. 8, 2020.
Co-pending U.S. Appl. No. 16/737,770, inventors Belhocine; Zahara Kamila et al., filed Jan. 8, 2020.
Co-pending U.S. Appl. No. 16/789,273, inventors Maheshwari; Arundhati Shamoni et al., filed Feb. 12, 2020.
Co-pending U.S. Appl. No. 16/800,450, inventor Katherine; Pfeiffer, filed Feb. 25, 2020.
Damean, et al. Simultaneous measurement of reactions in microdroplets filled by concentration gradients. Lab Chip. Jun. 21, 2009;9(12):1707-13. doi: 10.1039/b821021g. Epub Mar. 19, 2009.
Depristo et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet 43:491-498 (2011).
Duffy et al., Rapid Protyping of Microfluidic Systems and Polydimethylsiloxane, Anal Chem 70:4974-4984 (1998).
Ekblom, R. et al. “A field guide to whole-genome sequencing, assembly and annotation” Evolutionary Apps (Jun. 24, 2014) 7(9):1026-1042.
Fabi, et al. Correlation of efficacy between EGFR gene copy number and lapatinib/capecitabine therapy in HER2-positive metastatic breast cancer. J. Clin. Oncol. 2010; 28:15S. 2010 ASCO Meeting abstract Jun. 14, 2010:1059.
Fisher, et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 2011;12(1):R1. doi: 10.1186/GB-2011-12-1-r1. Epub Jan. 4, 2011.
Fulton, et al. Advanced multiplexed analysis with the FlowMetrix system. Clin Chem. Sep. 1997;43(9):1749-56.
Gericke, et al. Functional cellulose beads: preparation, characterization, and applications. Chemical reviews 113.7 (2013): 4812-4836.
Ghadessy, et al. Directed evolution of polymerase function by compartmentalized self-replication. Proc Natl Acad Sci USA. 2001;98:4552-4557.
Gonzalez, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. Mar. 4, 2005;307(5714):1434-40. Epub Jan. 6, 2005.
Gordon et al. “Consed: A Graphical Tool for Sequence Finishing,” Genome Research (1998) 8:198-202.
Hiatt, et al. Parallel, tag-directed assembly of locally derived short sequence reads. Nat Methods. Feb. 2010;7(2):119-22. Epub Jan. 17, 2010.
Huang et al. EagleView: A genome assembly viewer for next-generation sequencing technologies, Genome Research (2008) 18:1538-1543.
Huebner, “Quantitative detection of protein expression in single cells using droplet microfluidics”, Chem. Commun. 1218-1220 (2007).
Hug, et al. Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation. J Theor Biol. Apr. 21, 2003;221(4):615-24.
Islam, et al. Highly multiplexed and strand-specific single-cell RNA 5' end sequencing. Nat Protoc. Apr. 5, 2012;7(5):813-28. doi: 10.1038/nprot.2012.022.
Jaitin, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science. Feb. 14, 2014;343(6172):776-9. doi: 10.1126/science.1247651.
Jarosz, M. et al. “Using 1ng of DNA to detect haplotype phasing and gene fusions from whole exome sequencing of cancer cell lines” Cancer Res (2015) 75(supp15):4742.
Kanehisa et al. “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Research (2000) 28:27-30.
Kaper, et al. Supporting Information for “Whole-genome haplotyping by dilution, amplification, and sequencing.” Proc Natl Acad Sci U S A. Apr. 2, 2013;110(14):5552-7. doi: 10.1073/pnas.1218696110. Epub Mar. 18, 2013.
Kaper, et al. Whole-genome haplotyping by dilution, amplification, and sequencing. Proc Natl Acad Sci U S A. Apr. 2, 2013;110(14):5552-7. doi: 10.1073/pnas.1218696110. Epub Mar. 18, 2013.
Kenis, et al. Microfabrication Inside Capillaries Using Multiphase Laminar Flow Patterning. Science. 1999; 285:83-85.
Khomiakova et al., Analysis of perfect and mismatched DNA duplexes by a generic hexanucleotide microchip. Mol Biol(Mosk). Jul.-Aug. 2003;37(4):726-41. Russian. Abstract only.
Kim et al. “HapEdit: an accuracy assessment viewer for haplotype assembly using massively parallel DNA-sequencing technologies,” Nucleic Acids Research (2011) pp. W557-W561.
Kirkness et al. Sequencing of isolated sperm cells for direct haplotyping of a human genome, â€ Genome Res (2013) 23:826-832.
Kitzman et al. “Haplotype-resolved genome sequencing of a Gujarati Indian individual.” Nat Biotechnol (2011) 29:59-63.
Kivioja, et al. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. Nov. 20, 2011;9(1):72-4.
Klein, et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. May 21, 2015; 161:1187-1201.
Knight, et al. Subtle chromosomal rearrangements in children with unexplained mental retardation. Lancet. Nov. 13, 1999;354(9191):1676-81.
Korlach et al., Methods in Enzymology, Real-Time DNA Sequencing from Single Polymerase Molecules, (2010) 472:431-455.
Koster et al., “Drop-based microfluidic devices for encapsulation of single cells”, Lab on a Chip The Royal Soc. of Chern. 8: 1110-1115 (2008).
Lagally, et al. Single-Molecular DNA Amplification and Analysis in an Integrated Microfluidic Device. Anal Chem. Feb. 1, 2001;73(3):565-70.
Layer et al. “LUMPY: A probabilistic framework for structural variant discovery,” Genome Biology (2014) 15(6):R84.
Lennon et al. A scalable, fully automated process for construction of sequence-ready barcoded libraries for 454. Genome Biology 11:R15 (2010).
Li, et al. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26.5 (2010): 589-595.
Lippert et al. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem, Brief. Bionform (2002) 3:23-31.
Lo, et al. On the design of clone-based haplotyping. Genome Biol. 2013;14(9):R100.
Lupski. Genomic rearrangements and sporadic disease. Nat Genet. Jul. 2007;39(7 Suppl):S43-7.
Macosko, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. May 21, 2015;161(5):1202-14. doi: 10.1016/j.cell.2015.05.002.
Madl, et al. “Bioorthogonal Strategies for Engineering Extracellular matrices”, Madal, Chritopher, Adv. Funct. Master. Jan. 19, 2018, vol. 28, 1706046, pp. 1-21.
Marcus. Gene method offers diagnostic hope. The Wall Street Journal. Jul. 11, 2012.
Margulies 2005 Supplementary methods (Year: 2005).
Margulies et al. “Genome sequencing in microfabricated high-density picoliter reactors”, Nature (2005) 437:376-380.
Mccoy, R. et al. “Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements” PLOS (2014) 9(9):e1016689.
Mckenna, Aaron et al. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing next-Generation DNA Sequencing Data.” Genome Research 20.9 (2010): 1297-1303. PMC. Web. Feb. 2, 2017.
Miller et al. “Assembly Algorithms for next-generation sequencing data,” Genomics, 95 (2010), pp. 315-327.
Mirzabekov, “DNA Sequencing by Hybridization—a Megasequencing Method and A Diagnostic Tool?” Trends in Biotechnology 12(1): 27-32 (1994).
Mouritzen et al., Single nucleotide polymorphism genotyping using locked nucleic acid (LNa). Expert Rev Mol Diagn. Jan. 2003;3(1):27-38.
Myllykangas et al. “Efficient targeted resequencing of human germline and cancer genomes by oligonucleotide-selective sequencing,” Nat Biotechnol, (2011) 29:1024-1027.
Navin. The first five years of single-cell cancer genomics and beyond. Genome Res. Oct. 2015;25(10):1499-507. doi: 10.1101/gr.191098.115.
Nisisako, et al. Droplet formation in a microchannel network. Lab Chip. Feb. 2002;2(1):24-6. Epub Jan. 18, 2002.
Nisisako, T. et al. Droplet Formation in a Microchannel on PMMA Plate. Micro Total Analysis Systems. 2001. Kluwer Academic Publishers. pp. 137-138.
Nisisako, T. et al., Microfluidics large-scale integration on a chip for mass production of monodisperse droplets and particles, The Royal Society of Chemistry: Lab Chip, (Nov. 23, 2007) 8:287-293.
Novak, R. et al., “Single cell multiplex gene detection and sequencing using microfluidicallygenerated agarose emulsions” Angew. Chem. Int. Ed. Engl. (2011) 50(2):390-395.
Orakdogen, N. “Novel responsive poly(N,N-dimethylaminoethyl methacrylate) gel beads: preparation, mechanical properties and pH-dependent swelling behavior” J Polym Res (2012) 19:9914.
Perrott, Jimmy. Optimization and Improvement of Emulsion PCR for the Ion Torrent Next-Generation Sequencing Platform. (2011) Thesis.
Peters, B.A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature, 487(7406):190-195 (Jul. 11, 2012).
Pinto, et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. Jul. 15, 2010;466(7304):368-72. doi: 10.1038/nature09146. Epub Jun. 9, 2010.
Priest, et al. Generation of Monodisperse Gel Emulsions in a Microfluidic Device, Applied Physics Letters, 88:024106 (2006).
Pushkarev et al. Single-molecule sequencing of an individual human genome, Nature Biotech (2009) 17:847-850.
Ramsey, J.M. “The burgeoning power of the shrinking laboratory” Nature Biotech (1999) 17:1061-1062.
Ramskold et al. (2012) “Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells” Nature Biotechnology 30(8):777-782.
Ritz, A. et al. “Characterization of structural variants with single molecule and hybrid sequencing approaches” Bioinformatics (2014) 30(24):3458-3466.
Roche. Using Multiplex Identifier (MID) Adaptors for the GS FLX Titanium Chemistry Basic MID Set Genome Sequencer FLX System, Technical Bulletin 004-2009, (Apr. 1, 2009) pp. 1-11. URL:http://454.com/downloads/my454/documentation/technical-bulletins/TCB-09004 Using MultiplexIdentifierAdaptorsForTheGSFLXTitaniumSeriesChemistry-BasicMIDSet.pdf.
Ropers. New perspectives for the elucidation of genetic disorders. Am J Hum Genet. Aug. 2007;81(2):199-207. Epub Jun. 29, 2007.
Rotem, A. et al., “High-throughput single-cell labeling (Hi-SCL) for RNA-Seq using drop-based microfluidics” PLOS One (May 22, 2015) 0116328 (14 pages).
Saikia, et al. Simultaneous multiplexed amplicon sequencing and transcriptome profiling in single cells. Nat Methods. Jan. 2019;16(1):59-62. doi: 10.1038/s41592-018-0259-9. Epub Dec. 17, 2018.
Schmitt, “Bead-based multiplex genotyping of human papillomaviruses”, J. Clinical Microbial., 44:2 504-512 (2006).
Schubert, et al. Microemulsifying fluorinated oils with mixtures of fluorinated and hydrogenated surfactants. Colloids and Surfaces A; Physicochemical and Engineering Aspects, 84(1994) 97-106.
Sebat, et al. Strong association of de novo copy number mutations with autism. Science. Apr. 20, 2007;316(5823):445-9. Epub Mar. 15, 2007.
Seiffert, et al. Microfluidic fabrication of smart microgels from macromolecular precursors.Polymer. vol. 51, Issue 25, Nov. 26, 2010, pp. 5883-5889.
Seiffert, S. et al., “Smart microgel capsules from macromolecular precursors” J. Am. Chem. Soc. (2010) 132:6606-6609.
Shah, et al. “Fabrication of mono disperse thermosensitive microgels and gel capsules in micro fluidic devices”, Soft Matter, 4:2303-2309 (2008).
Shendure, et al., Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 309.5741 (Sep. 2005): 1728-1732. XP002427180, ISSN: 0036-8075, DOI: 10.1126/SCIENCE.1117839.
Shlien, et al. Copy number variations and cancer. Genome Med. Jun. 16, 2009;1(6):62. doi: 10.1186/gm62.
Shlien, et al. Excessive genomic DNA copy number variation in the Li-Fraumeni cancer predisposition syndrome. Proc Natl Acad Sci U S A. Aug. 12, 2008;105(32):11264-9. doi: 10.1073/pnas.0802970105. Epub Aug. 6, 2008.
Simeonov et al., Single nucleotide polymorphism genotyping using short, fluorescently labeled locked nucleic acid (LNA) probes and fluorescence polarization detection. Nucleic Acids Res. Sep. 1, 2002;30(17):e91.
Smith, et al. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Research, 38(13): e142 (2010).
Song, et al. Reactions in droplets in microfluidic channels. Angew Chem Int Ed Engl. Nov. 13, 2006;45(44):7336-56.
Sorokin et al., Discrimination between perfect and mismatched duplexes with oligonucleotide gel microchips: role of thermodynamic and kinetic effects during hybridization. J Biomol Struct Dyn. Jun. 2005;22(6):725-34.
SSH Tunnel—Local and Remote Port Forwarding Explained With Examples, Trackets Blog, http://blog.trackets.com/2014/05/17/ssh-tunnel-local-and-remote-port-forwarding-explained with- examples.html; Retrieved from the Internet Jul. 7, 2016.
Tewhey, et al. Microdroplet-based PCR amplification for large-scale targeted sequencing. Nat Biotechnol. Nov. 2009;27(11):1025-31. doi: 10.1038/nbt.1583. Epub Nov. 1, 2009.
Thaxton, C.S. et al. “A Bio-Bar-Code Assay Based Upon Dithiothreitol Oligonucleotide Release” Anal Chem (2005) 77:8174-8178.
The SAM/BAM Format Specificatio Working Group, “Sequence Allignment/ Map Format Specification,” Sep. 6, 2016.
Thorsen, et al. Dynamic pattern formation in a vesicle-generating microfluidic device. Physical Review Letters. American Physical Society. 2001; 86(18):4163-4166.
Turchinovich, et al. “Capture and Amplification by Tailing and Switching (CATS): An Ultrasensitive Ligation—Independent Method for Generation of DNA Libraries for Deep Sequencing from Picogram Amounts of DNA and RNA.” RNA Biology 11.7 (2014): 817-828. PMC. Web. Nov. 13, 2017.
Uttamapinant, et al. Fast, cell-compatible click chemistry with copper-chelating azides for biomolecular labeling.Angew. Chem. Int. End. Engl., Jun. 11, 2012: 51(24) pp. 5852-5856.
Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife, 2:e00569 (2013). doi: 10.7554/eLife.00569. Epub Jul. 2, 2013.
Wang, et al. Digital karyotyping. Proc Natl Acad Sci U S A. Dec. 10, 2002;99(25):16156-61. Epub Dec. 2, 2002.
Wang et al., Single nucleotide polymorphism discrimination assisted by improved base stacking hybridization using oligonucleotide microarrays. Biotechniques. 2003;35:300-08.
Weaver, “Rapid clonal growth measurements at the single-cell level: gel microdroplets and flow cytometry”, Biotechnology, 9:873-877 (1991).
Weigl, et al. Microfluidic Diffusion-Based Separation and Detection. Science. 1999; pp. 346-347.
Wheeler et al., “Database resources of the National Center for Biotechnology Information,” Nucleic Acids Res. (2007) 35 (Database issue): D5-12.
Zerbino, Daniel, “Velvet Manual—version 1.1,” Aug. 15, 2008, pp. 1-22.
Zerbino, D.R. “Using the Velvet de novo assembler for short-read sequencing technologies” Curr Protoc Bioinformatics. Sep. 2010;Chapter 11:Unit 11.5. doi: 10.1002/0471250953.bi1105s31.
Zerbino et al. “Velvet: Algorithms for de novo short read assembly using de Bruijn graphs,” Genome Research (2008) 18:821-829.
Zhang, et al. One-step fabrication of supramolecular microcapsules from microfluidic droplets. Science. Feb. 10, 2012;335(6069):690-4. doi: 10.1126/science.1215416.
Zhang, et al. Reconstruction of DNA sequencing by hybridization. Bioinformatics. Jan. 2003;19(1):14-21.
Zheng, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. Jan. 16, 2017;8:14049. doi: 10.1038/ncomms14049.
Zheng, X. SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants, Department of Biostatistics; University of Washington-Seattle; Dec. 28, 2014.
Zheng, X.Y. et al. “Haplotyping germline and cancer genomes with high-throughput linked-read sequencing” Nature Biotech (Feb. 1, 2016) 34(3):303-311.
Zhu, et al. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques. Apr. 2001;30(4):892-7.
Zong et al. Genome-Wide Detection of Single Nucleotide and Copy Number Variations of a Single Human Cell. Science 338(6114):1622-1626 (2012) .
Co-pending U.S. Appl. No. 17/014,909, inventor Giresi; Paul, filed Sep. 8, 2020.
Co-pending U.S. Appl. No. 17/148,942, inventors McDermott; Geoffrey et al., filed Jan. 14, 2021.
Co-pending U.S. Appl. No. 17/166,982, inventors McDermott; Geoffrey et al., filed Feb. 3, 2021.
Co-pending U.S. Appl. No. 17/175,542, inventors Maheshwari; Arundhati Shamoni et al., filed Feb. 12, 2021.
Co-pending U.S. Appl. No. 17/220,303, inventor Walter; Dagmar, filed Apr. 1, 2021.
Co-pending U.S. Appl. No. 17/318,364, inventors Bava; Felice Alessio et al., filed May 12, 2021.
Co-pending U.S. Appl. No. 17/381,612, inventor Martinez; Luigi Jhon Alvarado, filed Jul. 21, 2021.
Co-pending U.S. Appl. No. 17/499,039, inventors Pfeiffer; Katherine et al., filed Oct. 12, 2021.
Co-pending U.S. Appl. No. 17/512,241, inventors Hill; Andrew John et al., filed Oct. 27, 2021.
Co-pending U.S. Appl. No. 17/517,408, inventors Salmanzadeh; Alireza et al., filed Nov. 2, 2021.
Co-pending U.S. Appl. No. 17/518,213, inventor Lund; Paul Eugene, filed Nov. 3, 2021.
Co-pending U.S. Appl. No. 17/522,741, inventors Zheng; Xinying et al., filed Nov. 9, 2021.
Co-pending U.S. Appl. No. 17/545,862, inventor Katherine; Pfeiffer, filed Dec. 8, 2021.
Co-pending U.S. Appl. No. 17/573,350, inventor Corey; M. Nemec, filed Jan. 11, 2022.
Co-pending U.S. Appl. No. 17/580,947, inventor Gibbons; Michael, filed Jan. 21, 2022.
Co-pending U.S. Appl. No. 17/831,835, inventor Martinez; Luigi Jhon Alvarado, filed Jun. 3, 2022.
Co-pending U.S. Appl. No. 17/957,781, inventor Bava; Felice Alessio, filed Sep. 30, 2022.
Co-pending U.S. Appl. No. 18/046,843, inventor Toh; Mckenzi, filed Oct. 14, 2022.
Office action dated Dec. 1, 2016 for U.S. Appl. No. 14/571,120.
Co-pending U.S. Appl. No. 18/152,650, inventor Shastry; Shankar, filed Jan. 10, 2023.

Related Publications (1)

	Number	Date	Country
	20210357479 A1	Nov 2021	US

Provisional Applications (1)

	Number	Date	Country
	61916687	Dec 2013	US

Continuations (2)

	Number	Date	Country
Parent	15730119	Oct 2017	US
Child	17306512		US
Parent	14571120	Dec 2014	US
Child	15730119		US

Methods and apparatus for sorting data

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Disclaimer

Abstract