The present disclosure relates to rendering two-dimensional (2D) representations of three-dimensional (3D) scenes composed of shapes using raytracing, and more particularly to techniques for accelerating computations necessary for such raytracing rendering using field programmable gate array processors.
Raytracing is a sophisticated rendering technique in the computer graphics arts used to generate photo-realistic 2D images from 3D scene descriptions with complex light interactions. Raytracing generally involves obtaining a scene description composed of geometric shapes, which describe surfaces of structures in the scene, and can be called primitives. A common primitive shape is a triangle.
Virtual rays of light are traced into the scene from a view point (“a camera”); each ray is issued to travel through a respective pixel of the 2D representation, on which that ray can have an effect. The rays are tested for intersection with scene primitives to identify a first intersected primitive for each ray, if any.
After identifying an intersection for a given ray, a shader associated with that primitive determines what happens next. For example, if the primitive is part of a mirror, then a reflection ray is issued to determine whether light is hitting the intersected point from a luminaire, or in more complicated situations, subsurface reflection, and scattering can be modeled, which may cause issuance of different rays to be intersected tested. By further example, if a surface of an object were rough, not smooth, then a shader for that object may issue rays to model a diffuse reflection on that surface. As such, finding an intersection between a ray and a primitive is a first step in determining whether and what kind of light energy may reach a pixel by virtue of a given ray, since what light is hitting that primitive still has to be determined.
Thus, most conventional algorithms build a tree of rays in flight when raytracing a scene, where the tree continues along each branch until it leaves the scene or hits a luminaire that does not issue new rays. Then, for those branches that hit light emissive objects, the branches are rolled up through the primitive intersections, determining along the way what effect each primitive intersection has on the light that hits it. Finally, a color and intensity of light for the originally issued camera ray can be determined and stored in the buffer.
If raytracing is not managed well, however, it can be computationally intensive. Among the various available data structures that assist the raytracing process is the bounded (B)-KD (k dimensional) tree data structure, which is also the one best suited for hardware implementations. This structure is a combination of space partitioning KD tree structures and bounding volumes to surround the primitive. By using this tree structure, the best results from the raytracing can be achieved.
While raytracing has been implemented in both software and hardware over the recent years, significant amount of acceleration has been limited due to the requirement of high numbers of computational units and the complex nature of the traversal algorithm. Modifications to the algorithm and the hardware architecture configuration, however, make it possible to achieve a very high order of performance improvement. Such modifications in the algorithm and the associated hardware architecture are described in the present disclosure.
The following description is presented to enable a person of ordinary skill in the art to make and use various aspects of the disclosed embodiments. The various embodiments disclosed in the present specification are directed generally to apparatuses, systems, and methods for accelerating raytracing computations using various techniques. Numerous specific details are set forth to provide a thorough understanding of the overall structure, function, manufacture, and use of the disclosed embodiments as described in the specification and illustrated in the accompanying drawings. It will be understood by those skilled in the art, however, that the disclosed embodiments may be practiced without the specific details disclosed herein. In other instances, well-known operations, components, and elements have not been described in detail in the interest of conciseness and clarity and so as not to obscure the disclosed embodiments. Various modifications to the embodiments described herein may be apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the scope of the disclosed embodiments. Those of ordinary skill in the art will understand that the disclosed embodiments describing specific techniques, implementations, and applications are provided for illustrative purposes only and serve as non-limiting examples. Thus, it can be appreciated that the specific structural and functional details disclosed herein are representative in nature and are not necessarily limiting. Rather, the overall scope of the embodiments is defined solely by the appended claims.
Reference throughout the specification to “various embodiments,” “some embodiments,” “one embodiment,” or “an embodiment” means that a particular feature, structure, or functional characteristic described in connection with a disclosed embodiment is included in at least one embodiment. Thus, appearances of the phrases “in various embodiments,” “in some embodiments,” “in one embodiment,” or “in an embodiment” in places throughout the specification are not necessarily all referring to the same embodiment. Furthermore, particular features, structures, and/or functional characteristics associated with one embodiment may be readily combined in any suitable manner with one or more than one embodiment, without limitation. Thus, the particular features, structures, or functional characteristics illustrated or described in connection with one embodiment may be combined, in whole or in part, with the features structures, or characteristics of one or more than one other embodiment, without limitation.
It will be appreciated that for the sake of conciseness and clarity in description, data for a certain type of object, e.g., a primitive (e.g., coordinates for three vertices of a triangle) usually are described simply as the object itself, rather than referring to the data for the object. For example, when referring to “a ray,” it is to be understood that data representative of that ray is referenced, as well as the concept of the ray in the scene. Similarly, for example, when referring to “a tree,” it is to be understood that data representative of the tree structure is referenced. In addition, the use of “logic” herein should be understood to mean that the logic function can be implemented as either an algorithm, digital signal processing routine, or a logic circuit (such as that implemented on a field programmable gate array).
Various embodiments disclosed in the present specification are directed generally to techniques for accelerating raytracing computations using various hardware architectures and techniques. In one embodiment, field programmable gate array (FPGA) processors are used for accelerating the computations associated with raytracing. Several implementations of such embodiments described herein include, but are not limited to, the following implementations, substantially as disclosed and described herein in any combination thereof:
Compared to linear data structures, e.g., linked lists and one dimensional arrays, which have only one logical means of traversal, tree structures can be traversed in many different ways. Starting at a root node of a binary tree, there are three main steps that can be performed and the order in which these steps are performed defines the traversal type. These steps include, for example, performing an action on the current node (referred to as “visiting” the node), traversing to the left child node, and traversing to the right child node. Thus the process is most easily described through recursion.
In one embodiment, a basic tree traversal flow comprises testing a ray for intersection with the nodes N1, N2 defined by the intersecting intervals. The path of the traversal depends on the intersection test. If the intersection test is positive, then two intersection intervals are calculated. These two intersection intervals are used for comparison to the child nodes. This is a recursive traversal function that traverses through all the nodes of the entire tree 110. This is done on a per node basis following an algorithmic flow. This algorithm does not push a node and its intersection intervals onto a stack.
Accordingly, the above raytracing algorithm, is substantially different from conventional raytracing algorithms that perform the intersection test for the child nodes when traversing the parent node and subsequently push one of the node and its intersection intervals onto a stack. This takes away resources and the control logic to manage the levels. Also, the raytracing algorithm in accordance with the disclosed embodiments can be implemented irrespective of the direction of the ray being traced, which is not the case in software implementations where the early termination test is performed before proceeding further with the traversal. In a hybrid raytracing implementation in accordance with the disclosed embodiments, the tree traversal algorithm may be implemented with a FPGA processor.
In one embodiment, the hybrid architecture 200 enables the FPGA 204 processor and the GPU 208a to fully utilize their respective processing and the computational power independently, and together achieve high performance. In one embodiment, although somewhat limited by the ability to access random data, the GPU 208a is able to achieve high floating point computational density. The hybrid architecture 200 is well suited for the ray-triangle implementation where the data stream 216 associated with the triangles is streamed to the GPU 208a, which executes the triangle intersection algorithm 206, and then streams out the triangle intersection data 218. In various embodiments, random access of data is generally not required and the processing can be based on a set of floating point computations.
In one embodiment, the tree traversal algorithm 202 can be better suited for the FPGA 204 process where the node data 214 associated with the tree structure can be stored partly in a block random access memory (BRAM) and provided to the FPGA 204 processor or on a large, high speed static RAM (SRAM) interfaced with the FPGA 204, for example. By replicating multiple tree traversal cores or programming element (PE) cores on the FPGA 204 processor, a processing power superior to a single core CPU or quad core CPU with the same algorithm can be achieved.
In one embodiment, the CPU 210a is configured for performing tree building, data packing, and interfacing functions for the FPGA 204 processor and the GPU 208a. The combination of the CPU 210a, the FPGA 204, and the GPU 208a provides faster raytracing.
In one embodiment, the FPGA 204 processor can be implemented with an FPGA embedded processor such as, for example, a Xilinx Virtex or Spartan series FPGA device available from Xilinx, Inc. of San Jose, Calif.; the GPU 208a can be implemented with an Nvidia GeForce series GPU device available from Nvidia of Santa Clara, Calif.; and the CPU 210 can be implemented with an Intel Xeon multi-core processor device available from Intel Corp. of Santa Clara, Calif. Those skilled in the art will appreciate that the FPGA 204 processor, the GPU 208a, and/or the CPU 210 may be readily substituted with one or more than one functionally equivalent component(s) without limiting the scope of the disclosed embodiments. For example, functional components may be substituted without limiting the scope of the hybrid architecture 200.
Under control of the input FSM 306, the ray data 212 is transferred from the first (input) SDRAM 302 to an SDRAM transfer block 314 and to a ray data first-in-first-out (FIFO) register 316. The ray data 212 (RAY IN) are provided as input to one or more than one of the PE cores 312 for processing the tree traversal algorithm. The input node data 214 (NODE IN) associated with the tree structure are stored in one or more than one BRAM 318 and may be transferred to the PE core 312 from the BRAM 318. Similarly, output node data 215 (NODE OUT) are provided from the PE core 312 and are stored in the BRAM 318. In the illustrated embodiment, the BRAM 318 is an on-chip BRAM integrated with the FPGA 204a processor die. As used herein, on-chip refers to a BRAM provided on the FPGA 204a processor whereas off-chip refers to a BRAM located separately from the FPGA 204a processor. The PE core 312 processes the ray data 212 and the node data 214 and outputs the processed data 320 (INT NODE OUT) via a processed output buffer 322 to an SDRAM transfer 324 block to the second (output) SDRAM 304. The PE core 312 also outputs the processed node data 215 to the BRAM 318.
The output FSM 308 controls the processed output buffer 322 and the transfer output FSM 310 controls the transfer of data from the processed output buffer 322 to the SDRAM transfer block 324. The data stream 216 representative of triangles is provided to the second (output) SDRAM 304.
A look-up table 326 (LUT) indexes the ray number in a write address buffer 328. The LUT 326 controls the transfer of the data stream 216 from the SDRAM transfer block 324 to the second (output) SDRAM 304.
Various decoders 330, 332, among other functional logic blocks, provide the interface and control logic necessary for the processing the ray data 214, routing the node input/output data 214/215, the processed data 320, and the stream data 216.
In the various embodiments of the FPGA 204a, b, c processors shown in respective
In one embodiment, a PE core 312 may comprise one or more than one core floating point computations block 402 to execute the floating point computations required by the ray tracing algorithm. In one embodiment, a floating point computations block 402 may comprise one or more than one floating point subtractor (S1, S2), multiplier (M1, M2), and comparator (C3, C4, C5, C6) blocks. In one embodiment, one or more than storage memories comprising a primary storage 406, a secondary storage 408, and an intermediate storage 410 for storing intermediate values can be provided in a feedback path 404. In the illustrated embodiment, the primary storage 406 is implemented as a primary queue FIFO register and the secondary storage 408 is implemented as a secondary queue stack FILO (first-in-last-out) register. In the illustrated embodiment, the storage 410 for the intermediate value is implemented as a three separate FIFO registers. The input ray data 212 is provided to an input register 412 and then to the core floating point computations block 402. A FIFO storage register 414 is provided to store the input node data 214 and a FIFO storage register 416 is provided to store the output node data 215. The working of the PE core 312 and its blocks are explained in detail in the following sections.
In one embodiment, a first decision block 418 determines whether a ray has passed through a node of the tree. If “yes,” the output is a leaf node and a leaf node value 420 is stored in a results stack 422 (e.g., Stack 1, Stack 2). From the results stack 422, the processed data 320 (INT NODE OUT) is provided to the processed output buffer 322 (
The core floating point computations block 402 of the PE core 312 performs the intersection test on one particular node and derives the intervals for the tests on the child nodes. This test can be mathematically expressed as a sequence of computations shown below.
S1=BOUND[RAY_SIGN[Axis]]−FROM[Axis]
S2=BOUND[1−RAY_SIGN[Axis]]−FROM[Axis]
M1=S1×RAYDIR_INV[Axis]=Lower Adist (L.A.)
M2=S2×RAYDIR_INV[Axis]=Upper Adist (U.A.)
Near Adist=max (Near Adist, L.A.),
Far Adist=min (Far Adist, U.A).
In one embodiment, set of floating point computations discussed above can be implemented in an FPGA processor (e.g., FPGA processor 204a-c shown in
As discussed previously, the ray data 212 are tested for intersection with scene primitives to identify a first intersected primitive for each ray. The ray-triangle intersection algorithm 206 (
In one embodiment, the performance drop due to the latency can be overcome by the novel architecture shown in
The depth-breadth search tree traversal and the tree blocking to avoid data explosion concepts can be illustrated in conjunction with
In a conventional software implementation of the tree traversal algorithm, the nodes in the tree structure 500 shown in
Those skilled in the art will appreciate that this kind of control is best suited for software implementations and for the early termination of the tree traversal algorithm. In a hardware implementation according to the disclosed embodiments, however, this method of computing is inefficient and computation resources may remain idle and may have high latency as mentioned previously. In various hardware implementations, for example, one ray node intersection per clock cycle may be desired and provides the best possible performance. In one embodiment, this may be addressed by a two step improvement in the pipeline architecture by (1) pipelining more nodes and (2) pipelining more rays.
In one embodiment, as many nodes as available in the pipeline are processed at a particular time instead of pushing them onto a stack for later processing. In the embodiment illustrated in
The number of nodes that can be processed in this manner increases exponentially and reaches a point where the pipeline becomes full. The maximum number of nodes that can be processed at any given point of time may be equal to the length of the pipeline. This to some extent follows the breadth first search traversal. When the number of nodes exceeds the length of the pipeline, the subsequent nodes are stored away in the secondary storage 408 (
Maintaining the secondary storage 408 (
Careful analysis of the timing and pipeline data reveals whether there exist bubbles of idle states. Bubbles of idle states exist due to the fact that initially, the number of nodes required to fill the pipeline are actually available only after processing the root node (e.g., Iteration 1: Node 1 along path 502 in
When compared to conventional software implementations or to hardware implementations without a multi-ray architecture, it can be shown that the PE cores 312 (
To accommodate the multiple rays, flip-flops and storage elements can be used to pipeline and feedback the ray IDs respective to the particular node. All other ray data can be stored as registers in an array and retrieved immediately as and when required, and based on the node details (e.g., node axis). The cost for the performance improvement is very little—resources for the storage of the rays.
With reference now back to
The data required for the Intersection test by the FPGA 204 (
With reference now to
An example data format for a 16 bit data storage structure 700 is shown in
A typical software renderer cast rays in packets that may range in size from a single ray to 128 rays, for example. In a large scene with many such small packets of rays, performing computations on a co-processor may not be very efficient as it requires many software function calls to exchange small amounts of input/output data with the co-processor each time.
With reference to
In a ray tracing flow, shown in
With reference now to both
When the ray-beam traversal is performed, each ray-beam will produce a list of leafs hit. This leaf list applies to every ray in the beam, therefore each ray needs to have an intersection test done with all the primitive objects in every leaf node of that output list, which can be very computationally expensive and may produce false positive leaf hits for certain rays. In one embodiment, a ray-box intersection test stage can be implemented to filter the ray-beam output. Each leaf node can be represented by a bounding-box in all three dimensions (X, Y and Z coordinates). In one embodiment, shown in
Various embodiments of a bounding-box are shown in
For maximum hardware performance, both memory access and output amount have to be reduced as far as possible. In one embodiment, an optimized ray box hardware 1150 allows for the reduction in memory access and output to be achieved by taking a ray-beam as an input to the ray-box filter. Since every ray-beam output has a set of rays and a set of bounding boxes, for which an “all-against-all” computation has to be done, this lends itself well to a hardware design where a group of rays is taken in, an only a single bounding-box fetch has to be done at a time for computation against all rays in a ray-beam. This significantly reduces memory accesses to fetch the bounding boxes.
For the ray-box hardware engine to be highly scalable to suit different performance requirements, it needs to be easily replicable so that more ray-box engine cores can be instantiated and used concurrently in parallel on the same hardware FPGA device. In one embodiment, shown in
With reference to
In using ray-beam traversal, since all the rays go through traversal and ray-box intersections at once, there is no need for feedback to continue traversal. In one embodiment, shown in
In this example, the computing device 1500 comprises one or more processor circuits or processing units 1502, one or more memory circuits and/or storage circuit component(s) 1504 and one or more input/output (I/O) circuit devices 1506. Additionally, the computing device 1500 comprises a bus 1508 that allows the various circuit components and devices to communicate with one another. The bus 1508 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. The bus 1508 may comprise wired and/or wireless buses.
The processing unit 1502 may be responsible for executing various software programs such as system programs, applications programs, and/or modules to provide computing and processing operations for the computing device 1500. The processing unit 1502 may be responsible for performing various voice and data communications operations for the computing device 1500 such as transmitting and receiving voice and data information over one or more wired or wireless communications channels. Although the processing unit 1502 of the computing device 1500 includes single processor architecture as shown, it may be appreciated that the computing device 1500 may use any suitable processor architecture and/or any suitable number of processors in accordance with the described embodiments. In one embodiment, the processing unit 1502 may be implemented using a single integrated processor.
The processing unit 1502 may be implemented as a host central processing unit (CPU) using any suitable processor circuit or logic device (circuit), such as a as a general purpose processor. The processing unit 1502 also may be implemented as a chip multiprocessor (CMP), dedicated processor, embedded processor, media processor, input/output (I/O) processor, co-processor, microprocessor, controller, microcontroller, application specific integrated circuit (ASIC), field programmable gate array (FPGA), programmable logic device (PLD), or other processing device, such as a DSP in accordance with the described embodiments.
As shown, the processing unit 1502 may be coupled to the memory and/or storage component(s) 1504 through the bus 1508. The memory bus 1508 may comprise any suitable interface and/or bus architecture for allowing the processing unit 1502 to access the memory and/or storage component(s) 1504. Although the memory and/or storage component(s) 1504 may be shown as being separate from the processing unit 1502 for purposes of illustration, it is worthy to note that in various embodiments some portion or the entire memory and/or storage component(s) 1504 may be included on the same integrated circuit as the processing unit 1502. Alternatively, some portion or the entire memory and/or storage component(s) 1504 may be disposed on an integrated circuit or other medium (e.g., hard disk drive) external to the integrated circuit of the processing unit 1502. In various embodiments, the computing device 1500 may comprise an expansion slot to support a multimedia and/or memory card, for example.
The memory and/or storage component(s) 1504 represent one or more computer-readable media. The memory and/or storage component(s) 1504 may be implemented using any computer-readable media capable of storing data such as volatile or non-volatile memory, removable or non-removable memory, erasable or non-erasable memory, writeable or re-writeable memory, and so forth. The memory and/or storage component(s) 1504 may comprise volatile media (e.g., random access memory (RAM)) and/or nonvolatile media (e.g., read only memory (ROM), Flash memory, optical disks, magnetic disks and the like). The memory and/or storage component(s) 1504 may comprise fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) as well as removable media (e.g., a Flash memory drive, a removable hard drive, an optical disk, etc.). Examples of computer-readable storage media may include, without limitation, RAM, dynamic RAM (DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), static RAM (SRAM), read-only memory (ROM), programmable ROM (PROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory (e.g., NOR or NAND flash memory), content addressable memory (CAM), polymer memory (e.g., ferroelectric polymer memory), phase-change memory, ovonic memory, ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magnetic or optical cards, or any other type of media suitable for storing information.
The one or more I/O devices 1506 allow a user to enter commands and information to the computing device 1500, and also allow information to be presented to the user and/or other components or devices. Examples of input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner and the like. Examples of output devices include a display device (e.g., a monitor or projector, speakers, a printer, a network card, etc.). The computing device 1500 may comprise an alphanumeric keypad coupled to the processing unit 1502. The keypad may comprise, for example, a QWERTY key layout and an integrated number dial pad. The computing device 1500 may comprise a display coupled to the processing unit 1502. The display may comprise any suitable visual interface for displaying content to a user of the computing device 1500. In one embodiment, for example, the display may be implemented by a liquid crystal display (LCD) such as a touch-sensitive color (e.g., 76-bit color) thin-film transistor (TFT) LCD screen. The touch-sensitive LCD may be used with a stylus and/or a handwriting recognizer program.
The processing unit 1502 may be arranged to provide processing or computing resources to the computing device 1500. For example, the processing unit 1502 may be responsible for executing various software programs including system programs such as operating system (OS) and application programs. System programs generally may assist in the running of the computing device 1500 and may be directly responsible for controlling, integrating, and managing the individual hardware components of the computer system. The OS may be implemented, for example, using products known to those skilled in the art under the following trade designations: Microsoft Windows OS, Symbian OSTM, Embedix OS, Linux OS, Binary Run-time Environment for Wireless (BREW) OS, JavaOS, Android OS, Apple OS or other suitable OS in accordance with the described embodiments. The computing device 1500 may comprise other system programs such as device drivers, programming tools, utility programs, software libraries, application programming interfaces (APIs), and so forth.
Various embodiments may be described herein in the general context of computer executable instructions, such as software, program modules, and/or engines being executed by a computer. Generally, software, program modules, and/or engines include any software element arranged to perform particular operations or implement particular abstract data types. Software, program modules, and/or engines can include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types. An implementation of the software, program modules, and/or engines components and techniques may be stored on and/or transmitted across some form of computer-readable media. In this regard, computer-readable media can be any available medium or media useable to store information and accessible by a computing device. Some embodiments also may be practiced in distributed computing environments where operations are performed by one or more remote processing devices that are linked through a communications network. In a distributed computing environment, software, program modules, and/or engines may be located in both local and remote computer storage media including memory storage devices.
Although some embodiments may be illustrated and described as comprising functional components, software, engines, and/or modules performing various operations, it can be appreciated that such components or modules may be implemented by one or more hardware components, software components, and/or combination thereof. The functional components, software, engines, and/or modules may be implemented, for example, by logic (e.g., instructions, data, and/or code) to be executed by a logic device (e.g., processor). Such logic may be stored internally or externally to a logic device on one or more types of computer-readable storage media. In other embodiments, the functional components such as software, engines, and/or modules may be implemented by hardware elements that may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth.
Examples of software, engines, and/or modules may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
In some cases, various embodiments may be implemented as an article of manufacture. The article of manufacture may include a computer readable storage medium arranged to store logic, instructions and/or data for performing various operations of one or more embodiments. In various embodiments, for example, the article of manufacture may comprise a magnetic disk, optical disk, flash memory or firmware containing computer program instructions suitable for execution by a general purpose processor or application specific processor. The embodiments, however, are not limited in this context.
It also is to be appreciated that the described embodiments illustrate example implementations, and that the functional components and/or modules may be implemented in various other ways which are consistent with the described embodiments. Furthermore, the operations performed by such components or modules may be combined and/or separated for a given implementation and may be performed by a greater number or fewer number of components or modules.
It is worthy to note that any reference to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” or “in one aspect” in the specification are not necessarily all referring to the same embodiment.
Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. These terms are not intended as synonyms for each other. For example, some embodiments may be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Unless specifically stated otherwise, it may be appreciated that terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within registers and/or memories into other data similarly represented as physical quantities within the memories, registers or other such information storage, transmission or display devices.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described.
All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.
It is noted that, as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual aspects described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several aspects without departing from the scope of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.
The foregoing description is provided as illustration and clarification purposes only and is not intended to limit the scope of the appended claims to the precise forms described. Other variations and embodiments are possible in light of the above teaching, and it is thus intended that the scope of the appended claims not be limited by the detailed description provided hereinabove. Although the foregoing description may be somewhat detailed in certain aspects by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the present teachings that certain changes and modifications may be made thereto without departing from the scope of the appended claims. Furthermore, it is to be understood that the appended claims are not limited to the particular embodiments or aspects described hereinabove, and as such may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments and aspects only, and is not intended to limit the scope of the appended claims.
While certain features of the embodiments have been illustrated as described above, many modifications, substitutions, changes and equivalents will now occur to those skilled in the art. It is therefore to be understood that the appended claims are intended to cover all such modifications and changes as fall within the scope of the disclosed embodiments.
This application claims the benefit under 35 USC §119(e) of U.S. Provisional Patent Application No. 61/333,631, entitled “TECHNIQUES FOR ACCELERATING COMPUTATIONS USING FIELD PROGRAMMABLE GATE ARRAY PROCESSORS,” and filed on May 11, 2010, the contents of which are herein entirely incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61333631 | May 2010 | US |