The present invention relates to a Multiprocessor Computing Apparatus (MCA) comprising each of the multiprocessors, each of shared resources such as Shared Memory Units (SMUs), interface circuits of peripheral components and inputs/outputs (I/Os), and wireless interconnect (WLI) for communication among multiprocessors and shared resources that are components of MCA. Specifically, the invention relates to two components of MCA that are SMUs and WLI.
Switching speed of transistors and diodes improve over each technology generation of shrinking geometries and increasing integration density or scaling of Complementary-Metal-Oxide-Semiconductor (CMOS) Integrated Circuits (ICs). As per International Technology Roadmap for Semiconductor (ITRS) for the year 2010, the cut-off frequency of switching is expected to be of the order of 600 GHZs in 16 nm CMOS technology leading to the availability of hundreds of GHZs bandwidth in the near future.
ICs are networks/circuits in packages interconnecting thousands or millions or billions of discrete electronic components like transistors, diodes, resistors, capacitors etc. depending on Small Scale Integration (SSI) or Large Scale Integration (LSI) or Very Large Scale Integration (VLSI). However, increasing clock frequency for operation of ICs in a push for faster computation, power lost as heat in components like transistors due to switching, and Metallic Interconnects (MIs) due to skin effect, and propagation delays due to Resistance-Capacitance (RC) time constants in MIs connecting discrete electronic components increases particularly in LSI and VLSI degrading performance. Increasing clock frequency for the operation of digital ICs, increases the self inductance (L) and therefore self reactance (XL=2πfL) of MIs. However, central core portion of the MI experience greater L and 2πfL pushing current to flow through the outer periphery of MI known as the skin effect. Because of the skin effect, pulsating current flowing through MIs experience increased Impedance (R+jXL), and therefore increased I2R losses as heat in MIs.
Modern computing apparatus is composed of various ICs mounted on Mother Printed Circuit Board and if required additional add on Printed Circuit Boards (PCBs). PCBs usually of plastic material printed with traces of copper connecting various discrete components and pins of ICs, wherein copper traces are narrow, densely laid for communication of data, addresses, and control signals, and power supply to and from various ICs. Again, increasing frequency of operation of ICs in a push for faster computation, RC time constants of, power lost as heat due to skin effect in, and cross talk called Inter Symbol Interference (ISI) due to parasitic inductances (L) among, long usually copper MIs connecting ICs increases degrading performance of an apparatus or a device or a system.
Therefore, global clock frequency and data rates within/intra and among/inter IC chips are limited to below about 6 GHZs. Scaling of MIs along with scaling of ICs in LSI/VLSI has degraded the performance of the LSI/VLSI in terms of operating clock frequency and data rates, and power consumption. With the increasing integration density and cut-off switching frequency of transistors in CMOS ICs, the MIs technology is emerging as a major bottleneck to the performance improvement of VLSI such as System-on-Chip (SoC), System-in-Package (SiP), and Network-on-Chip (NoC). This performance bottleneck is due to the global interconnection delays becoming significantly larger than the gate switching delays. Carbon nano-materials based Carbon Nano-Tubes (CNTs) and Graphene Nano-Ribbons (GNRs) are emerging as next-generation interconnect technology referred to as Carbon Interconnects (CIs) that has the potential to resolve the most problems of MIs. However, according to ITRS, only material innovations like CNTs and GNRs will lead to a brick wall that can only be overcome by radically different interconnect architectures based on other forms of technology scaling.
The MCA has evolved to have as many number of banks of shared memory as number of processors. This is in order to facilitate simultaneous access of different banks of shared memory by different processors for reducing latency and contention for shared memory. This approach has followed from FIG. 4 of the Best Possible Parallel Computer Architecture (BPPCA) claimed in the technologically disruptive U.S. Pat. No. 7,788,051 and Canadian Patent #2564625 titled “Method and Apparatus for Parallel Loadflow Computation for Electrical Power System”, where each processor has been shown to connect to a box of shared memory leading to an idea that shared memory can be divided into as many SMUs as the number of processors, and then provide interconnect to increase shared memory bandwidth. Canadian Patent #2564625 provides figures each completely contained in a single A4 size paper as originally provided by this inventor. So far the trend has been to put as many processors along with their Private Memories (PMs) and SMUs on a single chip with MIs and associated switches constituting what is called System on Chip (SoC). However, this arrangement can introduce substantial delays in accessing data from a SMU located at the other end across the chip by a processor at the one end, because data has to take several ‘hops’ through MIs and associated switches.
Parallel Gauss-Seidel-Patel Loadflow (PGSPL) when implemented on BPPCA claimed in the U.S. Pat. No. 7,788,051 and the Canadian Patent #2564625 titled “Method and Apparatus for Parallel Loadflow Computation for Electrical Power System”, ignoring all communication delays was estimated to speed-up by a factor of 10 for the first time in the parallel computation history, and that marked the beginning of the new era of computer technology. Historically, parallel computing produced speed-up at the most about 3-times. Any attempt to further speed-up by a factor greater than 3 was not successful even by increasing number of computers in parallel. The speed-up/scaling bottleneck was due to the techniques of decomposing a big computational problem into small sub-problems and the parallel computer architecture were not very well tuned, requiring huge moving around of computational data. The PGSPL method and BPPCA are very well tuned for minimum communication and synchronization requirements, and almost removed the speed-up/scaling bottleneck bringing about the state of “NIRVANA” for parallel computing in general. The BPPCA is scalable in the sense that it can have just two processors to thousands of processors all working in parallel. What followed was proliferation of many/multi-core computers, super computers with massive number crunching capabilities; massively parallel cloud computing machines or data centres. The envelop of technology is being pushed towards utility computing and ultimately putting all automated cloud computing machines (MCAs) in the outer space or on the other planets preferably on the Moon to begin with as per the case made by this inventor in his Canadian patent application #2743882, titled “System of Internet for Information/Data Processing, Storage, and Retrieval” completed on May 28, 2012.
Modern complex Electrical Power Utility System is composed of millions of tiny light bulbs to thousands of huge motors and generators all connected in parallel for operational convenience in the sense that each component from tiny light bulb to huge motor/generator can be individually turned on/off without disturbing the rest of the system. The evolution of single generator supplying single light bulb or a group of light bulbs into the modern complex Electrical Power Utility System is believed to have taken more than a century.
All automated cloud computing machines can be placed in the outer space or on the other planets preferably on the Moon to begin with for the following reasons.
High Performance Computing (HPC) or Super Computing has found its way into mainstream following recent advances in parallel computing technologies particularly influenced by developments of U.S. Pat. No. 7,788,051. Every advance in computing technology has always been followed by increased expectations and demands for enhanced computational power. Usually the domain of science and technology, HPC has become increasingly pervasive among industries, businesses, and governments. Wireless communication in atmospheric fee space is regulated by governments and requires licensing and standardization of a range of frequencies (a spectrum) for a particular use. This invention is about Wireless Interconnect (WLI) comprising TRA mounted/fabricated/integrated/embedded on each of the multiprocessors and shared resources, and electromagnetically shielded and sealed confined free space within Metallic Enclosure (ME) housing MCA.
As per statements in US patent application publication 2012/0331269 titled “Geodesic Massively Parallel Computer”, different modern MCA share similar packaging, construction, and connectivity implementation hierarchy. That is: assemble component ICs onto PCBs, PCBs into racks, racks into cabinets, and cabinets into rooms. Typical communication channels are printed circuits on boards and back-planes, with electrical and fibre optic cabling running over longer distances. Processor-clusters communication in and between cabinets of massively parallel systems is typically cabled packet switched networks such as Infiniband or Ethernet. So far, all the arrangements have been the use of various physical interconnects networks for multiple processors, multiple SMUs, multiple inputs/outputs (I/Os) and other shared resources in MCA. Physical topologies of interconnect networks are typically star, ring, mesh, tours, hypercube, spherical hypercube, and other variants as per
Further, The current status of the rewritable Megneto-Optical (MO) and Optical (O) memories is that they are available in the form of rewritable Compact Disks (CD-RWs) and Digital Video Disks (DVD-RWs) and they need to be rotated using CD/DVD-drives in order to be able to read from and written to by a computer.
As said before, ICs are networks/circuits in packages interconnecting thousands or millions or billions of discrete electronic components like transistors, diodes, resistors, capacitors etc. depending on level of integration such as SSI or LSI or VLSI using MIs or next-generation evolving CIs. However, integration of discrete electronic components is carried out to create various functional blocks and storage blocks/units in VLSI such as SoC, SiP, NoC etc. The best approach appears to be scaling of MIs or emerging CIs only to a point of formation of each functional block and/or storage block, and then providing Wireless Interconnect (WLI) involving TRAs for communication among various functional and/or storage blocks. That is to say, intra-functional block uses MIs or emerging CIs, and inter-functional blocks use WLIs involving TRAs. The definition of functional block can vary from designer to designer. For example, a functional block can further be divided into sub-functional blocks and providing MIs or CIs for communication within/intra sub-functional blocks and WLIs involving TRAs for communication among/inter sub-functional blocks along with WLIs involving TRAs among/inter functional blocks. There are two extremes to this approach: at one end there are no WLIs involving TRAs used as per current status of the interconnect technologies, and at the other ideally there are no MIs or CIs used. Hopefully, the other extreme end, wherein interconnect technology that do not use MIs or CIs at all will soon be reached. The WLIs involving TRAs technology has advantages of re-configurability, system scalability/expandability, and fault tolerance, which are not possible with fixed wire-line MIs or CIs. At the system level fault tolerance can be achieved by software commands to debug and then to eliminate the faulty chips via reconfiguration. Moreover, MIs or CIs using physical “wired” channels for data transport do not resolve the difficult problem of routing the interconnect because they involve consequent time delaying and power consuming switching operations, whereas, WLIs involving TRAs make it possible for every functional block to be able to communicate directly with all others at the speed of light, which is the highest possible. Particularly every processor of MCA can access every bank of shared memory directly with WLIs involving TRAs. Therefore, while allowing for material innovations like CNTs and GNRs for intra-functional CIs, and by using WLIs involving TRAs for inter-functional communication, the present invention attempts to overcome a brick wall by radically different interconnect architectures based on other forms of technology scaling as described in the following. While cut-off switching speed of transistors is expected to be 600 GHz for the next 16 nm technology generation, it may be possible to raise the operational clock frequency of MCAs to 10 GHz and much higher with technology scaling of the present invention along with the use of CIs for intra functional communications.
It is the primary object of the present invention to introduce wireless interconnects for communication among various components of Multiprocessor Computing Apparatus in order to dramatically reduce latency and contention for shared resources for the purpose of parallel processing. MCA comprises 2 or more processors, and sometimes of the order of thousands or millions of processors in case of massively parallel computing apparatus, each processor having PM, and an access to shared memory divided into SMUs, and also access to other shared resources including I/Os devices.
For the purpose of this invention, electro-magnetically shielded and sealed Metallic Enclosure (ME) also acting as heat sink for the Multiprocessor chips and other heat producing chips without requiring any noise producing cooling fans inside enclosed space that provides means for implementing wireless interconnect for communication among components of MCA. Wireless interconnects can use whole range of radio, microwave, and optical frequencies, and use antennas along with transceiver mounted/fabricated/integrated/embedded on components of MCA. Optimized size MCA can be used as building blocks for constructing data centres or cloud computing centres.
The WLI comprises Transmitter-Receiver-Antenna (Transceiver-Antenna: TRA) mounted/fabricated/integrated/embedded on each of the multiprocessors and shared resources, and electromagnetically shielded and sealed confined free space within Metallic Enclosure (ME) housing MCA. The ME is made up of pure or alloyed metal that is very good conductor of both heat and electricity. The invention is in general about wireless communication within electromagnetically shielded and sealed confined free space that is part of any apparatus, equipment, or device including MCA. WLI can use the whole range of frequencies that can be generated by oscillators of transceivers for transmitting and receiving information to achieve communication among components of MCA. The whole range of frequencies involves the range from lows of 100s HZs to highs of GHZs and beyond. The use of WLI involving TRA makes it possible for each processor of MCA to be able to address large number of SMUs because an address for each of SMUs is the frequency to which its transceiver is permanently tuned to send and receive information/data. A significant achievement of this invention is that each processor of MCA is capable of addressing almost unlimited shared memory. The smallest SMU could be consisting of a single addressable memory location. That means in a best possible scenario each processor of MCA can reach and communicate with every single addressable memory location directly.
This invention is synergistic extension of the U.S. Pat. No. 7,788,051 and the Canadian patent no, 2564625 where this inventor has claimed a technique of decomposing a big problem into small sub-problems and the corresponding BPPCA leading to estimated 10-times speedup ignoring communication delays between processors and SMUs. The invention claimed in this application is the result of constant intellectual and mental struggle for achieving fastest possible communication between processors and SMUs. Assisted by advanced signal processing techniques such as equalization, echo/crosstalk cancellation, and error correction coding, the performance of WLI involving TRA is expected to continue advancing at a steady pace.
The invented WLI involving TRA provides all-to-all direct communication links between components of MCA regardless of their topological distances. Without packet/circuit switching, WLI involving TRA eliminates intermediate routing and buffering delays and makes signal propagation delay approach the ultimate lower bound: the speed of light. WLI involving TRA links can operate at much higher speed than core logic making it easy to provide high throughput. In WLI involving TRA, line of sight communication channels are built directly between communicating nodes within a network in a total distributed fashion without arbitration. An important consequence is that packets destined for the same receiver will collide. Such collisions require detection, retransmission, and extra bandwidth margin to prevent them from becoming a significant issue. The WLI involving TRA allows errors and collusions to be handled by the same mechanism essentially requiring no extra support than needed to handle errors, which is necessary in any system.
It is also, the primary object of the present invention to introduce Universal Computer memory for information storage and retrieval. The Universal Computer Memories are Static Magneto-Optical (SMO) and Static Optical (SO) memories that can be accessed in the same manner as currently used semiconductor Random Access Memories (RAMs). The SMO and SO RAMs are Non-Volatile Random Access Memories (NVRAMs), cheap, and consume much less power than semiconductor RAMs.
Present invention is about putting MCA or any other similar apparatus/equipment/device into an electro-magnetically shielded and sealed Metallic Enclosure (ME) or Container, and using wireless means for communication among its components. Aluminum, Copper, CNT, GNR or any alloy metal that is good conductor of both heat and electricity can be used in making ME. The ME also acts as heat sink for component ICs of the MCA, or an apparatus or equipment, or a device, and if required it can be corrugated and/or finned on the outside to increase surface area for heat dissipation. The ME of MCA, or an apparatus or equipment, or a device, can also be made dust proof, sound proof, and water proof so that it can be placed under water in a sea or a lake or a river preferably closer to the mouth of river where water is clean pristine and naturally flowing in order to save electricity expended in cooling MCA particularly when it constitutes data centre or cloud computing centre. As per statements in US patent application publication # US 2012/0331269 titled “Geodesic Massively Parallel Computer”: High-performance computer systems consume large amounts of electrical power, some of which gets dissipated as heat. Typically, a similar amount of energy is used by refrigeration as the computer proper. That means, by putting MCAs constituting data/cloud centres under water particularly close to the mouth of a river can save almost 50% of electrical power used in running data/cloud centre.
ME being electromagnetically shielded and sealed whole range of radio, microwave, and optical frequencies are available for wireless means of communication among components of an apparatus or equipment or a device enclosed. The whole range of frequencies involves the range from lows of 100s HZs to highs of GHZs and beyond. A designer can use different range of frequencies for different purposes of communication, or different frequency ranges for different purposes of communication inside electromagnetically sealed ME for different products can be standardized by industry associations. Inside of ME is either vacuum or filled with clear/clean purified air without any suspended particles for efficient and reliable wireless communication. For mitigation of the problem of reflections and multiple paths, inside surface of ME is made rough enough to cause much of scattering and less of reflection of impinging electromagnetic wave. Also all surfaces of PCBs and components mounted on them are made rough enough to mitigate the problem of reflection and multiple path. Also, to mitigate the problem of reflection and multiple path, all inside surfaces of electromagnetically shielded and sealed MEs both ME housing MCA and INSIDE-ME schematically shown in
Invented WLIs involving TRAs of
The present invention provides apparatus for massively parallel MCA implementation where best and worst-case neighbour-to-neighbour distances can be short and similar, which facilitates transmission, reception and broadcast of information/data with high performance and substantially equal timing. In every sense, the invention is as general purpose as other parallel computers and is eminently scalable in terms of size, configuration, and performance. It lands itself well to a broad variety of apparatus or equipments or devices that can be enclosed in electromagnetically shielded and sealed ME and use wireless interconnect for communication among/inter and/or within/intra component ICs and other circuits.
Cubical, spherical and cylindrical with height equal to diameter MEs allow maximum distance travelled by wireless communication signals of data, instruction, control to be approximately the same. However, MEs can be made of shape that permits fastest possible communications between processors and SMUs for high-bandwidth data rates communications. Other slower low-bandwidth data rate communications such as control signals can take place over longer distances. For example, processors PCBs and SMUs PCBs can be mounted on longer 4-metalic sides of rectangular ME, and control and other circuit PCBs can be mounted on top and bottom 2-metalic sides of rectangular ME, and there are many such other possibilities. Because of high-bandwidth data rates communications requirements between processors and SMUs different possible processors and SMUs layouts are given in figures. However, a designer can appropriately place ICs for functions of other purposes among processors and banks of memory layouts or they can be placed on separate metallic inside surface of ME.
Various possible arrangements or layouts of components within ICs, ICs on PCBs and PCBs on metallic surfaces of ME will now be described using various figures. Since communications among processors and SMUs are the major factor in the performance of MCAs, various possible layouts only of processors and an SMUs are shown in Figures described in the following.
Single IC chip can contain say, 5, 10, 100, . . . etc processors along with local private memory of each processor depending on SSI, LSI, or VLSI chip, and depending on size of MCA that is being built. MCA of few processors say, 10 could be housed in small ME, and massively parallel MCA of 1000s and 1000s of processors requires huge cubical, spherical, cylindrical, or rectangular ME. The length of all sides of cubical ME is the same as in
This invention is about making available the whole range of radio, microwave, and optical frequencies from lows of 100s HZs to highs of GHZs and beyond for wireless preferably Line-of-sight (LOS) one to all and all-to-all communication among components of MCA or an apparatus or an equipment or a device by enclosing it in an electromagnetically shielded and sealed ME that also acts as heat-sink for heat producing components like microprocessors. In other words, an apparatus or an equipment or a device enclosed in a dust-proofed and electromagnetically shielded and sealed ME making available the whole range of radio, microwave, and optical frequencies for wireless direct one to all and all-to-all communication among its components, and ME also acts as an extended heat-sink for heat producing components attached to it from inside, wherein ME is either vacuumed or filled with clean air without any suspended particles for efficient and reliable communication.
In an another embodiment of this invention an apparatus can be built that can eliminate routing apparatus/system that requires time delaying and power consuming buffering and switching operations in packet switched or circuit switched communication systems. Such an apparatus when replaces each of the routing apparatus/system in a communication system, information/data can flow without any hindrances to destinations. Hasn't this inventor become a great artist now that he is able to sing: let it flow, let it flow, let it flow . . . ?
This description of preferred embodiment of Static Magneto Optical (SMO) or Static Optical (SO) Non-Volatile Random Access Memory (NVRAM) and relevant figures are adapted from the description of Semiconductor Main Memory on pages 111-114 from the book titled “Computer Organization and Architecture” Fourth Edition by William Stallings published by Prentice Hall Inc. in the year 1996.
The basic element of a SMO or SO memory is a read/write head placed on a SMO or SO media. Like semiconductor memory cell, a bit read/write head placed on a SMO or SO media share common properties:
As with semiconductor memory Integrated Circuit (IC), each bit read/write head placed over SMO or SO media can be a packaged chip. Each chip contains an array of bit read/write heads placed over SMO or SO recordable media.
Address lines provide the address of the word (group of bits) to be selected. A total of log2 W (word) lines are needed. In our example, 11 address lines are needed to select one of 2048 rows. These 11 lines are fed into row decoder, which has 11 lines of input and 2048 lines of output. The logic of the decoder activates a signal one of the 2048 outputs depending on the bit pattern on the 11 input lines (211=2048).
An additional 11 address lines select one of 2048 columns of four bits per column. Four data lines are used for the input and output of four bits to and from a data buffer. On input (write), the bit driver of each bit line is activated for a 1 or 0 according to the value of the corresponding data line. On output (read), the value of each bit line is passed through a sense amplifier and presented to the data lines. The row line selects which row of cells is used for reading or writing.
Since only four bits are read/written to this SMO or SO RAM at a time, there must be multiple SMO or SO RAM connected to the memory controller in order to read/write a word of data to the bus.
Note that there are only 11 address lines (A0-A10), half the number you would expect for a 2048×2048 array. This is done to save on number of pins. The 22 number of lines are passed through select logic external to the chip and multiplexed onto the 11 address lines. First, 11 address signals are passed to the chip to define the row address of the array, and then the other 11 address signals are presented for the column address. These signals are accompanied by Row Address Select (RAS) and Column Address Select (CAS) signals to provide timing to the chip.
Multiplexed addressing plus the use of square arrays result in a quadrupling of memory size with each new generation of memory chips. One more pin devoted to addressing doubles the number of rows and columns, and so the size of memory grows by a factor of 4. Note that
A typical SMO or SO RAM chip pin configuration is shown in
Specific embodiments have been used to describe the invention. However, numerous modifications are possible as would be recognized by one skilled in the art. For instance, the descriptions in the above may make reference to specific ideal layout of components of wireless interconnects, it will be appreciated that various other arrangements could be implemented using any combination of hardware and/or software.
Although, the invention has been described with respect to specific embodiments, it will be appreciated that the invention is intended to cover all modifications and equivalents within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
2809725 | Mar 2013 | CA | national |
763/MUM/2014 | Feb 2014 | IN | national |
2901521 | Aug 2015 | CA | national |
Number | Date | Country | |
---|---|---|---|
61802398 | Mar 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14202306 | Mar 2014 | US |
Child | 14985805 | US |