A memory controller is a digital circuit that manages the flow of data going to and from the processor's main memory. The memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an integral part of a microprocessor. The main memory is local to the processor and is thus, not directly accessible by other processors. In contrast to the local memory controller to access local processor memory, a node controller is a circuit or system that manages the flow of data for one or more processors to a remote memory. Thus, the node controller controls access to the remote memory by each of the one or more processors.
Circuits, systems, and methods are disclosed to control access to remote memory based on machine learning. This includes learning memory access patterns (heuristically) based on past memory accesses to the remote memory. For example, learning is based on artificial intelligence, such as via classifiers that observe memory access patterns. Weighting values can be assigned to the learned patterns based on the frequency of the memory accesses and parameters relating to conditions when the memory accesses have occurred. For example, the parameter conditions can include time of day, day of the week, day of the month, type of request, and address range of the request. Parameter values for a set of the conditions can be monitored to control subsequent memory actions, which may include retrieving (e.g., pre-fetching) data from a determined memory address location of the remote memory before it is requested by a given processor node based on evaluating the assigned weighting values and the parameter values.
In one example, Content Addressable Memory (CAM) memories can be employed to store the parameter values and utilized for high-speed lookup to determine likely future memory actions. For instance, the memory actions can include memory requests (e.g., reads, memory writes). The memory actions may also include other supervisory or management actions that change a state for a block of memory (e.g., modified, exclusive, shared or invalid) to facilitate coherency of the data that is stored in the remote memory. By way of example, in response to parameter values of a current memory request matching stored parameters of a CAM row line, the another memory associated with the CAM automatically provides weighted values, which may be summed from multiple CAM row line matches. The value from the last CAM search can also be accumulated unless the CAM rows are assigned to clear history. In that example, the past request history can be cleared. In some examples, the total weighting in which primary CAM row lines match can be input into a secondary CAM. The secondary CAM produces a state change to trigger the speculative memory action to be implemented based on current parameter values. The weighting values accessed fin accordance with the primary CAM search can be summed to generate a summed weighting value that can be compared to a threshold. If the threshold is exceeded, the secondary CAM can be evaluated to determine likely future memory actions as well as to update the weighting values and parameters.
By way of example, it may be learned that when memory request “A” occurs, that within a few clock cycles (or some other predetermined time window) that memory action “B” will follow. Thus, the parameter values when memory action “A” occurs can be updated to indicate a higher likelihood that memory action “B” will occur. During a current processor node request, when the address for memory action “A” matches along with the weighting values retrieved in response to current parameter values, memory action “B” can be executed before actually being requested by a given processor node. When the processor node actually requests memory action “B”, the node controller 120 can fulfill the request from local node resources (e.g., local buffer) as opposed to the slower process of having to access the remote memory 120 at the time of the request.
Event detection circuitry 140 stores the parameters and the weighting values for each of the parameters (as updated by the learning block 130) associated with an address for the given data block in the remote memory 120. The event detection circuitry 140 determines a subsequent memory action to execute for the prospective data block in the remote memory based on matching an address of a current request to the given data block and comparing current parameter values associated with the current request relative to the stored parameters.
The event detection circuitry 140 can include a comparator (see e.g.,
As mentioned, the future memory action can include activating the node controller 110 to execute a request to retrieve (e.g., pre-fetch) a predicted data block, such as a read request or write request for the data, from the remote memory 120 before one of the processor nodes issues the request. This saves time for the processor nodes in having to access the remote memory 120 themselves and increases the efficiency of access to the remote memory since processor node hand shaking to the node controller 110 when accessing the remote memory can be reduced. That is, since the node controller 110 can retrieve the predicted data block from the remote memory 120 and store it in a local buffer of the node controller, the request received from the processor node (assuming a match) can be provided to such processor node in a response without having to execute its retrieval in response to such request.
In one example, the event detection circuitry 140 can include a content addressable memory (CAM) (see e.g.,
In an example, the learning block 130 can be implemented as a classifier that monitors the past requests to a given data block in the remote memory 120 and updates the respective weighting value for each of the parameters as a statistical probability to indicate the likelihood of the subsequent memory action. The classifier can include rule-based machine learning circuits that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). The classifier can identify a set of context-dependent rules (e.g., heuristics defined by the parameters described herein) that collectively store and apply knowledge in a piecewise manner in order to make predictions (e.g. behavior modeling, classification, data mining, regression, function approximation, and so forth). The predictions can be stored as the weighting values described herein which represent probabilities that a future memory action may occur based on the address and parameter inputs described herein. One example of a classifier is a support vector machine (SVM) to perform Bayesian learning based on memory requests and parameters but other learning block examples are also possible including neural networks. In one example, the node controller 120 can be implemented as a state machine to perform various processes that can be implemented as a number of states that respond to various inputs to transition between states. Example processes can include learning, event detection, parameter evaluation, processing requests from processor nodes, accessing the remote memory 120, and so forth.
The CAM 210 is a type of computer memory that can be used in very-high-speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure. The CAM 210 compares input search data (e.g., tag) defined by the address and parameter inputs against a table of stored data, where the weighting values are returned via memory 212. In this example, weighting values are stored, updated, and/or retrieved as the matching data. In one particular example, the CAM 210 can be implemented as a ternary CAM (TCAM) to allow for the specification of do not care state inputs representing the parameters to the content addressable memory. The TCAM allows a third matching state of “X” or “don't care” for one or more bits in the stored data word, thus adding flexibility to the search. For example, a ternary CAM may have a stored word of “10XX0” which will match any of the four search words “10000”, “10010”, “10100”, or “10110”. For each parameter that matches, the CAM provides a corresponding weighted value from memory 212, which values are summed via a summer 214 to provide a summed weighting value 216.
In one example implementation, the subsequent memory action to take based on past memory requests can be stored in a column of the CAM 210. In another example, a primary TCAM can perform an initial search for parameter matches and provide weighting values via memory 212 based on the matched parameters. For instance, a comparator 220 compares the summed weighting value output 216 of the memory 212 to a threshold which triggers a secondary TCAM (not shown) via an output 230 to lookup the subsequent memory action if the weighting value output exceeds the threshold. If it does not exceed the threshold, then the subsequent memory action is not executed. The comparator 220 compares summed weighting values 216 retrieved from the stored parameters of the memory 212 to a threshold. The memory 212 provides weighting values based on the current parameter values matching the stored parameter values for a given data block (e.g., indexed by the address for the given data block). The event detection circuitry 200 determines the subsequent memory action if the summed weighting values 216 exceed the threshold. In another example, rather than store the weighting values in a separate memory 212, the weighting values along with the parameter values can be store in the CAM 210.
Example memory access conditions that are represented by the parameters for a given memory request can include a time of day, a day of the week, a day of the month, a month of the year, a type of memory request, or an address range of the request. Thus, if a particular address of the remote memory is consistently accessed on the first Tuesday of each month at a particular time, a high weighting value can be assigned via the learning block over time which indicates a future memory action should occur when the current parameter inputs match previous parameter conditions. For example, a given memory read to a given address that occurs on the same time each day (or other condition/conditions) may indicate that a subsequent memory read will occur after the given memory read. If such conditions are detected, the learning block can increase the probability in the form of the weighting value that the subsequent memory read will occur based on the address of the given memory read and the current parameter value of the time.
Other example parameter values can include a previous address accessed, an access request history, a processor address region, coherency directory information (e.g., to define a particular processors domain that is protected from other processors), a previous request history, or a running application identifier. A memory scrubber 240 can be provided to update the weighting values stored in the memory 212 depending on whether or not the future memory request is fulfilled by the subsequent memory action. For example, if the subsequent memory request which has been fulfilled (e.g., via response manager shown in
Event detection circuitry 340 stores the parameters and the weighting values for each of the parameters associated with an address for the given data block. The event detection circuitry 340 includes a content addressable memory (CAM) 350 has separate columns to store each of the parameters and its associated weighting value and a separate row assigned to each data block that has been previously requested by the processor nodes 1-N. The CAM 350 receives the address and the parameters as inputs to retrieve the weighting values. The event detection circuitry 340 determines the subsequent memory action for the prospective data block in the remote memory 310 based on matching an address of a current request to the given data block and comparing current parameter values associated with the current request relative to the stored parameters to determine the prospective data block.
A response manager 360 executes the subsequent memory action and monitors a future request from the processor nodes 1-N. The response manager 360 fulfills the future request to the processor nodes 1-N if the subsequent memory action matches the predicted, future request for a data block that has been retrieved and stored in a local buffer. The response manager 360 can execute the subsequent memory action determined by the event detection circuitry 340 and monitors a future memory request from the processor nodes 1-N. The response manager 360 fulfills the future memory request to the processor nodes 1-N if the subsequent memory action matches the future memory request. For example, the subsequent memory action can include a memory read, a memory write, a memory coherency directory operation, or a supervisory action that is applied to the remote memory.
Supervisory actions to the remote memory 310 can include operations to facilitate coherency of the remote memory (e.g., a state change to block or unblock a given memory location to allow one processor to read the location and write data back in a read-modify-write cycle). As used herein, the term coherency refers to the node controller's ability to manage concurrent data accesses to the remote memory 310 without one processor corrupting another processor's data access. Although not shown, the node controller 320 can also include a memory scrubber to update the weighting values stored in the CAM 350 depending on whether or not the subsequent memory action is fulfilled (e.g., if a given processor node actually requests the subsequent memory action taken by the node controller before the actual request to the remote memory).
In view of the foregoing structural and functional features described above, an example method will be better appreciated with reference to
At 430, the method 400 includes detecting a current request to the given data block and values of the input parameters associated with the current request (e.g., via event detection circuitry 140 of
Although not shown, the method 400 can include summing weighting values associated with a content addressable memory in response to parameters associated with the current memory requests of the processor nodes and comparing the summed weighting values to a threshold. Along with addition, the summing process can also include subtraction, multiplication, division, and the shifting of the value associated with the other rows to the left or right to modify the weighting values. The summing process may also include passing the value associated with another or substituting a constant value. Because of a possible circuit delay for an addition operation, an example summing operation can be to shift, pass or replace a value from the CAM row above. For example, all of the rows that miss would pass the value received from the row above and the CAM rows that matched would shift the value they received to the left to increase and the right to decrease.
Some match or miss CAM rows would thus not pass the weighting value but would replace the current value with a constant value to start the chain of shifting values in another cycle. The method can include executing the predictive memory action if the summed weighting values exceed the threshold. The method 400 can also include monitoring future requests from the processor nodes to corresponding data blocks in the remote memory and parameters associated with each of the future requests. This can include fulfilling the future request to the processor nodes based on data retrieved from the remote memory and stored locally in response to performing the subsequent memory action prior to the future request.
What have been described above are examples. One of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, this disclosure is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite “a,” “an,” “a first,” or “another” element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term “includes” means includes but not limited to, and the term “including” means including but not limited to. The term “based on” means based at least in part on.