Computing devices comprising at least one processor coupled to a memory are ubiquitous. Computing devices may include personal computing devices (PCDs) such as desktop computers, laptop computers, portable digital assistants (PDAs), portable game consoles, tablet computers, cellular telephones, smart phones, and wearable computers. In order to meet the ever-increasing processing demands of users, PCDs increasingly incorporate multiple processors or cores running instructions or threads in parallel.
Such PCDs often include multiple finite state machines (FSM) as part of various systems or subsystems of the PCD, including for example, in association with power up, autonomous charging, and other functions requiring digital command signals. Finite state machines require a valid input clock in order to sequence through the intended states and operate properly. If the clock signal is not valid or turned off for some reason during the operation of the finite state machine, the finite state machine becomes stuck or hung in a fixed, unknown state.
Finite state machines interact with a bus using an interface clock for the bus to transmit or receive commands and/or data. If the signal for the interface clock is not valid or is turned off, then the bus is stuck or hung and any further transactions on the bus are not possible. Recovering the finite state machine, and the bus, requires a hard reset or battery removal. Additionally, when a finite state machine becomes hung, resulting in the bus hang, there is very limited visibility into the internal state of the finite state machine to diagnose the issue. This causes difficulties tracing the error that caused the finite state machine and bus to hang and/or reproducing the problem for diagnosing and resolving the issue.
Accordingly, there is a need for improved methods and apparatuses to detect when a finite state machine has become hung due to loss of the clock signal resulting in a hung bus, and to recover the finite state machine, and the bus, without the need for a hard reset.
Apparatuses, systems, methods, and computer programs are disclosed for detecting and resolving bus hang in a bus controlled by an interface clock are disclosed. The apparatuses, systems, methods, and or computer programs detect that a finite state machine operating on the interface clock is in a hung or unknown state due to signal clock failure from the interface clock, and allow for recovering the finite state machine and the bus, from the hung state without the need for a reset.
An exemplary system comprises a bus of a computing device, the bus operating in accordance with an interface clock, and a controller in communication with the bus. The exemplary controller comprises a finite state machine in communication with the bus, where the finite state machine configured to receive a clock signal from the interface clock and a command signal originating external to the controller. The exemplary controller also comprises hang detection logic. The hang detection logic is configured to receive one or more signals that the finite state machine is active, monitor the interface clock, and generate an event notification in response to the interface clock turning off while the finite state machine is active. The exemplary controller further comprises a trap handler in communication with the hang detection logic, where the trap handler is configured to send an interrupt in response to the event notification.
In another embodiment, a method for resolving bus hang in a bus of a computing device, the method comprising is provided. The exemplary method comprises receiving one or more signals that a finite state machine of the computing device is active, an output of the finite state machine in communication with the bus; monitoring an interface clock of the bus, the interface clock providing an input signal to the finite state machine; generating an event notification in response to the interface clock turning off while the finite state machine is active; sending an interrupt in response to the event notification; and performing a recovery operation in response to the interrupt.
In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” or “image” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In this description, the term “computing device” is used to mean any device implementing a processor (whether analog or digital) in communication with a memory, such as a desktop computer, gaming console, or server. A “computing device” may also be a “portable computing device” (PCD), such as a laptop computer, handheld computer, or tablet computer. The terms PCD, “communication device,” “wireless device,” “wireless telephone”, “wireless communication device,” and “wireless handset” are used interchangeably herein. With the advent of third generation (“3G”) wireless technology, fourth generation (“4G”), Long-Term Evolution (LTE), etc., greater bandwidth availability has enabled more portable computing devices with a greater variety of wireless capabilities. Therefore, a portable computing device may also include a cellular telephone, a pager, a smartphone, a navigation device, a personal digital assistant (PDA), a portable gaming console, a wearable computer, or any portable computing device with a wireless connection or link.
In order to meet the ever-increasing processing demands placed on PCDs, within the small form factors, PCDs increasingly incorporate multiple processors or cores (such as central processing units or “CPUs”) running various threads in parallel. Such PCDs often include multiple finite state machines (FSM) as part of various systems or subsystems of the PCD. FSMs require a valid input clock in order to sequence through the intended states of the FSM and operate properly. The input clock for the FSM may be an interface clock controlling a bus or interconnect over which the FSM communicates. Additionally, FSMs require a digital signal, such as a read/write signal, a transmit data signal, other command signal, or data signal that directs the FSM to perform the desired action.
As mentioned, in operation a FSM sequences among various pre-defined states.
Next the FSM may sequence to a Send Data state 316a where, in accordance with the received command, data is sent to one or more recipient component, system, application, process, etc. component, system. When the command has been executed, the FSM may sequence to a Command Complete state 318a where a signal may be sent and/or a bit set to indicate that the FSM has executed the received command. The FSM may then sequence back to the ready state Waiting for Command 310a. As will be understood, the state illustrated in
FSMs may be used for a wide variety of purposes. Example uses of a FSM include a controller for a power management integrated circuit (PMIC) for power management, in a controller for a camera of the PCD, in a controller for one or processors of the PCD, such as a graphical processing unit (GPU), etc. The present apparatuses, systems, and methods, are applicable to any such uses of a FSM that uses (i.e. has as an input) an interface clock of a synchronous serial communication interface bus or interconnect.
As is understood, when the FSM is operating (e.g. executing the states illustrated in
Such rapid detection and recovery of an unknown/hung condition of an FSM provides several benefits not possible with current solutions. For example, the systems and methods of the present disclosure allow for recovery from the hung FSM and the hung bus, without reset of the PCD and/or a system-on-a-chip (SoC) of the PCD, resulting in an improved user experience. Additionally, rapid detection of the hung FSM and recovery/debug via software allows for much quicker and easier diagnosis of commands or situations that cause the interface clock to turn off, and the FSM and bus to hang, than are possible with current solutions.
Although discussed herein in relation to PCDs, the systems and methods herein—and the considerable savings made possible by the systems and methods—are applicable to any computing device.
As illustrated in the embodiment of
The illustrated controller 104 comprises various components including an AHB interface 106, which may be a bridge to allow or control access between the controller 104 (or components of the controller 104) and the first bus 135. Controller also includes an additional bus interface 108 to allow or control access between the controller 104 (or components of the controller 104) and the second bus 145.
The AHB interface 106 may be coupled to various registers in some embodiments, such as configuration registers 107 that may contain configuration information for controller 104 received from one or more software masters 130. Additionally, in some embodiments AHB interface 106 may be coupled to channel registers 105 that may contain the command requests received at the controller 104 from one or more software masters 130. As illustrated in
The controller 104 also includes a FSM 110 implemented in hardware. The FSM 110 may be configured in any manner desired, and may include one or more states, such as states 310a-318a of
As will be understood, in other embodiments FSM 110 may receive an input signal, such as a command signal, from more or different sources than those illustrated in
As shown in
Hang detection logic 112 is configured to then generate an event notification 113. In an embodiment, the event notification 113 may be asynchronous and may comprise a signal regarding the clock state and/or a trap signal that is sent to another component such as a trap handler 114. Trap handler 114 may be implemented in hardware, software, or both. In an embodiment, trap handler 114 may be implemented in hardware, such as a sequence in register-transfer level (RTL) for one or more registers of the controller 104. In other embodiments, the trap handler 114 may not be located on the controller 104.
The trap handler 114 is in communication with one or more debug register 116. On receiving an event notification 113 from hang detection logic 112, the trap handler 114 may be configured to cause information to be written to one or more of the debug registers 116. In an embodiment, trap handler 114 may write a bit to one of the debug registers 116 indicating that the interface clock has stopped. Such information may be subsequently used to recover from the FSM 110 hang and bus 145 hang. In other embodiments, the trap handler 114 may write additional information to the debug registers 116, including information about the state of the FSM 110, an identity of the command being executed by the FSM 110 when the interface clock 240 (see
As discussed more below (see
It will be understood that
Turning to
As will be understood, such PMIC arbiter 204 may be used to control the power distributed among the various components of a PCD. PMIC arbiter 204 may accomplish this control via a system power management interface (SPMI) bus 245 in communication with the PMIC arbiter 204. In the embodiment of
As illustrated in
As illustrated in Table 1 clock_state signal 226 is only true/high when the interface clock input into FSM 110 is on (clock_off signal 222 is 0) and the read/write signal 224 is being received by FSM 110 (read/write signal 224 is 1).
The clock_state signal 226 in
In this configuration, the output of the NAND gate 230 will always be on/high, until the interface clock turns off (clock_state signal 226 is 0 and turned to 1 by second not gate 237) while the FSM 110 is active (FSM_active signal 228 is 1). An illustrative truth table for the inputs to NAND gate 230 for
Thus, with this exemplary configuration of hang detection logic 112 logic illustrated in
As will be understood other implementations of the logic for hang detection logic 112 beyond those illustrated in
As discussed above for
Additionally or alternatively, trap handler 114 may write or place other information into one or more of debug registers 116, such as information about which command FSM 110 was executing then FSM 110 and SPMI bus 245 became hung. Such information may be obtained from FSM 110. Such information may also or instead be obtained from hang detect logic 112 which also receives the command received by FSM 110, illustrated as read/write signal 224 in
Trap handler 114 also causes an interrupt 118 to be sent to one or more destinations outside of the PMIC arbiter 204, providing visibility to external components and/or software that the hang event has occurred. In an embodiment, the trap handler 114 may cause PIC 117 to generate and send the interrupt 118. The interrupt 118 may contain information to allow another component and/or software to recover the hung FSM 110 and hung bus 245 without performing a hard reset of the PCD (or of an SoC). Such information may be contained within the interrupt 118. Alternatively, the interrupt 118 may direct the external component and/or software to the debug register(s) 116 where the trap handler 114 has stored information about the event.
The one or more signals indicating that FSM 110 is active in block 402 may comprise one or more signals received at the hang detection logic 112. As shown in the embodiment of
Method 400 continues in block 404 where an interface clock that provides an input to the bus interface, such as an SPMI bus interface 108, is monitored. Referring back to
In block 406 an event notification is generated if the interface clock turns off while the finite state machine is active. In an embodiment, the hang detection logic 112 may generate the event notification, such as trap_signal 232 (see
For example, as discussed above for
Turning to
Method 500 begins in block 502 where a determination is made that a clock for a bus interface has stopped. In an embodiment, this determination of block 502 may comprise a determination that an interface clock providing an input signal to the bus interface (such as bus interface 108). For example, as discussed above, a determination may be made that the clock for SPMI bus 245 which provides and interface clock signal 240 to bus interface 108 has stopped. Such determination may comprise a component such as trap handler 114 receiving a notification, such as a change in trap_signal 232 from hang detection logic 112 as discussed above for
In block 504 information about the state of a finite state machine is recorded. Block 504 may comprise in some embodiments, a component such as trap handler 114 acting in response to an event notification from hang detection logic 112 to place or write information about the event into a memory or register such as debug registers 116 illustrated in
Method 500 continues to block 506 where an interrupt is sent. In an embodiment, in response to receiving the event notification (such as a change in trap_signal 232 of
Interrupt 118 informs or alerts the external component(s) and/or the software of the FSM 110/bus 245 hang and/or the stoppage of the bus 245 clock so that the system may recover from the hang condition. In an embodiment the interrupt 118 may include information, such as a register address, directing the software to the debug register(s) 116 containing information about the event—e.g. one or more debug registers 116 where the trap handler 114 has written or placed information indicating that the interface clock has stopped/turned off, information about the state of the FSM 110, etc.
After receiving the interrupt the external component(s) and/or software may act to recover from the hung state, generally indicated in blocks 508-510 of method 500. In some embodiments the recovery may include more actions, fewer actions, or different actions than those illustrated in the exemplary blocks 508-510. Additionally, in various embodiments the actions may be taken in different order than illustrated by blocks 508-510 of
In an embodiment the recovery may include resetting the finite state machine in block 508. Block 508 may be performed in an embodiment by one or more component or software external to the controller 104/PMIC arbiter 204 receiving the interrupt, acting in response to the interrupt, such as interrupt 118. The interrupt 118 itself may identify the finite state machine (such as FSM 110) that needs to be reset in block 508 some embodiments.
In other embodiments, the interrupt 118 may just indicate that an event has occurred. For such embodiments, the external component(s) or software may be directed to a memory location, such as one or more of debug registers 116 that contains information such as a bit indicating the nature of the event—e.g. that a particular interface clock has stopped, that a particular FSM 110 and/or bus 245 has hung, etc. Based on this information, the external component or software may determine what the event is, where the event occurred, what bus 245 clock or FSM 110 is involved, and/or to reset the FSM 110 (block 508).
Additionally, or alternatively, in response to the interrupt 118, the recovery may include causing one or more commands to be resent to the finite state machine in block 510. As will be understood, when a finite state machine, such as FSM 110 becomes hung or enters an unknown state any data being acted on by the FSM 110, such as data transmissions, is lost. As a result, the recovery in method 500 may also, or alternatively, include causing a retransmission to the FSM 110 of the digital signal that the FSM 110 was acting on when it became hung.
As will be understood, causing the command to be resent to FSM 110 may be accomplished in a variety of ways. For example, in the embodiment of
Systems 100 (
A display controller 628 and a touch screen controller 630 may be coupled to the CPU 602. In turn, the touch screen display 606 external to the on-chip system 102 may be coupled to the display controller 628 and the touch screen controller 630.
Also, a video port 638 is coupled to the video amplifier 636. As shown in
Further, as shown in
As further illustrated in
Referring to
As discussed above, it will be understood that one or more of the components of the PCD 600 or SoC 102 listed above, including one or more of the “controllers”, may include or implement a finite state machine such as FSM 110. The systems 100/200 and methods 400/500 may be implemented in any such FSM 110 of PCD 600 or SoC 102 that operates on a synchronous interface clock, such as from one or more busses or interconnects of the PCD 600 or SoC 102.
It should be appreciated that one or more of the method steps described herein may be stored in the memory as computer program instructions. These instructions may be executed by any suitable processor in combination or in concert with the corresponding module to perform the methods described herein. Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described.
However, the invention is not limited to the order of the steps or blocks described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps or blocks may performed before, after, or parallel (substantially simultaneously with) other steps or blocks without departing from the scope and spirit of the invention. In some instances, certain steps or blocks may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example.
Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the Figures which may illustrate various process flows.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, NAND flash, NOR flash, M-RAM, P-RAM, R-RAM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
Disk and disc, as used herein, includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.