The present invention relates to a memory system having a writable data memory and means for recognizing and correcting an error in a data word read out from the data memory as well as an operating method for such a memory system.
Functional interference may occur in a writable data memory, which is manifested in that one or more bits of a stored data word spontaneously change their value. If such a data memory is used in a safety-relevant application, e.g., in an engine control unit of a motor vehicle or the like, it is absolutely necessary to recognize interference of this type and take suitable countermeasures to avoid dangerous malfunctions. In the simplest case, the countermeasures may include terminating an application which accesses the data memory in a predetermined way upon recognition of an error, so that a faulty data value is no longer accessed and maloperations because of the error are precluded. The application may then no longer be operated until the error is corrected in the data memory.
To avoid such an operational interruption, storing data words in a memory together with redundant information, on the basis of which not only may an error of the data word be recognized, but rather this error may also be corrected under certain circumstances, has come into consideration. Certain conventional encoding methods allow errors in the data word to be recognized and corrected, such as the Reed-Solomon or Hamming codes. Error correction codes may therefore be assumed to be known within the scope of the present description and are not explained in detail. If an application accesses a cell of the memory and establishes on the basis of the redundant information that the data word stored in the cell is faulty, a corrected data word may be provided to the application, and the application may be operated further without the danger of a maloperation.
The number of bit errors which may be corrected in a data word or in a block of data words encoded jointly using an error correction code is a function of the bit count of the redundant information produced for this data word or block. This means, for example, that if the bit count of the redundant information is sufficient to correct a single bit error in a data word or block, the operating capability of the application may be maintained only as long as no more than one bit error occurs in the affected data word or block. As soon as a second bit error occurs, correction is no longer possible, and the application must be terminated as described above.
However, memory errors tend to occur in groups, which means that the probability of the occurrence of an error in a memory bit is not equal everywhere, but rather is particularly high in the surroundings of an already existing error. To ensure continued usability of the memory even if a large number of bit errors occur closely adjacent to one another, a large quantity of redundant information is required, which increases the size of the required memory location and as a result the costs of the memory system.
An example method for operating a writable data memory or a memory system having such a data memory is provided according to the present invention, which allows insurance of a high degree of availability of the data memory and keeps the memory location required for storing redundant information small.
One advantage that may be achieved is that together with one data word, the redundant information assigned to this data word is read out from the data memory, it is checked on the basis of the redundant information whether the data word is faulty, and, if it is faulty, the data word is not only corrected, but rather is additionally written to a new address in a free area of the data memory. Because a correct version of the data word is thus again located at the new address, possible future errors occurring at this address may be corrected in the maximum number possible on the basis of the redundant information. The reliability of the data memory is therefore not impaired by the occurrence of individual bit errors as long as free memory location is available, into which the contents of defective memory cells may be moved. Because in most cases the new address will be far away from the original address of the data word recognized as faulty, the probability of the occurrence of further bit errors at the new address is less than at the original address, which further improves reliability.
The read sequence of the data words in the data memory is expediently altered to access the new address for reading the data word. This is necessary in particular if the data word represents a program instruction which must be executed in a predefined relationship with other instructions.
To alter the read sequence, at least one data word preceding the corrected data word in the read sequence may be written together therewith in the free area of the data memory, to thus be able to place, at the original memory location of the preceding data word, a reference, e.g., a jump instruction, to its new memory location.
After correcting the data word, a reference to a memory location, which follows the original memory location of the corrected data word, may be written to the free area.
Alternatively, the possibility exists of providing the free area in which the corrected data word is written in an address area following the address of the data word recognized as faulty, in that the contents of memory cells whose addresses follow those of the data word recognized as faulty are shifted.
Instead of shifting the memory cells following the data word recognized as faulty backward to provide the free area, the cells may also be provided, of course, in that the contents of memory cells whose addresses precede that of the data word recognized as faulty are shifted forward, in this case a reference to a memory location following the original memory location of the corrected data word being written in the free area following the corrected data word.
In both cases, it may be expedient if the shift of addresses significantly distant from the address of the data word recognized as faulty to addresses proximally adjacent thereto occurs progressively, so that data words do not have to be buffered outside the memory at a point at which a data loss is possible, for example, due to shutdown of the data processing system used by the memory system according to the present invention.
For the same purpose, shifting preferably includes copying a data word from an original address to a new address, followed by overwriting the original address using another data word after copying. It is thus ensured that every data word is present at least once in the memory at every instant.
If the set of data words contains a reference to a data word which has been moved into the free area, i.e., a jump instruction to this data word in the case of program instructions, for example, this reference is to be ascertained and adapted to the new address of the data word.
If data words before or after the data word recognized as faulty are shifted, references to shifted data words in the non-shifted data words and relative references to non-shifted data words in the shifted data words are also to be adapted to the shift to ensure further correct execution of the program instructions.
Because of the increased probability of the occurrence of errors in close proximity to one another, it is always expedient to check whether the data word recognized as faulty is part of a block having multiple faulty data words and possibly to correct the entire block and write it in the free area.
Further features and advantages of the present invention result from the following description of exemplary embodiments with reference to the attached figures of the drawings.
A motor vehicle control unit is illustrated in
Memory monitoring circuit 103 is connected to an interrupt input 107 of processor 101 to trigger an interrupt of processor 101 if an error is recognized in a data word of flash memory 102. The application program is interrupted by this high-priority interrupt, and processor 101 reads out the redundant bits for the data word recognized as faulty and executes decoding to correct the faulty output data word from memory 102, and enters the address at which the faulty data word was read in a table. The application program is subsequently continued on the basis of the corrected data word.
Program instructions which are to be executed in the case of an interrupt of processor 101 triggered by monitoring circuit 103 may be stored in flash memory 102 like the application program. Because in this case the interrupt triggered by monitoring circuit 103 is no longer executable if the error or a further error is located in the program instructions of this interrupt, a further read-only memory 108 may be provided for the program instructions of the interrupt, which, in contrast to flash memory 102, does not have to be overwritable by processor 101 and in which the probability that a stored bit is faulty is less than in flash memory 102.
According to a first example embodiment of the method according to the present invention, processor 101 reads the program instructions in flash memory 102, if no jump instructions are contained, in the sequence of rising addresses. If monitoring circuit 103 does not detect any errors in the read program instructions, they are executed by processor 101 as read. If monitoring circuit 103 recognizes a program instruction as faulty, i.e., for the first time with instruction Instr7 in the case shown in
During the execution of the high-priority interrupt, a second interrupt is triggered, whose priority is lower than that of the first interrupt and also than that of the specific time-critical parts of the application program and which causes processor 101 to perform a correction of the content of flash memory 102. This correction does not have to occur immediately after detection of the error in the flash memory, because the system still remains capable of running in that it corrects the errors in real time as described above. In relation to the concrete application example of an engine control unit, this means that a correction of the content of flash memory 102 does not have to be performed immediately after recognition of the error, but rather may be delayed until an interrupt of the application program required for error correction may be performed harmlessly, e.g., when the vehicle is at a standstill, in the afterrunning of an engine controller, or in an idle task.
After processor 101 has executed corrected instruction Instr7, in the present example, it addresses instruction Instr8, which is also assumed to be faulty. The sequence described above is repeated: the error is corrected during a brief interruption of the application program on processor 101, the corrected instruction is executed, and the second interrupt is triggered, using which the faulty instruction is later to be corrected.
If a lower-priority part of the application program is executed at a later time, i.e., when the application program may be interrupted long enough to execute the second interrupt and correct the error established in flash memory 102, a list of faulty memory cells exists due to the high-priority interrupts triggered upon each occurring error. In the exemplary case considered here, this list includes memory cells 6 and 7 having instructions Instr7 and Instr8.
According to a first example embodiment of the method according to the present invention, when executing the second interrupt, processor 101 writes instruction Instr6, which immediately precedes the instructions of faulty memory cells 6, 7, at the first free memory cell of memory 102, i.e., memory cell 11 in the present case, writes corrected instructions Instr7 and Instr8 to following memory cells 12, 13, and writes a jump instruction to cell 8, which follows the faulty cell, to memory cell 14. Instruction Instr6 in cell 5 is overwritten by a jump instruction to cell 11.
Defective memory cells 6, 7 no longer need to be accessed. Because the content of these memory cells has been corrected before the transfer into cells 12, 13, an error occurring in these new cells may also be corrected in the same way as described above, if sufficient free memory space is available for this purpose.
A second example embodiment of the method is explained on the basis of
The example method described on the basis of
In practice, an application program has a large number of jump instructions. To ensure that the jump instructions remain correctly executable, it is necessary to identify them among the instructions of the application program and correct them if necessary. In the case of the embodiment of the method explained with reference to
In the case of the embodiment explained with reference to
Because the example method according to the present invention does not require correction of a detected error in flash memory 102 immediately after the detection, but rather the correction may be delayed until a suitable time, the method is well compatible with real-time applications which must fulfill specific tasks within predefined time limits. A delay which results from decoding the content of a faulty memory cell may nonetheless interfere with such an application. To minimize the probability that such a correction will be necessary, it may be expedient to read the program instructions stored in flash memory 102 successively in a starting phase of the application, in which no strict real-time requirements are yet to be fulfilled, to detect possible memory errors. If no memory error is detected, the application may subsequently go into operation normally; however, if a memory error is present, it is possible to correct it before the real-time requirements become stringent. In regard to the exemplary embodiment of an engine control unit, this means, for example, that a test for faulty memory cells is always performed when a user, for example, expresses his wish to start the engine by turning an ignition key, and an actual start of the engine is first controlled by the engine control unit after, if necessary, faulty memory cells have been corrected.
Performance in the afterrunning stage of the control unit is also expedient, i.e., in a limited time span after turning off the engine, in which the control unit still remains active.
A second example embodiment of a data processing system, which offers further increased operational reliability in relation to the embodiment from
The second interrupt may be handled in the example embodiments described above by the same processor 101 or 111 which also handled the first interrupt. However, it is also possible to have it handled by an external processor, which communicates with the data processing system of
A further possible version is to design monitoring circuit 103 in such a way that it not only executes the recognition of an error in a data word output by memory 102, but rather also its decoding and correction, without using processor 101 assigned to memory 102. The temporary interruption of processor 101 which is necessary to prevent it from accepting a data word output incorrectly on bus 106 may occur here in that monitoring circuit 103 interrupts a clock signal supplied to processor 101 as long as it needs to in order to correct the faulty instruction output by memory 102 and output it in turn correctly on bus 106. Breakdown of the decoding as a result of a faulty stored interrupt instruction in memory 102 is also precluded here. This example embodiment has the advantage of being able to correct errors not only in an instruction memory, but rather also in a parameter memory.
The present invention is also applicable to other types of data memories. Thus, for example, a hard drive may be used as a memory, on which useful data are stored in blocks together with redundant information assigned to each block and, in the case that an error is recognized on the basis of the redundant information, the affected block is corrected, stored again at another point of the hard drive surface, and a block which precedes the faulty block in the read sequence of a file to which the blocks belong is provided with a reference to the new memory location of the corrected block. The corrected block may in turn receive a reference to a following block in the read sequence, so that the blocks may still be read according to the sequence, even if they are not recorded in a contiguous location on the disk surface.
Number | Date | Country | Kind |
---|---|---|---|
102005040916.4 | Aug 2005 | DE | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2006/064768 | 7/28/2006 | WO | 00 | 5/13/2009 |