1. Field of the Invention
The present invention relates to a method for data transmission among electronic control units, referred to hereinafter as ECUs, and/or measuring devices in the realm of motor vehicles, and the present invention further relates to an ECU interface module, an ECU and a measuring device.
2. Description of the Related Art
The number of electronic control units and in particular of engine control units (ECUs) in motor vehicles and their internetworking increases continuously. For example, new powertrain technologies lead to faster control loops and Ethernet is starting to complement or even replace traditional interconnect technologies in cars like CAN, FlexRay, LIN and MOST.
These developments result in rapidly increasing data throughput and more challenging real-time requirements for ECUs and embedded ECU interface modules.
The next generation of ECU interface devices will therefore step from Fast Ethernet to Gigabit Ethernet within the distributed measurement, calibration and diagnostics (MCD) systems. Embedded rapid prototyping (RP) systems will utilize PCI Express technology to meet the demanding latency and jitter requirements.
The traditional way of software based quality of service (QoS) and protocol processing is not able to handle the large variety of protocols across multiple layers at the requested performance.
Traditional automotive protocol standards and the corresponding reference diagram known from the prior art are based on a software driven client-server pattern. A client running on a powerful standard personal computer hosting a MCD application software or prototyping software acts as an intelligent protocol master and a server in the embedded ECU interface module acts as a command executing protocol slave.
In the known automotive protocol standards only the link layer in the server is implemented using standard controller hardware (CAN controller, Flexray controller, Ethernet media access controller or similar controllers). Higher protocol layers like network layer, transport layer and automotive protocol service layers are all implemented in software running on top of a real-time operating system with some limited standard hardware support like Direct Memory Access (DMA).
Implementing multiple protocol layer stacks on top of different link layers within a very restricted set of central processing units (CPU) requires the serialization of the processing associated with asynchronously incoming data events (frames) using the services and software threading methods provided by the underlying operating system.
However, the serialization and the context switching overhead associated with software threading restrict the maximum event rate. This event rate restriction appears to be the main bottleneck for all software based real time systems. It results in an increased JO latency and jitter for prototyping application, increased round trip times for transactions and a restricted measurement throughput because of the resulting network frame rate limitations. Performance optimization in the software based real time systems is difficult if not even impossible to achieve, because throttling of event rates for reducing the context switching overhead corrupts prototyping's and control planes' low latency requirements. Using multicore CPU technology for increasing software processing power corrupts the ECU interface modules power consumption requirements and cannot efficiently accelerate a single high bitrate data conversation (e.g. single Transmission Connect Protocol (TCP) connection) since the conversation's packet ordering forbids parallel processing of its packets.
Therefore, it is an object of the present invention to provide a method for data transmission of the above-identified kind, enabling an accelerated data transmission, in particular fast (low) event cycle times, a low jitter and a high data throughput.
This object is achieved by the method for data transmission of the above-identified kind, wherein the architecture of the data transmission is split up into a control plane implemented in software operating on configuration, calibration and/or diagnostics (CD) data and a data plane implemented in hardware transporting measurement (M) data and/or prototyping (RP) data.
The present invention suggests a component, which implements a new paradigm for a hardware based multilayer protocol processing and hardware based QoS in embedded ECU interface modules. This new architectural approach provides at least a tenfold performance increase compared to the current technology.
The invention suggests a new technology, which uses a new architectural view. According to the multi-service properties of automotive protocols, the architecture is split into a control plane implemented in software operating on configuration, calibration and diagnostics (CD) data, preferably transported using transactions (T), and a data plane implemented in hardware transporting measurement (M) and prototyping (RP) data, preferably transported using stateless data streams (S).
The implementation of the data plane in hardware has several major advantages in respect of the prior art, which comprise:
An especially effective optimization can be achieved, for example, by an ASIC development instead of an FPGA development.
According to the present invention the following terms have the subsequently defined meaning:
Receive=reception of data from external line,
Transmit=transmission on external line
Forward=forward data from a device to a data processor core and further to a device.
Hence, the sequence is always: Receive=>Forward=>Transmit
According to a preferred embodiment, the data to be forwarded or the data stream, respectively, is segmented into multiple data segments on the receiving device's side before data forwarding. Furthermore, it is possible that the data segments are interleaved before data forwarding. Finally, it is advantageous if the data segments of various receiving devices are multiplexed before data forwarding. The multiplexing can be effected alternatively or additionally to the interleaving before or after the interleaving process. Then the data segments are forwarded sequentially. If more than one device wants to forward data, according to the result of the interleaving and/or multiplexing process, data forwarding is switched between data segments from a first device and data segments from another device. After forwarding of the data segments the transmitting device collects the data segments into data units of the outgoing device's interface or line. Hence, microscopically seen (<1 μs) the data or the data segments, respectively, are forwarded sequentially. However, macroscopically seen (>10 μs) the data forwarding is effected in parallel (or quasi-parallel, respectively), because the switching of the data forwarding between the data segments from the first device and the data segments from the other device is performed very fast. This is only possible because the data plane is implemented in hardware, which allows fast context switching. Context switching comprises storing and restoring the state (context) of a processing unit so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single processing. The context switch is an essential feature of a multitasking operating system.
In contrast thereto, in the prior art software architecture the switching of the data forwarding between the data segments from the first device and the data segments from the other device is performed much slower. This is due to the fact that the switching in the software architecture comprises a plurality of steps of saving registers, storing the current context, freeing the CPU for a new context, etc. The context switching takes much longer if realized in software than it takes when realized in hardware. One reason for this is that a software based context switching consumes a lot of time for each software interrupt.
The functioning of the hardware based data plane implementation according to the invention and the ECU interface module, respectively, can be compared to a co-processor for accelerating data transmission among ECUs and/or measuring devices. A conventional co-processor supports a CPU in processing data. A difference of the present invention in respect to a conventional co-processor is the fact that all data, which is to be processed by the co-processor, first passes through the processing unit before it is processed in the co-processor. This is different in the hardware based data plane implementation according to the invention and the ECU interface module, respectively. According to the invention, all data to be transmitted passes through the hardware based data plane implementation and the ECU interface module, respectively. By this the ECUs and their CPUs, respectively, are significantly relieved from handling and processing data for the sake of data forwarding.
Preferably, the data plane switches the commands and responses of transactions and/or the streams to and from the ECUs and/or the measuring devices. However, it is noted that—in contrast to, for example, PCI Express switches—the switching of the data plane is not transaction aware. In PCI Express switches the switching unit remembers the commands' path and uses this knowledge for the responses along the reverse direction. According to the present invention both paths must be configured beforehand.
The present invention is preferably used in the realm of motor vehicles. It can be realized as an ECU interface module located between a first ECU and a second ECU or a measuring device connected to the first ECU. Furthermore, it could be realized in a measuring device, which can be connected to one or more ECUs of a motor vehicle in order to monitor and/or control their operation and functioning. Furthermore, the invention could be realized in a gateway control unit of an ECU. With other words, the invention could also be implemented in the ECU itself.
Further features and advantages of the present invention are explained in detail hereinafter with reference to a preferred embodiment of the invention and the figures. It is appreciated that the present invention does not necessarily have to comprise all of the features described below with reference to the preferred embodiment but may just as well have only some of the mentioned features alone or in any combination with selected other features.
Automotive protocol standards and the corresponding reference diagrams known from the prior art are based on a software driven client-server pattern, an example of which is shown in
Only the link layer in the server is implemented using standard controller hardware (CAN controller, Flexray controller, Ethernet media access controller or similar controllers). Higher protocol layers like network layer, transport layer and automotive protocol service layers are all implemented in software running on top of a real-time operating system with some standard hardware support like Direct Memory Access (DMA).
Implementing multiple protocol layer stacks on top of different link layers within a very restricted set of central processing units (CPU) requires the serialization of the processing associated with asynchronously incoming data events (frames) using the services and software threading methods provided by the underlying operating system.
However, the serialization and the context switching overhead associated with software threading restrict the maximum event rate. This event rate restriction appears to be the main bottleneck for all software based real time systems. It results in an increased JO latency and jitter for prototyping application, increased round trip times for transactions and a restricted measurement throughput because of the resulting network frame rate limitations. Performance optimization is difficult if not even impossible to achieve, because throttling of event rates for reducing the context switching overhead corrupts prototyping's and control planes' low latency requirements. Using multicore CPU technology for increasing software processing power corrupts the ECU interface modules power consumption requirements and cannot efficiently accelerate a single high bitrate data conversation (e.g. single TCP connection) since the conversation's packet ordering forbids parallel processing of its packets.
In contrast thereto, the present invention suggests a new technology which uses a different architectural view. According to the multi-service properties of automotive protocols, the architecture is split into a control plane implemented in software operating on configuration, calibration and diagnostics (CD) data transported using transactions (T) and a data plane implemented in hardware transporting measurement (M) and prototyping data (RP) stateless data streams (S). The respective control and data plane implementation view is shown in
The transactions between master (client) and slave (server) can also be split into stateless data streams: a downlink stream carrying commands and an uplink stream carrying responses and events. With this transaction split in mind, the control plane appears to the data plane as an ordinary input device for uplink traffic and an ordinary output device for downlink traffic. The data plane switches these transaction streams to and from the devices (ECU sink, ECU source) responsible for the desired physical port. The transaction state itself which associates downlink and uplink streams is located in the control plane.
As shown in
Another satellite function shown in
A major advantage of a common data plane is the fact that all incoming traffic can be observed which is essential to control quality of service (QoS) in respect or latency, jitter, throughput etc.
The separation of control plane and data plane suggested by the present invention enables optimized implementations. Further optimizations can be achieved by an ASIC development instead of an FPGA development.
The data plane receives data carrying events from input devices, serializes these events very fast without any context switching overhead, performs a set of generalized processing steps and distributes data carrying events to the output devices.
Quality of service (QoS) aware queuing and scheduling of data events is only performed within the data plane which allows a much better control of traffic priority for transaction round trip times, latency, jitter and fairness for data throughput.
Devices operate and deliver events concurrently while the data plane core or data processor processes events sequentially following a strict pipelining approach with built-in cooperative hardware multithreading as shown in
Devices operate concurrently, encapsulate physical and link layer specific behavior or stateful protocol functions and provide serialization support to the data plane core pipeline.
Data Plane Core Pipeline=Data Processor
Performs the actual event processing based on a generic information model: classification, re-assembly, queuing, hierarchical scheduling, segmentation and header modification.
The pipeline consists of a set of chained well defined processing stages (IF, CL, IM, SW, MC, EQ, LL, DQ, SR, EM, IF).
The data processor operates on descriptors (pointers) of data rather than the data itself, which saves logic gates and power.
Descriptor pools are repositories of indices to data buffers located in local or remote storage.
There are two data storage types:
Devices deliver data in form of data segments of a fixed maximum size, e.g. 128 Byte. The size is determined by the bandwidth to and from the data storages. Multiple segments form a so-called data unit which corresponds to a protocol layer specific frame format, e.g. data transfer object for Universal Measurement and Calibration Protocol (XCP) layer, Ethernet frame for IEEE 802.1 link layer, CAN frame from the CAN bus or a Flexray frame.
Devices have knowledge about the segmentation and can mark the first and the last data segment of a data unit.
Data Segmentation is a pre-condition for a non-blocking date event processing in the data processor where events are serialized and where the processing is distributed among multiple pipeline stages.
The data processor implementation is based on a generic information model, like the one shown in
Devices are concurrent working producers and consumers of data segments. Each data segment is associated with a device index, a channel index and first/last tag indicating whether the data segment is the first, an intermediate or the last data segment of a data unit.
A receive channel is an independent logical input data channel within the scope of a device. Devices may have multiple receive channels. Receive channels are used to reassemble data units from timely interleaved data segments.
A flow is the description of a data conversation with respect to a specific protocol layer. It is used to control data processing operations in the data processor, e.g. insertion of a protocol specific header into the outgoing payload data stream. Flows are attached to receive channels.
Queues store descriptors of data. There are three basic types of queues: tail drop queues, triple buffers and priority lists. Tail drop queues are used to control quality of service (QoS), e.g. traffic priority and fairness. Triple buffers are used in prototyping real-time IO (RTIO) functions to provide the signal processing with a consistent sample of newest input data. Priority lists are used on bus systems with a strict priority based arbitration like the CAN bus. A secondary task of queues is to reassemble data units from incoming data segments.
The transmit priority channel represents an independent data channel in egress direction. Each transmit priority channel denotes a strict priority scheduling level within the scope of its associated output device. The transmit priority channel itself is associated with one or more queues, which are scheduled in a weighted round robin manner.
Transmit priority channels may schedule data segments (change scheduled queue after each segment) or data units (change scheduled queue once the running data unit has been completely scheduled). Transmit priority channels may change the segmentation granularity, by scheduling a data segment multiple times but with a lower byte count and providing an additional offset into a data buffer.
The channel interworking function is an optional entity and is used to provide aggregation information between ingress and egress path for a traffic flow distributed among multiple sequential data segments.
The data processor implements a 3-stage hierarchical scheduling: 1) weighted round robin scheduling of devices, 2) strict priority scheduling of transmit priority channels within a device and 3) weighted round robin scheduling of queues within the scope of a transmit priority channel.
The data core pipeline consists of generic processing stages as described below. The data core pipeline architecture is shown in
The input interface connects the devices to the data plane core and serializes concurrently incoming data events using a weighted round robin scheduling among connected devices. Data associated with events can be passed “by value”, which is the default or “by reference”. In the latter case, the data processor assumes that the data payload associated with the event has already been stored and receives only the buffer index to the data payload. The example described below illustrates the handover by reference in the context of an XCP on TCP use case. This is also the subject of the other patent application (application number EP 12 188 660) filed by the same applicant. This application is incorporated herein by reference.
The classifier uses the descriptor along with the first payload data to classify the data segment by building a key from the above information and producing a result vector which contains the following information: pool index, queue index, drop tag, flow id, interworking function index and others. This information is entered into the descriptor.
The ingress modifier provides a means to modify (extend or reduce) the incoming data by either changing the data itself or by manipulating the descriptor.
The segment writer allocates a descriptor from the pool as commanded by the pool classifier and writes payload data to its destination using the descriptor's buffer index.
The multicast stage provides a means for distributing data towards multiple destinations. It maintains a multicast table, which is addressed by a multicast group identifier contained in the received descriptor and delivers a list of queues. The received descriptor is cloned within the multicast stage to be enqueued into multiple output queues. The descriptor receives a reference count in order to manage the buffer pool.
The enqueue engine performs all operations required to store the descriptor into the selected queue and to make the queue visible for the scheduler.
The linked list control maintains linked lists of descriptors which represent data queues. It receives enqueue command from the enqueue engine and Dequeue commands from the scheduler or dequeue engine.
The dequeue engine is the scheduling execution unit. It determines the next queue to be serviced using the device's output FIFO status and configured priority information in form of transmit priority channels per output device. It is also capable to re-segment traffic by splitting a running data segment into smaller data segments or to maintain data unit consistency by scheduling all data segments of data unit before servicing another queue for the current transmit priority channel.
The segment reader uses the buffer index contained in the descriptor to read data from local or remote data storage.
The egress modifier uses the flow id and other tags from the descriptor to modify the payload data stream according to its configuration, e.g. by adding or removing data to or from the data stream or by manipulating the descriptor content.
The output interface distributes data segments to those output devices that have announced their readiness to receive more data (output FIFO not full). Devices are serviced using a weighted round robin algorithm with weights proportional to the desired line rate.
The local pool contains software-prepared indices to buffers in the local data storage. Those buffer indices are the core attribute of descriptors.
The remote pool contains software-prepared indices to buffers in the remote data storage. There may be multiple remote pools for multiple remote data storages.
The interfaces between the pipeline and the devices ETHIP, TCPS, TCPR, CPU L, TEA are the same regarding their syntax but are protocol-specific regarding their semantic.
In the following two embodiments of the present invention are described on an exemplary basis. The first embodiment shown in
In PATH 1 the data processor multiplexes CTOs and DTOs and terminates the XCP protocol's transport layer containing the XCP packet counter which is common for CTOs and DTOs.
The stateful TCP protocol layer is terminated in the TCP sender (TCPS) device which co-operates with the TCP receiver (TCPR) device which delivers acknowledgements to the TCP sender.
In PATH 2 the data processor multiplexes TCP segments and Ethernet frames from the local CPU's software stack towards the common IP network and Ethernet link layer.
The ETHIP device finally terminates the IP network layer and the Ethernet link layer.
It is noted that the downlink TCP conversation co-exists with the uplink conversation and passes the data processor also twice. Hence, for a full XCPonTCP session there are four co-existing paths inside the data processing pipeline.
The second embodiment shown in
The TEA device delivers DTOs using a dedicated receive channel for the RTIO. Since the signal processing unit (simulation node) is a remote instance (device CPU-R), the data processor allocates descriptors from the remote pool carrying buffer indices into the remote data storage of the simulation node and pushes data early into the remote memory.
The data processor queue operates as triple buffer which re-assembles signal groups from data transfer objects split into data segments. Once a signal group sample is complete, an interrupt is asserted to the simulation node which then reads address information from the triple buffer. The corresponding data has already arrived and may even be already available in the simulation node's cache memory.
An exemplary selection of some of the key features and key ideas of the present invention and of some of the advantages associated therewith are listed hereinafter:
Number | Date | Country | Kind |
---|---|---|---|
12196331.8 | Dec 2012 | EP | regional |