The present disclosure relates generally to a method and apparatus for distributing digital signal processing code to one or more target devices.
Systems which perform digital signal processing (DSP) processes on an input data stream are often required to operate in a real-time context. In such a scenario, a continuous stream of input samples is required to be processed to produce a constant stream of output samples without an increase in processing delay over time.
Existing solutions for writing and distributing DSP code, for example in audio, include preparing DSP code which may be executed at one of various hardware platforms, each of which may have different device characteristics. This therefore requires that all possible platforms are known at build time so that compatible code can be generated. It is often impossible to know whether DSP code will work on a particular future target device; ensuring compatibility with multiple target machines requires complex toolchains be built and this inherently slows down the development process.
Examples of the present disclosure will now be explained with reference to the accompanying drawings in which:
Throughout the description and the drawings, like reference numerals refer to like parts.
A method of distributing digital signal processing (DSP) code is described herein. The method comprises providing DSP code to an application programming interface (API), translating the DSP code into an intermediate representation (IR) which is a directed graph (di-graph) representation of the DSP code comprising a plurality of nodes and connections. Each node of the di-graph contains a DSP program for performing a respective DSP task. The method further comprises detecting a target device which has advertised its ability to execute the IR, and transmitting the IR to the target device for execution.
A system is described, comprising a user device and a target device, the system being arranged to perform the method.
There are many examples of systems on which a user may wish to execute DSP source code in order to produce an output. The way in which the DSP code is designed and distributed depends, in a known context, on the requirements of the system on which the code is to be executed. Therefore, the code must be made as robust as possible in order that errors do not occur at runtime, and this leads to poor or unreliable performance. In this way, present methods of distributing DSP code attempt to mitigate the effects of variation in target devices which are often unknown to the producer of the code.
The processing of the DSP code can, however, be modified from the known methods by executing the code by having an API create an intermediate representation (IR) of the code.
An API as employed in the systems and methods described herein communicates via a bespoke communication protocol. The bespoke communication protocol is used by the API to determine whether and what compatible target devices exist for the execution of the IR code created in the API. To be compatible with the IR code, a target device must comprise means for executing the IR code to produce the DSP output. In practice, a target device may comprise software instructions that allow the input/output and processing resources of the device to be targeted by a client application. The means for executing the IR code may be a JIT compiler. The target device must be arranged to utilise the bespoke API protocol in order to be recognised by the API and communicate with the API.
The API may detect devices by detecting connection of a target device to a host, for example via connection of a target device into a USB connection terminal, or other direct connection, on the host. In this case, a device driver on the host computer device will detect the target device directly, using standard methods. If a target device is a network device, device discovery may be via mDNS protocol.
The API may communicate the IR to a target device using the bespoke communication protocol, and opens up one or more communication channels for the streaming of continuous digital sample data and control event data to the target device while the IR code is being executed at the target device.
The DSP source code which is to be processed, distributed and executed may be developed at a user application 110 provided on user device 100. The DSP source code may relate to any signal processing code, or may specifically relate to audio DSP. The user application 110 provides the DSP code to an application programme interface (API) 120 in a first programming language. The first programming language may be a special-purpose language for signal processing applications only. The API 120 comprises a compiler 122 and optionally a just in time (JIT) engine 124.
The compiler 122 is arranged to translate the DSP source code in the first language into an intermediate representation (IR) of the source code. The IR may be described as a second language, and may itself be DSP code.
The IR may comprise a directed graph, di-graph, representation of the DSP source code in the first language. The IR di-graph comprises a series of nodes and connections which define one or more program paths which determine how data input to the di-graph is treated. The IR represents a DSP code via a series of data threads or data flows.
Once the compiler 122 has translated the DSP into the IR di-graph, the API attempts to detect whether a device which is compatible with the first language, or compatible with the IR. That is to say, the API matches the requirements of the IR with that of the found devices. Requirements compared may include one or more of device inputs, device outputs, midi support, available memory and stack space requirements.
The API may determine that no compatible device is connected. In this case, the API calls JIT engine 124, if the best available device is in-process rendering. Where such a determination is made, the compiler passes the IR to the JIT engine 124 to execute the code internally on the computer 100.
The JIT engine 124 executes the code, and is arranged to operate in the second language, that is the language of the IR. The DSP output of the JIT engine may then be passed on to any output devices 150 which are connected to the user device.
The JIT engine may be an LLVM-compatible compiler.
The first language in which the DSP code is written may support DSP specific features, including float and fixed point calculation, accelerated fast-Fourier transforms (FFTs), etc. The first language may further comprise vector primitives and multi-processing support. The first language does not require platform specific requirements and extensions such as threading or synchronisation primitives, system calls, and blocking operations.
The user device 100, in
An IR-aware device driver is one that natively supports compilation and running of the IR code. When a program using the IR runs on a host without the audio driver supporting the IR, the API must compile and execute the code itself. With an IR aware device driver, the API will pass the IR code to the driver for compilation and execution. This has the benefit of avoiding the need for a context switch into the user application to generate audio data, the driver having the means to generate the audio buffer using the provided IR code.
The API 220 is arranged to communicate with the driver 230 so as to transmit control messages along with the IR itself. The control messages are used to control and monitor the code while it is running. These will be described further below.
Driver 230 comprises a JIT engine 234 arranged to execute the IR code in real-time to produce any DSP output resulting from executing the IR, and pass it to output device 250. Driver 230 is arranged to communicate with the API to deliver performance and error messages about the running code. Since IR-aware driver 230 is specifically designed to accept the IR code, it can execute it with greater performance and latency than possible in the system of
API 320, may request and receive device identification information from DSP 340, and once it detects that DSP device 340 is present in the system, through one or more of the methods outlined in the foregoing, it establishes a communication channel. API 320 transmits the IR code to DSP 340, whereupon the IR code is executed by JIT engine 344. The JIT engine 344 executes the IR code on DSP device 340 to produce the DSP output, which can then be transmitted to any playout or output device 350, such as a display or speaker.
In the systems shown in
The IR may be a secure language. This secure language may be passed to the kernel without the need for “sandboxing”. By removing the sandbox, context switches into the code may be faster, reducing latency, and also introducing less overhead, which improves the power efficiency of the processor running the IR code.
The API 420 then makes use of a network connection, to provide the DSP code to the external target device 450. The external target device may be provided with specific DSP processing capability arranged to operate in the first language. Target device 450 may comprise a specific audio DSP device 440, which may comprise a JIT engine 454 for compiling the DSP code at the device at run-time.
In the system shown in
In the system of
Where system 400 is employed in an audio application, the external device may effectively reduce the system's overall latency since the network does not need to carry audio stream data. The network connection is only required for control messages while the code is running on the device.
It can be seen that the systems outlined in
The API 120, 220, 320, 420, monitors network and other connections to discover target devices, and sends the DSP code or the IR code to be executed at the external device, in dependence on the characteristics of the target device. The API 120, 220, 320, 420, may also comprise its own proprietary JIT engine for compiling the DSP code and executing the IR in the case that no target device is found.
A target device may be physically located in the same device (100, 200) as the API, and therefore the communication channel for communication with the API may be wired or internal to the computer device. The target device may be external and therefore the IR code and control data may be transmitted to the target device through other connection means, including over a Wi-Fi connection, a Bluetooth® connection, via a USB connector or otherwise. In an audio DSP example, the target device communicates to the API that it comprises an audio interface compatible with the IR code.
The API may therefore be effectively arranged to offload the DSP code to the best device having a compatible audio interface. This means that any application code written in the first language and received at the API may be offloaded by default, and the DSP code in the first language is therefore device agnostic. It may be compiled and run locally on the user device 100, 200, or it may be transmitted to an external device for execution there.
The above description relates to 4 possible alternative arrangements and it will be appreciated that the features of any one embodiment may be present in the others, where appropriate.
Communication between the API of the systems and methods described and the compatible target device is bi-directional. Examples of data communication which can flow along communication link in an audio DSP example are as follows:
(1) Streams of audio data, both inputs to and outputs from the nodes forming the graph of the executing code;
(2) Control events—these may be functions added to the DSP code which receive calls with potentially complex data messages at specific moments in time. The API message specifies not only the data, but the exact sample to apply the message to the running code, A typical use may be to submit midi messages, or parameter change (say a volume control). Control events can be generated and emitted by the running audio DSP code as well as received.
3) Resource requests—the audio-DSP code can depend on external data, such as samples. As part of the runtime events can be received to provide a given resource, and a response will include this data.
4) Control events can be submitted to control the running of the code, such as ‘start’, ‘pause’, ‘stop’.
5) Control events that can be received indicating a problem in the runtime. A typical example would be to indicate a problem with the target device, such as a buffer underrun (when audio cannot be processed quickly enough) and CPU utilisation information to indicate how loaded the device is.
The intermediate representation of the DSP source code will now be described. The IR is structured as a directed graph which comprises processing nodes and graph connections between the nodes. The IR created by the API specifies a graph, and therefore the implied data flow within the graph, but the di-graph need not specify an evaluation order or any parallelism requirements.
Each node of the di-graph may be a processor node or a graph node. A node may comprise an input and an output, i.e. sample data may be provided to a node at its input, the data may be acted upon by the node, and the node will then output the processed sample data.
A processor node may comprise a set of low level instructions. The processor node is effectively a program which, when executed on a target device, performs at least the following tasks: continuously reading from the node's inputs (if required); performing some arbitrary processing on this data, and writing data to the node's outputs.
A graph node may be a nested program also in the IR language. Graphs define a collection of constituent graphs and processors, and the connections between them. In addition, graph syntax allows additional properties to be attached to nested graphs and processors, such as oversampling rates, and stream interpolation strategies.
The simplest graph would be a single node, and in such a processor represented by the node receives some input, performs its processing on the input, and creates an output.
It will be appreciated by the skilled person that the graph shown in
The JIT engine is aware of its local context. This means that the JIT comprises information regarding the processor capabilities available for use at run-time. It will also be aware of the audio and midi hardware support on the platform, and hence what capabilities are exposed to connecting clients. this might include, for example, a number of available input and output channels, sample rates supported, and latency information.
The JIT engine may comprise a graph analysis module arranged to analyse the graph structure of the received IR, in order to determine an optimum way to execute the IR code.
JIT engine may further comprise a parallelisation module. Where multiple processor cores are available on the target device, the parallelisation module may be arranged to control load across cores when the code is to be executed, and to control the location of the executing code itself.
The parallelisation module may be arranged to identify that a plurality of data flows, or data paths, through the graph may be subject to parallelisation. The parallelisation module may be arranged to parallelise the data flows, to effectively separate various parts of the graph and assign the task of executing the particular data flows to a respective processor core or cores.
The JIT engine may further comprise a load analysis module arranged to perform load analysis on the cores which are available to it, and be adapted to modify the parallelisation of the graph data flows based upon the load analysis. In this way, the system is optimised for execution of the IR code, since it is executed at a location remote from the CPU of the user computing device and the API.
The graph analysis module may be arranged to determine that a graph is inherently serial, and thus impossible to parallelise. Upon such a determination, JIT engine may be arranged to split the execution of the graph across multiple cores. The splitting of the graph across multiple cores will provide a performance improvement for the running cores at the cost of a latency increase. The decision to split the graph may be automatically taken where it is determined that the resulting increase in latency is lower than a predetermined threshold. Where splitting a graph would result in an increase in latency that would exceed the predetermined threshold, the code may be implemented on a single core.
The fact that the target device characteristics are used at the JIT engine stage means that the DSP code received at the API need not contain any evaluation of the system on which the DSP code, or IR code, is to be executed.
Device characteristics may include any of number or processor cores, processor core speeds available, usable memory, number of inputs, number of outputs etc.
At step 610 the API transmits the IR code to the identified target device that has advertised its ability to execute the IR code, in order that the code may be run at the target device.
The target device may then execute the IR code to produce the DSP output. A communication channel may be established between the API and the target device prior to the transmission of the IR code. When the IR code is being executed at the target device, control data may be sent to from the API to the target device, as described above.
It will be appreciated that any target device compatible with the IR, and adapted to execute the IR code, may be arranged to automatically or periodically advertise its presence to the API, such that step 606 may be optionally not required.
A further method is described in
At step 706 the target device receives IR code from the API. The IR code is in the form of a di-graph of nodes and connections, the nodes and connections defining one or more code paths or threads through the graph which define DSP functions which are to be performed on input data. The target device comprises a JIT engine operable to execute the IR code.
At step 708 a graph analysis module of the target device analyses the IR code to determine the data threads present in the IR graph.
Optionally, at step 710, a parallelisation module determines that one or more threads may be run in parallel across different processor cores present in the target device. Optionally, at step 712, the graph analysis module determines that a graph or a portion of the graph is inherently serial and incapable of parallelisation. Graph analysis module may then determine that the inherently serial graph or portion may be divided for execution across two or more processors. This determination includes calculating a processing overhead which results from splitting the code to be run on separate processor cores. If this processing overhead is less than a predetermined threshold, the JIT engine may execute the IR code serially, with portions being run on separate cores.
The target device executes the code at 714, and may pass the output to an output device, such as, in the case of audio DSP, a loudspeaker.
Target device may be any device adapted to process DSP code. It may form part the user device, and be an audio driver. Alternatively it may be a DSP processor specially adapted to operate on IR code created by the API. The DSP processor may be local or on a remote device. A remote device might be a loudspeaker having local capability to process the IR code. Remote device may be a pair of Bluetooth® enabled headphones and the communication channel may be established via a Bluetooth® connection.
The above systems describe various discrete options for an arrangement of the features of the host computer device, API arranged to receive source code, and a JIT engine shown located at various locations from within the API itself, to being located on an external hardware device. The API of the systems disclosed herein is arranged to detect the one or more devices which may be able to run the IR code it generates. It may be that a system being a combination of any or all of those shown in
The above systems and methods describe an approach for distributing DSP code has the advantage that network bandwidth requirements may be reduced when running DSP code.
The described approach has the advantage that system latency may be reduced in a context where audio DSP is being processed.
The described approach has the advantage that DSP code may be made more versatile than in existing systems and approaches. The described approach improves particularly on existing systems for processing audio DSP code and rendering audio, providing a method which allows for a user to create a DSP code which need not be modified to suit each individual and possible hardware set-up on which it might be eventually rendered. Rather, not only can the DSP processing take place on the most appropriate device thanks to the detecting of devices and selection by the API, that processing can be optimised thanks to the nature of the IR created by the API, as described above.
The computer apparatus 800 comprises various data processing resources such as a processor 802 coupled to a central bus structure. Also connected to the bus structure are further data processing resources such as memory 804. A display adapter 806 connects a display device 808 to the bus structure. One or more user-input device adapters 810 connect a user-input device 812, such as the keys or other input mechanisms of the present disclosure to the bus structure. One or more communications adapters 814 are also connected to the bus structure to provide connections to other computer systems 800 and other networks.
In operation, the processor 802 of computer system 800 executes a computer program comprising computer-executable instructions that may be stored in memory 804. When executed, the computer-executable instructions may cause the computer system 800 to perform one or more of the methods described herein. The results of the processing performed may be displayed to a user via the display adapter 806 and display device 808. User inputs for controlling the operation of the computer system 800 may be received via the user-input device adapters 810 from the user-input devices 812.
It will be apparent that some features of computer system 800 shown in
In the foregoing, the singular terms “a” and “an” should not be taken to mean “one and only one”. Rather, they should be taken to mean “at least one” or “one or more” unless stated otherwise. The word “comprising” and its derivatives including “comprises” and “comprise” include each of the stated features but does not exclude the inclusion of one or more further features.
Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the disclosed concepts, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the disclosed concepts.
The approaches described herein may be embodied on a computer-readable medium, which may be a non-transitory computer-readable medium. The computer-readable medium carrying computer-readable instructions arranged for execution upon a processor so as to make the processor carry out any or all of the methods described herein.
The term “computer-readable medium” as used herein refers to any medium that stores data and/or instructions for causing a processor to operate in a specific manner. Such storage medium may comprise non-volatile media and/or volatile media. Non-volatile media may include, for example, optical or magnetic disks. Volatile media may include dynamic memory. Exemplary forms of storage medium include, a floppy disk, a flexible disk, a hard disk, a solid state drive, a magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with one or more patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, NVRAM, and any other memory chip or cartridge.