This disclosure relates generally to method and systems for processing input signals using machine learning for neural activation.
Cochlear implants (CIs) are arguably the most successful neural implant with nearly 40 years of innovation and over 300,000 users worldwide. CIs have been shown to significantly improve speech recognition and comprehension in children and adult users. However, they have had considerably less success in being able to restore pitch of sound. This poses major issue for CI users who speak tonal languages, such as Mandarin, who have difficulties with speech comprehension, and a lesser but significant quality of life deficiency for music appreciation.
In the cochlea, fine-timing of cochlear neuron firing is associated with pitch perception. In other applications, fine-timing depends on the particular neural system being considered. In cochlear afferent neurons, neurons will file action potentials at specific times as a stimulus is presented. Over numerous stimulations of sound, there will be some variability in which exactly the neurons fire. This variability is the degree of precision or fine-timing that is desired in cochlea implants. Pitch perception is important for tonal languages like Chinese and music appreciation. This has been a problem cochlear implants have attempted to address for over 20 years. In other neural implants, such as vestibular implants, lack of natural fine-timing has been associated with inaccurate restoration of eye movement.
Until recently, the focus of improving cochlear implants has been on preventing current spread from distorting perceived sound. Hardware innovations were implemented to minimize electrode distance from the modiolar wall to more directly target spiral ganglion neurons, and algorithms were modified to avoid electrical interference by ensuring no electrodes delivered current simultaneously; these improvements led to significant gains in fidelity of targeting neurons for spatial encoding of sound to give the percept in sound of a certain frequency. These improvements paired with the continuous interleaved sampling (CIS) strategy, modulation of amplitude of fixed-rate pulsatile stimulation to the envelope of sound, have led to highly accurate English speech comprehension in CI users.
Studies indicate that the inability to correctly convey pitch are the result of unrealistic CI-evoked timing of neural responses. Thirty-years of detailed studies produced a phenomenological model that reflects the full process of sound processing, including mechanical transduction and outer hair cell connectivity, and this model can produce fine timing for a cochlear neuron at any specified location along the basilar membrane.
However, this algorithm cannot process sound in real time even on a powerful desktop processor. It requires considerably longer to process the sound than the duration of the actual sound. Some studies have been performed on normal hearing subjects listening to computer generated tones using algorithms that deliver pulses with timing reflecting fine structure of sounds. These studies indicate that including timing information leads to better perception of tonal language (Mandarin).
While replicating exact timing of natural spiking has not been attempted per se, high-rate pulsatile stimulation was popularized as an improvement to CIS because it leads to more desynchronized, naturalistic neural responses. This stimulation strategy led to improved speech perception in noisy environments. However, a recent study showed that reducing the number of these high rate pulses by half while considering natural firing principles improves speech perception. A strategy with reduced number of pulses but more accurate fine timing could provide better perception.
According to examples of the present disclosure, a method for cochlear implant processing is disclosed. The method comprises receiving, at a receiver of a cochlear implant, an input natural sound pattern; processing, by a front-end processing algorithm, the input natural sound pattern to produce a target population firing pattern for a cochlea; and transforming, by a back-end processing algorithm, the target population firing pattern to a simulation pattern that induces a response with naturalistic timing.
Various additional features can be included with the method for cochlear implant processing including one or more of the following features. The front-end processing algorithm comprises a trained neural network. The trained neural network is a trained recurrent neural network. The trained recurrent neural network is trained to learn a sound wave-to-spiking relationship of a phenomenological model of the cochlea. The phenomenological model of the cochlea accounts for outer hair cell and inner hair cell contributions to firing, filtering effects, and non-linearities related to synaptic and axonal activation. The trained recurrent neural network is trained on a synthetic waveform data set and a speech command dataset. The trained recurrent neural network transforms sound pressure level (SPL) into spiking and firing rate over time for an auditory nerve fiber with low, medium, or high spontaneous firing.
According to examples of the present disclosure, a system for cochlear implant processing is disclosed. The system comprises a cochlear implant comprising a receiver that receives an input natural sound pattern; a front-end processing algorithm that processes the input natural sound pattern to produce a target population firing pattern for a cochlea; and a back-end processing algorithm that transforms the target population firing pattern to a simulation pattern that induces a response with naturalistic timing.
Various additional features can be included with the system for cochlear implant processing including one or more of the following features. The front-end processing algorithm comprises a trained neural network. The trained neural network is a trained recurrent neural network. The trained recurrent neural network is trained to learn a sound wave-to-spiking relationship of a phenomenological model of the cochlea. The phenomenological model of the cochlea accounts for outer hair cell and inner hair cell contributions to firing, filtering effects, and non-linearities related to synaptic and axonal activation. The trained recurrent neural network is trained on a synthetic waveform data set and a speech command dataset. The trained recurrent neural network transforms sound pressure level (SPL) into spiking and firing rate over time for an auditory nerve fiber with low, medium, or high spontaneous firing.
According to examples of the present disclosure, a method for neural implant processing is disclosed. The method comprises receiving, at a receiver of a neural implant, an input activation pattern; processing, by a front-end processing algorithm, the input activation pattern to produce a target population firing pattern for one or more neurons; and transforming, by a back-end processing algorithm, the target population firing pattern to a simulation pattern that induces a response with naturalistic timing.
Various additional features can be included with the method for neural implant processing including one or more of the following features. The neural implant comprises a cochlear implant, a vestibular implant, a retinal vision prostheses, a deep brain stimulator, or a spinal cord stimulator. The front-end processing algorithm comprises a trained neural network. The trained neural network is a trained recurrent neural network. The trained neural network is trained using clinical data, a phenomenological model, or both. The trained neural network comprises one or more convolution layers for retinal prosthesis analysis.
According to examples of the present disclosure, a system for neural implant processing is disclosed. The system comprises a neural implant comprising a receiver that receives an input activation pattern; a front-end processing algorithm that processes the input activation pattern to produce a target population firing pattern for one or more neurons and a back-end processing algorithm that transforms the target population firing pattern to a simulation pattern that induces a response with naturalistic timing.
Various additional features can be included with the system for neural implant processing including one or more of the following features. The neural implant comprises a cochlear implant, a vestibular implant, a retinal vision prostheses, a deep brain stimulator, or a spinal cord stimulator. The front-end processing algorithm comprises a trained neural network. The trained neural network is a trained recurrent neural network. The trained neural network is trained using clinical data, a phenomenological model, or both. The trained neural network comprises one or more convolution layers for retinal prosthesis analysis.
Various features of the embodiments can be more fully appreciated, as the same become better understood with reference to the following detailed description of the embodiments when considered in connection with the accompanying figures, in which:
Reference will now be made in detail to example implementations, illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. In the following description, reference is made to the accompanying drawings that form a part thereof, and in which is shown by way of illustration specific exemplary embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the invention. The following description is, therefore, merely exemplary.
Generally speaking, a neural implant and a method of using the neural implant are disclosed where a first processing component, for example a front-end processing system or algorithm, receives an input, for example a sound for a cochlear implant, and produces a target firing pattern and a second processing component, for example a back-end processing system or algorithm, uses the target firing pattern and maps that to a simulation pattern that produces a pulse rate to deliver the desired firing pattern. For example, the front-end system, algorithm, or method and the back-end system, algorithm, or method can be implemented in a number of neural implants for different use cases. In some examples, the front-end processing system or algorithm and the back-end processing system or algorithm can also be implemented into a single system or algorithm. There are already a variety of neural implant on the market or in late-stage trials, including cochlear implants, vestibular implants, retinal vision prostheses, deep brain stimulators, and spinal cord stimulators. These devices take an input signal and transmit it to target neural tissue to overcome a deficit using pulsatile electrical stimulation. During signal processing, the input signal is (1) converted to a target neural firing pattern in the local tissue and (2) the target neural firing pattern is transformed into a stimulation paradigm considering the number of implanted electrodes and their proximity to the tissue in that specific implant. Other input signals can include, but are not limited to, an image for retina stimulation, an electrical signal that can mitigate pain signals, or signals from other brain regions, such as the hippocampus. The outputs of the front end that are used as inputs to the back end can also include, but are not limited to, sound to firing rate over time for cochlear implants, spike timing of individual neurons, and local field potentials oscillations for epilepsy treatment. The electrical neuromodulation can include, but are not limited to, pulsatile stimulation, transcranial direct current stimulation, galvanic stimulation. The neuromodulation can also take the form of a magnetic stimulation.
As further disclosed herein, a neuroprosthetic is disclosed that can take an input (e.g. sound, image, motion) to neural population firing as used in current neural prostheses (cochlear implant, etc.), future neural prostheses, or an input from another neural system (e.g. neural firing from visual cortex and send to another neural population e.g. frontal cortex). The use of a neural network, as described herein, provides for a faster processing capability that increases its usefulness in current and further neural prostheses.
According to examples of the present disclosure, a front-end processing system and method is disclosed that can be applied to any neural implant design, such as, but not limited to, a cochlear implant, a deep brain stimulation implant, an ocular implant, or a vestibular implant. A validated computational model is used that predicts the response of any cochlear afferent to an arbitrary sound. This model takes too long to execute to be used in real-time devices (e.g. 0.5 s sound takes approximately 1.5 s to execute). As disclosed herein, a recurrent neural network (RNN) machine learning algorithm is trained to compute the same transformation in much faster than real-time (e.g. same 0.5 s sound takes only 5 ms to execute on the same processor—100×faster than real time). Faster than real time means that it can be used in a neural implant to process incoming information as it comes in rather than post-hoc. In some examples, the RNN has 32 outputs to establish the time resolution of the neural firing. In some examples of the cochlear implant example, the RNN provides the target firing rate as a function of time for a single “average neuron” located in the 400 Hz location of the cochlea. The “average” firing behavior is estimated by simulating the responses of 50 neurons located in that region and averaging them. In another example, firing behavior may be simulated for the three types of neurons in the cochlea (high, low, and medium spontaneous activity fibers).
As described further below for the cochlear implant, and in some examples, an existing phenomenological model (Zilany model) is used to train the neural network. However, other models and/or clinical data can be used to train the neural network. In the retinal implant example, a neural network can be trained using test data obtained from a retina explanted from an experimental animal where random light patterns are directed onto it and a record from the retinal ganglion cells obtained. Thus, in this example, the trained neural network can transform the light patterns to ganglion cell firings.
In the cochlea, fine-timing of cochlear neuron firing is associated with pitch perception. The level of precision of the fine-timing for the cochlea is on the order of about 5 to about 10 ms in the healthy fibers. In other neural implants, such as vestibular implants, lack of natural fine-timing has been associated with inaccurate restoration of eye movement. Other existing prostheses can benefit from stimulation paradigms that induce realistic fine-timing of spiking, as well. For other implants, such as brain implants (e.g. Neurolink), the same front-end can be used to transform brain signals from one neural population into brain signals in another neural population. In this case, the front-end is part of an implant, for example, to improve performance on cognitive tasks. As disclosed herein, only 100 training examples with inputs and outputs were required to get less than 4% error in predicting neural responses. If this applies across neural systems, datasets for training the machine learning algorithm could be collected in a clinic instead of with a phenomenological model to perform the same front-end transformation for different neural circuits. Hundreds of trials of responses to sound in individual patients being implanted can be performed to tune the algorithm specifically to the patient.
Because firing is stochastic, complex, and asynchronous, it is slow to compute. The phenomenological model used herein required about 1.5 seconds to compute the response of one cochlear neuron to about 0.5 seconds of sound. For multiple neurons, this model will then take about 1.5*N seconds. Meanwhile, the computation time with RNNs is not dependent on the number of neural responses being predicted. Instead, it will depend on the number of nodes the algorithm is trained to use when performing the prediction of cochlear neural responses. In this disclosure, a network with a core calculation of 500 nodes is shown that is well within real-time capabilities, and accurately predicting one neural response only required 25 nodes. If the computation time of the disclosed algorithm increased linearly with the number of neural responses predicted, this would allow computation of 20 distinct neural responses in real time. Currently, cochlear implants only use 20 electrodes now, so this is well within real-time needs for a cochlear implant. Similarly, other devices on market use less than 20 electrodes, making this technique suitable for those applications, as well. Additionally, because the computation time of a machine learning algorithm increases slower than linearly, it is possible to compute more than 20 neural responses in real-time.
In some instances, there are two steps to assessing improvements with this front-end design in a cochlear implant. The first is to scale up the neural network to predict 20 neural responses across the cochlea. The cochlea performs spatial encoding of sound, and electrodes are places throughout the cochlea in approximately equal spacing and therefore equal spacing in frequencies—125 Hz, 1170 Hz, 2217 Hz, . . . 20,000 Hz. Stimulating across these locations gives the sensation of sound perception presently. As disclosed herein, a model is disclosed that predicts responses across neurons tuned to these frequencies.
Despite being able to restore speech perception with 99% success rate, cochlear implants cannot successfully restore pitch perception or music appreciation. Studies suggest that if auditory neurons were activated with fine timing closer to that of natural responses pitch would be restored. Predicting the timing of cochlear responses requires detailed biophysical models of sound transmission, inner hair cell responses, and outer hair cell responses. Performing these calculations is computationally costly for real time cochlear implant stimulation. Instead, implants typically modulate pulse amplitude of fixed pulse rate stimulation with the band-limited envelopes of incoming sound. This method is known to produce unrealistic responses, even to simple step inputs. As disclosed herein, a machine learning algorithm is used to optimize the prediction of the desired firing patterns of the auditory afferents in response to sinusoidal and step modulation of pure tones. A trained network that comprises 25 GRU nodes can reproduce fine timing with 4.4 percent error on a test set of sines and steps. This trained network can also transfer learn and capture features of natural sounds that are not captured by standard CI algorithms. Additionally, for 0.5 second test inputs, the ML algorithm completed the sound to spike rate conversion in 300×less time than the phenomenological model. This calculation occurs at a real-time compatible rate of 1 ms for 1 second of spike timing prediction on an i9 microprocessor.
As disclosed herein, a front-end algorithm is used for cochlear implants that can transform any sound into the naturalistic fine timing of spikes for a fiber at real-time processing speeds using a machine learning (ML) approach, as shown in
The generation of training and test waveforms is now discussed. The data used to train and test the model were synthetically generated sine waves and steps in power of a 400 Hz sinusoid, as shown in
All inputs modulated a 400 Hz wave of the form:
s
base=sin(2πfprinct),fprinc=400 HZ (2)
Sinusoidal modulation was performed with Equation 3
S
sine
=A
stim(dm(1−Amod)sin(2πfmod t+ϕmod)+Amod)sbase (3)
where depth of modulation, dm=0.95, amplitude of modulation, Amod, frequency of modulation fmod, phase of modulation ϕmod and Astim, stimulus amplitude were varied in the range in Table 1:
Step modulation was performed with Equation 4, where Amod1 and Amod2, the amplitude of each step and the shift, tmod were varied in the range of Table 2:
S
step=(Amod1+(Amod2−Amod1)u(t−tmod))sbase (4)
Additionally, performance of the model was tested on individual spoken word recordings from the training set of the Speech Command dataset. During testing, the waveforms were converted into power by using the spectrogram function in MATLAB with a hamming window of length 512, as shown in
A phenomenological model of the human auditory periphery (referred to here as the Zilany model) was developed over the last 30 years to replicate healthy auditory response to perception of any sound. This model accounts for outer hair cell and inner hair cell contributions to firing, filtering effects, and non-linearities related to synaptic and axonal activation. The model transforms sound pressure level (SPL) into spiking and firing rate over time for an auditory nerve fiber with low, medium, or high spontaneous firing, as shown in
Because this is an inherently “forwards-only” problem due to signal processing progressing from the eardrum to the ganglion cell firing pattern, a gated recurrent unit (GRU) can be used, which incorporates the memory of past network states to generate new inputs as the core of the network design for the task. This can account for effects of history, such as past spikes affecting proceeding spikes due to refractoriness. There were 32 firing rate values for every spectrogram time bin, so a fully connected layer was used to transform the outputs of the GRU layer into 32 outputs. This also allowed additional calculations to be made to adjust firing rate predictions within several milliseconds of one another that occur within one spectral bin. During this study, model size was assessed and used GRUs with 25, 50, 100, 200, 500, and 1000 nodes. A fully connected layer is used that reduced the GRU nodes to 32 outputs, as shown in
To assess performance on the test and training data, the RMS was also used as a measure of error. During testing, the model was assessed on 100 waveforms (49 sinewaves). The model as also assessed for transfer learning to 100 speech command recordings. RMS was compared to the RMS between 10 predictions of the same response to sound with the Zilany model and compared to the RMS of the firing rate over time. The latter was transformed into a percent error.
Statistical testing between models and performance was computed with a paired t-test for comparing model size performance and a two-sided t-test when comparing performance on sinusoidal versus step modulation. Several performance metrics were used in addition to the prediction error. In addition to determining the minimum number of nodes necessary to perform the task of predicting responses to sinusoidal and step modulation, the computation time for the model was also assessed for the Zilany 2014 model and the neural network on simulation a 50-fiber response on the same computer in Python. Additionally, all size models were trained with between 500 and 250,000 epochs to determine the number of epochs that would be required to properly train each model and whether the loss was limited by network size. Run-time was evaluated on one CPU from a 2.4 GHz 8-Core Intel Core i9 Processor on a 2019 15-inch MacBook Pro when the trained RNN and Zilany model perform a prediction in response to the same 0.5 second sound 10 times. The ratio of speed was used as a metric in the results.
Based on previously published studies, cochlear implants successfully restore speech perception in non-tonal languages. However, implant users have difficulty with tonal languages and music appreciation, due to lack of accurate pitch perception. Studies indicate restoration of pitch perception requires fine timing of cochlear neuron firing. Standard implant algorithms, such as the CIS algorithm do not attempt to replicate this fine timing because it is unclear how to do this properly without a detailed computational model such as the Zilany model, which is computationally intensive to perform in real time.
The performance of the model was tested on synthetic sounds. The training time and network size required to create an RNN that can perform this task was examined. The RNN contains a GRU layer and fully connected layer. The networks can be trained with as few as 25 GRU nodes and as many as 1000 GRU nodes for up to 250,000 epochs. The smaller models trained and reached the lowest training errors after fewer epochs (
This was compared to the minimum achievable error, the rms between multiple simulations of the natural response to a sound with the Zilany model, which reaches a minimum of 3.0±0.2 (SEM) sps. Compared to the rms of the signal, the models on average have a test error of 4.20±0.03%.
The models were trained to infer responses to both sinusoidal and step modulation (
The performance of the model was also tested on natural sounds. The relative difficulty of learning responses was evaluated against natural sounds by testing the performance of the same model on natural response of the same 400 Hz auditory fiber bundle to audio recordings from the speech command.
It was found that rms error increases to about 80 sps across models, and smaller models outperform the largest, 1000-node RNNs significantly. The 50-node RNN outperforms all other models significantly but has similar performance to the 500-node RNN. The minimum percent error across models averaged 46.1±0.76%. It is speculated that the large model has not converged and learned the rules as accurately as the smaller models have with fewer weights and biases (
The potential of these RNNs to be used in a real-time implementation were evaluated. The Zilany model required 1.47±0.01 seconds to predict the neural response of a single fiber to a 0.5 second sound. The 25-node network required 4.73±0.02 milliseconds. This improvement is plotted as a ratio of time to perform the task with the Zilany model over the time to perform the task with the RNN. The RNN was 335.4±4.54 times faster with a 25-node network trained with 50,000 epochs. The performance was evaluated with a minimum 50,000 epochs, because the performance of the RNNs converged by 50,000 epochs across models on the synthetic data. So, models of these size produced reasonable predictions of responses. The number of training epochs did not significantly influence run time for most models, as shown in
The computation speed was evaluated on a 2019 MacBook Pro with an Intel Core i9 with 2.4 GHz Processor (1909980HK). These processors are clocked at 478 GFLOPS. The modern microcontrollers in an Apple iPhone XS perform at 487.5 GFLOPS, indicating this RNN should be able to perform at similar speed with modern cell phone microprocessor technology. These results show creating a real-time front-end for a cochlear implant that is used to generate realistic target responses. To use this disclosed front-end to the benefit of patients, algorithms also use accurate transformation of a predicted firing pattern to a stimulation pattern that can evoke this firing patter in actual neurons. These algorithms also incorporate complexities of how stimulation parameters, such as pulse amplitude and rate affect induced firing rate.
The back-end algorithm and/or system can be modeled based on one or more of the following considerations.
Experimental Stimulation Paradigms
Pulses were applied using an isolated pulse stimulator (A-M systems), which delivered biphasic pulses with 150 μs stimulation phases and no interphase gap. The same single unit stimulation paradigm used in physiology experiments was replicated in silico on the biophysical model. During each pulse block, 1 s pulse trains were delivered at rates from 25 to 300 pps. These rates were chosen because they fall within the physiological range of vestibular afferent firing. Fixed-rate pulse blocks were delivered at a fixed pulse amplitude in order from lowest to highest pulse rate. There were typically four repetitions of each pulse rate block before increasing the pulse rate. For several there were only three repetitions. For most pulse blocks there was a 500 ms or 1 s window between blocks. The maximum current was set to 80% of the minimum value that caused visible facial muscle activation in response to pulses delivered at 300 pps. Pulse rate experiments were performed with a fixed current amplitude and repeated at amplitudes from 25%, 50%, 75%, 87.5%, and 100% of maximum amplitude.
Data Analysis
The assay of neural responses is firing rate in response to blocks of pulses. Therefore, induced firing rate was measured as the number of APs that occurred from the onset of the first pulse in a pulse block to the offset of the last pulse in the block divided by the length of that time window. There were noticeable experimental differences in spontaneous activity before and after pulse blocks. Therefore, spontaneous activity was measured as the firing rate in the window preceding a pulse train, excluding the first 50 ms, if the window occurs after another pulse block. This was done to avoid remaining effects from the previous pulse train. Many stimulation paradigms assume a linear relation between pulse rate and firing rate. To test this hypothesis, data from all repetitions of pulse rate blocks at a single current amplitude were fit with the line best fit with a y-intercept equal to the average spontaneous rate (S): fr=m pr+S. The slopes of best fit are compared to the unity line, the slope if each pulses produced one AP.
Data Fitting with Equation fr=ϕ(pr,I,S)
Simulations revealed predictable, smooth transitions between effects of pulses with the change in pulse amplitude and spontaneous rate. These effects were captured through an equation fr=ϕ(pr,I,S) (Eq.5) explained further below that takes measurable and controllable values as inputs: pulse rate pr, pulse amplitude I, and spontaneous rate S. The equation captures effects from facilitation to blocking at 0 to 350 μA as fitted to a single irregular afferent simulation. Only the magnitude of parameters changes as a function of I and S to produce these effects.
The experimental data were fit by using the measured values of pr and S and optimizing on predicted current, Ipred. Ipred was chosen as the value that minimized the two-dimensional rms between the data at 25 to 300 pps and predictions. The two-dimensional error was taken between a prediction frpred=ϕ([1:300] pps,Ipred,S) and the 8 experimental values. The (x,y) error for each of the 8 sampled pulse rates (k) was taken as the minimum distance between the closest point on frpred and the kth pulse rate-firing combination (ρk,frk):
The total error for Ipred was the sum of the error at each of the 8 points.
The only restrictions on Ipred were the value that minimized the rms error described above and that the value was weighted by the size of Ipred:
The second rule was enforced because many pr-fr combinations are the same at high and low I values. Also, between 100-200 μA, the relationship is approximately static. With the sparse sampling of pulse rate, this weighting helped assure Ipred was not overestimated. Note, all model errors reported on the figure are standard rms measured at the pulse rates tested in the experimental data.
Biophysical Modeling of Vestibular Afferents
Vestibular afferents were simulated using a biophysical model to study the effects of electrical stimulation on vestibular afferents Past work from the lab showed this model can replicate experimental firing rates and changes in firing rate with pulsatile and direct current stimulation.
An adapted version of the Hight and Kalluri model can be used for the modeling. In brief, Hight & Kalluri showed that vestibular firing can be simulated accurately by assuming cells have the same shape and size. Type I and Type II vestibular afferents are modeled as differing only in channel expression and EPSC magnitude (K). Spontaneous rate can be set by changing the average inter-EPSC arrival interval (m).
The membrane potential (V) varies as:
where in addition to the current from each channel, membrane potential is influenced by the EPSCs arriving at axon (Iepsc) and the injected current (Istim). The system of equations represents each cell as a single node with overall surface area, S=1.1·10−5 cm2 and capacitance Cm=0.9 mF/cm2. Each channel is voltage-gated and dependent on a conductance, an open state and a closed state variable: Na (gNa, m, h), KH (gKH, n, p), KL (gKL, w, z). The electrode was stimulated at 2 mm from the simulated afferent which causes the firing threshold around 56 μA for a typical neuron.
The simulations used to parameterize the equations were performed on a single model of an irregular vestibular afferent. It was found that conductance values of gNa=13 mS/cm2, gKH=2.8 mS/cm2, and gKL=1 mS/cm2 and EPSCs with K=1 and m=1.3 ms matched previously published experimental findings at pulse rates from 25 to 300 pps. These conductance values were used for all irregular afferent simulations.
For studies with the effects of spontaneous rates on firing. The channel conductance values were kept the same but m was set to 0.25, 0.5, 1, 2, 4, and 8. To model the axon with no spontaneous activity, EPSCs was not added.
Additionally, the effect of firing regularity was assessed on induced firing rate. The irregular neuron (fr=36.6±0.9 sps, CV=0.57, where CV is Coefficient of Variance), was modeled with K=1, and m=1.65 ms. A conductance matched regular neuron (fr=33.8±0.4 sps, CV=0.09) was also modeled with gNa=13 mS/cm2, gKH=2.8 mS/cm2, and gKL, =0 mS/cm2, K=0.025, and m=0.09 ms.
The effects of conductance values on the pr-fr mapping with current amplitude was tested. Conductance values were used that produced firing rates similar to those observed in a previous in vitro experiment with and without exposure to DC current: 24.50 gNa=7.8 mS/cm2, gKH=11.2 mS/cm2, and gKL=1.1 mS/cm2, K=1. m was again varied from 0.25 to 8 ms.
No evidence of pulsatile stimulation affecting the hair cell was found, so all DC hair cell related effects (adaptation, the non-quantal effect, etc.) were not activate in these simulations. The simulation is run using the Euler method to update all variables through each of the channels.
Simulated Pulsatile Stimulation Experiments
The experiment conducted was repeated in silico with finer sampling of current amplitudes and pulse rates. Electrodes were simulated as being placed 2 mm from the vestibular afferent axon. In addition to the pulse rates used experimentally, pulse rates from 1 to 300 pps in steps of 1 pps were delivered for 1 second. Five repetitions were performed for each current amplitude, spontaneous rate, and pulse rate combination. Pulse amplitude was varied from 0 to 360 μA in steps of 12 μA and used to parameterize equations values. Interpolation between these values was used to create a smooth function for predicting induced firing rates.
This combination of experiments was repeated on the irregular neuron, regular neuron, and low conduction/in vitro neuron. It was also repeated for all values of μ to map how these effects change with different levels of spontaneous activity.
Jitter Experiment
To assess the effect of jittered pulse delivery time on induced firing rate, the same simulation was performed. Instead of delivering perfectly timed pulses, a gaussian noise term with standard deviation of 1 ms or 2 ms is added to the exact pulse timing to simulate delay or advancement in the delivery of regularly scheduled pulses (
Pulse Rate and Amplitude Modulation
To test how these rules apply to sinusoidal modulation, as used in various prosthetic algorithms, PRM and PAM was simulated within a common range for vestibular prostheses. Pulse rates were modulated by steps of 20 or 50 pps/μA around values where non-linearities or linearities were observed in the one-second fixed pulse rate experiments (
Predictive Equation
The observed effects at the axon were transformed into equations that depended on measurable or controllable variables: pulse amplitude (I) delivered from the electrode, pulse rate, as inter-pulse interval (ρ), and spontaneous rate (S). Equations were not optimized for real-time usage. The goal was to create equations that captured the observed effects and their mechanisms without dependence on time. The equations were made to reflect mechanism. Variables were shown to vary with I and S. They were best fit to each I-S combination of the simulated afferent. All variables that were best fit are marked with “′” in the following equations.
Pulse-Pulse Interactions
Pulse-Pulse Block/Partial Block (PPB)
The most substantial effect of pulses in the absence of EPSCs is pulses blocking following pulses from becoming APs. The visible refractory period after a pulse is driven by an underlying change in channel dynamics that leads to a zone where all pulses are blocked followed by a zone of exponentially decreasing blocking effects until pulses have zero probability of being blocked (
where the length of the full block zone tb(I) changes with I (
After tb until tpb(I), the subsequent pulse is not blocked with certainty, but the probability of being blocked decreases with time. This zone is called the partial elimination (PE) zone. tpb changes with I similarly to tb (
As ρ decreases such that n>1, p′pb becomes an increasingly larger (
PE is implemented such that between
and tpb(n), the term takes values that linearly decrease from 1 to 0 as the pulse rate approaches
This term adds to the ceil( ) term so that the transition is not abrupt from fr=pr/n to fr=pr/(n+1) but instead has an exponential decay from one harmonic line to the next (
At midrange I, the scaling effect with I causes lower prs to transition to the next subharmonic (fr=pr/(n+1)) but it never causes a harmonic to skip to over n+1. As I increases, such as at I=192 μA, this leads to a sharper transition from one line to the next (
At very high I (I>204 μA in the mapping) suppression effects occurred that caused axonal firing to shut down in longer, unnatural dynamic loops. Two main effects at high amplitudes (Pulse Dynamic Loop (PDL) and Suppression of Future Pulses (SFP)) were implemented as variations of the PE function.
Pulse Dynamic Loop (PDL)
At the transition from n=1 to n=2, with no spontaneous activity, it can be seen that the transition overshoots and comes returns to n=2 during the PE time (
Suppression of Future Pulses (SFP)
Additionally, instead of the pulses within the immediate refractory period being blocked, the delivery of an additional pulse pushes axonal dynamics to a state of full suppression after n=2. Again, it can be seen that this transition is an exponential decay to fr=0 as opposed to a sharp drop (
When spontaneous activity is included, it can be seen that, with larger spontaneous activity, the slope of this decay is exceedingly slower. This effect is enforced by β′(S), which increases from 1 to 250, when S=130 sps.
With these terms only, the induced firing rate simplifies to:
The min term assures smooth transition to fr=0 without the bends that would typically occur at pulse rates that are multiples of 1/t′b.
Pulse-Spontaneous Interactions
Pulse-Spontaneous Additive (PSA), Pulse-Spontaneous Block (PSB), and Spontaneous-Pulse Block (SPB).
Regularly timed pulses break up time into windows of length ρ in which relative affects of pulses on spontaneous activity and vice versa can be approximated. This effect with mod(ts, φ is numerically implemented. This shows that spontaneous spikes are distributed uniformly between 0 and ρ ms after a pulse. So, pulses affect some fraction of the evenly distributed spontaneous activity, and all pulses are affected to some level by the ongoing spontaneous activity. As pr increases, the time between pulses is smaller but the probability of a spontaneous AP is equally likely, so a greater portion of spontaneous activity is affected per pulses. The probability of spontaneous activity affected by pulses is called, pPS. The probability of pulses being affected by spontaneous activity and producing APs is pSP. A simple approximation of these interactions would be:
where tPS is the time after a pulse when spontaneous APs are blocked.
where tSP is the time after a spontaneous AP when pulses are blocked, and T is the total length of the time window. tSP and therefore pSP only depends on I, as spontaneous rate will remain the same for a neuron. With this approximation, tPS will only depend on I and pPS will increase linearly with the pulse rate until reaching 1. To simplify fitting, p′PS and p′SP are fitted directly depending on I. Spontaneous rate did not have large effects on these variables. Implementation would likely not be largely affected if these variables were kept the same for any spontaneous rate.
These pulse-spontaneous interactions took two forms. At low amplitudes, they worked together to produce facilitation of pulses at amplitudes that a silent axon could not produce activity (
Spontaneous-Pulse Full Block Effects
Between about 50 and 290 μA, spontaneous activity blocks pulses to a more limited degree. These effects are likely due to spontaneous APs and not EPSCs, as the quantities directly relate to S. The largest effect is that, as S increases, the same pr-fr relationships occur as in silent neurons but reduced by S (
Facilitation ends around 50 μA, but this threshold current varies with S, so it is called Ifac(S) and include facilitation as:
The max term assures each term does not go negative if blocking effects exceed S or pr.
In summary, a real-time front-end for a CI could be created using an RNN. The RNN could be of a size of less than a 100-node GRU layer and a fully connected layer and perform the transformation of natural sound to sinusoidal and step neural response prediction with less than 5 percent error. Additionally, the relationships between sound and predicted firing pattern on this simplified task transfers to natural sound and captures a number of non-linearities in the transformation of sound into firing rate encoding by the cochlea. These RNNs can run over 300 times faster than the only phenomenological model that exists and can accurately produce natural cochlear responses to sound. These RNNs can run at real-time speeds on the microprocessors that exist in modern cell-phone technology. This approach produces neural implants that generate neural responses with the fine timing of natural population responses in the body and can be used to help in restoring pitch perception in CI users.
The computer device 1200 can also include one or more network interfaces 1208 for communicating via one or more networks, such as Ethernet adapters, wireless transceivers, or serial network components, for communicating over wired or wireless media using protocols. Computer device 1200 can also include one or more storage devices 1210 of varying physical dimensions and storage capacities, such as flash drives, hard drives, random access memory, etc., for storing data, such as images, files, and program instructions for execution by one or more processors 1202.
Additionally, computer device 1200 can include one or more software programs 1212 that enable the functionality described above. One or more software programs 1212 can include instructions that cause one or more processors 1202 to perform the processes, functions, and operations described herein, for example, with respect to the processes of
In implementations, computer device 1200 can communicate with other devices via network 1214. The other devices can be any types of devices as described above. Network 1214 can be any type of network, such as a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, and any combination thereof. Network 1214 can support communications using any of a variety of commercially-available protocols, such as TCP/IP, UDP, OSI, FTP, UPnP, NFS, CIFS, AppleTalk, and the like. Network 1214 can be, for example, a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, and any combination thereof.
Computer device 1200 can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In some implementations, information can reside in a storage-area network (“SAN”) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers, or other network devices may be stored locally and/or remotely, as appropriate.
In implementations, the components of computer device 1200 as described above need not be enclosed within a single enclosure or even located in close proximity to one another. Those skilled in the art will appreciate that the above-described componentry are examples only, as computer device 1200 can include any type of hardware componentry, including any necessary accompanying firmware or software, for performing the disclosed implementations. Computer device 1200 can also be implemented in part or in whole by electronic circuit components or processors, such as application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs). Some of these microprocessors may include dedicated neural network hardware capability specifically designed to implement artificial neural networks and RNNs described here.
If implemented in software, the functions can be stored on or transmitted over a computer-readable medium as one or more instructions or code. Computer-readable media includes both tangible, non-transitory computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media can be any available tangible, non-transitory media that can be accessed by a computer. By way of example, and not limitation, such tangible, non-transitory computer-readable media can comprise RAM, ROM, flash memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes CD, laser disc, optical disc, DVD, floppy disk and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing description is illustrative, and variations in configuration and implementation can occur to persons skilled in the art. For instance, the various illustrative logics, logical blocks, modules, and circuits described in connection with examples of the present disclosure disclosed herein can be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), cryptographic co-processor, or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor can be a microprocessor, but, in the alternative, the processor can be any conventional processor, controller, microcontroller, or state machine. A processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
In one or more examples, the functions described can be implemented in hardware, software, firmware, or any combination thereof. For a software implementation, the techniques described herein can be implemented with modules (e.g., procedures, functions, subprograms, programs, routines, subroutines, modules, software packages, classes, and so on) that perform the functions described herein. A module can be coupled to another module or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, or the like can be passed, forwarded, or transmitted using any suitable means including memory sharing, message passing, token passing, network transmission, and the like. The software codes can be stored in memory units and executed by processors. The memory unit can be implemented within the processor or external to the processor, in which case it can be communicatively coupled to the processor via various means as is known in the art.
While the preferred embodiments have been shown and described, it will be understood that there is no intent to limit the invention by such disclosure, but rather, is intended to cover all modifications and alternate constructions falling within the spirit and scope of the invention.
This application is the national stage entry of International Patent Application No. PCT/US2022/017087, filed on Feb. 18, 2022, and published as WO 2022/178316 A1 on Aug. 25, 2022, which claims the benefit of U.S. Provisional Patent Application Ser. No. 63/150,829, filed Feb. 18, 2021, both of which are hereby incorporated by reference herein in their entireties.
This invention was made with government support under grant NS092726 awarded by the National institutes of Health and grant DGE1746891 awarded by the National Science Foundation. The Government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/017087 | 2/18/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63150829 | Feb 2021 | US |