AS024 » C. Elegans Neural Network Hardware Implementation
Caenorhabditis elegans (C. Elegans) is one of the simplest organisms with a nervous system. In the hermaphrodite, this system comprises 302 neurons, the pattern of which has been comprehensively mapped, in what is known as a connectome.
Currently, most of the work done in simulating this neural network is done by software, and what little work has been done in hardware emulation is focused on a per neuron simulation, implementing each neuron inside a separate FPGA IC, which is great for studying this particular nervous system, which has a really low neuron count, but when we want to study bigger nervous systems, comprised of millions (mice) or billions (humans) of neurons, it's not feasible to take this one neuron in one FPGA approach, we need to be able to integrate those neurons and optimize the neuron implementation to utilize the lowest amount of resources possible.
Why? Right now, we have the full connectome (which means all neurons and corresponding synapses have been mapped) for the C. Elegans, which was completed in 1986.
We already have active projects like the CIC's Mouse Connectome Project or the Human Connectome Project, which aim to fully map the Mouse and Human brains, respectively.
When these connectomes have been mapped, a suite like the one proposed here will be of much use to simulate these networks with low latency, real time results, which would bring huge benefits to neuroscience and psychology.
The aim of this project is twofold:
First, to implement the C. Elegans neural network efficiently, utilizing the lowest amount of resources possible, and write the corresponding software (or, if available, integrate it with an open source solution) to visualize the results in a computer, with the data transmitted via a high speed interface like PCIe.
Second, to lay the ground work for a complete suite for simulating biological neural networks in hardware and visualizing the results (be it neuron status, or in case of simple organism like the C. Elegans, view the actual organism moving, but receiving data not from a software simulation, but from the hardware implementation).
The whole C. Elegans neural network will be implemented inside the Cyclone V FPGA from the OpenVINO Starter Kit. Alongside the network, an interface will be implemented to transmit the necessary data to a computer via the PCIe connector.
For the neuron model to be used, there are several options, most common ones listed here, each model with its own pro's and con's. All of which will be tested and the one that offers the best performance-area tradeoff will be selected for the final design, if size allows it, multiple models will be implemented inside the same Cyclone V, and measurements between different models will be compared in real time, making conclusions based on the performance per area of each model.
In the computer side of things, data will be received and stored in RAM, to later be read and displayed to the user, with different measurements options.
For data visualization, it would be ideal if current C. Elegans visualization software could be used, just by replacing the input data with the hardware implementation data, if that is not possible, a more simple neuron status visualization software will be designed.
* The actual neural model implementation will be selected based upon which gives the best area-performance tradeoff.
Simulation for the C. Elegans neural network has been done in software many times before. But the advantages that an FPGA device brings to this application are many.
Real Time: Due to the implementation being in hardware rather than software, we can simulate the organism's response in real time, which is needed for accurate results in more complex organisms, where implementing fixed delays for all neurons would be extremely computationally expensive.
Latency: The low latency advantage that an FPGA offers, allows us to make the most accurate simulation possible, a well implemented neural network in an FPGA with no unneeded delays provides the best environment for simulation possible.
Scalability: This is one of the most important aspects when thinking of the future of this field. The work done in a simple 302 neuron nematode can be easily scaled up to a million neuron rodent (which is believed to be the next step), with just the information about each neuron configuration parameters, and the amount of distinct types of neuron in the organism, alongisde its connections. With an FPGA implementation, if a bigger connectome were to be mapped in the near future, the only requirement for its implementation would be to connect together the needed amount of FPGA's (what this project aims to do is prove that it is possible to vastly reduce this number) and have it running in real-time, with a reconfigurable interface, with the ability to probe each neuron individually, and to study the effects of changes in these networks (e.g. the effect that damaged neuron pathways can have on the system).
Although software simulations of this nematode's nervous system are available, performance is not great, to achieve 1 second of data, 92 seconds of computing (averaged) is required on a low end laptop computer (tested on a laptop with i3 7100U processor, with the latest C. Elegans software simulation c302 emulating all neurons).
Neuroscience, and more specifically, computational neuroscience, is a fast-growing field of science, which can benefit greatly from FPGA technology. In this particular case, it would be desirable if real-time performance could be achieved, inside a single FPGA, which would open doors to more interesting projects, as well as cut down on computation time, and cost, needed for simulations.
This is what this project achieves, outputing data in real time for each of the individual neuron potentials, thus, achieving ~90 times* more data thoughput than software simulation.
*(Based on tests performed on a low end laptop, if performed on higher end processors, results could be closer, maybe then the performance increase of an FPGA would be ~45 times insttead of ~90, but it remains to be tested on more machines).
The leaky integrate and fire model was used for each neuron, with parameters where applicable, what follows is an electrical representation of the model.
And here is the equation that describes it.
Further reading on the model can be done on Gerstner and Kistler's "Spiking Neuron Models.", which is freely available on the internet via Cambridge University Press , and a link can be found in the references.
In a neural network, not only you have to worry about the neurons themselves, but about their connections as well.
In fact, most of the hardware in the implemented design ended up being utilized by the axons.
Synapses and Axons are the other two components needed for this design.
Synpases are simply delays, which represent the time an ion signal takes to reach from the Axon to the Neuron (this actually is not exactly like that, in reality the delay is in the Axon itself, but since we are modeling a digital system, we can interchange if every input of the Axon is delayed, or if the only output of the Axon is delayed, and we choose the latter, because it means less registers to implement the same amount of delay).
Axons represent the junction of all the dendrites, which, is nothing more than an adder, with every element that is being added is also multiplied by it's own weight, this represents the fact that each dendtrite that takes the signal from one neuron to the Axon has a different physical size, which in turn changes its resistance value, and therefore the "weight" of the effect that the voltage has on generating current.
Real-time performance has been achieved, with no perceivable error in neuron firing rate, the actual voltage of each neuron does have a low voltage error compared to the simulation, but it is to be expected from the "optimizations" that remove multipliers and replace them with arithmetic shifts, which introduce some error. The important parameter is the actual firing time, which shows no perciavcable error or drift in time.
Resource utilization is quite high, but the design fits in the OpenVino Starter Kit's Cyclone V with more than enough space as seen below.
Below, the Leaky Integrate and Fire neuron model implementation can be seen.
Some optimizations have been made, for example, in the original LIF model's equation, a division by the time constant is required, here, that time constant has been aproximated to a power of two, so that the division can be replaced by a simple arithmetic right shift, the same goes for the multiplication by the time delta, required for the Euler's method which implements the necessary integration, and since the input current multiplication is scale up by one of the factors of the time constant (R), we can also use an arithmetic left shift for the multiplication, and we not only reduce the hardware required per neuron, but we also cancel out some of the error introduced by the rounding to a power of two.
In some neurons, current flows chemically via ions, thus it has a measurable delay in transmission, we simulate this delay with a shift register, with configurable size, in the next example a synapse with a delay of 6 is shown.
It is worth noting that a little modification has been made to the delay of the synapses, biologically, the delay would be before the axon, thus there would have to be a delay for every connection, with the same value, so we put the syanpse delay after the axon, applying the delay after the voltage's addition, so that we only apply one delay per axon and not per synapse.
Each input to the neuron has some "weight" by which it is multiplied, and added to the other inputs before reaching the neuron, this is the axon block. It takes a parameter that indicates how many voltages go in, and multiplies each voltage by its corresponding weight, then adds them all up, to obtain the final voltage.
Next, an axon with 2 input voltages is shown, but this number ranges from 1 to 146 in the final model.
A simple example of 3 connected neurons is shown below, to demonstrate how these neurons connect with each other, since it's not possible to see in the full design. It's the same neuron, the ADAL neuron, with neurons 1 and 2 excited, and neuron 3 being fed with the outputs of neuron 1 and 2.
From this simple 3 neuron test becomes evident that it would be impossible to instantiate the full C. Elegans neural system by hand. These 3 neurons alone, with only 3 synapses, only instantiation-wise are 90 lines in verilog, if it scaled linearly (it doesn't), it would mean to write over 9000 lines of verilog just for instantiation. For this reason, a bash script was developed to auto-instantiate every module from the CSV data of the C. Elegan's Connectome.
Finally, we see the fully instantiated C Elegans neural system, with 302 neurons with their corresponding 302 axons, and more than 6000 synapses between these. It is worth noting, to give a sense of scale, that the tiny blocks shown inside the design, are not primitives, but are actually sub-blocks, neurons, synapses and axons.