My motivation comes from the fact that animal brains, and particularly the human brain which is the topic of this entire course, are themselves genetically evolved neural networks. Therefore, artificial neural networks trained by genetic algorithms are a good starting rudimentary model of understanding the hardware of the brain. This sentiment is echoed in my primary reference, Evolutionary Algorithms for Neural Network Design and Training, Branke et al (1995). However, the paper mostly discusses the idea qualitatively, and I decided that since it is a relatively simple algorithm to implement, I would benefit my understanding to a much greater extent by implementing it myself on some simple networks.
Background
McCulloch-Pitt Neurons
Artificial neural networks
Training artificial neural networks
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
Limitations of learning-rule based weight training
- The number of nodes in the network.
- Their connectivity, i.e. the adjacency matrix (the
th element of this is 0 or 1 depending on whether the
th and
th nodes are connected).
- The weights of the connections. 2 and 3 may be combined into finding a weighted adjacency matrix, where a zero matrix element implies that two nodes are disconnected, and a non-zero element specifies the weight of the connection.
- The thresholds of firing of the neurons.
Genetic Algorithms
Implementation
- For the sake of simplicity and owing to limited time, the only floating variables were the weights and thresholds, i.e. the number and connectivity of neurons was fixed. It is not hard to imagine extending this implementation to float those parameters as well.
- The fitness function is chosen to be the sum of the absolute differences between the correct answers and the answers given by the network. For example, consider the situation when we are trying to model an OR gate and our network gives us the following answers:
Actual answer | Network’s answer | ||
0 | 0 | 0 | 1 |
0 | 1 | 1 | 0 |
1 | 0 | 1 | 0 |
1 | 1 | 1 | 1 |
Then the fitness of the network is going to be . By this definition, then, the lower the fitness, the better the network, and a fitness of zero means that the network achieves the desired behaviour for all inputs. As long as the fitness measure ranks the individuals accurately based on their performance, the exact form of the fitness is irrelevant to the working of the algorithm.
- The reproduction is sexless, i.e. there is no mating. This means that in each iteration, the fittest individual is selected and a new generation is created by mutating each of its parameters.
- To speed up convergence and make it monotonic, if in an iteration all of the offspring are less fit than the parent, the parent is selected again. This ensures there is no retrograde evolution.
Code
xornet<-function(i1,i2,w){ m1=i1*w[1]+i2*w[3]>w[7] m2=i1*w[2]+i2*w[4]>w[8] return(m1*w[5]+m2*w[6]>w[9]) } #calculates the fitness of a network defined by a set of parameters fitness<-function(w){ s=0 for (i1 in c(0,1)){ for (i2 in c(0,1)){ s=s+abs(xornet(i1,i2,w)-xor(i1,i2)) } } return(s) } #initialize random population of weight vectors pop<-NULL for (i in 1:10) { pop<-rbind(pop, rnorm(9)) } ftraj<-NULL wspace<-NULL for (gen in 1:1000){ #sort according to fitness fitnesslist<-NULL for (i in 1:dim(pop)[1]){ fitnesslist<-c(fitnesslist, fitness(pop[i,])) } pop<-cbind(pop,fitnesslist) pop<-pop[order(pop[,10]),] #list the parameter vectors that work. Can be plotted to visualize the solution region in parameter space for (i in 1:10){ if ((pop[,10]==0)[i]){ wspace<-rbind(wspace,c(pop[i,1],pop[i,2])) } } #list highest fitness against generations ftraj<-rbind(ftraj,fitness(pop[1,])) #generate new generation by mutations from fittest fittest=pop[1,] pop<-NULL for (i in 1:9){ pop<-cbind(pop, rnorm(10, mean = fittest[i], sd = 1)) } pop<-rbind(fittest[1:9],pop) } plot(ftraj, pch=19, xlab="Generation", ylab="Fitness of the fittest") lines(ftraj)
This code is implementing a neural network for a XOR gate, which corresponds to the highlighted lines. Those are the lines that have to be changed accordingly for other networks.
2-input OR Gate
Analytical observations
- For (0,0):
, i.e.
.
- For (1,0):
, i.e.
.
- Similarly for (0,1), we have
.
- For (1,1):
, i.e.
, which is automatically satisfied as long as 2 and 3 are satisfied.
In my simplified implementation I further specified the threshold to 1 (as can be seen in the definition of the ornet function in the code), so that the search space of parameters reduces to the 2D space and the solution region is defined by the area
, illustrated below:
Results
2-input AND gate
Analytical observations
- For (0,0):
,i.e.
.
- For (1,0):
, i.e.
.
- Similarly for (0,1), we have
.
- For (1,1):
, i.e.
, which is no longer automatically satisfied.
Results
2-input XOR gate
Analytical observations
Results
Conclusions
- It is more general, i.e. it can be used to find out the number of nodes, connectivity and firing thresholds as well, and is therefore easily scalable for large and complex networks.
- It is an entirely unguided method and requires little design and foresight.
- As a side-effect, it also provides a way to visualize the solution region in the search space by plotting solution vectors starting from different starting conditions.