# Lyapunov exponent of the logistic map (Mathematica Code)

In a previous post I’d shown a way to get the Lyapunov exponent from the time series data of any map. In this quick tutorial, I’ll show you a cleaner way to get the Lyapunov exponent for the specific case of the logistic map, and then using a really short script in Mathematica, plot it against r.

First the mathematical analysis that simplifies the expression for the Lyapunov exponent for a map, and particularly the logistic map. This discussion follows this article.

Suppose the initial infinitesimal perturbation is $\delta x_{0}$. Then we have, for $n\rightarrow\infty$:

$\left|\delta x_{n}\right|=\left|\delta x_{0}\right|e^{\lambda n}\Rightarrow e^{\lambda n}=\underset{\delta x_{0}\rightarrow0}{\lim}\left|\frac{\delta x_{n}}{\delta x_{0}}\right|=\left|\frac{dx_{n}}{dx_{0}}\right|$.

But we can write:

$\frac{dx_{n}}{dx_{0}}=\frac{dx_{n}}{dx_{n-1}}\frac{dx_{n-1}}{dx_{n-2}}\ldots\frac{dx_{1}}{dx_{0}},$

where each $x_{i}=f(x_{i-1})$. So we have:

$\frac{dx_{n}}{dx_{0}}=f'(x_{n-1})f'(x_{n-2})\ldots f'(x_{0}).$

Therefore,

$\;e^{\lambda n}=\left|f'(x_{n-1})f'(x_{n-2})\ldots f'(x_{0})\right|$

$\Rightarrow\lambda(r;x_{0})=\underset{n\rightarrow\infty}{\lim}\frac{1}{n}\ln\left|f'(x_{n-1})f'(x_{n-2})\ldots f'(x_{0})\right|$

$=\underset{n\rightarrow\infty}{\lim}\frac{1}{n}\sum\limits_{k=0}^{n-1}\ln\left|f'(x_{k})\right|.$

For the logistic map, $f'(x)=r(1-2x)$.
So we have:

$\lambda(r;x_{0})=\underset{n\rightarrow\infty}{\lim}\frac{1}{n}\sum\limits_{k=0}^{n-1}\ln\left|r(1-2x_{k})\right|$.

We can put the above formula in a short Mathematica script to obtain $\lambda$ as a function of $r$ and plot it. The following is the code:

\[Lambda][r_] := Module[{f, l},
f[x_] := r x (1 - x);
l[x_] := Log[Abs[r (1 - 2 x)]];
Mean[l[NestList[f, 0.1, 1*^2]]]];
Plot[\[Lambda][r], {r, 0, 4}, PlotStyle -> Thickness[.0001],
AxesLabel -> {"r", "\[Lambda](r)"}]


And the following is the output:
In the line that uses Nestlist, we specify the starting point of the trajectories. However, I noticed that the output does not depend on the starting point.

# Training Neural Networks with Genetic Algorithms

In this blog post I present my findings of an independent analytical and computational study of using genetic algorithms to train neural networks. This was my final project for an Introduction to Cognitive Science course that I took at The University of Texas at Austin, under Dr. David Beaver.
My motivation comes from the fact that animal brains, and particularly the human brain which is the topic of this entire course, are themselves genetically evolved neural networks. Therefore, artificial neural networks trained by genetic algorithms are a good starting rudimentary model of understanding the hardware of the brain. This sentiment is echoed in my primary reference, Evolutionary Algorithms for Neural Network Design and Training, Branke et al (1995). However, the paper mostly discusses the idea qualitatively, and I decided that since it is a relatively simple algorithm to implement, I would benefit my understanding to a much greater extent by implementing it myself on some simple networks.

## Background

### McCulloch-Pitt Neurons

A McCulloch-Pitt neuron is a simple decision-making unit that forms the building block for the kind of artificial neural networks I have constructed and studied. It has a set of inputs (each of which is a 0/1 bit) connected to it via weighted links. It checks if the weighted sum of the inputs exceeds a specified threshold $t$, and passes an output 1 if it does, and 0 if not.

### Artificial neural networks

Neurons of the above kind can be connected and their weights and thresholds set in ways to have their action emulate desired computational operations. This is called an artificial neural network. The following are two simple networks that act as 2-input OR and AND gates:
Each network has weights of 1 each. The OR gate has a threshold of 0.5 whereas the AND gate has a threshold of 1.5.
It should already become clear that there isn’t a unique set of weights and thresholds to have a neural network perform a certain task. We shall delve more on this later.

### Training artificial neural networks

What if we wish our network to perform a certain kind of operation but we do not know what weights and thresholds will do the work? First, we need to define in a simple way what is meant by a certain operation. It means mapping the set of possible input tuples to their right answers. This is called a truth table. The following is the truth table for a two-input XOR gate for example:
 $\boldsymbol{i_{1}}$ $\boldsymbol{i_{2}}$ $\boldsymbol{o}$ 0 0 0 0 1 1 1 0 1 1 1 0
A standard way of converging on the right set of weights for a neural network prescribed to perform a certain task is by the use of learning rules. These rules update the weights based on the difference between the correct answer for a set of inputs and the actual answer.

### Limitations of learning-rule based weight training

However, this simple method often fails to converge because the number of nodes in the network and the connections between them may be wrong to start with, such that no set of weights between them would provide the desired operation. We can see this when we use the simple connectivity of the OR and AND gates illustrated above and try to emulate a XOR gate and the process does not converge.
Therefore, in a more general case the problem might be extended to include the structure of the network as well, and now the question extends to finding:
1. The number of nodes in the network.
2. Their connectivity, i.e. the adjacency matrix (the $i,j$ th element of this is 0 or 1 depending on whether the $i$ th and $j$ th nodes are connected).
3. The weights of the connections. 2 and 3 may be combined into finding a weighted adjacency matrix, where a zero matrix element implies that two nodes are disconnected, and a non-zero element specifies the weight of the connection.
4. The thresholds of firing of the neurons.
Some of these variables are discrete and others are continuous, and they are interrelated in a complex way. All of this means that learning rules cannot be simply generalized to find all the variables using an updating scheme. Herein lies the utility of genetic algorithms.

### Genetic Algorithms

Genetic algorithms are commonly used where the problem can be formulated as a search in several dimensional parameter space, and a fitness function can be attached to any choice of parameters (a measure that mirrors how close the solution represented by those parameters is to the desired solution). The following schematic illustrates a typical genetic algorithm.
In the context of constructing a neural network, genetic algorithms provide a natural method of solution in the general case where all of the mentioned variables are floating and can be concatenated into a string.

## Implementation

There are some important points to note about my particular implementation of the method:
1. For the sake of simplicity and owing to limited time, the only floating variables were the weights and thresholds, i.e. the number and connectivity of neurons was fixed. It is not hard to imagine extending this implementation to float those parameters as well.
2. The fitness function is chosen to be the sum of the absolute differences between the correct answers and the answers given by the network. For example, consider the situation when we are trying to model an OR gate and our network gives us the following answers:
 $\boldsymbol{i_{1}}$ $\boldsymbol{i_{2}}$ Actual answer Network’s answer 0 0 0 1 0 1 1 0 1 0 1 0 1 1 1 1

Then the fitness of the network is going to be $|0-1|+|1-0|+|1-0|+|1-1|=3$. By this definition, then, the lower the fitness, the better the network, and a fitness of zero means that the network achieves the desired behaviour for all inputs. As long as the fitness measure ranks the individuals accurately based on their performance, the exact form of the fitness is irrelevant to the working of the algorithm.

1. The reproduction is sexless, i.e. there is no mating. This means that in each iteration, the fittest individual is selected and a new generation is created by mutating each of its parameters.
2. To speed up convergence and make it monotonic, if in an iteration all of the offspring are less fit than the parent, the parent is selected again. This ensures there is no retrograde evolution.

### Code

The following is the program in R:
xornet<-function(i1,i2,w){
m1=i1*w[1]+i2*w[3]>w[7]
m2=i1*w[2]+i2*w[4]>w[8]
return(m1*w[5]+m2*w[6]>w[9])
}

#calculates the fitness of a network defined by a set of parameters
fitness<-function(w){
s=0
for (i1 in c(0,1)){
for (i2 in c(0,1)){
s=s+abs(xornet(i1,i2,w)-xor(i1,i2))
}
}
return(s)
}

#initialize random population of weight vectors
pop<-NULL
for (i in 1:10)
{
pop<-rbind(pop, rnorm(9))
}
ftraj<-NULL
wspace<-NULL
for (gen in 1:1000){
#sort according to fitness
fitnesslist<-NULL
for (i in 1:dim(pop)[1]){
fitnesslist<-c(fitnesslist, fitness(pop[i,]))
}
pop<-cbind(pop,fitnesslist)
pop<-pop[order(pop[,10]),]
#list the parameter vectors that work. Can be plotted to visualize the solution region in parameter space
for (i in 1:10){
if ((pop[,10]==0)[i]){
wspace<-rbind(wspace,c(pop[i,1],pop[i,2]))
}
}
#list highest fitness against generations
ftraj<-rbind(ftraj,fitness(pop[1,]))

#generate new generation by mutations from fittest
fittest=pop[1,]
pop<-NULL
for (i in 1:9){
pop<-cbind(pop, rnorm(10, mean = fittest[i], sd = 1))
}
pop<-rbind(fittest[1:9],pop)
}
plot(ftraj, pch=19, xlab="Generation", ylab="Fitness of the fittest")
lines(ftraj)


This code is implementing a neural network for a XOR gate, which corresponds to the highlighted lines. Those are the lines that have to be changed accordingly for other networks.

### 2-input OR Gate

#### Analytical observations

In order for the above network to reproduce the truth table of an OR gate, we must have the following:
1. For (0,0): $0.w_{1}+0.w_{2} \leq t$, i.e. $t \geq 0$.
2. For (1,0): $1.w_{1}+0.w_{2}>t$, i.e. $w_{1}>t$.
3. Similarly for (0,1), we have $w_{2}>t$.
4. For (1,1): $1.w_{1}+1.w_{2}>t$, i.e. $w_{1}+w_{2}>t$, which is automatically satisfied as long as 2 and 3 are satisfied.
Thus, in the parameter space there is a region of solutions, not a unique set of weights and threshold. These solutions occupy the unbounded region defined by criteria 1, 2 and 3. The surfaces of such regions are called decision boundaries of the perceptrons (neural networks).

In my simplified implementation I further specified the threshold to 1 (as can be seen in the definition of the ornet function in the code), so that the search space of parameters reduces to the 2D $w_{1},w_{2}$ space and the solution region is defined by the area $w_{1},w_{2}>1$, illustrated below:

#### Results

One may consider the evolution of the parameter vector to be a Markov chain. If we plot the fittest of each generation, the Markov chain looks like this for an example run:
Notice how we start outside the solution region and wander into it.
If we plot the fitness of the fittest individual of each generation for the markov chain, it converges to 0. Here is a plot for an example run:

This is a scatter plot of $w_{1},w_{2}$ values after convergence for a sample run:

This is in accordance with the solution region schematically depicted previously.

### 2-input AND gate

#### Analytical observations

Following similar arguments as for the OR gate, we can conclude the following about the weights and thresholds if the network is supposed to work as an AND gate instead:
1. For (0,0): $0.w_{1}+0.w_{2} \leq t$,i.e. $t \geq 0$.
2. For (1,0): $1.w_{1}+0.w_{2}\leq t$, i.e. $w_{1}\leq t$.
3. Similarly for (0,1), we have $w_{2}\leq t$.
4. For (1,1): $1.w_{1}+1.w_{2}>t$, i.e. $w_{1}+w_{2}>t$, which is no longer automatically satisfied.
For a fixed threshold (say 1), the solutions in the parameter space now occupy a bounded triangular region:

#### Results

All the results are analogous to the case of the OR gate. The only plot I will show here is the space of solutions for an example run, which is in accordance with the solution region found analytically above:

### 2-input XOR gate

#### Analytical observations

In an assignment we saw how a network connectivity like the one I’ve used so far fails to converge to the operation for a XOR gate for any choice of weights. This is unsurprising because the response of a XOR gate is not a linear function of its inputs. This requires the network to be more complex. In particular, here is a network constructed by observation that functions as a XOR gate:
The intermediate OR gate ensures that at least one of the inputs is on, while the NOT AND gate ensures that both are not on at the same time. The outputs from these pass through a final AND gate that makes sure that both these conditions are met.

#### Results

As we can see, there is some variability introduced in the weights as well as the thresholds. This is an opportunity to use the power of the genetic algorithm to easily extend the method to include the thresholds as floating parameters as well, which is what I have done with the code. The number of nodes and the connectivity is still fixed.
The parameter space is now 9-dimensional, and the solution region occupies a much tinier fraction of it. Therefore convergence takes longer. Here is a plot of the convergence of fitness for a sample run:
The Markov chain or the solution parameter vectors cannot themselves be plotted because it is a 9-dimensional space.
The weight and threshold values that resulted from this run have been plugged into the schematic. As we can see, the logic of this network is quite different from the one constructed before by observation, yet it manages to produce the same truth table as a XOR gate. This is because the structure, nature, connectivity and operation of the hidden units of two networks may be different, yet their overall behaviour may be the same.
The emergence of such alternate solutions allows one to see the problem anew in retrospect. This is indeed the beauty of genetic algorithms. It is an unguided method for obtaining seemingly designed solutions.

## Conclusions

As we saw, we can use genetic algorithms to train artificial neural networks to perform desired computations. The advantages of the algorithm over standard learning-rule based updating are several.
• It is more general, i.e. it can be used to find out the number of nodes, connectivity and firing thresholds as well, and is therefore easily scalable for large and complex networks.
• It is an entirely unguided method and requires little design and foresight.
• As a side-effect, it also provides a way to visualize the solution region in the search space by plotting solution vectors starting from different starting conditions.
A disadvantage is that since it is unguided by conscious design, it is often much slower to converge than learning-rule based updating schemes.

# Jumping to an Outer Self

I had a strange and frightening dream experience today. Then I had a theory of the operation of the mind that I thought of when I lay in bed for a while afterward thinking about the experience. I shall first describe to you the dream, and then the theory.

The dream was that I am photographing a huge concert in Austin. It is nighttime and anything is hardly visible, but I am on a raised rim around a huge rectangular concert ground teeming with a dense population of dark human heads numbering in the tens of thousands. The ground centers on a deep pit of some sort from which a faint red glow emanates, presumably the source of the music, but I cannot see it clearly for the forest of black heads around. Everything is blurry and unclear and dark.

Apparently some others are there to photograph it as well, and with alarm I watch them step off the raised rim onto the sea of people, and the surface of this dark granular sea begins to move swiftly  to the central pit as they step on it, like a crowd of small dark pebbles rolling to create a moving surface. They move along with it until they drop into the central pit.

I do not want to do this so I step away and into a covered area beside the rim. I think there is a person standing there, leaning against an opening, looking out to the concert ground. Maybe not a person. I had a feeling it could be a individual of an alien species of sorts, and this gathering that I am at is not something entirely human.

There is a box or something on the ground that I stumble on, and a jacket or something I was carrying drops to the ground, along with my phone, which opens up as it hits the ground to spill its battery (it’s a Nokia E5 that does this every time it hits the ground in real life).

I stoop to pick it up, and as I do so, the frightening part of the dream begins.

I feel a very sudden onset of a heavy, heavy drowsiness land on me. My eyes get immediately heavy and my body is hard to pick up off the stooping position. Everything in me  wants to lie down and drift away into the unobserving nothingness of sleep, and my mind launches a vigorous and alarmed fight against this. It is a very frightening feeling, because sleep never arrives like this, so it must be something else, and it is my own body that is betraying me. Everything is getting dark, and it is my own instincts that are bullying me to let go and fall asleep, while my self feels like a small person trapped inside this fast-darkening human body machine, taken by sudden fear and shock but not knowing how long it can keep up its small, fragile, important fight.

At this point I look to the floor and notice that the battery that spilled out of my phone is blue. My battery is actually not blue. Which means, I tell myself, I am dreaming. I have been taken already. The battle has been lost. This is not waking life any more.

I think I fall gently to the floor on my back, and the panic inside me soars to unbearable heights. I cannot go like this, I tell myself. I remember the person/alien standing there. I do not know who it is, but surely they shall not be so unfriendly as to not offer me help in such a crisis.

‘I need help!’ I shout out. There is no response.

At this point I half-open my eyes with a lot of effort. And I see bits of my room that I am sleeping in, in Austin. I see a beam on the roof, and the Tintin poster I put on the wall. This part is not a dream. I do really see this with my half-open eyes.

But I am not able to wake up.

This really, really frightens me. This has never happened to me before. I know that I have breached the layer into final wakefulness, but something is still keeping my mind from being able to fully wake up, try as I might. It is also strange because it is not that I am too tired and sleepy to fully wake up, and that I am not trying hard enough out of my drowsiness. There is a full battle going on inside my head, but this highly increased activity has no effect on being able to finally wake up. It is indeed a very, very strange thing to find oneself in.

I put all my resources together into one concerted effort, and then I feel something. I feel a shift in perspective, as if I am now a new person who is outside this experience, looking at the struggle I was going through as a dream that needs to be woken out of. I feel myself as being on the outer loop of a nesting, no longer the character struggling in this story, but a real-life person waking up from a dispensable dream in which this character resided.

And that’s when I actually woke up. I fully opened my eyes and looked through the crack of my blanket at the morning sunlight pouring into my room, illuminating the white ceiling. I checked to see if this is what it felt to be fully awake. My faculties were returning as they always do each morning, and I was assured that I was awake.

An epilogue is that I lay in bed for some more and drifted off into drowsiness again, and somehow managed with my sleepy antics to land a desk lamp on myself that shook my entire half-asleep world and jolted me finally into wakefulness enough to get me out of bed.

Now comes the theory, and the theory is about that final part where I stepped out of a loop and into a surrounding perspective that helped me wake up.

As I was lying in bed after this experience, still in a half-asleep state, I was thinking in my head how such a shift of character was possible. How in my mind I could both be a person, and then in a moment be another person regarding the first person as their dream.

And I had the following ‘insight’. I do not claim this in any way to be a well-founded theory of any sort, but I thought about it later and it seemed to link to some other ideas about the operation of the mind in an interesting way, so I thought it would be good to preserve it.

If you consider the brain as a very complicated computer, which I almost certainly believe it is, only using neural circuits instead of logic gate circuits, one can draw analogies between the working of the brain and that of a computer, although computer architecture today is at an infant stage in many ways compared to the complexity of the human brain.

A computer runs many processes at any given time, and they are interrelated in increasingly complex ways. Without going into the hard problem of consciousness, if the brain is the hardware, the mind, our thoughts, and our sense of self doing things must be some complex fallout of the processes that go on all the time in this vast and complex neural circuit.

It is important now to consider that like a computer, the brain is supporting many processes at once in its network. The sense of self doing things is a fraction of these. It has many subconscious processes, some which we may choose to become conscious of if we direct our attention to, and may let them recede back into the subconscious at our will (think of the exercise of trying to isolate all the noises in a noisy environment, or listening for a particular instrument in a piece of music). Some processes are forever in the subconscious and cannot be brought into attention. Similarly, there is data that can either be consciously pondered or packed away as memories that recede from the consciousness until retrieved perhaps many years later. (There is data in your mind now that you cannot think of unless someone produces a very specific cue, when it jumps right up.) The sense of self and conscious thought is only a group of processes in the brain amidst this sea of processes, illuminating a fraction of the other processes and data by shining its small light on them as and when ‘we want’.

Now this was my theory, that the collection of processes in this neural hardware that embodies the feeling of the self is not a concrete, unchanging one. In other words, ‘we’ inhabit different collections of processes in the hardware at different times.

At this point I think it relates in a way that I’ve been hearing said a lot in the context of mindfulness meditation or vipassana, for example by Sam Harris, that the self is only one of the incessant stream of thoughts arising in this one unchanging background of consciousness. ‘Pure consciousness’ is the substrate on which thoughts evolve, such as hunger or boredom or the feeling of self and having a body and doing things. The final objective of mindfulness meditation as I understand it is to dissolve the state of the mind into this pure consciousness, undisrupted by the arising of thoughts. In the context that I am talking about, the potential in this complex circuit for processes to go on that sense the outside world, take decisions and reflect internally, is the one unchanging substrate. The processes themselves that arise, operate transiently and quit to make way for others are only temporary, and our (so-perceived) continuous ‘sense of self’ is a constant real time transition of inhabiting different collections of processes in the neural circuitry. Note that I do not aim at all to explain how a collection of neural processes can assume a ‘sense of self’. This is related to the hard problem of consciousness that I do not even want to hint that I can solve.

Now, just as in a computer we can have encompassing processes running, observing or controlling subjugate processes, so can happen in the computer inside our head as well. In fact, in a computer of such staggering complexity, it must happen. For example, the rush of anxiety as you face a crowd of people on stage is a process in your mind. You can choose to observe it as it happens (then you’ll be taking the first steps to mindfulness). At that point, the self is a process that is observing this other process, both occurring in this complex neural circuit.

What happens then when the process that ‘we’ inhabit shifts from a nested process to another that observes the first? Would the experience be somewhat like what I went through? That could explain how I was both the actual person struggling to wake up, and then switching to be the other person who felt like they were dreaming of the first and could control its termination.

As I was lying in bed, still half-asleep, these are the thoughts that went through my mind. As I said, I am not claiming any of these to be founded in anything at all. They just seemed to be interesting ideas, and may in future spring new ideas and connections that can actually be placed on firmer logical grounds, so I decided to blog about it.

Let me know what you think.

The following is a truncated clip from one of Sam Harris’ lectures talking about mindfulness meditation in the context in which I referred to him.

# Calculating the Lyapunov Exponent of a Time Series (with python code)

(In a later post I discuss a cleaner way to calculate the Lyapunov exponent for maps and particularly the logistic map, along with Mathematica code.)
I found this method during my Masters while recreating the results of an interesting paper on how some standard tests for chaos fail to distinguish chaos from stochasticity (Stochastic neural network model for spontaneous bursting in hippocampal slices, Biswal and Dasgupta, 2002).
Procedure
The Lyapunov exponent is a measure of sensitive dependence on initial conditions, i.e. how quickly two nearby states diverge.
Now consider two points in the time-series, ti and tj, whose values are very close. That means the system reached near the same state at the ith and jth iterations. Now consider the two sequences  ti,  ti+1,  ti+2 … and  tj,  tj+1,  tj+2 … We wish to know how these two sequences diverge from each other. For this, consider the distance between the two sequences after k steps: d(k) = | ti+ktj+k |. (This is for a 1D time series. For higher dimensions, you can define this to be the Euclidean distance and modify the code accordingly.) If the system is chaotic, d(k) will initially rise exponentially with k. For this, one can plot ln d(k) vs k and apply a linear fit. The slope will be an estimate for the Lyapunov exponent.
(Since the system is bounded, the two nearby states will not diverge indefinitely though. Their exponential divergence will stop after some length. We must fit the straight line only within this region.)
Now, this was for a single pair of initial states. The Lyapunov exponent is an average of this divergence exponent over all nearby initial pairs. So for this, define d(k)>, where is averaging over all starting pairs  ti,  tj, such that the initial distance d(0) = | t– tj | is less than some fixed small value. The program finds all such initial pairs, calculates d(k)>, plots it against k, and the slope of the initial linear part gives us the Lyapunov exponent.
Python Code
The following code takes a text file with the time series, ‘timeseries.txt’, as the argument. The text file must contain only the time series values in a single column, no serial numbers or any other text before or after. It asks for the starting diameter within which to limit the initial pairs. It displays how many such pairs it is finding in the time series, so you can vary the diameter based on this.
It outputs a text file, ‘lyapunov.txt’ with two columns, k and d(k)>, which you can then plot and fit in the correct region by visual inspection.
from math import log

def d(series,i,j):
return abs(series[i]-series[j])

f=open('timeseries.txt', 'r')
f.close()
N=len(series)
eps=input('Initial diameter bound: ')
dlist=[[] for i in range(N)]
n=0 #number of nearby pairs found
for i in range(N):
for j in range(i+1,N):
if d(series,i,j) < eps:
n+=1
print n
for k in range(min(N-i,N-j)):
dlist[k].append(log(d(series,i+k,j+k)))
f=open('lyapunov.txt','w')
for i in range(len(dlist)):
if len(dlist[i]):
print>>f, i, sum(dlist[i])/len(dlist[i])
f.close()


The following is the plot and fit of the resulting data from a logistic map series with an appropriately chosen initial diameter.

I deliberately did not automate the plotting and fitting part, because a. it’s tedious and hard to write the code in a way that runs on most installations, and b. human eyes will do a much more reliable job of identifying where the linear portion ends.

# R code for multivariate random-walk Metropolis sampling

I couldn’t find a simple R code for random-walk Metropolis sampling (the symmetric proposal version of Metropolis Hastings sampling) from a multivariate target distribution in arbitrary dimensions, so I wrote one. This is also my first R code.
It requires the package MASS to sample from the multivariate normal proposal distribution using the mvrnorm function. If you are using R on Windows, you can download the package zip for Windows from the link, and use Packages > Install package(s) from local zip files… from the GUI to install the package.
The reason I couldn’t write the code for a general Metropolis algorithm (i.e. for any arbitrary symmetric proposal distribution, not just normal) or a more general Metropolis-Hastings algorithm (with any arbitrary proposal distribution, symmetric or not) is that generating the proposal point would then require sampling from an arbitrary proposal distribution. This is only easy for a few standard distributions, but hard in general (which is the point of using such algorithms in the first place).

I. Function

The following is the function that does the Random Walk Metropolis-Hastings sampling when supplied with the required arguments. Notes about the arguments follow the code.

rwmetro <- function(target,N,x,VCOV,burnin=0)
{
require(MASS)   #requires package MASS for normal sampling
samples <- x
for (i in 2:(burnin+N))
{
prop <- mvrnorm(n = 1, x, VCOV)
if (runif(1) < min(1, target(prop)/target(x)))
x <- prop
samples <- rbind(samples,x)
}
samples[(burnin+1):(N+burnin),]
}


II. Arguments

1. target function
The function defining the multivariate target distribution, written as a function of an n-vector, where n is the number of variables on which the distribution is defined. The different variables of your distribution must be written as x[1], x[2] etc.
Example
An example is the following, defining the function f(x,y)=exp(-5*abs(x^2+y^2-1)) in two dimensions:

ring2D <- function(x)	# x is a vector
{
exp(-5*abs(x[1]^2+x[2]^2-1))
}

2. N integer
The final sample size (i.e., excluding the burn-in length).
Example
100
3. x numeric vector
The starting point (vector) of your Metropolis-Hastings chain. This vector needs to be the same length as the dimension of the target distribution. Your target function must be able to accept this vector as an argument.
Example
c(-0.1,20.5)
4. VCOV numeric matrix
The variance-covariance matrix for the multivariate normal that is used as the proposal distribution for random-walk Metropolis-Hastings. The length of this matrix also has to be the same as the dimension of the target distribution, i.e. the length of vectors acceptable by target and the length of x. You can vary the entries of this matrix and observe your results to see what works better for sampling your target distribution.
Example
The following line defines a matrix in two dimensions with .01 variance for each variable and no covariance between them.

vcov2D <- .01*diag(2)

5. burnin (optional) integer
The ‘burn-in’ length for the chain. The number specified will be the number of initial samples chucked. If nothing is specified, it’s taken to be 0.
Example
20

III. Output
numeric matrix
The output is a matrix where each row is a sample from your target distribution, excluding the initial burn-in samples. The number of rows is thus the sample size, and the number of columns is equal to the dimension of the target distribution. You can use this matrix however you want, to save, visualize or analyze your sample.
Example
An example output of sampling a 2D distribution with sample size 5. Each row is an x,y sample.
 [1,] 0.12961923 0.03708061 [2,] 0.10765053 -0.02798036 [3,] 0.01112930 -0.07255766 [4,] 0.06049625 -0.04546265 [5,] 0.1832164 -0.1244354 

IV. Usage Example
Let’s take the target distribution we used as an example above, f(x,y)=exp(-5*abs(x^2+y^2-1)). This looks like a ring of radius 1 rising from the x-y plane and centered at the origin. Here is a gnuplot surface plot of the distribution (because I found it frustratingly hard to figure out a level plot in R):

Let’s generate a sample of size 40,000 from this distribution with the starting point (0,0) and without any burn-in length, and with the variance-covariance matrix we defined before. This is done by calling the function with the correct arguments:

ringsample2D <- rwmetro(ring2D,40000,c(0,0), vcov2D)


This assumes that you have already defined the target ring2D and the matrix vcov2D in the way explained, preceding this function call.
ringsample now contains a random sample from the distribution.
With the following one-line R code, I made a scatter plot of the sample from the ringsample matrix.

plot(ringsample2D[,1], ringsample2D[,2], xlim=c(-1.5,1.5),ylim=c(-1.5,1.5), main='Metropolis-Hastings Sample',xlab='x', ylab='y', pch='.')


The following is the plot:

Putting all the code together in sequence, this is what the full code for defining the arguments, drawing the sample, and making the plot for this example looks like:

# Define arguments
ring2D <- function(x)    # x is a vector
{
exp(-5*abs(x[1]^2+x[2]^2-1))
}

vcov2D <- .01*diag(2)

#Define the sampling function
rwmetro <- function(target,N,x,VCOV,burnin=0)
{
require(MASS)   #requires package MASS for normal sampling
samples <- x
for (i in 2:(burnin+N))
{
prop <- mvrnorm(n = 1, x, VCOV)
if (runif(1) < min(1, target(prop)/target(x)))
x <- prop
samples <- rbind(samples,x)
}
samples[(burnin+1):(N+burnin),]
}

# Call the sampling function with the arguments
ringsample2D <- rwmetro(ring2D,40000,c(0,0),vcov2D)

# Use the sample
plot(ringsample2D[,1], ringsample2D[,2], xlim=c(-1.5,1.5),ylim=c(-1.5,1.5), main='Metropolis-Hastings Sample',xlab='x', ylab='y', pch='.')


The following is another sample, this time for a 3-variable function, f(x,y,z)=exp(-5*abs(x^2+y^2+z^2-1)). This is just the 3D version of the previous function, and its density is mostly concentrated on a spherical shell of radius 1 and of some thickness, centered at (0,0,0). Notice how we never have to spare a thought about normalizing these functions to get the sample, which is one of the advantages of the Metropolis-Hastings algorithm.
The target function is now:

ring3D <- function(x)	# x is a vector
{
exp(-5*abs(x[1]^2+x[2]^2+x[3]^2-1))
}


and the variance-covariance matrix is, say, .01 down the diagonal again:

vcov3D <- .01*diag(3)


Let’s go for a sample of 20000 points after a burn-in of 20000, starting from (0,0,0). The function is then called as:

ringsample3D <- rwmetro(ring3D,20000,c(0,0,0),vcov3D,20000)


ringsample3D is now a 3×1000 matrix, and I use the following to get a rotatable 3D plot of it (requires the rgl package):

require(rgl)
plot3d(ringsample3D[,1], ringsample3D[,2], ringsample3D[,3], xlab='x',ylab='y', zlab='z',col='red', size=1)


I would have preferred a spinning 3D gif, but I’m too tired to work that out right now.
Share the knowledge.