R code for multivariate random-walk Metropolis sampling

I couldn’t find a simple R code for random-walk Metropolis sampling (the symmetric proposal version of Metropolis Hastings sampling) from a multivariate target distribution in arbitrary dimensions, so I wrote one. This is also my first R code.
It requires the package MASS to sample from the multivariate normal proposal distribution using the mvrnorm function. If you are using R on Windows, you can download the package zip for Windows from the link, and use Packages > Install package(s) from local zip files… from the GUI to install the package.
The reason I couldn’t write the code for a general Metropolis algorithm (i.e. for any arbitrary symmetric proposal distribution, not just normal) or a more general Metropolis-Hastings algorithm (with any arbitrary proposal distribution, symmetric or not) is that generating the proposal point would then require sampling from an arbitrary proposal distribution. This is only easy for a few standard distributions, but hard in general (which is the point of using such algorithms in the first place).

I. Function

The following is the function that does the Random Walk Metropolis-Hastings sampling when supplied with the required arguments. Notes about the arguments follow the code.

rwmetro <- function(target,N,x,VCOV,burnin=0)
{
    require(MASS)   #requires package MASS for normal sampling
    samples <- x
    for (i in 2:(burnin+N))
    {   
        prop <- mvrnorm(n = 1, x, VCOV)
        if (runif(1) < min(1, target(prop)/target(x))) 
            x <- prop
        samples <- rbind(samples,x)
    }
    samples[(burnin+1):(N+burnin),]
}

II. Arguments

  1. target function
    The function defining the multivariate target distribution, written as a function of an n-vector, where n is the number of variables on which the distribution is defined. The different variables of your distribution must be written as x[1], x[2] etc.
    Example
    An example is the following, defining the function f(x,y)=exp(-5*abs(x^2+y^2-1)) in two dimensions:

    ring2D <- function(x)	# x is a vector
    {
    	exp(-5*abs(x[1]^2+x[2]^2-1))
    }
    
  2. N integer
    The final sample size (i.e., excluding the burn-in length).
    Example
    100
  3. x numeric vector
    The starting point (vector) of your Metropolis-Hastings chain. This vector needs to be the same length as the dimension of the target distribution. Your target function must be able to accept this vector as an argument.
    Example
    c(-0.1,20.5)
  4. VCOV numeric matrix
    The variance-covariance matrix for the multivariate normal that is used as the proposal distribution for random-walk Metropolis-Hastings. The length of this matrix also has to be the same as the dimension of the target distribution, i.e. the length of vectors acceptable by target and the length of x. You can vary the entries of this matrix and observe your results to see what works better for sampling your target distribution.
    Example
    The following line defines a matrix in two dimensions with .01 variance for each variable and no covariance between them.

    vcov2D <- .01*diag(2)
    
  5. burnin (optional) integer
    The ‘burn-in’ length for the chain. The number specified will be the number of initial samples chucked. If nothing is specified, it’s taken to be 0.
    Example
    20

III. Output
numeric matrix
The output is a matrix where each row is a sample from your target distribution, excluding the initial burn-in samples. The number of rows is thus the sample size, and the number of columns is equal to the dimension of the target distribution. You can use this matrix however you want, to save, visualize or analyze your sample.
Example
An example output of sampling a 2D distribution with sample size 5. Each row is an x,y sample.

[1,] 0.12961923 0.03708061
[2,] 0.10765053 -0.02798036
[3,] 0.01112930 -0.07255766
[4,] 0.06049625 -0.04546265
[5,] 0.1832164 -0.1244354

IV. Usage Example
Let’s take the target distribution we used as an example above, f(x,y)=exp(-5*abs(x^2+y^2-1)). This looks like a ring of radius 1 rising from the x-y plane and centered at the origin. Here is a gnuplot surface plot of the distribution (because I found it frustratingly hard to figure out a level plot in R):

Target Distribution in Gnuplot
Let’s generate a sample of size 40,000 from this distribution with the starting point (0,0) and without any burn-in length, and with the variance-covariance matrix we defined before. This is done by calling the function with the correct arguments:

ringsample2D <- rwmetro(ring2D,40000,c(0,0), vcov2D)

This assumes that you have already defined the target ring2D and the matrix vcov2D in the way explained, preceding this function call.
ringsample now contains a random sample from the distribution.
With the following one-line R code, I made a scatter plot of the sample from the ringsample matrix.

plot(ringsample2D[,1], ringsample2D[,2], xlim=c(-1.5,1.5),ylim=c(-1.5,1.5), main='Metropolis-Hastings Sample',xlab='x', ylab='y', pch='.')

The following is the plot:
Metropolis-Hastings Sample, scatter plot

Putting all the code together in sequence, this is what the full code for defining the arguments, drawing the sample, and making the plot for this example looks like:

# Define arguments
ring2D <- function(x)    # x is a vector
{
    exp(-5*abs(x[1]^2+x[2]^2-1))
}

vcov2D <- .01*diag(2)

#Define the sampling function
rwmetro <- function(target,N,x,VCOV,burnin=0)
{
    require(MASS)   #requires package MASS for normal sampling
    samples <- x
    for (i in 2:(burnin+N))
    {   
        prop <- mvrnorm(n = 1, x, VCOV)
        if (runif(1) < min(1, target(prop)/target(x))) 
            x <- prop
        samples <- rbind(samples,x)
    }
    samples[(burnin+1):(N+burnin),]
}

# Call the sampling function with the arguments
ringsample2D <- rwmetro(ring2D,40000,c(0,0),vcov2D)

# Use the sample
plot(ringsample2D[,1], ringsample2D[,2], xlim=c(-1.5,1.5),ylim=c(-1.5,1.5), main='Metropolis-Hastings Sample',xlab='x', ylab='y', pch='.')

The following is another sample, this time for a 3-variable function, f(x,y,z)=exp(-5*abs(x^2+y^2+z^2-1)). This is just the 3D version of the previous function, and its density is mostly concentrated on a spherical shell of radius 1 and of some thickness, centered at (0,0,0). Notice how we never have to spare a thought about normalizing these functions to get the sample, which is one of the advantages of the Metropolis-Hastings algorithm.
The target function is now:

ring3D <- function(x)	# x is a vector
{
	exp(-5*abs(x[1]^2+x[2]^2+x[3]^2-1))
}

and the variance-covariance matrix is, say, .01 down the diagonal again:

vcov3D <- .01*diag(3)

Let’s go for a sample of 20000 points after a burn-in of 20000, starting from (0,0,0). The function is then called as:

ringsample3D <- rwmetro(ring3D,20000,c(0,0,0),vcov3D,20000)

ringsample3D is now a 3×1000 matrix, and I use the following to get a rotatable 3D plot of it (requires the rgl package):

require(rgl)
plot3d(ringsample3D[,1], ringsample3D[,2], ringsample3D[,3], xlab='x',ylab='y', zlab='z',col='red', size=1)

Metropolis Sample

I would have preferred a spinning 3D gif, but I’m too tired to work that out right now.
Share the knowledge.

Partners meet halfway: a simple correlation study of an undergrad lab class

Last semester I taught two classes of an Introductory Physics (Electrodynamics) for Engineers lab course at the University of Texas at Austin. The first day that my students came in, most of them did not know each other. They just came in and sat at tables of two, and this pretty much became their permanent seats in the room for the rest of the course (that’s human habit). They were supposed to work with their partner on the experiments in the first part of the class, but individually on their worksheets in the second part. So choosing their seats also fixed the partners. Thus, it is a good assumption to say that the pairing had been pretty random, at least as far as academic ability of the students was concerned.

I noticed though that both partners in a group tended to move towards a common average in their performance over the course. If one partner was diligent, it would motivate (and sometimes force) the other to work harder, while the sloppiness and uncaring attitude of a partner would also negatively affect the performance of the other. I was happy to notice though that the ‘uplifting’ effect was usually more dominant than the ‘down-dragging’. In any case, this caused the performance of partners to pull closer, irrespective of their starting academic ability.

At the end of the semester, after I had the final scores that I handed out to them, I decided to do a simple correlation test. The following are two scatter plots of pairs of student scores for each of my classes, each pair ordered as (better score, worse score) (which is why all points are below the bottom-left to top-right diagonal of the plot). The linear fits through them have also been shown. Although I had suspected that there would be a significant correlation between partners’ scores, I was surprised at how high the correlation coefficients turned out to be. (Note that if you pair an even number of numbers randomly, on the average you should expect a linear correlation coefficient of zero.)

 

image

 

image

I had felt this trend even by the middle of the semester, and I had this crazy idea of using this to try and pull up the performance of the students who were not doing too good. I thought of sorting the students by their score so far into the course, and re-assigning partners so as to pair the best student with the worst, the second-best to the second-worst and so on. However, I know that habit is a deadly force, and if I suggest my students in the middle of the course that they have to say goodbye to their partner with whom they already have some steady chemistry going and have new partners again, ones that I have chosen for them, all hell would break loose. So I had to rest it in my head as a theoretical result with possible practical applications, to be taken up by a braver soul than me. But hey, here’s the data, so you can quote it if you want to try it and someone doesn’t take the idea too well.

A/B and Rh Antigens in Blood Types: A Statistical Test of Independence among IISER Kolkata Students

A couple of days ago one of my juniors in college (Indian Institute of Science Education and Research, Kolkata) found unguarded in the guest account of our computing system a spreadsheet.

This spreadsheet contained the blood types of the Masters students across five batches (‘07-‘13). With this information he made a nice little bar diagram of the frequency distribution of blood types, which I lift here:

blood-group

When I looked at this graph, what I noticed first was that this contains the distribution across both the antigen type (A/B/AB/O) and the Rh factor (+/-). It struck me that it may be possible by constructing a contingency table to check whether these two properties are independent of each other.

For this I first constructed from the graph the two-way contingency table of the joint distribution of antigen and Rh factor:

 

Rh\Ag→ A B AB O Marginal totals
+ 76 116 28 122 342
1 4 1 6 12
Marginal totals 77 120 29 128 354 (total)

 

Consider now the marginal distributions, i.e. the subtotals along the last row and column. If we divide these numbers through by the total number of data points (354), we obtain the relative marginal frequencies. For a large sample, this may be identified with the probabilities of occurrence of the individual properties irrespective of that of the other.

Now comes the important part. If the occurrence of these two properties are independent of each other, then the joint probability in any cell shall be the product of the marginal probabilities for that row and column (P(A∩B)=P(A)P(B)). In terms of frequencies, this means that the number in any cell should be the product of the subtotals for that row and column, divided by the total (354). You can think of it this way: the frequency in any cell is the product of one of its marginal frequencies with the other relative marginal frequency (signifying the independent conditional probability). Example:

n(A+) = n(A).P(+) = n(A).n(+)/N.

Taking this assumption of independence then, it is possible to construct a contingency table inwards starting from just the outer marginal frequencies. This has been constructed below with the marginal frequencies of the actual data, and rounded to the nearest integers.

Rh\Ag→ A B AB O Marginal totals
+ 74 116 28 124 342
3 4 1 4 12
Marginal totals 77 120 29 128 354
(total)

If you compare the joint frequencies in the two tables, you can see immediately that they are very close, lending support to the assumption that these factors are independent.

For a more graphical idea, I decided to plot cluster bars of the actual and the computed frequencies assuming independence.

image

The average error between the actual and computed frequencies was 0 over the eight blood types. That doesn’t say much because errors of opposite signs reduce the net effect. The root mean squared error was 1.16, which too is tiny in comparison to the frequencies themselves.

Thus, it is pretty safe to suppose that the antigen and the Rh factor are uncorrelated properties.

There are stronger, more explicit tests of association than what has been done here. The usual correlation coefficient, though, cannot be calculated here because the variable (blood type) is not a quantitative one. However, there are others that you can read about here.

If you want to test this method or any of the other methods on other datasets of blood types, this page provides quite a bit of data for various countries (although last I checked, they seemed to not be completely factually correct).

A Statistical Problem on Laptop Uptimes

Suppose you are in a large university campus. Most students here use laptops, and if you look around, you’d see most of them either working, listening to music or doing something else on their laptops. Suppose now you think of a quick project, of listing the uptimes of the laptops (how long they’ve been running). In Windows this is quite simple. Under the ‘Performance’ tab in Task Manager, you’ll notice that ‘Uptime’ gives the duration for which the laptop has been running. This timer, however, keeps counting from where it left off if you resume from hibernation, as it should, but we shall assume that no such cases happen in our campus.

Now, the campus is huge, and many students are using their laptops. You figure out some methodical way of visiting each student so that there are no over- or under-counting errors. But it takes you quite a while to visit all of them and note down their uptimes. So that if, for example, it took you six hours to collect all the data, then you took the last uptime reading six hours after the first.

If we assume a very large campus where the uptimes of the laptops of different students are completely independent of each other, the questions are the following:

1. On an average, is it going to make any difference to the statistics you collected, if instead of taking a long time to go around the campus, you could somehow acquire all of the uptimes at one instant of time?

2. Is the data you collected going to be distributed differently from that of the maximum uptimes of student laptops (duration before they shut it down)?

Unless I find myself without the time and effort, I plan to return and solve it in this blog post. (I haven’t solved it yet.)