Latin Hypercube Sampling Radiocarbon Ages with Python

Radiocarbon dating is an important technique that offers insights into the ages of geological/archaeological records. One essential procedure for radiocarbon dating is the calibration process, which adjusts the conventional 14C ages to accurate calendar ages accounting for changes in the radiocarbon level in the atmosphere/ocean through time owing to processes like Earth carbon cycle.

The calibrated radiocarbon age often present a complex probability distribution which requires a large number of sampling process (using standard Monte Carlo method) to get a reasonable representation. Here’s an example of calibrated radiocarbon age distribution generated using the Oxcal 4.4 program:

import numpy as np
import matplotlib.pyplot as plt

#load output from Oxcal 
sample_dis = pd.read_csv('Oxcal_output', sep='\t', lineterminator='\n',header=None)

#----------Display the probability density function----------------
pdf_x,pdf_y = sample_dis.iloc[:,0]-1950,sample_dis.iloc[:,1]  #-1950 for converting AC to BP
plt.xlabel('Calibrated Radiocarbon Age (BP)')

Using 50 Monte Carlo simulations to sample this probability density function (PDF), we get:

# Define a function to do Monte Carlo Sampling

def MC_sample_pdf(pdf_x,pdf_y,n_samples=50):
    from scipy import interpolate
    pdf_x = np.array(pdf_x) 
    pdf_y = np.array(pdf_y)
    bins = np.mean(np.diff(pdf_x))
    cum_values = np.cumsum(pdf_y*bins)
    inv_cdf = interpolate.interp1d(cum_values, pdf_x)
    r = np.random.rand(n_samples)
    return inv_cdf(r)

MC_samples = MC_sample_pdf(pdf_x,pdf_y)

plt.xlabel('Calibrated Radiocarbon Age (BP)')

It can be seen that, for this random seed, the original PDF is somehow misrepresented because there should be more samples with age ~-13700. This is because Monte Carlo simulation generates each point independently from the input PDF, it requires a larger number of sampling to get a reasonable representation of the PDF.

For this 14C age, we can easily increase the sample size to perhaps 5,000 and we can get a decent representation of the input PDF. However, for some complicated physical process modelling, it is time-consuming to increase the sample size. In this case, we need a sampling process with higher efficiency. A nice and simple method for achieving this is the Latin Hypercube Sampling (LHS), which generates samples more evenly across all possible values by randomly sampling within each defined sub-interval. Here’s a comparison of using Monte Carlo and Latin Hypercube to generate 100 samples for a normal distribution N(0,1).

n = 100
MC_samples = np.random.normal(0,1,100) #MC sampling

#latin hypercube 
#define intervals
min_prob = 0.0001 
max_prob = 0.9999
lower_limit= np.linspace(min_prob,max_prob,n+1)[:n]
upper_limit = np.linspace(min_prob,max_prob,n+1)[1:n+1]
#generate samples from each interval
sample = np.random.uniform(low = lower_limit,high = upper_limit,size=[1,n])

#project points into cumulative density function
from scipy.stats import norm

cdf_x = np.linspace(norm.ppf(min_prob),norm.ppf(max_prob), 100)
cdf = norm.cdf(np.linspace(norm.ppf(min_prob),norm.ppf(max_prob),100))
inv_cdf = interpolate.interp1d(cdf,cdf_x)
Latin_samples = inv_cdf(sample)[0]

t_pdf = norm.pdf(cdf_x) #true pdf

plt.plot(np.linspace(norm.ppf(min_prob),norm.ppf(max_prob),100),t_pdf,label='True PDF')
sns.kdeplot(MC_samples,label='MC') #kernel density estimate
sns.kdeplot(Latin_samples,label='LHS') #kernel density estimate

After the sampling process, each 100 samples were used to generate its PDF using the Gaussian kernel density estimate function. It is evident that with 100 samples, the LHS method fits better to the true PDF.

We can then use LHS to sample 14C age:

# Define a function to do Monte Carlo Sampling

def LHS_sample_pdf(pdf_x,pdf_y,n_samples=50):
    from scipy import interpolate
    pdf_x = np.array(pdf_x) 
    pdf_y = np.array(pdf_y)
    bins = np.mean(np.diff(pdf_x))
    cum_values = np.cumsum(pdf_y*bins)
    inv_cdf = interpolate.interp1d(cum_values, pdf_x)
    lower_limit= np.arange(0,n_samples)/n_samples
    upper_limit = np.arange(1,n_samples+1)/n_samples
    points = np.random.uniform(low = lower_limit,high = upper_limit,size=[1,n_samples]).T
    return inv_cdf(points).T[0]

LHS_sample = LHS_sample_pdf(pdf_x,pdf_y,50)

plt.xlabel('Calibrated Radiocarbon Age (BP)')

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s