optimization
In [1]:
from datetime import datetime as dt

print('Last accessed on: {}'.format(dt.now()))
Last accessed on: 2016-12-19 00:28:44.890000
In [2]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
Out[2]:

Optimization

Optimization consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. With this definition, it is a wide field but equally diverse applications.

In this notebook, we will take the two most common optimization methods: Nelder-Mead and Levenberg-Marquardt. Although these methods can optimize general functions, we will restrict ourselves to sinusoidal input.

In [3]:
from matplotlib import pyplot as plt
import numpy as np
%pylab inline
Populating the interactive namespace from numpy and matplotlib

Let's make a sine function and plot it given parameters.

In [4]:
true = [1, 1, 0]

x = np.linspace(0, 10, 1000)

def simple_sine(theta, x):
    a, b, c = theta
    return a * np.sin(b * x + c)

plt.plot(x, simple_sine(true, x))
plt.ylim([-4,4]);

Artificially add noise to represent natural data. Superpose noisy data to perfect sine curve.

In [5]:
y_noisy = simple_sine(true, x) + np.random.randn(x.size)

plt.plot(x, y_noisy, 'o')
plt.plot(x, simple_sine(true, x),'r-', lw=3)
plt.ylim([-4,4]);

Let's see the histogram of residuals (i.e. true - observed).

In [6]:
#y = simple_sine(x, params);
res = y_noisy - simple_sine(true, x);
plt.hist(res, bins=30);

The histogram is fairly normally distributed, i.e. bell-shaped.

Nelder-Mead Method

Let's make a general optimization model called Nelder-Mead.

In [7]:
import scipy.optimize as opt
opt.minimize?
In [8]:
# Nelder-Mead <optimize.minimize-neldermead> uses the Simplex algorithm. 
# This algorithm is robust in many applications. However, if numerical 
# computation of derivative can be trusted, other algorithms using the 
# first and/or second derivatives information might be preferred for 
# their better performance in general.

The minimization algorithms in scipy.optimize.minimize requires a theoretical function as an input called objective below. To get the minimum, the values that yield smallest squared error between the theoretical function and the data are computed.

In [9]:
def objective(theta, xi, yi):
    model = simple_sine(theta, xi)
    return np.sum((model - yi)**2) #res**2

Let's see the squared error of the objective given our noisy data.

In [10]:
objective(true, x, y_noisy)
Out[10]:
1059.6891964823992

This is just a number. But we can use this as the minimum measure of our error. We can use scipy.optimize.minimize to check all combinations of parameters theta (tuple) that yield closest to the value above.

In [11]:
import scipy.optimize as opt

init_guess = [0.5] * 3  #[0.5,0.5,0.5]
optimize = opt.minimize(objective, init_guess, args=(x,y_noisy), method='nelder-mead')
In [12]:
optimize
Out[12]:
 final_simplex: (array([[ 1.02389139,  1.0010725 ,  0.02068413],
       [ 1.02387403,  1.00106853,  0.02066714],
       [ 1.0238398 ,  1.00106392,  0.02072249],
       [ 1.02392367,  1.00106232,  0.02075102]]), array([ 1059.03278596,  1059.03278631,  1059.03278642,  1059.03278685]))
           fun: 1059.0327859578683
       message: 'Optimization terminated successfully.'
          nfev: 136
           nit: 74
        status: 0
       success: True
             x: array([ 1.02389139,  1.0010725 ,  0.02068413])

optimize has a lot of outputs but we only need the 'x' in the results. It gives the best estimate to theta or true (tuple).

In [13]:
for i in optimize.x: #x is the result of opt.minimize
    print('Success={}'.format(optimize.success))
    print("parameter optimum: {}".format(i))
Success=True
parameter optimum: 1.0238913853
Success=True
parameter optimum: 1.00107250403
Success=True
parameter optimum: 0.0206841300478

Let's see if the squared error is indeed close to ~980 computed earlier.

In [14]:
objective(optimize.x,x, y_noisy)
Out[14]:
1059.0327859578683

Let's see the % error

In [15]:
(1 - objective(optimize.x,x, y_noisy)/objective(true,x, y_noisy))*100
Out[15]:
0.061943683743292954

Let's see the difference between the results (computed optimum values) and the true values.

In [16]:
(true-optimize.x) #difference from true values
Out[16]:
array([-0.02389139, -0.0010725 , -0.02068413])

Levenberg-Marquardt Algorithm

The next optimization model we will discuss is also known as LMA.

We will use scipy.optimize.curve_fit to find the optimum values (called p_opt and p_cov) which yields minimum squared error between the theoretical function (h: simple_sine) and the data (x, y_noisy). Note that p_opt is a tuple (best estimate for true=[1,1,0]=a,b,c) and p_cov is the covariance from which variance and standard deviation can be computed.

In [17]:
opt.curve_fit?
In [18]:
#create a function identical to simple_sine
#mapping/copying of simple_sine to h
h = lambda x,a,b,c: a*np.sin(b*x+c) 

p_opt, p_cov = opt.curve_fit(h, x, y_noisy, p0=init_guess)
var = np.diag(p_cov)
std = np.sqrt(np.diag(p_cov)) #a.k.a. sigma

for i,j in zip(p_opt, std):
    print ("parameter optimum: {} +/- {}".format(i, j))

#above is similar to
#for i in range(len(p_opt)):
#    print("parameter optimum: {} +/- {}".format(p_opt[i], p_std))
parameter optimum: -17.1692401204 +/- 393322.752729
parameter optimum: -0.00117550908862 +/- 27.540873208
parameter optimum: 3.1566168848 +/- 351.989690551
In [19]:
p_opt #is a tuple
Out[19]:
array([ -1.71692401e+01,  -1.17550909e-03,   3.15661688e+00])
In [20]:
plt.plot(x, y_noisy, 'o')
plt.plot(x, h(x,*p_opt), 'r-', lw=3); # *mu; asterisk in variable is required for lambda func 

Not quite as robust to bad initial guesses, apparently. So let's try initial guess that is close to true values. This is quite difficult to implement in real life situation however.

In [21]:
better_guess = 0.9, 0.9, 0.1

p_opt, p_cov = opt.curve_fit(h, x, y_noisy, p0=better_guess)
std = np.sqrt(np.diag(p_cov))

plt.plot(x, y_noisy, 'o')
plt.plot(x, h(x, *p_opt),'r-', lw=3);
In [22]:
p_opt
Out[22]:
array([ 1.02388082,  1.00106603,  0.02070902])
In [23]:
std
Out[23]:
array([ 0.04745942,  0.01449838,  0.08765835])
In [24]:
# mu, sigma = 0, 0.1 #which among the 3 values of std?
# count, bins, ignore = plt.hist(h(x, *p_opt)-y_noisy, 30, normed=True)
# #superpose the probability density function (pdf)
# plt.plot(bins, 1/(sigma*np.sqrt(2*np.pi))*np.exp(- (bins - mu)**2/ (2*sigma**2)), lw=2, c='r')
# plt.show()
In [25]:
plt.plot(x, y_noisy, 'ko', ms=2)
for i in range(100):
    theta = p_opt + np.random.randn(p_opt.size) * std
    plt.plot(x, h(x, *theta),'r-', lw=3, alpha=0.1)
In [ ]: