Today you will learn how to fit non-linear functions to data using least squares in order to obtain estimates of the function's coefficients.


InstrumentWe will use the same kind of photosynthesis data used as an example in lecture. Net photosynthesis can be measured at varying light levels by a portable photosynthesis system like the one pictured to the left. A leaf is placed in the machine's chamber, and light levels are set by the machine. The rate of depletion of CO2 from the chamber is a measure of carbon fixation through photosynthesis, so the data produced by the system is a set of light levels and photosynthesis rates. Thus, both light level and net photosynthesis are known quantities.

CurveLight level is set by the machine, making it the independent variable (x-axis), and net photosynthesis is measured in response (y-axis). However, the shape of the response between these two variables is not a straight line, as you can see from the graph of a typical photosynthesis curve shown to the left. Photosynthesis rises quickly at very low light levels because the leaf can't photosynthesize at all without it. Photosynthesis asymptotically approach a maximum when the photosynthetically active tissue is fixing carbon as quickly as it possibly can. Photosynthesis is thus limited by light availability at low light levels, which causes the curve to drop rapidly below a value of about 500.

To better understand why the curve is shaped this way a model of how plants convert light into fixed carbon has been developed:

Equation

You'll see that two of the terms in the equation are data values:

The other three terms are biophysical properties of the leaf that allow it to convert light energy into fixed carbon:

We will fit data taken from leaves at the bottom of a tree to this curve today.

Fit the curve to the data to obtain estimates of Φ, θ, and Pmarea

Step 1 - Download this file, and save it to OneDrive.

Or wherever you're saving files for this class, that is.

You'll see that you have two worksheets, one with data from a leaf at the bottom of a tree that has only light level (column A, which is Q in the photosynthesis equation) and net photosynthesis (column B, Pnet in the photosynthesis equation). The other has data from a leaf at the top of the same tree, which we won't use until part 2.

Step 2 - Set up the "Bottom" sheet for curve fitting.

In cell A15 type "Parameter", and in B15 type "Values".

Select A16, switch to the "Insert" tab, and click on "Symbol". On the Symbol selector that pops up, select "Greek and Coptic" as the "Subset", and find capital Phi - select it, and click "Insert".

Select cell A17 and type "Pmarea".

Select cell A18 and insert lower case theta, using the "Insert" "Symbol" method you used for Phi.

We will need to provide starting values for these parameters, but for now just type in 1's for each parameter (1 for B16, B17, and B18).

We will be entering the rather complicated equation, above, as a cell formula next, so to make this easier we are going to assign names to each of the parameters we can refer to in the formula.

We can now use these descriptive names instead of cell references in our formulas, which will make it easier to spot mistakes. Named cells are absolute references, so we won't need to worry about using $ in our formulas either.

Step 3. Calculate predicted Pnet.

In cell C1 type "Pnet predicted". We will want to enter the formula that predicts Pnet in C2, but the equation is pretty hairy - hard to translate into a cell formula, and easy to make entry errors while doing it. We can make this easier by breaking the equation down into parts that are easy to enter, and then making the final calculation at the end using the parts.

I've color coded the parts for you here:

Color coded

If we have a cell that calculates each of these four color coded parts of the calculation then the final calculation will just be:

Simplified

That's not so hard, right?

Let's do these one step at a time...

Now we can put them all together

Okay, that got us the first Pnet calculated, but we need it for every light level, and we don't want to have to do an A, B, C, and D calculation for every row in the data. Now that we have the pieces entered correctly, though, we can assemble them into a single cell in C2. The process we'll use is to set each of the A, B, C, and D cells into edit mode one at a time, copy the formula inside of them, and then paste them into C2 to assemble the full formula in all its glory. The steps you'll complete look like this:

First step

Note that to make the animated image easier to see I'm hiding some rows - row 10 through 13 and row 19 through 25 are hidden from view, your sheet will have some blank rows showing in those ranges.

The steps are:

Now we'll get the B component into the formula - to do this we'll need to copy what we have entered for B, then subtract the square root of B from what we just entered for A, like so:

Step 2

The steps are:

Next we need to subtract C from B inside of the sqrt():

Step 3

The steps are:

And finally, we need to divide everything currently in C2 by the D part of the formula

Step 4

The steps are:

You final formula will look like this:

Formula

Hopefully you got all the pieces assembled correctly, but make any changes needed to make the cell in C2 match this formula.

Copy and paste the formula from C2 to the rest of the data rows, C3 to C9. The predicted Pnet values should be 1 for the rest of the cells.

Step 4. Graph the data.

At this point we want to pick better starting values for our parameters so that the predicted values come as close as we can get them to the observed values before starting Solver - the closer we are to the correct values when we start the more likely that Excel will find the correct values for us.

Select cells A1 through C9, and insert a scatter plot with only markers. Then, select the predicted Pnet series on the graph, right-click, and select "Format data series" from the popup menu. In the "Fill and Line" options (the paint bucket), choose "Solid line" as the line style, and then switch to "Marker" and use in "Marker options" use "None" - this will make the predicted values show only as a line on the graph.

Add axis labels - use "Light" as the x-axis label, and "Pnet" as the y-axis label.

Your graph should look like this.

Now we can use the graph to help us pick some better starting parameters.

Once you have the line as close to the data as you can get it, you can move on to the next step.

Step 5 - Calculate the sum of squared deviations between Pnet and predicted Pnet.

We can calculate the sum of squared differences between observed Pnet and predicted Pnet in a single cell if we use an array formula. In cell B11 type "SumSq", and in cell C11 type the array formula:

=sum((b2:b9-c2:c9)^2)

Don't forget to CTRL-SHIFT-ENTER. You should see curly braces around the formula. If you used my suggested starting values this will be a small number, 0.206831 (we're starting pretty close to the final solution).

Step 6 - Use Solver to find the parameter estimates.

Turn on the Solver if it isn't already present in the Data tab. Click on Solver, and have it set cell C11 (the sum of squared deviations) to Min by changing cells B16:B18 (the parameters). Click "Solve".

You'll see that the parameters are now a little different than before, and the curve goes nicely through the data in the graph.

If you have trouble getting Solver to give you the right solutions bear in mind that these non-linear fits are a little fussy... it's really important to start as close to the final solution as possible. Try using 0.01 for phi and 0.6 for theta and run Solver again.

Now is a good time to save your work!

Assignment

That's all for today - save your Excel file, we'll use it next time to get the standard errors.