Analysis of field trip data - accuracy assessment

Today we will work on comparing data we collected in the field to the cover type maps we produced from LandSat data (unsupervised classification map) or downloaded (landuse_2016_simple.shp).

There are a lot of steps needed to get this task done, but the task is conceptually simple. The basic road map for the day is:

There are multiple steps involved for each of these, which are given below.

Import GPS points into ArcMap

Calculating positions of points at a distance from the observer

The data from our field outing is in this file. Download it into your S: drive.

If you open the file in Excel, you'll see that there are several columns:

I did some checking, and a few points had data entry errors that placed them in wildly incorrect locations - those have been dropped from the data set. We have 45 points that were accurate enough to use.

Before we can plot the points on the map, we need to calculate the locations from the GPS locations, distances, and directions.

Location from dist. and direction.


In the illustration to the left, we have a GPS location of 492570 E, 3656770 N, and from that point we are recording the rock outcrop that is 56.57 m away, at 45°. We need to calculate the number (x) to add to our known east coordinate, and the number (y) to add to our known north coordinate to get the coordinate of the rock outcrop. You can see that the red line with our recorded distance forms a right triangle with lines running due north and due east from the point (in blue), so we can use trigonometric functions to figure out what we want to know.

Remember from your trigonometry class that the sin of an angle (A) is the opposite side of a right triangle divided by the hypotenuse - the hypotenuse is the red line with our measured distance, which we'll call D, so:

sin(A) = x/D

Solving for x we get:

x = D sin(A) = 56.57 sin(45) = 40.

Likewise, the cos of A is the adjacent side over the hypotenuse, so:

cos(A) = y/D

y = D cos(A) = 56.57 cos(45) = 40

Once we have these x and y values, we just add x to East and y to North to get our new coordinates - the new coordinates are thus E + D sin (A), and N + D cos (A). You can see from the UTM grid that the rock pile is indeed at 492610 E, 3656810 N, which is 40 m north and 40 m east of the original point.

To do these calculations on the data in Excel:

Some points were recorded by standing at the feature and recording its location, and didn't require any displacements to be added to them. We are dealing with this by using a 0 distance and direction for those points - that way we can enter the calculation once, copy/paste for all the other rows, and get the correct result regardless of whether the point is at a distance from the recorded GPS coordinate or not.

Converting field descriptions to cover types

When we overlay these points with cover type maps, we will only want to use the points that recorded a cover type, and not the ones that recorded the un-mappable habitat features. We will use a column in Excel that we can filter on later, so that we can just focus on the cover type points.

We will also want to use the same cover type categories as are in our land use vector maps, and in your unsupervised classification cover type maps. There are two different systems that we used, so we'll need a column for each:

As you do this, note that spelling counts - if you use different labels for the same thing it will make your work harder later. Once you enter a cover type of land use category once, copy it and paste it wherever else you need it to ensure that you enter things the same way every time.

Save the Excel file onto your S: drive, and quit Excel. MAKE SURE YOU QUIT EXCEL or ArcMap will refuse to open the file.

Importing GPS data into ArcMap

Now, to import the GPS data from the Excel file into a point layer. Start up ArcMap with a blank map, and do the following:

If you switch to "List by Drawing Order" mode in the table of contents, you can drag the points above the imagery layer. You can also label the points by their descriptions - double-click on "field_points", and switch to the "Labels" tab. Check the box to "Label features in this layer", and set the "Text string" to "Label Field:" of "Description". Set the color of the label test to a nice bright yellow, and click "OK". You'll new see the labels for each point on the map.

You can right-click and remove "FieldPts$Events", and "FieldPts$", you don't need them now that you have a map layer of that data.

Now that we have field points and their attributes, we need to overlay them with our cover type maps to see how well the cover type maps reflect what we find on the ground.

Overlay GPS points with cover type maps

Add the landuse_2016_simple.shp file from the lab7 folder, and your unsupervised classification map that you re-classified into our nine cover types (which should be called "cover_spring11" in your biol420.mdb file). You can also add the "World Imagery.lyr" file as background.

Extract cover types from your unsupervised classification cover type map

Now we can extract cover type categories from the cover_spring11 map, using the field points to identify which pixels to record - for each point, we will get a row in an output table that records the pixel value it touches.

The Sample tool makes a table, but does not make a new point layer, so nothing new will be on the map. You will see a table called "field_unsup" in your table of contents, though. If you open the field_unsup, you'll see there is a column called "FIELD_POINTS" that gives the point number from the field_points layer, and a column called "SP11_COVER" that gives the cover types. We need to add these columns back into the field_points file, which we can do by "joining" the tables.

Joining database tables is done based on matching entries in each table. Once they are joined, the columns from both tables can be displayed and used as though they were a single table. To join "field_unsup" to the attribute table of "field_points", do the following:

If you right-click on field_points and open the attribute table, you'll see that there are now columns at the end that come from field_unsup, including the one that gives the cover type code from SP11_COVER.

Extract land uses from landuse_2016_simple

Joins don't permanently join columns to an attribute table, they just link files for use temporarily. But, the join does persist if we use the joined file in an operation that makes a new output file. We'll take advantage of that now to overlay field_points (and its joined field_unsup table) with landuse_2016_simple to make a new point layer with an attribute table that has the cover types we observed in the field (from field_points), the cover types in the cover_sp11 map (from the joined field_unsup table), and from the land use map for 2016 (from landuse_2016_simple). That attribute table will have everything we need to compare what we observed in the field to either of the maps.

To do the overlay:

You will get a new point layer added, field_vs_maps, in your table of contents. If you right-click and open the attribute table, you'll see that you now have the LU_current field appended to the end.

While you have the attribute table open, we can simplify the next step if we "turn off" columns that we don't need for making our confusion matrices. For example, we don't need to know the feature ID numbers from various files we used along the way - position your pointer over the column label "FID_field_points_field_unsup", right-click, and select "Turn field off". This hides the field from view, and will also prevent it from being exported. You can repeat this with every column except:

If you make a mistake and turn off the wrong thing, you'll need to drop down the menu in the upper-left corner of the Table window and select "Turn All Fields On" and start again.

We are going to convert the field_vs_maps file to an Excel spreadsheet so we can do our confusion matrix calculations using PivotTables.

By the way, you may have noticed we worked a little differently today than in previous labs - so far we have only used ArcGIS table file format or have generated point layers within our biol420.mdb database, but have not used the dbf file format (either as an output table in Sample, or indirectly by creating a shapefile as an output layer in our intersection operation - remember, shapefiles use dbf as the format for attribute tables). We needed to avoid dbf files because our field names are long and descriptive, but the dbf format only allows 8 characters for field names. If we used a dbf file as an output table, or used a shapefile as an output point layer format, our field names would all have been changed to something that fits in 8 characters, no matter how unintelligible the new name is. If we go straight form an ArcGIS table to an Excel file, all the field names will be preserved.

Now, to export the field_vs_maps attribute table directly to an Excel spreadsheet file, do the following:

Make a "confusion matrix" that compares what we observed in the field to what the map says would be there

Open field_vs_maps.xls in Excel. You'll see that the only columns in the spreadsheet are the ones we left turned on in the attribute table for field_vs_maps.

We are going to make a confusion matrix for each of the maps - we'll start with the unsupervised classification map.

Make a new worksheet for your results - click on the plus in the circle to make a new worksheet. Re-name the sheet to "Confusion". Switch back to your pivot table in Sheet1.

You can drop down the list in cell B1 and just show "Cover type" rows. Copy the cells in your pivot table (starting with cell A4, and including enough cells to get the row and column totals), and paste-special as values into cell A3 in your new worksheet. In cell A1 type "Confusion matrix - unsupervised classification". You can now replace the row labels with the cover type names as you assigned them (you should have saved an Excel file that gave the numbers you assigned, and the cover type names associated with them).

Now we'll do the land use categories from landuse_2016_simple.shp:

You should now have a confusion matrix showing the mapped land uses in landuse_2016_simple.shp (rows) compared with what we saw in the field (columns). Copy and paste-special to cell A20 of the Confusion sheet, and label cell A18 "Confusion matrix - land use".

MAKE SURE YOUR ROW AND COLUMN LABELS ARE THE SAME ON YOUR CONFUSION MATRIX

It's possible that there might be some cover types on your map that we didn't see in the field, or cover types we saw in the field that weren't mapped. The accuracy statistics instructions assume that you have the same set of cover types in your rows and columns, and if that isn't true make it so before you move on.

So, for example, if you have something like this...

...add blank columns for cover types that are in the rows but not the columns...
Added columns
...and blank rows for cover types that are in the columns but not the rows. The labels should match when you're done
Added rows
As a last step before going on to the accuracy statistics, update the row and column totals. In this example, to sum the Agriculture row, in cell B10 I would enter =sum(b2:b9), and to update the Agriculture row I would enter =sum(a2:i2). These could then be copied and pasted to the rest of the rows and columns.

Accuracy statistics

Now to calculate the accuracy statistics for the maps. For each of your confusion matrices, make sure the rows and columns are sorted in the same order so that the matches are on the main diagonal. Then do the following:

Do this for each confusion matrix. It's possible that students will have different numbers of rows and columns, depending on how they assigned the point descriptions to cover types, so I can't give you specific instructions for which cells to use, or what the formulas will be. Try coming up with the formulas yourself, and I'll help if you run into trouble.

Look over the mis-classifications for each map - look down the columns to see which cover types were hard to map accurately, and to see what cover types were incorrectly assigned to each cover type. Look across the rows to see what the mapped cover types proved to be when we checked in the field - what mistakes did each method make? And, think about the mis-classifications in terms of two possible causes: 1) changes over time, such that the map was accurate when it was made but it now no longer reflects conditions on the ground, and 2) mapping errors. It may not always be obvious, but a point that was mapped as developed land and found in the field to be undeveloped is probably a mapping error - it is very unlikely that houses were converted to undeveloped land. The opposite (land that is mapped as undeveloped, but was found to be developed) is more likely to be an actual change on the ground - we would have to find out when the development was done to be certain, but it's much more plausible from first principles.

That's it! Save your confusion matrix spreadsheet, you'll be using it in your project write-ups.