KCLCCHMinor programmeAV1000Numerical and graphical analysis


AV1000
Fundamentals of the digital humanities
Pollution in Hackney 1998

  1. Statement of the problem
  2. Analysis
  3. Argument

I. Statement of the problem

Graphing nitrogen oxide in London Hackney during 1998, for example, produces an intriguing pattern that leads one to suspect a direct relationship between the seasonal behaviour of Londoners and pollution levels:

Nitrogen oxide in Hackney
1998

Notice the flat inverted U-shape, peaking at the beginning and end of the year, with the trough more or less directly in the middle and the sloping off at the end, i.e. in the approach to Christmas. An immediately tempting hypothesis is that during the cold weather, when schools are in session and most people at work, Londoners tend to use their automobiles more and so produce more pollution—Christmas, of course, excepted. (Therefore putting them all on the Underground would solve a very serious problem for Londoners themselves and their buildings.) Is this so, however?

Let us not ignore the fact that the sloping off at the end of the year begins to occur much earlier than most working people start their holidays. For the moment, however, let's work with the gross pattern just noted.

The research problem posed here is to test the hypothetical link between pollution and automobile traffic graphically, by comparing the average levels of all 5 pollutants measured by the Air Quality Information Archive for 1998. Do they all match this shape? If not, which ones deviate, and why? Is the sloping off at year's end a clue to a better, more inclusive hypothesis?

II. Analysis

Acquiring the data

To investigate this problem, you obviously need the 1998 data for the pollutants. These data are provided for you in a single Excel workbook containing the following 5 sheets:

  1. carbon monoxide (CO)
  2. nitric oxide (NO)
  3. nitrogen dioxide (NO2)
  4. oxides of nitrogen (NOs)
  5. ozone

You will first need to save a copy of the workbook. Click here to get the file.

Producing the chart

Begin by thinking through what it is that you need in order to see the relationships among the 6 pollutant levels and how to achieve it.

  1. The objective in mind is to generate a single chart that will demonstrate whether the shapes of all the lines, each representing the levels across time of a single pollutant, match approximately. (Small deviations are of no consequence; a successful chart will show the answer immediately.) This objective implies two things about all the "data series" (one per pollutant):

    1. they are on a single sheet;
    2. they have approximately the same range of magnitudes or "scale"
  2. Constructing a single sheet with a representative data series from each of the sheets you have been given is not at all difficult. Consider that since you are after a rough, quick answer to the question, probably all you need to see is the average level for each pollutant over time, given at the extreme right-hand of its chart. Thus what you need to do is to cut the Averages column of data from each of the sheets in the workbook supplied and paste it into the new sheet; the column of dates is most conveniently taken from the first of these sheets. (You may need to return to the discussion on Copying, insering, deleting, moving for a review.) Then you need to alter the heading at the top of each pasted column to remind you which pollutant it represents. When you finish constructing your new sheet, it should look something like the one here:
  3. Achieving roughly equal scales is more of an impediment. If you glance briefly at the values in the sheet above you will immediately see that the ranges of magnitude vary considerably: CO tends to be in the range of .5 to 2.5 ppb; nitric oxide from 0 to more than 200 ppb; nitrogen dioxide 1 to 100; and so forth. This variation means that the overall shapes will be more difficult to compare than they should be. Note here, in the slice from an uncorrected chart, what you would get if you did not do anything about the problem. The blue and purplish-red lines are simple enough to compare, the yellow slightly difficult, but the others are simply too flat.

    Thus what you need to do is to amplify all lines with less range of magnitude than the one with the maximum. Note that although such an operation will change the data, and so falsify them, it will not change the relative shapes of the graphs. All we are interested in here is the shapes. Thus by altering the data we reveal a truth about them that is otherwise hidden.

    The essential idea is that if you multiply all the values of a series by the same factor you will scale up the series without distorting it. The problem, then, is to determine what the factor should be for each series. This is done by calculating for the average value of each series the ratio of the maximum average to it. Applying the ratio for a data series to each of its component values then scales the series appropriately.

    To get these ratios, do the following:

    1. At the bottom of each column of data in your newly created chart, in row 368, enter the formula to calculate the average value, rounded off to one decimal place, e.g. =ROUND(AVERAGE(B4:B367),1). Remember, you only need type it in once, then propagate the formula. Notice again the large variation in magnitude, from the smallest value to the largest. The ratio of the largest value to each of the smaller ones will give you the multiplying factor you need—we will return to this ratio shortly.
    2. Insert a new column to the right of each existing column of data into which to put the new scaled data. After inserting the new columns, your existing data should be in columns B, D, F, H, J, with empty columns C, E, G, I and K on.
    3. Now enter into the row following your row of averages a formula to calculate the ratio described above, namely =ROUND($H368/B368,2), and copy the formula to the cells beneath each of the columns with data in them. (If you do not understand why the dollar sign appears in the immediately previous formula, review the idea of referencing.)
    4. To save scrolling somewhat, begin at the bottom of the chart by putting into the cell in column C opposite 31/12/98 the first scaling formula. Think about this: it should multiply each of the values for CO in column B by the ratio of the maximum average to the average for CO, then round off the whole to one decimal place. Since you know what the averages are, you could just type in these numbers, but it is better to use the absolute addresses of the respective cells. (Do you understand why? If not, ask!) So, your formula should look something like this: =ROUND($B$369*B367,1). Study this a moment to make sure you understand it.
    5. Propagate the above formula into all the cells of column C up to the first data value. Repeat the same operation for columns E, G, I, K. For each column make sure the absolute address of the average is correct. Once you have completed this step, your spreadsheet is finished. The bottom of your sheet should look something like this (this image is from a spreadsheet for the previous year):

      All that remains is to make the chart. Before doing that, however, save the new sheet as LHAV1998.XLS.
    6. Now select the data. This is not difficult but requires care. Start with column A. Hold down the control key, click on the first data cell (at the bottom of the sheet or the top, whichever is easier), then drag the cursor (up or down, depending on where you start) until every relevant cell is highlighted. Keep the control key pressed, click on the first cell of column C, drag the cursor to the last cell, and go on in identical fashion to highlight the remaining columns E, G, I, K. When you conclude you should have highlighted only the relevant cells (header rows and all data rows but not the calculated averages) in columns A, C, E, G, I, K.
    7. Now start the chart wizard. In step 1 of 4 specify a Line chart. Step 2 of 4 should specify the cells as follows:
      =$A$1:$A$367,$C$1:$C$367,$E$1:$E$367,$G$1:$G$367,$I$1:$I$367,$K$1:$K$367
      If you don't see exactly this (making allowances for the fact that the number 367 may be slightly different in your case), then correct the specification using the mouse cursor, NOT the cursor keys. In addition, click on the Series tab, then one by one enter the name of the pollutant for each series in the Name box. In step 3 of 4 enter a chart title and titles for the axes. In step 4 simply click on Finish.

Analysing the result

  1. Once you have your chart, save the spreadsheet file under its existing name, which will save the newly produced on-sheet chart as well. Then stretch the right hand side of the chart out, say approximately to column AZ or so—until you can clearly see the separation of the various lines; adjust the height as well. It should then look something like this section from the beginning:
  2. Now scan across the length of the chart looking for any anomalous patterns, i.e. any consistent difference in the behaviour of one or more of the pollutants from the others, anything that stands out and cannot simply be explained as an artefact of the equipment, fluke in the weather or the like because it keeps happening across time.
  3. For any consistent difference you spot, determine how consistent it is—exactly or roughly? You can expect that at best it will be roughly consistent because there are likely to be many factors influencing the readings. From your real-world knowledge, what might some of these factors be? (Writing down a list of them will prove helpful.)
  4. Look at the characteristics of the pollutants described in the Air Quality Information Archive in the section on "The Chemistry of Atmospheric Pollutants". Is the anomalous pattern completely explained by these characteristics? If not completely, then look to your list of other factors; what else might you need to know to evaluate these? (Again, making notes will prove useful.)

III. Argument

The pattern with an example, biochemical explanation (from the Archive) and idea of what else you might need to investigate form the rudiments of an argument: here, you can say, is something interesting about the pollutants in the atmosphere and this is the direction of further research.

The following is what you need to do actually to make this argument.

  1. Find a telling example. With the chart stretched out horizontally as required, scan along the length of it for a particularly illustrative case of the pattern you have identified. Make sure you know which line belongs to which pollutant; adjust the colours of the lines if required to make them visually distinct. (Click directly on the line whose colour you wish to change, then select a new colour for it.)
  2. Capture the image. Use the screen-capture function in PaintShop Pro to select the relevant portion of what you see on screen and save it as an image-file.

  3. Sketch out the steps of the argument:

  4. Write out the argument in the form of a Web-page, using the image produced with Paintshop Pro as illustration. Consider including one or more links to the Air Quality Information Archive. What else might be helpful to include?

revised January 2008