G'day all,

Please don't bother to respond to this post Mr Morgan, it will be ignored unless it actually has some science in it.

I'd be interested in any views as to sampling techniques for a data set of 7,000 odd weather stations. My suggestion is to start with all weather station data, eliminate all stations within a set distance of a population centre, and then divide the remainder into region, giving weight to the region based on its area in relation to the total earth's surface. I think that the raw data averages should also be included, not weighted by region, for transperancy. I also think that a comparison average including urban effect stations should be included in any set made.

It is interesting that the averages of all stations in the Southern Hemisphere, do not show any trend at all. I'd like to divide that up further into continents at least.

Actually, there are a few areas of the earth using the raw data, that show strange trends. The total US stations, including all cities show a cooling trend. I'm still trying to work out how that can be rationalised since urban effect has to be pretty pronounced in the US data.

The antarctic is also getting slightly colder but of course this is not a very big sample, being a limited number of stations only dating from the 50s. The southern hemisphere obviously has less stations that the northern hemisphere but it also has some very accurate data, including as it does Australia, New Zealand and, yes, even parts of Africa that have kept good records.

Averaging all of a region, obviously isn't random sampling at all. Excluding urban areas still isn't sampling but it does leave some large areas of the earth with no data or very few locations.

Any suggestions for different sets and ranges would be appreciated. I'd like to include a wide variety of methods and certainly do not want to be accused of bias because of some set that someone thinks should have been included but I didn't think of or has not been done in other analysis.

The biggest problem comes down to what to do with missing data. My thought is to assume that the missing data is actually random in nature and to average each site used providing the missing data is not extensive. So if there are 8 years missing in 90 years, simply average the 82 years without attempting to "fill in" the gaps. Anyone see any flaw in that method. I have thought that I'd have to present a table of the missing years and how often they are missing overall. Obviously if the 1920s are missing in a region far more frequently than any other period that could distort the average and anyone looking at the data should be able to easily look up how often particular periods were missing from the data.

I believe this is the area where previous averages have failed or not been clear. For instance, the NASA system, does fill in the blanks by averaging the surrounding stations for the missing years and adding them in. It also "calculates" urban effect by obtaining the deviation of an urban effect area from the closest none urban effect stations based on night light emissions and subtracting that deviation from the urban effect stations. The potential for manipulation of the data or inadvertant bias is very large indeed by using formulas such as this and I wish to eliminate any such need for modification of the data other than a simple decision to exclude or include the station depending on rigid selection criteria.

It would be easier to confine the data to only those stations that were not missing more than say a couple of years here and there but if you want to go back more than 80 years, that cuts out over 97% of the stations. Less than 300 stations is not nearly enough to work with, especially if sub-sets such as non urban effect stations are also be be analysed.

It is far easier to consider this if you look at a few station's data and start to see just what the difficulties are. And of course, this is becomes a much more difficult task when the raw daily data has to be obtained, after selection criteria is established.


Regards


Richard


Sane=fits in. Unreasonable=world needs to fit to him. All Progress requires unreasonableness