More

# Spatial sampling design

I have some plots of land spread across the landscape. Sometimes a parcel is subdivided in different sectors. For each parcel (and, in case of subdivision, for each sub-section) the quality of agricultural yield is known - binary coded as good or bad.

With the ultimate goal to assess what topographic/environmental IV (say, slope, elevation, distance from water reservoirs, etc) may have influenced the binary DV (by using Logistic Regression), I am wondering what could be a sound sampling strategy.

The starting point would be to draw a number of points within the parcels, then for each of them record the value of the DV (whether the quality is optimal or not) and the values of the IVs.

Questions:

1. I am unsure which is (more) correct: (a) to draw equally-spaced points or (b) to draw random distributed points?
2. In either approach, since the total area of good quality land is not equal to the total area of bad land, should the number of (random?) points be equal across the two qualities, or proportional to the size of each area?

I don't know that either would be 'more correct'. I would probably lean toward random points, since any regular spacing might introduce bias based on the pattern.

You do not want an equal number of points. See http://en.wikipedia.org/wiki/Modifiable_areal_unit_problem Three samples of a large variable polygon might fall on values the same as a uniform small polygon, yet 80% of the larger polygon could be a completely different value. The number of sample points should be proportional to the area covered to ensure an accurate representation of variability.