PUMAs - Public Use Microdata Areas of the Census

2017 April 8
by Daniel Lakeland

The Census ACS and similar microdata datasets use PUMAs which are sort of regions that contain about 100k people. In 2010 the PUMAs were revised to be simpler, and completely contained within a state. But I gotta say, the PUMA is the wrong way to do things.

For the ACS and similar microdata, each household record should have a Latitude and Longitude associated. These values should be equal to the actual Lat/Long of the actual household plus a random uniformly distributed perturbation. The record should give the size of this perturbation in each dimension, and the size should be determined by the local population density such that the patch contains approximately 100k people. The uniform distribution should be used because it's bounded so you can be sure you're never more than a certain distance away from the true location.

The record still has the State, so if you go across a state boundary you'd have some issues which I'm sure the census is capable of working out, but other than that, it makes way more sense to think of things in terms of fuzzing out the location using a continuous perturbation than the complexity of redrawing boundary lines every few years and completely re-doing your identification scheme and screwing up any chance you had of working with spatial-timeseries models...

Just sayin.


No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS