# Hurricanes vs Himmicanes: a little modeling goes a long way

Many people in the blogosphere have been commenting on the paper in PNAS about male vs female hurricanes. Thanks to Bob O’Hara who posted some code doing his own analysis, I decided to grab the data and do a little modeling myself. My main concern was how I was seeing a lot of statistical analysis, but very little modeling involving causal and physical factors. The following model actually considers the causality, and the physics and produces results that match the intuitions that go into the model. And the femininity turns out to be meaningless. Note also that I found one outlying point and corrected the number of deaths in hurricane Isaac to be 34 as shown on Wikipedia instead of 5 from the dataset. Whatever, I’m sure there are more data cleaning issues too but I don’t get paid to clean other people’s data and I don’t enjoy it, so feel free to scrutinize their data more carefully if you like.

So let’s cut to the punchline of a couple of graphs, and then hear the analysis:

And the residuals from the fit:

The first thing I decided was to analyze only the period after 1975, since the use of male names comes in shortly after (1979) and this is the “modern” era of forecasting and is more relevant to any effect of gender in the future than storms in 1950 or 1960.

### The Basic Model:

So I decided to think about how damage and deaths work. In both cases, the number of deaths and the amount of damage are tied to the population in the area affected. In particular, I used the following basic idea:

\[D = K N (C/N) (v/v_1)^2\]

In other words, Damage $$D$$ is related linearly via a constant $$K$$ (to first order expansion) to the number of people in the area affected, times the capital stock per person $$(C/N)$$ times a measure of the destructive force of the winds (the fraction of the capital stock that will be damaged). Since the pressure exerted on a building is proportional to the velocity of the wind squared, I used $$(v/v_1)^2$$ to measure destructive force. I get the velocity of the winds from the “Category” variable in the dataset, and transform it according to wikipedia’s description of the velocity (I transform to m/s, but then divide by $$v_1$$ the velocity of cat 1 hurricanes, which makes the result independent of units).

So if we’re interested in how many people were involved, we can get a measure of that by doing $$N \sim D/(C/N)/v^2$$.

### Time Trends:

We might expect a time trend in Capital per person. This time trend comes from two sources, the first is inflation of prices, which is removed by their “normalized damage” variable, and the second is actual growth in real value of the capital stock, so that capital normalized to inflation could still be written as $$C/N (1 + \epsilon (Y-1990)/10)$$

Now let’s solve for N to get a number of people involved. We’ll assume K is relatively constant in time and for different values of v^2, and choose a K which gives values of N which are O(1), namely $$(KC/N)\approx 40000$$. Think of this as a units transformation which makes N take on values between 0 and about 1. Also note that I’ve spatially homogenized $$C/N$$ to an overall average. Hurricanes are pretty big, but they do affect areas that vary in how rich they are. This is one of the things that will explain the “residuals” in the model.

\[N \sim \frac{D}{K (C/N (1 + \epsilon (Y-1990)/10)) (v/v_1)^2}\]

Now, what about deaths? Well, basically we’re thinking that the number of deaths is proportional to the number of people involved in destructive events, or:

\[M \propto N\]

Which is to say that the mortality $$M$$ (# of people who die) is basically constantly in proportion to the number of people involved. I rescale $$M$$ to $$M/150$$ because 150 is about how many people die in a big hurricane in this dataset, and it makes the values O(1). However, we’re not done yet. N has the year appearing linearly in the denominator because capital stock per person grows in time. So let’s use a taylor series approximation to move it to the numerator, capital stock per person changes slowly on the scale of a decade so this is probably reasonable.

\[M \sim \frac{D}{K (C/N) (v/v_1)^2} (1 – \epsilon (Y-1990)/10) + \ldots\]

### Estimating The Coefficients:

Now we can let R estimate our coefficients by least squares, because we’re too lazy to do a bayesian analysis. Furthermore, we’ll allow a term for MasFem (the femininity index) and even allow it to interact with $$N$$.

## Read the data and normalize the format, code copied from: ## http://rpubs.com/oharar/20012

library(gdata) library(MASS) #library(knitr)

# Read in the data Data=read.xls("http://www.pnas.org/content/suppl/2014/05/30/1402786111.DCSupplemental/pnas.1402786111.sd01.xlsx", nrows=92, as.is=TRUE) # Data=read.xls("~/git-repos/Hurricanes/pnas.1402786111.sd01.xlsx", nrows=92, as.is=TRUE)

## various plots show Hurricane Isaac as an outlier, there seems to be ## something wrong with the fatalities in Hurricane Isaac. According ## to Wikipedia there were 34 direct, not 5 as in the original dataset Data$alldeaths[Data$Name == "Isaac"] <- 34

Data$Category=factor(Data$Category) Data$Gender_MF=factor(Data$Gender_MF) #Data$ColourMF=c("lightblue", "pink")[as.numeric(Data$Gender_MF)] Data$ColourMF=c("blue", "red")[as.numeric(Data$Gender_MF)] BigH=which(Data$alldeaths>100) # Select hurricanes with > 100 deaths # scale the covariates Data$Minpressure.2014.sc=scale(Data$Minpressure_Updated.2014) Data$NDAM.sc=scale(Data$NDAM) Data$MasFem.sc=scale(Data$MasFem)

vfromC <- function(C){ catms <- c(mean(33,42),mean(43,49),mean(50,58),mean(58,70),80); # typical velocity in meters per second from wikipedia, there were no cat 5 hurricanes in this dataset, so the cat 5 number doesn't matter... return(catms[as.numeric(C)]); }

USPopdat <- read.table("UScensuswiki.txt",header=T); popmodel <- lm(log(Pop) ~ Census,data=USPopdat[USPopdat$Census >= 1970,]); appxuspop <- exp(predict(popmodel,list(Census=Data$Year)));

Data <- cbind(Data,USpop=appxuspop);

qplot((NDAM/40000)/(vfromC(Category)/vfromC(1))^2,alldeaths/150,color=MasFem,data=Data[Data$Year > 1975,])+stat_smooth(method="lm") + opts(title="Deadlyness vs a measure of People Involved")+labs(x="NDAM/40000/v^2");

regmodel <- lm(I(alldeaths/150) ~ I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2)*MasFem+I((Year-1990)/10):I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2),data=Data);

summary(regmodel);

plot(residuals(regmodel));

### Analysis of Resulting Coefficient Estimates:

What about the model coefficients? If you parse through the output of the following R summary, you’ll see that the intercept is essentially zero, indicating that when there’s no hurricane, there are no deaths. This is good. The coefficient of our measure of N is O(1) (it is about 0.84) meaning that basically for every unit of our measure of the number of people involved, the number of people who die goes up by about 1 unit. We expect this because of the rescaling we’ve applied to make both N and M O(1). This is good and expected. The coefficient on the time term is negative and O(0.1) which is reasonable because the capital stock per person increases slowly with time (implying negative coefficient, and small on this scale, as seen and expected). Finally, the effect of Femininity is meaninglessly small by itself, even when interacting with our proxy for the number of people involved, and is not anywhere close to statistically significant. **The final interpretation: When destructive events involve more people because they hit more populated areas, or they destroy a larger fraction of the structures… more people die.**

### Raw R Summary:

Call: lm(formula = I(alldeaths/150) ~ I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2) * MasFem + I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):I((Year - 1990)/10), data = Data) Residuals: Min 1Q Median 3Q Max -0.45177 -0.05030 -0.02915 0.00788 1.51405 Coefficients: Estimate (Intercept) 0.0299784 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2) 0.8358198 MasFem 0.0008459 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):MasFem 0.0457350 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):I((Year - 1990)/10) -0.1577365 Std. Error (Intercept) 0.0621153 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2) 0.4158438 MasFem 0.0080992 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):MasFem 0.0500918 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):I((Year - 1990)/10) 0.0605482 t value (Intercept) 0.483 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2) 2.010 MasFem 0.104 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):MasFem 0.913 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):I((Year - 1990)/10) -2.605 Pr(>|t|) (Intercept) 0.6306 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2) 0.0475 * MasFem 0.9171 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):MasFem 0.3638 I((NDAM/40000)/(vfromC(Category)/vfromC(1))^2):I((Year - 1990)/10) 0.0108 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2108 on 87 degrees of freedom Multiple R-squared: 0.4282, Adjusted R-squared: 0.4019 F-statistic: 16.29 on 4 and 87 DF, p-value: 5.4e-10

I’ve gotten ahold of the pay-walled PNAS article, and see that they only count US deaths, so my change for hurricane Isaac might not be correct. whatever, it seems that they probably have problems with their counts anyway, a proper bayesian model would certainly have an errors-in-variables component.

The biggest issue though is that they seem to have created their NDAM variable by converting the damage to the prediction for the 2013 damage that would have occurred if the hurricane hit the same spot at that time. That’s pretty broken, since it has to do with what the population and building stock are in 2013 which is irrelevant to the actual hurricane.

Unfortunately, they don’t include an un-normalized damage estimate, which we could simply divide by CPI to get a constant dollar amount. but I would expect if that happened, my model would fit even better.