According to this opinion piece in The Telegraph by a knighted peer, and inventor of the bagless vacuum, the UK needs to "double the number of engineering graduates

Ok, let's see here. There must be at least 1 engineering school in the UK, and it must graduate at least 100 students because otherwise how would it stay in business?

So let's calculate: $\sum_{i=1}^{20} 100 \times 2^i$. We could do all kinds of nice little integration techniques to approximate this, or we could just ask Maxima for the answer:

(%i1) sum(100*2^i,i,1,20);
(%o1) 209715000

Yep, so at a minimum the UK needs to graduate another 209 million engineers total, in a country  that currently only has 63 million people and a growth rate of 0.6% last year?

And what if we maybe assume that more than 100 people graduated from engineering school last year? Like maybe it was 1000 or even 10000? The whole thing is linear in the factor of 100, so we might need to graduate 2 billion or even 20 billion engineers over the next 20 years. Perhaps they're planning to puree them into Soylent Green to feed the masses?

Monty Python had something useful to say at this point:

Let's just hope these engineers have better numeracy than the peerage.

This video was posted to my FB news feed by a friend who works in technology. It gives voice to a lot of thoughts I've been having recently on the future of the economy:

Here are some things I've been thinking about related to this:

It's easy for me to imagine a coming world, in my lifetime, in which the vast majority of the frequently repeated tasks currently performed by humans are performed by computer and robot. This includes:

• Transporting things (by truck, train, within warehouses, picking, packing, shipping, etc)
• Selling things (from what Amazon already does, to cars, houses, industrial supplies...)
• Making decisions about which things are needed where, or how much of a thing to make or acquire (much of store management, middle management).
• Cleaning things (janitors, maids, etc)
• Diagnosing illness and treating common illness (let's say 80 to 90% of general practice medicine)
• Manufacturing things, even or especially customized things (3D printers, general purpose CAD/CAM, robotic garment construction)
• Basically, anything you do more than 1 day a week...

In that context, first of all vast numbers of people will be unemployed, or looking for alternative employment, and second of all, the only alternative employment will be doing things that are "one off" where the cost of teaching a machine to do it doesn't pay off. So humans will need to find a series of relatively random things to do. Things which you won't typically repeat more than a few times a month or year.

Furthermore, it now becomes impossible to rely on working at a "regular job" to provide a regular level of income to feed, clothe, house, educate, medicate, and otherwise sustain the basic needs of a human being. So, all else equal, the average wages humans will earn will go steadily down.

At the same time, cost of producing things will go down too. A decent pair of glasses might be something you can 3D print the frame of, and online order the lenses for, assembling it all yourself in less time than it takes to get a pair from a current Lens Crafters, at a price of say $3.00 instead of$50 or $250 (for designer frames), and choosing from a vast vast selection of designs. So, the bottom might drop out of the price of everyday goods. Note that you can already buy cheap eyeglasses online for around$10 if you don't care about relatively high quality lens material and coatings.

The question is, what will be the value of the ratio $\bar P/ \bar S$, that is the average price (P) of a basket of important common goods that you consume in a typical year, divided by the average salary (S) in dollars per year. This is a dimensionless ratio and describes whether you are breaking even or not (>1 = losing, < 1 = winning). Both of these quantities are in theory going to be decreasing. But what matters is not just what is some new asymptotic value, 100 years from now or more, but also what are the dynamics of these changes during the next 10, 20, 50, or 100 years. It is entirely plausible that the bottom drops out of the market for labor quicker than it drops out of the market for goods for example. The result is potentially worse than the Great Depression during a period where nevertheless we have vast potential growth in real wealth through automation!

One problem area seems to be social and political technology. Conservative ideas about how the world should be: "work hard and get ahead" sort of thoughts could very well be highly counterproductive during these changes. The future may well be in "find stuff no-one has thought to automate yet, and do that for a little while until it isn't needed anymore", where "a little while" might be anywhere from an hour to a couple of months or a year but probably won't be ten or twenty years.

We already see some of this in things like "Etsy" where individuals produce one-off or short batches of custom goods, not that Etsy is by itself changing the economy dramatically, but even the world of people buying used books from libraries, marking them up by a few pennies, selling them on Amazon with shipping and handling fees, and pocketing a few dollars per book is an example of humans going out and doing one-off tasks (namely combing through boxes of books for likely candidates). Even that, with its regular nature is fairly automatable, and it only exists because we don't legally allow technology to scan and reproduce those books electronically (copyright anyone?).

One political advance I see being needed is relatively simple and efficient redistribution of wealth. We're already redistributing a lot of wealth through tax systems, welfare systems, etc. But we could set some baseline standard, and create the Universal Basic Income, or build it in to our taxation system (see my previous post on optimal taxation with redistribution). The idea being that we give everyone a simple cushion to help make the risky entrepreneurial aspect of doing a wide variety of non-repeated things more workable for people, and let people improve on their basic level of income through this entrepreneurial process, with people essentially running a vast number of small businesses utilizing the tools that a bot-based production system creates.

Like it or not, I just don't see 9-5 jobs having very much future beyond my generation, but we should probably embrace that idea and make it work for us as a society, not rebel against the inevitable. Doing so will require new social structures, new identities, new laws, and new ideas about work.

## Oh well, I got a laugh out of it at least.

Boobies. There I had to say it. This is a post about boobies, and math, and consulting with experts before making too many claims.

In this click bait article that I found somehow searching on Google News for unrelated topics, I see that some "Medical Anthropologists" are claiming that Bras seem to cause breast cancer (not a new claim, their book came out in 1995, but their push against the scientific establishment is reignited I guess). At least part of this conclusion seems to be based on the observation from their PDF

Dressed To Kill described our 1991-93 Bra and Breast Cancer Study, examining the bra wearing habits and attitudes of about 4,700 American women, nearly half of whom had had breast cancer. The study results showed that wearing a bra over 12 hours daily dramatically increases breast cancer incidence. Bra-free women were shown to have about the same incidence of breast cancer as men, while those who wear a bra 18-24 hours daily have over 100 times greater incidence of breast cancer than do bra-free women. This link was 3-4 times greater than that between cigarettes and lung cancer!

They further claim "bras are the leading cause of breast cancer."

That's pretty shocking data! I mean really? Now, according to http://seer.cancer.gov/statfacts/html/breast.html there are about 2 Million women in the US living with breast cancer, and 12% overall will be diagnosed throughout their lives. There are around 150M women in the US overall. So $P(BC) = 2/150 = O(0.01)$

However, in our sample $P(BC) = 0.5$ That's 50 times the background rate (ok 37.5 if you do the math precisely).

Doesn't it maybe seem plausible that in winnowing through the 1% of women living with breast cancer and are still alive, or even the 5 or 6 percent who have been diagnosed in the past but are still alive (figure half of women who are alive today who will at some point be diagnosed have already been diagnosed at this point) that maybe, just maybe they could have introduced a bias in whether or not their sample wears bras?

So "looking for cancer patients causes us to find bra wearing women" is actually maybe the more likely story here? Perhaps "cancer patients who were non bra wearers were overwhelmingly more likely to have died from their breast cancer, and so we couldn't find any of them?" That's somehow not as reassuring to the non-bra-wearers in the audience I think.

Symbolically: $P(Alive \& BC \& NBra) = P(Alive | BC \& NBra) P(BC|NBra) P(NBra) = 1/100 P(Alive \& BC \& Bra)\\ = 1/100 P(Alive | BC\&Bra) P(BC|Bra) P(Bra)$ pretend BC and Bra are independent. We conclude $P(Alive | BC \& NBra) = 1/100 P(Alive | BC \& Bra) P(Bra)/(1-P(Bra))$ or not wearing a bra reduces your chance of surviving by a factor of 10 or so if P(Bra) ~ 0.9? Put on those bras ladies! The exact opposite of their conclusion!

I personally suspect something else spurious in their research. But nothing in their PDF convinces me that they know what they are doing.

Note that wikipedia has some discussion of their book.

A friend of mine posted a link to news articles about a recent Surgeon General warning about sunscreen

http://www.cnn.com/2014/07/29/health/surgeon-general-skin-cancer/

Quoting from that article:

Skin cancer is on the rise, according to the American Cancer Society, with more cases diagnosed annually than breast, prostate, lung and colon cancer cases combined.

On Tuesday, the United States surgeon general issued a call to action to prevent the disease, calling it a major public health problem that requires immediate action. Nearly 5 million people are treated for skin cancer each year.

But let's dissect this a little bit. First of all, most skin cancer is unlikely to really hurt you. It's melanoma that is the real concern.

cancer.gov gives melanoma rates according to the following graph:

Melanoma diagnosis and death rates through time from seer.cancer.gov

As for melanoma itself, clearly the diagnosed cases are on the rise, but look at the deaths per year. Totally flat. This is consistent with improved diagnosis procedures without any change in actual incidence in the population. Furthermore looking deeper on the melanoma site we see that 5 year survival rates have increased from 82% in 1975 to 93% in 2006, this is also consistent with earlier diagnosis (so that the 5 year rate measures from an earlier point relative to the initiation of the melanoma).

How about melanoma by state? Climate should be involved right? More sun exposure should mean more melanoma?

Melanoma rates by state from seer.cancer.gov

As you can see, states in the southwest have lower rates, and states in the northwest and northeast have higher rates. The cluster of southeastern states with high rates are interesting too.

Vitamin D is suspected to be involved in several health affects related to cancer, so overall exposure to sun may be beneficial. However, I think that the data is also clear that intense exposure to sun from tanning beds, intentional tanning, and repeated sunburn is bad for your skin.

Sun exposure, like other exposures such as alcohol, may have nonlinear risk effects. At moderate exposures you are better off than at zero exposure (in Oregon rain or Maine winters) or heavy exposure (leading to repeated sunburn and high melanoma risk).

So, is the advice to slather on sunscreen good? I don't think the conclusion is so clear cut, but I don't have any in-depth study data to go on either. All I can tell you is that I'll continue to avoid getting sunburn by covering up and using zinc based sunblock when I'm outside for long periods, but I'll continue to get regular sun exposure without sunblock in mornings, evenings, and mid day during non-summer months.

A serious error in refereeing occurred in the 2014 World Cup final in the beginning of the second half, when German goalkeeper Neuer challenges Gonzalo Higuain in a reckless and dangrous manner, resulting in a collision that is absolutely shocking to watch. I suspect that Higuain only survives this collision because he is unaware that it will occur and does not tense up (his eyes are facing away from Neuer and on the ball the whole time). Videos abound of this event, such as this one (EDIT: takedown requests have made getting good video more difficult, much of what is available is people glorifying this event by replaying it over and over set to aggressive music). After this collision, a foul was given against Neuer (ie. committed by Higuain??) and Germany takes possession of the ball. The charitable interpretation of this is that the referee simply didn't see what happened, and therefore applied a heuristic that the goalkeeper gets the benefit of the doubt. The alternative is actual misconduct by the referee. The uninformed claptrap that abounds on the internet, in which people claim that this is not a foul by Neuer against Higuain, or that it is not worthy of a red card against Neuer seems to be rampant. Fortunately, there are rules to the game which can be looked up. Such as on the FIFA website interpretation of rules for fouls and penalties. Under page 122 of this commentary: "Serious Foul Play"

"A player is guilty of serious foul play if he uses excessive force or brutality against an opponent when challenging for the ball when it is in play.

A tackle that endangers the safety of an opponent must be sanctioned as serious foul play. Any player who lunges at an opponent in challenging for the ball from the front, from the side or from behind using one or both legs, with excessive force and endangering the safety of an opponent is guilty of serious foul play. ....

A player who is guilty of serious foul play should be sent off"

Neuer lunges at Higuain, using one leg, from the side (or behind Higuain's direction of view), using excessive force (at full tilt with knee raised) and certainly endangering the safety of Higuain. Some have said this was ruled as a foul on Higuain because he was impeding the progress of Neuer, which is nonsense. On page 118 of the above pdf:

"Impeding the progress of an opponent means moving into the path of the opponent to obstruct, block, slow down or force a change of direction by an opponent when the ball is not within playing distance of either player.

All players have a right to their position on the field of play, being in the way of an opponent is not the same as moving into the way of an opponent.

Shielding the ball is permitted. A player who places himself between an opponent and the ball for tactical reasons has not committed an offence as long as the ball is kept within playing distance and the player does not hold off the opponent with his arms or body. If the ball is within playing distance, the player may be fairly charged by an opponent."

Higuain was within playing distance of the ball at the time the foul was committed, Higuain had a right to his position, Neuer could have avoided the collision (as opposed to Higuain impeding progress by stepping in front of Neuer to intentionally trip him, which is what the above commentary is about).

Was this reckless and endangering the safety of Higuain? Anyone who watches this collision has to see that Higuain's safety was endangered. It's actually shocking that he survives it without being paralyzed. I suspect that this is entirely due to Higuain being unaware of the brutal challenge that is coming since his eyes are clearly on the ball, away from Neuer, the entire time. If he had tensed up when the impact occurred his neck vertebra would have been subject to much higher forces (as opposed to simply bending) and that might have resulted in his permanent paralysis. That isn't even to say anything about the possibility of concussion from the impact.

My prediction is that this World Cup will result in rule changes regarding the treatment of head injuries, it was constantly on the lips of commentators, fans, and others throughout the tournament.

As far as I am concerned, without their primary goalie (ie. playing 10 men and with a substitute for the goalie), Germany would have lost this match badly, and since the rules clearly show that the goalie should have left the game at 56 minutes or so, Argentina are the proper world champions.

The article purports to show that alcohol played a role about 1/10 of all deaths among "working age" people (age 20-64). This is a pretty big percentage, but we should remember that most people don't die until after age 64. In fact according to the CDC life tables about 85% of people survive to 65, and 50% of people live to more than 84 years of age. So if alcohol is attributable to 10% of the 15% of deaths that occur before age 65, then alcohol is attributable to about 1.5% of all deaths in the US.

Since most people are living so long, the kinds of things that do kill younger people are likely to be things that are not very age specific, or which are generally quite harmful, like car accidents, and crime. It's pretty clear that excessive use of alcohol is problematic, but just because it's involved in 10% of all deaths in this population doesn't mean it's involved in 10% of all deaths in the US, most of which occur in people over 64 years of age, and are related to things like cancer, heart disease, pneumonia, etc.

How do they estimate the role that alcohol plays? They plug things into a model developed by people who study the harmful effects of alcohol. The model has an online application available. Assumptions in this model are going to play a big role in the outcomes. For example, they assume that 42% of fire injuries and 47% of homicides are "attributable to alcohol" in this population. I don't know where they get those kinds of numbers, but even if those are decent point estimates, the uncertainty in the attributable fraction would seem to be significant. Similarly suicide is somehow 23% attributable to alcohol.

This kind of result is pretty questionable. Clearly a causal attribution is being made. Implicitly this means somehow "without alcohol" these deaths wouldn't have occurred. But suppose you have for example a depressed Iraq war veteran with post traumatic stress syndrome, a small child (which is a big stress), and a wife who is considering leaving him (my sister who is a Psych NP used to see about 10 cases a day at a VA hospital that were like this). Suppose that this person, who has been threatening to commit suicide for a few years one day drinks 500 ml of 40% alcohol spirits (about 10 or 12 drinks) over a period of about 4 or 5 hours, and then shoots himself. Is this "attributable" to alcohol, or is it "attributable" to traumatic combat stress, social safety net issues, as well as perhaps genetic risk factors which affect personality and make this person both more likely to join the military, and more likely to contemplate suicide when faced with serious life problems?  It's a pretty difficult and complex thing to tease out. If you want to make "causal" claims, it has to some how be that with alcohol the death occurs, and without alcohol the death wouldn't have occurred. Pretty clearly, when a 19 year old fraternity pledge drinks until he passes out and dies of alcohol poisoning, the cause is plainly attributable "to alcohol" (as well as to poor social norms that encourage excessive drinking in that context). But what is the "intervention" that would involve alcohol and would have prevented these events? If it's not possible to intervene to control the variable, and thereby control the outcome, then we can't really have "causality".

It's also clear that interventions don't leave "all else equal". Clearly, in the US, when we tried Prohibition, violence went way up. We couldn't just "eliminate alcohol" while keeping everything else about society the same. So in that context you might say that thanks to the fact that we have alcohol, overall deaths are much down vs what they would be with legally enforced prohibition. No, I wouldn't say that either, because the whole concept is just stupid. Prohibition was predictably a bad idea from the start, but it was motivated by the "all else equal" fallacy.

I personally think of these CDC type reports as largely politically motivated. Some people study the harmful effects of alcohol (or guns, another political topic the CDC has published on)and by publishing this kind of "attribution" article they can justify further funding and continue their research, and they can also bring some publicity to what is undoubtedly a real problem. But typically the way in which these things are stated winds up over-generalizing and making the politically charged issue into a ruckus.

To contrast that approach, consider this report from a group that studies alcohol issues in Canada and has published "low risk" guidelines. The estimate is that if everyone were to adopt the "low risk" guidelines, deaths in Canada would decrease by 4600 per year. This is a concrete, causal sort of intervention and most likely achievable in large part without major changes to "all else." How big is that 4600 per year decrease? Well this website lists about 250000 deaths in Canada per year, so the idea is that by everyone adopting the low risk guidelines, deaths would be reduced overall by about  2%. It's interesting how my 2% calculation here is about the same order of magnitude as the 1.5% calculated back in the first paragraph or so. Implicitly then, the real issue is heavy, or binge drinking, since that is what the "low risk" guidelines rule out.

The Canadian report takes the point of view that their guidelines are set so that if you max out the guidelines (for men, 15 drinks per week, no more than 3 in 24 hours most days, no more than 4 in 24 hours ever, consumed in a safe environment that rules out drunk driving etc) the net effect on deaths vs complete abstinence from alcohol would be zero. How can that be? Wouldn't the least deaths happen from zero alcohol consumption? It's been known for a long time that alcohol consumption mortality has a sort of U shaped profile. Those who drink a little overall live longer than either heavy drinkers, or complete abstainers (on average). Mostly the effect is related to reduced cardiovascular disease related problems, but there may be other effects.

So, to that, I say if you are not in a high risk category for alcohol dependence (such as the offspring of an alcoholic or someone who has suffered from alcohol related problems in the past) raise one or maybe two glasses of wine, beer, or well made cocktails tonight with some friends, and be thankful for a substance that is beneficial in moderation. Keep to those low risk guidelines, and don't drive anywhere until the effects are fully worn off.

Oh, and don't believe everything you read.

Stanislaw Ulam was a mathematician who more or less invented the use of Monte Carlo computations for physical problems, and he's the namesake for which the Stan Bayesian modeling system was named.

Following up on my previous analysis of Himmicanes vs Hurricanes, I decided to go fully Bayesian on this problem (imagine the voice of Marsellus Wallace from Pulp Fiction). Hurricane damage is of course a traditional Civil Engineering interest, and if the name of the storm has a lot to do with how deadly it is, for social reasons, then this seems like information we really want to know.

The big issue in the Hurricane analysis is measurement uncertainty. For example in examining Hurricane Isaac in my previous analysis, (before I had gotten access to the PNAS paper), I decided that their estimate of 5 deaths was somehow wrong, and put the 34 deaths from Wikipedia in. Then, after reading the PNAS article I realized they were only counting US deaths, so I looked more carefully at the Wikipedia article and counted 9 deaths mentioned in news stories. So clearly even the direct deaths have errors and uncertainty, and that isn't to say anything about the indirect deaths, such as increased car accidents in regions outside the evacuation, or heart attacks or strokes caused by stress, or even potentially reductions in overall deaths due to lower traffic levels during hurricanes. It's a confusing and difficult to measure thing. And if it's hard to measure deaths, think of how hard it is to measure dollar damages. First of all, there is some kind of normalization procedure used by the original authors, second of all, a car crushed by a fallen tree is pretty clearly hurricane damage, but what about a car that gets "lightly" flooded but this causes electrical damage which leads ultimately to the car failing 6 months later and being scrapped because the repair is too costly? We should take measures such as damage and deaths as having significant errors incorporated into their measurement.

With that in mind, I decided to re-analyze the data using a Bayesian errors-in-variables model on top of my basic model of "involvement" using $D/(C v^2)$ as a measure of number of people potentially affected by the storm (where D is damage, C is capitalization of the region in dollars per person, and v is the velocity of the storm winds).

To accomplish this I set up parameters that represented actual values of $D, C,v$ for each storm, and linked them to the measured values through probability distributions, then linked these unobserved "actual" values together through a regression line in which the Masculinity and Femininity were allowed to play a role after 1979 (when both male and female names were in use). Before 1979 we simply consider this as a separate era, and have a slope and intercept term for that entire era).

The model is perhaps best described by the code itself:
stanmodl <- "
data {
int Nhur;
real<lower=0> NDAM[Nhur];
real alldeaths[Nhur];
real<lower=0> Category[Nhur];
real Year[Nhur];
real MasFem[Nhur];
}

parameters {
real<lower=0> Damage[Nhur];
real actualdeathsnorm[Nhur]; /* the actual deaths, normalized by 150, unobserved*/
real<lower=0> v[Nhur];
real<lower=0> Cappercapita[Nhur];
real<lower=0> ndsig;
real<lower=0> involvement[Nhur];
real<lower=0> invsd;
real<lower=0> slope;
real<lower=0> intercept;
real MFslope;
real MFint;
real MasFemAct[Nhur];
real predsd;
real Pre1979Int;
real Pre1979Slope;
real<lower=0> avgcappercap;
}
/* dollar amounts will be converted consistently to billions*/
model{
ndsig ~ exponential(1/.1);
invsd ~ exponential(1/.5);
MFslope ~ normal(0.05,10.0);
MFint ~ normal(0.05,1.0);
intercept ~ normal(0,1);
slope ~ normal(0,10);
predsd ~ gamma(1.3,1/4.0);
Pre1979Int ~ normal(0,1);
Pre1979Slope ~ normal(0,10);
avgcappercap ~ lognormal(log(5.0/1000.0),log(1.5));
for(i in 1:Nhur) {
Damage[i] ~ normal(NDAM[i]/1000.0,NDAM[i]/1000.00 * ndsig);
Cappercapita[i] ~ normal(avgcappercap,.25*avgcappercap);
v[i] ~ normal((25.5+8.2*Category[i])/(25.5+8.2),0.15); /* normalized wind velocity, with error*/
MasFemAct[i] ~ normal((MasFem[i]-6.0)/5.0,.15);
involvement[i] ~ normal(Damage[i]/(Cappercapita[i]*v[i]*v[i])/2000.0,invsd);
actualdeathsnorm[i] ~ normal(alldeaths[i]/150, .1*alldeaths[i]/150+5.0/150.0);
if(Year[i] < 1979) {
actualdeathsnorm[i] ~ normal((intercept + Pre1979Int + (slope+Pre1979Slope ) * involvement[i]), predsd);
}else
{
actualdeathsnorm[i] ~ normal((intercept + MFint*MasFemAct[i] + (slope+(MasFemAct[i]*MFslope) ) * involvement[i]), predsd);
}
}
}"

out <- stan(model_code=stanmodl,data=list(Nhur=NROW(Data), NDAM=Data$NDAM, alldeaths=Data$alldeaths, Category=as.numeric(Data$Category), Year = Data$Year, MasFem=Data\$MasFem),chains=1,iter=800)

I run 800 samples (only 400 of which are post-warm-up) on 1 chain because the model has a lot of parameters (one for each measurement, so a couple hundred) and it takes a while to run. This run takes about 5 minutes on my desktop machine. If I were going for publication I'd probably do 3000 or more samples in two separate chains in parallel (I have two cores). But the basic message is already visible in this smaller sample set:

Here we're plotting the posterior mean values of the "actual deaths" vs the "actual involvement" $D/(C v^2)$. As you can see, After 1979, the posterior values line up along the same line regardless of the Masculinity / Femininity of the name. Before 1979, the slope of the line is larger, indicating a higher fraction of the people involved tended to be killed. This can most likely be attributed to much poorer forecasting and communications in the 1950's and 1960's. For example, the wiki article on Hurricane Camille mentions many people refusing to evacuate even including some jail prisoners who thought this category 5 hurricane would be no big deal.

Why do these data fall right on the line? In the model we have "wiggle room" for all of the actual measurements. This includes some room for variability in capitalization per capita (think "wealth of the area affected") as well as the actual number of deaths and the actual wind velocity at landfall. Put all together, there is enough prior uncertainty in the measurements that we can find the posterior average values of the estimates directly along the regression line that we've specified. The prior we specified for the model error (predsd) is relatively wide, but the Bayesian machinery is telling us the more important source of uncertainty is in the measurements, somehow it seems to pick a very small predictive sd O(0.01) even under alternative priors. (this is actually somewhat suspicious I suppose, and would be worth following up if you were trying to publish this non-news).  So all the estimates of the actual deaths and actual involvement are pulled towards the regression line we've specified. We've plotted with low alpha some grey dots that show alternative possibilities from our samples.

How about the masculinity and femininity effects?:

The effect of femininity on slope and intercept.

As you can see, these values are essentially 0 though there is plenty of uncertainty left to allow for the possibility of small effects. We don't know if there is some small effect, but we sure as heck shouldn't publish a result claiming an important effect based on this information!

Many people in the blogosphere have been commenting on the paper in PNAS about male vs female hurricanes. Thanks to Bob O'Hara who posted some code doing his own analysis, I decided to grab the data and do a little modeling myself. My main concern was how I was seeing a lot of statistical analysis, but very little modeling involving causal and physical factors. The following model actually considers the causality, and the physics and produces results that match the intuitions that go into the model. And the femininity turns out to be meaningless. Note also that I found one outlying point and corrected the number of deaths in hurricane Isaac to be 34 as shown on Wikipedia instead of 5 from the dataset. Whatever, I'm sure there are more data cleaning issues too but I don't get paid to clean other people's data and I don't enjoy it, so feel free to scrutinize their data more carefully if you like.

So let's cut to the punchline of a couple of graphs, and then hear the analysis:

The basic model

And the residuals from the fit:

The first thing I decided was to analyze only the period after 1975, since the use of male names comes in shortly after (1979) and this is the "modern" era of forecasting and is more relevant to any effect of gender in the future than storms in 1950 or 1960.

### The Basic Model:

So I decided to think about how damage and deaths work. In both cases, the number of deaths and the amount of damage are tied to the population in the area affected. In particular, I used the following basic idea:

In other words, Damage $D$ is related linearly via a constant $K$ (to first order expansion) to the number of people in the area affected, times the capital stock per person $(C/N)$ times a measure of the destructive force of the winds (the fraction of the capital stock that will be damaged). Since the pressure exerted on a building is proportional to the velocity of the wind squared, I used $(v/v_1)^2$ to measure destructive force. I get the velocity of the winds from the "Category" variable in the dataset, and transform it according to wikipedia's description of the velocity (I transform to m/s, but then divide by $v_1$ the velocity of cat 1 hurricanes, which makes the result independent of units).

Last week I went swimming and met a couple who were very helpful and friendly. They were both professors at the Art Center College of Design here in Pasadena. Anyway, one of them pointed out that the pool does have a lap timer clock which I hadn't bothered to notice before, so I timed myself swimming 50 and 100 yard sets (two and four laps of the pool). My time for 50 was about 40 seconds (1.14m/s), and 100 was 1:40 (0.914 m/s) these were fairly consistent after the first set or two. I consider this not bad since I'm a father of two small boys who hasn't been exercising enough for the last 4 years or so. But another thing I discovered was that I know I swim 25 yards in about 13 or 14 strokes, sometimes 15. This, together with my lap times gives me a range of stroke rates which is around 36 or so strokes per minute. Now typical stroke rates are advocated to be around 60 to 70 per minute, which suggests that somehow I should increase my stroke rate. Intuitively, it seemed to me that this wouldn't work for me, but I thought I'd analyze the math of it. Thinking about how this would affect power output led me to the following analysis:

Let's add in the energy cost of stroking to the previous analysis. To get an energy cost, consider that your muscle "wastes" the kinetic energy of your arm at each end of the stroke. So the power you're putting out overcomes drag, and also accelerates and decelerates your arm every stroke.

Where $a$ is the length of the arm stroke, which is different from $sL$, so $m(2ar)^2r/2$ has the dimensions of power (kinetic energy per stroke time). Also, note that $s= v/(Lr)$ which I substitute for. We're going to try to control $v$ and $r$ so having things in terms of those variables is more helpful.

Another analysis I'm going to do here is to change from using $h$ and $d$ to $h$ and $t$, where $h$ is still the height of the wake above the water, but $t$ is the full cross sectional profile of the body, so $t = h+d$. I do this because for people who have a fairly flat profile in the water, $t$ is something fixed by their body shape, whereas $h$ can be controlled by things like head position. The previous analysis normalized by drag using $d$ but normalizing by full submersion drag using $t$ makes a little more sense because it's a single fixed point rather than something that changes when $h$ changes. Substituting $d=t-h$, and normalizing by $K\rho t wv^3$ gives the following dimensionless power equation (note, as in the previous analysis, I've dropped the lateral viscous drag term as we've already seen it's multiplied by a trivially small constant).

This is a slightly different form than before, we see in this analysis that we have three dimensionless groups $h/t, gh^2/(v^2t),ma^2r^3/(\rho t v^3 w)$.

From this analysis, if we take a very low in the water body position, then $h/t=0$, and we see that we can decrease our drag by riding up a little bit so that $h>0$. Since the wave drag grows like $h^2$ when $h$ is small the decrease in piston drag $(-h/t)$ exceeds the increase in wave drag. This probably accounts for the fact that elite swimmers often have a relatively forward looking head position. This lets them reduce their piston drag significantly while suffering only a small wave drag penalty. It also probably helps to produce the trough that allows for effective breathing.

Plugging in typical values for the mass of the arm and the density of the fluid and soforth, the order of magnitude estimates for the dimensionless groups are about 0.2 for the wave group, and $0.02r^3$ for the arm group. So, although the equation predicts that if I double my stroke rate, I will waste 8 times more arm power, this still brings it into the same range as the wave and piston drag power. But why would I want to do that? The only reason I can see is that it would give me significantly more opportunities to breathe. One thing that I find with a 30 or so strokes per minute stroke rate is that I pretty much have to breathe one side only because otherwise I build up carbon dioxide. But doing so gives me an asymmetric stroke and might increase the wave and piston drag coefficients, and it makes it hard to empty my lungs fast enough. A faster stroke rate will have to be combined with a shorter stroke length, but will allow me to have more opportunities to breathe and may allow me to have a better sustained power output, actually making me faster even though my power output increases.

At least, that's how I interpret the math. I plan to get a little metronome that beeps regularly to help me time my stroke and see how it works for me.