You should be able to access models.street-artists.org now via IPv6, thanks to my web host getting with the modern times. Now if only I could get IPv6 natively at home...
If you have small children, especially in daycare, eventually you will be exposed to a wide variety of illnesses. One of the least fun is viral gastroenteritis caused by norovirus.
Norovirus is not susceptible to alcohol hand sanitizers due to its durable protein coat, and the main method for sanitizing surfaces that have been exposed is the use of bleach.
Sources I read online suggest that 1000 ppm bleach (1:50 dilution) is effective for surfaces like countertops, toilets, and other durable surfaces, 200 ppm (1:250 dilution) is recommended for things that come in contact with food or human mouth (like baby toys etc). These dilutions are MUCH less concentrated than the average person might think. If you have a typical say 500ml bottle you're talking about putting in about 10ml (two tablespoons) of bleach and filling the rest with water for the strong solution, and 2 ml of bleach with the rest water for the weaker solution (half a tablespoon). Bleach solutions lose their effectiveness in a matter of days due to offgassing and reactions caused by light. One source said 30 days for complete ineffectiveness. If you're in an institutional setting they recommend pouring out old bleach and making a fresh solution each morning.
The best way to apply it is via a wash bottle which does not aerosolize the bleach and therefore keeps it out of your mucous membranes and lungs etc. Do NOT use a normal squeeze spray bottle due to the possibility of breathing in irritating bleach droplets from the air which then makes you irritated and more susceptible to illness.
How much pollution is caused by lawn care in LA county? If you live around here outside of the downtown urban areas, you know that you will constantly hear the sounds of leaf blowers, lawn mowers, edgers, trimmers, and the like. Ignoring the horrible noise pollution for the moment, can we approximate the effect of all this obnoxious activity on air quality? We'll make the following approximations and assumptions:
- There are about 10 Million people in LA county.
- Based on population I estimate there are perhaps 4 Million residences with lawns.
- Typical lawn care occurs once per week per residence.
- A lawn takes about 30 minutes to "mow and blow"
- A crew can "mow and blow" for about 6 hours a day (the other time is spent traveling around).
- Crews work 5 days a week and are at around 60% capacity
- There are therefore about: 400000 hours of operation of small engines each week day, or 10^8 hours per year.
- There are around 100000 crews working on a given day each with one lawnmower.
- In a year one crew's lawnmower therefore receives about 1000 hours of operation (not to mention edgers and blowers).
- A lawnmower engine lasts perhaps 7000 hours before needing replacement (7 years).
- A lawnmower used for an hour produces about 8 times the noxious pollutants as a car used for the same amount of time. (Based on some supposed EPA estimate I read online, but this seems reasonable if you've ever walked past a guy mowing a lawn compared to walking down a busy street).
- A typical household commutes 2 hours per day in 2 cars, and there are 4 Million households.
or in LA we are adding an additional 20% of our commuting pollution coming from lawn care related causes.I've been thinking about this problem for a long time, especially with respect to problems involving physical models with uncertainty. Let me try to give you an example, and then maybe you'll have some insight that I don't.
Suppose you have a model involving a deterministic dynamical system, but with unknown parameters. To be concrete, suppose it's something like a model for population of bacteria in an infected organ (i'm making this up, I don't actually have such a model). So you have something like

The meaning of these symbols is: P is population of bacteria, C is concentration of some drug in the affected organ, CB is concentration in the blood, and I is the immune system response intensity. The functions DP, DC, DCB, DI are something we have hypothesized and are specified as our model, and they have some parameters, like rate coefficients and soforth which are unknown. Now we have some observations, say a small number of cases where we track through time some laboratory indicators of P,C,CB,I for a few patients, and then perhaps for a lot more patients we have an initial treatment time (where C and CB = 0), an initial estimate of P and I, and a treatment protocol (DCB as a function of time, basically the dosing protocol) and some final time and final resolution (ie. at final time T_i we either observed complete recovery for patient i or we switched to some other more aggressive treatment method) We'd like to estimate the coefficients of the model based on the observations, so we write down Bayes' theorem:

where Z is the normalizing constant which we'll ignore and calculate via some kind of sampling procedure. Now because I've built the model I may have some kind of idea about the prior on the coefficients
, perhaps one rate is expected to be large, another one might be near zero, blablabla, I could come up with some mildly informative priors. This actually seems like the easy part.
What I am really confused about is to come up with a likelihood
and the reason is that the observations are a fairly complicated thing. Let's consider the various classes of observations:
- Time series: we could hypothesize some kind of measurement error on our instruments, and we could then put a
by taking the given coefficients, running the model to the various time points, looking at the differences between the measurements and the model predictions, and using the measurement error model (perhaps gaussian, or perhaps some kind of skewed or discretized, or censored error model depending on how we do our measurements). So here we're saying that the likelihood of a particular model outcome is related primarily to the measurement device. - Time to outcome data: we take the dynamic model, with the given coefficients, and we run it forward in time to the final observation point, and we ask what is the probability that given those coefficients we would observe the given outcome (recovery, defined as P approx 0, or not recovery, defined as P significantly bigger than 0). However it seems that given the deterministic nature of the model, the outcome is deterministic so the probability is either 1 or 0. If we have likelihood either 1 or 0 we will wind up with our prior multiplied by a square wave type function. I suppose we could also come up with a measurement error model here: if we say that they've recovered then the actual population P is somewhere near zero with some kind of say exponential distribution with average value a hyperparameter with a prior that's some sort of small value, and if they haven't recovered then there's some kind of say lognormal distribution for the population P with unknown hyperparameters again. We'd need to supply informative priors over the measurement error models given the observation (because it's not plausible for our estimation procedure to give us estimates that P is large if we gave the patient an "all clear" at the final time).
- Time to outcome data with dynamic noise: We could also hypothesize some dynamic noise driving the model so that the outcome is not deterministic (and also there could be uncertainty on the initial conditions, but the initial conditions are perhaps hyperparameters themselves so that in a given sample run they are given). Still even if initial conditions are given as coefficients, there can be dynamic variability even with a fixed sample of hyper parameters, perhaps one hyperparameter describes the distribution of drug quantity in pills, so that there are slightly different drug doses at the 10% fluctuation level at each time point where a pill is taken, perhaps also growth depends on temperature and the fever depends on the variable I and some other random factors which we could hypothesize so that we have a random forcing in the growth condition. Now even given all the coefficient values, the outcome at the final time is dependent on the entire random path, so to get a likelihood for the observation given the parameters using a sampling type approach, we need to run the model many times and estimate the probability distribution for the outcomes at the final measurement time. How good do our estimates from a finite number of sample paths have to be in order to get convergence of some kind of MCMC procedure? There doesn't seem to be an obvious way to know how to get these "numerical likelihoods" except in the limit of very large amounts of computing time.
- Random forcing and random measurement error: If we run the model with random forcing as above, and also our measurement device has a random error, how do we determine the likelihood
? We need to compare the distribution of model outcomes to the distribution of errors given the measurement? So this is basically a goodness of fit test? Some kind of Bayesian model selection procedure? I am confused. To be clear what I mean here, suppose that we have the random temperatures and pill concentrations, so that for a given set of coefficients we could run the dynamic model say 100 times and come up with 100 final populations from which we can estimate a final population density function. But in addition to that we have the final measurement data, and a measurement error model. So we want a probability that essentially is a function of two distributions. EDIT: I thought about this, and essentially it looks like we want
where here
if we are doing evenly weighted samples from Model Outcomes, or something else if we are doing an importance sampling type of deal.
It gets even more complicated when the likelihood is related to say a partial differential equation involving 3D fluid flow or something, say a model for the dispersal of radioactive pollution from the Fukushima power plant, where our data are pretty much a randomly sampled set of measurements in a small set of locations but the ocean is big and turbulent! How many times do I have to run an entire ocean fluid flow model in order to get an estimate of the likelihood for one MCMC jump on the parameter estimation procedure?
The 45 minute Turkey recipe proposed by Mark Bittman doesn't work. Or rather, if it does work there are some tricks beyond simply splaying out the turkey. Here are my experiences:
We used a 10 pound Turkey in a disposable aluminum roasting pan. These aluminum roasting pans require support on the bottom, so we placed a cookie sheet underneath to ease inserting and removing the roast. This seems to have insulated the bottom and prevented the two-sided heating that improves the cooking time. So while he mentions the roasting pan in passing, in fact a high quality roasting pan may be an essential part of the equation.
Also, although he mentions 165F temperature in the deep thigh, and we got that within about 45 mins to 1 hr, the deep portion of the breast was still at around 110. It took more than two hours to get the deep breast cooked, and by then the outer breast was scorched. In other words, the techniques I've tried to develop over the last few years of adjusting the temperature and covering in foil and fooling around to get the perfect tender breast and thigh meat took not significantly longer, and produced much tastier results.
On the other hand, the thick thigh did turn out quite tasty. I can imagine a modification to this recipe that would produce excellent results, but not in 45 minutes.
The first thing I would try would be splaying the turkey, starting at 475F for 20 mins to get the browned skin and initial transient of heat, and then covering the turkey and dropping the temperature to say 350 for an hour followed by monitoring every 15 mins and reducing temperature, with a 15 minute covered rest outside the oven at the end when the deep breast comes up to 160 or so, letting the final equilibrium be achieved at 165 throughout.
Ultimately, the splaying process probably reduces the overall time, but it doesn't seem to significantly change the relative time differences of the different portions of the meat. The deep breast near the thigh is always the trickiest part.
Professor Flannery at Georgia Tech has solved the problem of reconciling Gauss' principle, which is directly applicable to non-holonomic systems, with the Lagrange-D'Alembert principle whose application to non-holonomic systems is tricky. There are sort of two versions of the paper, the full version and the easier to read simpler version.
I've just skimmed each of these, but it seems that the key bit is the traditional assumption that
where
represents a perturbation of the coordinate path of a system, and
represents a perturbation of the velocities implied by the perturbed coordinate path. In other words, whether we perturb a path and then take a time derivative to get the velocity, or we perturb the velocity directly, we should wind up with the same perturbation. This does not obviously need to hold, but it is assumed in the traditional development.
Instead, Flannery proposes that for a system with a general velocity constraint
the relationship
is the one that needs to hold.
In other words, there are subtleties, but he has worked out these subtleties and showed that they produce equations equivalent to those gotten from Gauss' principle. I need some time to digest these papers, but they are a welcome advance, because previously I had only read papers that suggested the same equations of motion that Flannery derives should hold for all nonlinear nonholonomic constraints but did not effectively justify why.
My wife sent me a link to a NY Times Video about cooking a Turkey in 45 minutes (actually they take 35 mins in the video).
So of course, given my obsession with modeling the cooking time of Turkeys, and the inevitable need to come up with the physical part of the model, now is as good a time as ever to discuss why this method works. Mark Bittman implicitly implies that its by increasing the exposed surface area. Although this explanation is intuitive, the actual change in the surface area is quite small, only related to the new surfaces he makes with the knife. There seems to be something wrong.
To see what is going on, let's use the Feynman-Kac particle diffusion representation of the heat equation. The temperature at a point x inside the Turkey is related through the heat capacity to the concentration of thermal energy. The thermal energy comes from a large number of packets of heat which diffuse in from the boundary of the Turkey, the surface area that Bittman mentions. No other sources of heat (such as chemical reactions within the Turkey) are considered, and we'll ignore the flow and evaporation of fluids for the moment, an assumption which is not valid, but probably sufficiently accurate for our purposes.
Now a gaussian random walk in 3 dimensions takes some time to go a certain distance. In particular, at a time
, the distance between the origin point and some point where a particle of heat has gone to (in backwards time) is
. Now consider the value of
. Using a nonstandard analysis approach,
is the sum of N iid random variables whose standard deviation is related to the thermal conductivity of the material, and
where
the current time.
Why the
? The answer lies in the fact that the sum of the variances of N iid random variables is the variance of the sum of the random variables
. This follows from the linearity of expectation, the independence of the random variables, and the fact that the variance is the expected value of the square of the random variable. Basically if we square
we wind up with
terms
, and
terms that are the expectation of the product of two independent random variables, which go to zero due to independence.
We are at time
and we want the variance of the sum of all our individual infinitesimal random variables after this amount of time to be independent of the size of our infinitesimal
. This means if we use
random steps so that
, we should scale the variance of the individual random variables so that we do not change the variance in our sum. If we are going to do this, and keep the total variance after a certain time constant (independent of the choice of dt), we need the variance in the random variable to be linear with time, since the standard deviation is the square root of the variance, the standard deviation should scale like
. We've got three directions, x, y, and z, so the total 3D Euclidean distance is the square root of a scaled chi-squared variable with 3 degrees of freedom.
How does this help us? It tells us that the time it takes on average for a packet of energy to come from the boundary and get to some point deep within the turkey goes like
where R is the distance of the point from the boundary. Now the heat doesn't just come from the closest point on the boundary, it comes from a mixture of points on the boundary, and in general what we want is some kind of average. But the point is that when we can make the distance between the deepest part of the Turkey and the surface of the Turkey smaller, we should expect to decrease the time it takes to cook the inner part of the Turkey quadratically.
Let's see how this should work, supposing that Bittman's Turkey would normally take say 3 hours to cook, and now takes 1/2 hour to cook. How much did he decrease the distance from the inner part of the Turkey to the boundary? Let
be the distance after splaying the Turkey, and
be the distance in an un-modified turkey, then
. It seems very reasonable to me based on just watching him splay the Turkey that after breaking it up, the distance between the deep parts of the Turkey and its surface is cut to say 40% of the original value, and this is the type of change we predict we would need to cut our time to 1/2 an hour. In particular by breaking the bird in a few thick places he puts new surface area near the thickest portions of the breast, and he eliminates the interior cavity which brings the high oven heat directly to the interior surface of the Turkey, making the "farthest part" of the breast more like 1/2 the thickness of the breast, not the full thickness of the breast (which it would be if you ignore heating from the inside of the cavity).
Voila, magic, or science?
Each year for the past few years I've been cooking a Turkey for the American holiday of Thanksgiving, and I've been measuring the temperature through time. My goal is to build up a dataset which eventually I can use in an educational setting to give students an idea of the range of mathematical and statistical models from purely data driven, to dynamic and mechanistic.
This year I'm inviting the entire internet to submit their similar data (and I'm writing this invitation early enough to hopefully disseminate the request and let people consider it and prepare in advance).
The idea is this, you buy a Turkey, and you cook it in an oven. You use a meat thermometer to measure the temperature of the turkey meat at three locations (shallow, medium depth, and deep) at some regular time intervals of perhaps around 30 minutes starting at time t=0 right before you put the turkey in the oven. Along with this you provide an oven temperature at each time point (either the set point or if you have an oven thermometer, the actual temperature). Finally, you give some stats about the turkey (weight, whether it's covered, etc).
I'm providing a Turkey Template for you to download and fill out with your data. Once I have at least say 10 Turkeys I will upload a full dataset for all to use.
NOTE: I'm not quite sure how to have you upload the data yet. Most likely I will have to take submissions via email or some such thing.
I have an intuitive argument that says that every function of many finite-variance random variables can be expressed to good approximation as a different function of O(6) random variables, where O(6) means "on the order of" or "a small multiple of" 6.
The argument is related to a recent article in American Scientist about high dimensional geometry.
According to Wikipedia, the formulas for the (hyper) volume V of an N sphere and the (hyper) surface area S of an N sphere are:
and

Notice that both quantities go to zero quickly with large n because they are divided by the gamma function of n. So not only does the volume of a unit sphere go to zero as a fraction of the containing cube, but also it just goes to zero period.
Now to random variables. Imagine you have n random variables each independent and with mean zero and finite variance which we normalize to 1. You are using these random variables to model the uncertainty in some system so you have a function
which depends on these random variables (the function F also absorbs the scale and shift parameters which is why we can assume mean 0 and variance 1 for our random variables). We assume independence because we want the minimum set of random variables needed to model the randomness in our process, and it's easy to create further dependent random variables from the independent ones by combinations.
The squared radius of the vector
is on average
because each
is on average 1 independently. The variance of
is the variance of a sum of N independent random variables
where the variance of
is related to the 4th moment of the
. Therefore the standard deviation is 
Now with
deviation around a radius
, the sample points of this process will all lie in a small band essentially on the surface area of an N-1 sphere. By reparameterizing our vector into a radius r and N-1 other random variables with some dependence we can treat r as if it were a constant and ignore its randomness to a good approximation, reducing our problem to an N-1 dimensional one and introducing some dependence among the remaining variables. However if N is large the dependence is small, and we can repeat the process to get N-2 variables. We can continue to repeat the process until our assumption that the dependence is small and the variance in the radius is small (as a fraction of the radius) is violated at a scale that we care about.
Now back to the hyper-geometry problem. The formula for the surface area of a sphere, and for the volume of a sphere each has a maximum in the vicinity of 5 to 6 dimensions. This suggests strongly the existence of a distinguished scale for the number of variables where our dimension reduction process can no longer take place, after all we are using spherical geometry to argue for our dimension reduction, and when we get near this critical dimension we can no longer argue that the variance in the radial direction doesn't matter, the volume of the n-1 dimensional sphere is on the same order as the volume of the confidence band radially around the n sphere.
So if we're going to model a random process we need some small multiple of say 6 dimensions no matter how many random dimensions the problem really has (ie.
molecules in statistical mechanics). So let's say that this constant is on the order of about 10, which means that 64 independent random variables ought to be enough for anyone (with apologies to Bill Gates who never actually said that 640k RAM would be enough for anyone).
But seriously, this is not a rigorous argument, but it is certainly suggestive that we will often be successful using something like 12 to 24 independent random variables as the "sources of uncertainty" in our model, even if our problem has in fact many many random variables.
In my post from yesterday I talked about how progressive taxation hurts people we probably don't want to hurt, people who are entrepreneurial or who delay their income by increasing their education and similar situations. One of the reasons the progressive tax codes are so bad for these groups is that they are memory free, they treat two people with high incomes this year the same regardless of what their historical situation was. For example a person who spent many years in graduate school to become an expert at something that pays very well and is in their first year of reaping the rewards is taxed the same as someone who had both an enormous trust fund income for many years as well as a high paying job, and recently lost their job so their income is down to the level that the grad school graduate is so happy to be at.
One way to deal with this issue is to tax wealth rather than income. This is in general a more reasonable way to tax people as wealth is associated with ability to consume and with political power, whereas income is more associated with how your wealth and consumption are changing this year. But there are practical difficulties in taxing wealth, it's easier to hide wealth than income, and its harder to measure wealth when that wealth includes things which are illiquid (such as real estate, art collections, non-publicly-traded stock or whatever).
One way to deal with this problem is to make some kind of approximation to taxation of wealth. For example we could tax the integral of income minus some "minimal consumption" which represents what we think people need to consume to live without serious hardship. For example
where
is taxable wealth,
is income, and
where F is food, H is housing and utilities, Hc is health care, and E is education. In this scenario we could all argue about what goes into
and to a great degree we could also argue about
.
For example, currently we have capital gains income which is taxed at a lower rate, and is only taxed upon the sale and "realization" of the capital gain. This is a pretty stupid way to tax capital gains in my opinion. First it discourages people from allocating their wealth in a more optimal way, since each transaction that might improve the allocation is taxed, and second although the goal of taxing capital gains at a lower rate is to encourage savings, in fact it is a highly regressive (as opposed to progressive) tax since people who are quite rich get most of their income from some form of capital gains, since more of the tax burden is then shifted to "earned" income, there is less money available for the less wealthy people to save.
My initial guess is that we would do quite well to organize taxation as follows:
- Mark-to-market all major liquid capital assets annually. This means stocks, bonds, derivatives, and the book value of non-publicly traded stocks, as well as some standard measure of the equity change for real estate (perhaps treat a primary residence somewhat specially). For the uber-wealthy with big art collections and unique homes whose value is hard to determine perhaps there could be a separate law dealing with these kinds of capital assets.
- Annual earned income would be calculated at the end of the year as it currently is from W2 etc.
- Calculate the current year's (year n) taxable income as
where
where T is a characteristic time. A good choice of characteristic time might be around 1/10 of a typical lifetime earnings duration (say age 65 you retire, and age 20 you typically start earning, so 45/10 = 4.5 years and
).
The above method is what's called an exponential weighted moving average. You take the moving average from the last period, weight it by some amount s, and then add the new increment weighted by (1-s). It averages your income over all of your income history, and its effect can be seen by using exponential weighted smoothing on a stock price, it's a very common technique and you can see an example for the exchange traded fund "SPY" with 100, 300, and 450 day exponential weighted moving averages over a 5 year period. The longer the period, the smoother the curve. This is an exponentially weighted moving average of the total value (think wealth) so something like its derivative would be something like exponentially weighted moving average of income.
This method has several advantages:
- It is history dependent, so it taxes people based on not their current income, but a measure similar to the integral of income I mentioned before.
- It is very easy to calculate, and easy to explain on a tax form.
- It allows us to tax people who have significant capital gains based on their trend in capital accumulation, and eliminates much of the noise caused by year to year fluctuations. (at the same time, if you lose a lot in the market it will take time before you stop being taxed on your previous earnings, so there is some need to deal with severe market downturns, where people may not have the ability to pay. That's nothing new though, there have always been people who get rich, owe a lot of taxes, and then go bankrupt.).
- It eliminates barriers to capital reallocation. People can simply buy and sell their assets as they like, the tax is not associated with transactions.
- The time scale for exponential weighting is matched to the time scale of our lives, and the time delay for growing income to be seen in the tax burden gives an incentive to improve ones income.
- By including capital gains in income it taxes people in a similar manner regardless of the source of their wealth accumulation, but by averaging over history it also offers an incentive to save and optimize allocation since growth is not taxed immediately but rather over time.
It's not perfect I'm sure, but it is a realistic and simple way to modify the basic structure of income tax to distribute the tax in a standard way over all people regardless of how they earn their wealth, which also improves the incentive to invest in the future and reap the rewards of education and entrepreneurship.