I'm getting ready to run an in-class experiment with some 5th graders from the local elementary school. It's going to be about fundamental aspects of doing science, and it's going to be both very simple, and very deep at the same time. I hope it's successful. The idea goes like this:
We'll talk about measurement, and about doing science. We'll talk about the science facts that they learn (there are electrons and protons, the heart is a muscle, it pumps blood, clouds are made of water that evaporates from the ocean etc). We'll also talk about tools to do science with, measurement, how to connect measurements to facts we're interested in (models) and how we sometimes can't measure directly the thing we are interested in, and also how we can't measure things accurately (and the difference between accurate and precise).
The activity we're going to do, is to figure out how much a piece of soda straw weighs by measuring something that we can measure easily and accurately (its length).
Now, ask the kids if you took 1mm off the end of the straw, and weighed it super-accurately, how would you calculate how much 205mm of straw would weigh? Get the idea that 205mm of straw has the same weight as 205 1mm pieces into their head, and the idea that no matter where you slice the straw if it's 1mm long it will weigh basically the same weight (except with the accordion part).
We'll cut some soda straws. I've found it probably makes better sense to ask very specific tasks from specific people, so ask someone to cut a straw about 10, 25, 40, 60, 80, and 100 mm long. Then measure the actual length. If you ask them to cut a variety of lengths you'll wind up with a lot of 45,50,55mm type straws, you want a range from mid 10's to 90 or 100mm. We also want about 20 pieces that are around 5-10mm, just short pieces we'll be choosing randomly later.
Ask the kids for their guess as to how much 100mm of straw weighs. Get an order of magnitude estimate at least. (1g? 100g? 0.1g? etc give them a point of reference by measuring a pencil or a penny on the scale). Come up with an "upper end" estimate and in your head think of the uniform prior on as the prior for the slope of the line in cg/100mm.
Then, using a scale that measures down to 0.01 g (available on Amazon for like $10) we'll weigh the soda straws. Use a small cup as a weigh boat, and tare the scale to the cup weight. I used a medicine measuring cup from children's Advil, but a Dixie cup or lightweight plastic or styrofoam would be ok too.
Each time you weigh a straw, select at random 5 of the 20 small pieces of soda straw and add those to the cup. Write down the weights in cg (hundredths of a gram) next to the length in mm.
Now we've got a table of numbers. Plot the numbers on a graph paper. I just ran this experiment today, and I strongly recommend printing out a customized graph paper with the scales already in place. Kids had trouble preparing the scales to fit onto the page and cover the range of the data.
Once you've got the data points on the paper. Discuss with them about the added extra 5 pieces, and how this means all the data points are "too high". Also talk with them about how the line you're looking for has to go through (0,0) because zero length of straw has zero weight. Consider also that because we always chose 5 pieces, the errors should all be similar in size to the weight of 5x the "average length" of the straw bits.
Choose candidate lines and have each child graph a different candidate line on the graph paper. Have the children calculate the error for each data point, the distance in cg between the data point and the prediction on the line for that data point. Write those down in your table. Also calculate the slope of the line (cg/100mm for example, read off the prediction for a 100mm length).
Now, calculate and where is the slope of your line, and is the guess at the "average length" of the small bits you polluted the measurement with. If any of the individual errors is outside that range, call it an unacceptable line. Calculate the total error, if the total error is not between and it's not an acceptable line. Otherwise call it an acceptable line.
Put all the acceptable lines on one graph paper. Choose the one in the middle of the range. This is our estimate of the median of the posterior distribution of line slopes under a declarative model:
p(m) ~ uniform(0,u) p(error | m,L) ~ uniform(3mL,7mL) p(\sum error | m,L) ~ uniform(4NmL,6NmL)
Ask them to predict the weight of one of the longer straws, say near 100mm, and then weigh just that straw without the polluted measurements, and see how well we predict it using the median posterior value gotten above.
Watch out for outliers, where kids transpose digits or switch the weights between straws etc. This model won't handle those at all, there may be no lines that are acceptable if you have outliers.
For reasons discussed previously I believe that every scientific measurement lives on a finite sample set. But, it is tiresome to work with enormous explicit finite sample sets. like for example the actual vales that a 64 bit IEEE floating point number can take on... They're not actually evenly spaced for example. What we tend to do is deal with discrete samples spaces with explicit values when the set is small enough (2 or 10 or 256 or something like that) and deal with "continuous" distributions as approximations when there are lots of values, and the finite set of values are close enough together (for example a voltage measured by a 24 bit A/D converter in which the range 0-1V is represented by the numbers 0-16777215 so that the interval between sample values is about 0.06 micro-volts, which corresponds to 0.06 micro amps for a microsecond into a microfarad capacitor, or around 374000 electrons).
Because of this, the nonstandard number system of IST corresponds pretty well to what we're doing typically. Suppose for example x ~ normal(0,1) in a statistical model. We can pick a large enough number, like 10, and a small enough number like and grid out all the individual values between -10 and +10 in steps of 0.000001 and very rarely is anyone going to have a problem with this discrete distribution instead of the normal one. Anyone who does have a problem should remember that we're free to choose a smaller grid, and their normal RNG might be giving them single precision floating point numbers that have 24 bit mantissas anyway... IST formalizes this by some stuff (axioms, lemmas etc) that proves the existence, in IST, of an infinitesimal number that is so small no "standard" math could distinguish it from zero, and yet it isn't zero.
So, now we could say we have the problem of picking a distribution to represent some data, and we know only that the data has mean 0 and standard deviation 1. We appeal to the idea that we'd like to maximize a measure of uncertainty conditional on mean 0 and standard deviation 1. In discrete outcomes, there's an obvious choice of uncertainty metric, it's one of the entropies
Where the free choice of logarithm is equivalent to a free choice of a scale constant which is why I say "entropies" above. Informally, since the log of a number between 0 and 1 (a probability) is always negative, then the negative of the log is positive. The smaller you make each of the p values, the bigger you make each of the values. So maximizing the entropy is like pushing down on all the probabilities. The fact that total probability stays equal to 1 limits how hard you can push down. So that in the end the total probably is spread out over more and more of the possible outcomes. If there are no constraints, all the probability become equal (the uniform probability). Other constraints limit how hard you can push down in certain areas (ie. if you want a mean of 0 you probably can't push the whole range around 0 down too hard) so you wind up with more "lumpy" distributions or whatever depending on your constraints.
The procedure for maximizing this sum subject to the constraints is detailed elsewhere. The basic technique is to take a derivative with respect to each of the values and set all the derivatives equal to 0. To add the constraints, you use the method of lagrange multipliers. The result would be each and the will depend on in our case, and the chosen to normalize the total probability to 1.
Now, suppose you want to work with a "continuous" variable. In nonstandard analysis we can say that our model is that the possible outcomes are on an infinitesimal grid with grid size and constrained to be between the values for a nonstandard integer. So the possible values are for all the i values between 0 and . We define a nonstandard probability density function to be a constant over each interval of length dx, and the probability to land at the grid point in the center (or left side or some fixed part) of the interval is .
Now we calculate the nonstandard entropy
Now clearly the argument to is infinitesimal since p(x_i) is limited and is infinitesimal, so is nonstandard (very very large and positive). But, it's a perfectly good number. There is a finite number of terms in the sum so the sum is well defined. The value of the sum is of course a nonstandard number, but we could ask, how to set the p(x_i) values such that the sum achieves its largest (nonstandard) value. Clearly is going to be the same kind of expression as before, because we're doing the same calculation (hand waving goes here feel free to formalize this in the comments) so we're going to wind up with:
Where refers to the nonstandard function which is constant over each interval, the standardization of this is going to be the usual normal distribution.
The point is, just because the entropy is nonstandard doesn't mean it doesn't have a maximum, and so long as the maximum occurs for some function of x whose standardization exists, we can take the standard probability density that is chosen as the maximum entropy result we should use, and this procedure is justified in large part because of the way that the continuous function is being used to approximate a grid of points anyway!
If you don't like this result, you could always use the relative entropy (ie. replace the logarithm expression with relative to a nonstandard uniform distribution whose height is across the whole domain . This seems to be the concept referred to by Jaynes as the limiting density of discrete points. Then, the values in the logarithm cancel, and the entropy value itself isn't nonstandard, but the distribution is, so it's still a nonstandard construct. Since is just a constant anyway, it's basically just saying that by rescaling the original one via a nonstandard constant, we can recover a standard entropy to be maximized. But... and this is key, we are never USING the numerical entropy value itself, except as a means to pick out a probability density which turns out to have a perfectly well defined standardization, namely the normal distribution.
If you have a wireless router with both 2.4GHz and 5GHz bands, then the question arises as to whether you should allocate different ESSID (network names) to the two radios.
Up to now, I've been an advocate for a single ESSID and let the clients decide which network to get on. But, I've found that there is a case for forcing some clients to one band or another.
In particular, the FireTV Stick that we have is in a fixed location where the 5GHz band reception is good, and there are no other 5GHz devices nearby. Yet, it will sometimes flop back and forth between the 2.4 and 5GHz bands, generally resulting in stuttering video and/or problems with other 2.4GHz only devices.
Solution: Run OpenWRT or another free software high quality router distro, and run an EXTRA ESSID on the 5GHz band (only works on hardware that supports more than one network name per radio). This gives you the best of both worlds. Both 2.4 and 5 GHz are available to mobile clients with dual radios, but if you want to force something, especially a fixed location radio, onto the 5GHz band, you can connect to the extra 5GHz only ESSID. By bridging that extra ESSID onto your LAN, it just acts as a different access point to the same network.
So, with the FireTV forced onto foo5 and your cellphone on foo, can you use the FireTV remote app and connect? Yes, thanks to the network bridging they are on the same broadcast network so they see each other.
Ok, so I've advocated with some of my friends for a Universal Basic Income (UBI). The basic idea is this, if you are an adult citizen of the US, you get a social security number, and you register a bank account, and you get a monthly direct deposit pre-tax from the government. A flat amount that everyone receives just for being a citizen. The goal here is to simplify vastly the requirements to provide a basic social safety net as well as eliminating the complexity of programs like the progressive income tax with millions of specialty deductions etc.
The UBI eliminates the fear of being without income on the very short term (days, weeks, months), and lets people take risks, be entrepreneurial, take care of families, weather bad events better, etc. It also takes care of pretty much everything that a progressive tax rate structure is supposed to do (help poorer people who spend a lot of their income on basic necessities). So once a UBI is in place, you can VASTLY simplify the administration of an income tax, and you can eliminate all sorts of specialized subsidies that current require a lot of administrative overhead (checking that people qualify, running housing projects, providing specialty healthcare programs etc).
The UBI doesn't work for the mentally ill, so they will continue to need specialty help in addition, but for everyone else, it's a very efficient way to do what we're currently doing in very inefficient ways.
But, this isn't a post about the merits, it's a post about order of magnitude estimates for the quantities of money involved.
According to Google there are people in the US.
The federal Budget is currently about dollars, with about in defence, the rest in various social services and interest on debt.
Let's take as an order of magnitude estimate of a good choice for UBI as the 2015 federal poverty guidelines. That's about per year, or about $1k / mo.
So, if we just started shipping out cash to everyone at the rate of 12k/yr how big is that as a fraction of the federal after-defense budget?
So, to first order, the entire non-defense budget is about the same as the amount of money you'd need to spend on a UBI. But the UBI can replace a *lot* of other government programs. Social security, medicare, housing and human services, a big majority of what we're spending this budget on is basically doing an inefficient job of helping people.
I don't advocate gutting all of the government programs and replacing them with a UBI, but I imagine I could easily get on board with gutting 60 or 70% of them and replacing with a UBI.
Besides reducing the overhead of government, you'd need to increase revenue. The UBI would drive sales, and a flat federal sales tax would be a very simple way to take care of this extra need for income. A sales tax would be also a consumption based tax, which has good economic consequences (it encourages saving and investing vs income taxes which discourage earning and encourage consumption!)
So, our order of magnitude estimates show, this is a feasible plan. It's not something that would be easy to transition to in a blink, but it could be done a lot easier than setting up a universal medicare system for example. A UBI accomplishes things that both the liberal and conservative groups in politics wants: helping people, while being efficient, and encouraging growth and entrepreneurialism. It's an idea whose time has come:
(see this WaPo article on how a UBI like thing helped native american populations for some empirical information)