Further examples of monopoly problems on the internet

2012 May 2
by Daniel Lakeland

See this Slashdot article which mentions that Sony has decided to put a video streaming service on hold because it will be hard to compete with Comcast when Comcast puts a bandwidth cap on all data except data from their own video service. The article mentions that 20% of broadband in the US comes through Comcast.

 

A big problem with the internet

2012 April 27
by Daniel Lakeland

In the early 90's I was a student at Iowa State University. There they had a computer system called Project Vincent, which was based on the MIT Project Athena. I could securely log into any of perhaps 10000 UNIX computers on campus and it looked just like home, and by that I mean all my settings were preserved, I had all my files available, and I could reach any of the files hosted by the University with all permissions and security settings applying across the entire network.

Today this technology is potentially available to everyone if you have an OpenLDAP directory and a Kerberos server and some sort of file server. Setting those up takes some knowledge, but could easily be automated the way that setting up a blog is. Windows, MacOS and Linux all can in theory join into these big "realms" and provide uniformity across all your computers. For example your extended family could all join up and then your laptops, media PCs, desktop computers and other devices would all be symmetrical. Everyone in the family could log into any computer and it would look like "their computer" even if you were visiting grandma or hanging out at your sisters house across the country. Oh sure, we  now have laptops that let you take your computer with you, but then what happens to your files when someone steals it or drops it out of a luggage rack?

What is holding this back? I believe the big problem is the network. When I graduated in the mid 90's I got a DSL connection to feed me my daily internet. It was about 1.5 Mbps down and 0.8 Mbps up. It was "SOHO class" for small businesses so I could run my own server out of my house and at least have a little remote access to my computers.

Today in 2012 I have cable based internet with 15 Mbps downstream and ... wait for it ... 0.8 Mbps upstream. That's an improvement of 10x downstream, and exactly the same speed I had 15 years ago upstream. Oh and it costs a little less, but has an absolutely-no-servers policy built in including some kind of port filtering I believe. So yeah, suck it.

Furthermore, thanks to the slow adoption of IPv6 there are a multitude of NAT (Network Address Translation) problems. The IP address of anything at my home is of the form 192.168.x.x which only allows these computer to be reached directly from within my home.

Let me offer you a simple vision of an alternative world whose technology is available today and actually much of it has been available for years.

Infrastructure in the Alternative World

  1. In my house there is a high end desktop grade computer sitting under my desk. It cost about $600 plus the cost of the 4 x 2TB hard drives in it. It acts as both a metadata server and storage server for the Ceph distributed filesystem. It also provides a net-boot image for other computers in my house.
  2. On my wife's desk is a large screen monitor and a small box containing an Intel Atom Processor and no hard drive. It boots off the network and has access to files on the Ceph filesystem.
  3. On a cabinet in the living room is a large screen monitor and another Atom based workstation. It lets us watch Netflix and Hulu, or any of the DVDs we've ripped to the Ceph filesystem like Winged Migration, on of my 2 year old son's favorites.
  4. In my mother's house, 400 miles away is another heavy duty desktop grade machine that I have set up with another four 2TB HDs that is also a Ceph file server and provides redundancy. This machine also acts as my mothers primary computer on which she casually watches Hulu and looks at pictures of her grandchildren. Both heavy duty desktop machines have a UPS that will last 15 minutes under normal load.
  5. Both my mother and my house have Cable based internet with 50 Mbps symmetric connections to the internet and IPv6 addresses (note that this technology is available TODAY but not rolled out). A small router box at each location provides WiFi, and acts as an IPv6 firewall, and the two router boxes maintain a point-to-point encrypted IPSec tunnel so that machines at my mothers house and at my house reach each other on IPv6 address space under my control. They are also reachable from anywhere in the world if you have an authorized computer that can authenticate itself using strong cryptography (such as the laptops that my wife and I use away from home).
  6. A VPS server costing $15 a month sitting in a data center in Los Angeles provides OpenLDAP which gives a list of users allowed to log into any of my machines, and an Email Address book shared by my entire family. Kerberos on the VPS provides secure authentication and password management preventing unauthorized login. IMAP email, Jabber instant messenger, and SIP based telephone services are also provided by the VPS.
  7. At my house the desktop machine provides redundant secondary services for the LDAP, Kerberos, Email, and SIP services. Since power or internet connection is more likely to go out at my house than the data center I don't want to have the box under my desk be the primary source of these services, but if the data center has a problem perhaps my house can back it up.

Use Cases, or how and why we would want this set up

My laptop is plugged into a high speed network port on campus, and I am working on a draft of a paper. Certain directories on the laptop are synchronized manually to my fileserver so that I can work on them even without a network connection, for this I use unison. I suddenly realize that a different, non-synchronized directory containing a project I worked on two years ago would provide me with an excellent template for a presentation I want to make. But oh no! it's at home! haha I say with evil cackling and rubbing of my hands together (this makes my officemates nervous). With a few clicks I have mounted the distributed filesystem and am easily able to pull the presentation template off my fileserver. Even though it's 25 MB of data, and it's on my home computer, it takes around 10 seconds to download due to my 50 Mbps symmetric network connection. (Todays alternative: I set up a bunch of OpenVPN hackery and use my existing 0.8 Mbps upstream connection to grab things via ssh, it takes 5 minutes to get this one file)

My mother is visiting us and wants to upload the pictures she's taken of her grandchildren to "her computer". She logs into the media PC and voila, it looks just like the computer she has at home! Amazing. She is distracted, checking her email, and then plugs in her camera and uploads 400 MB of image files. This takes a minute or so. Later when she goes home she can see these just as if she'd been at home when she uploaded them.

Today the internet may seem amazing to you, but it seems to me like a series of big heavily sucking monopolistic Cable TV oriented service providers with the motion picture industry and major league baseball and recording industry all massaging their shoulders. They seem to believe that the purpose of the internet is to feed you a neverending stream of video advertising interspersed with enough humorous television and videos of kittens and people getting hit in the crotch to keep you coming back for more. In that world if they could simply shut off your upstream connection they would, because it would let them use that bandwidth for more downstream connections.

If you want something like what I described in this post, you are welcome to beg those same ISPs to string a fiber optic cable to your building and deign to allow you to pay the $3000 per month necessary for a business class connection which most likely still doesn't have IPv6.

There's something really wrong in this industry and it stinks of monopoly.

 

Non replicable research

2012 April 6
by Daniel Lakeland

An article in Nature (paywalled, or read a Reuter's summary) suggests that many scientific "discoveries" are nothing of the kind, being the product of most likely too little skepticism on the part of the researchers that did the work. It's easy to fool yourself into thinking you've made a discovery if you don't take into account the inevitably large amount of filtering that is done during the course of discovery.

Getting a result with a small p value should be the first step followed by a rigorous test of alternative causal hypotheses and replication in alternative hands. But people who are that careful will never get funded compared to those who throw up a flashy paper claiming a novel and truly groundbreaking discovery.

Probability theory: like it or not we're stuck with it

2012 April 5
by Daniel Lakeland

This article in the Christian Science Monitor about the Fukushima meltdown is unfortunate because it's sort of right, but for all the wrong reasons. The assertion that somehow it's a terrible idea to use probability theory to quantify reactor safety is just wrong headed. If you don't use probability theory then what do you use? If you need to take into account all the possibilities then each nuclear reactor must be equipped with equipment capable of moving the earth into a wider orbit around the sun just in case the sun explodes before you can safely shut down your reactor. Or defense mechanisms against alien spaceship attacks. Accounting for all the possibilities is simply not possible.

At the same time, he's right, the use of probability theory in these cases seems to have been very naive. The kind of thing you expect from pre 1970's, pre-computer probability calculations based on simplifying assumptions so you can get to an answer, regardless of how well that answer actually models reality.

For example, assuming independence of the events of losing each of the 4 or 5 interconnects to the power grid is a poor model. Sure, if you're dealing with corroded attachments, or a truck running into a power pole, then they might occur nearly independently, but if a tsunami comes, an event they did plan for, the independence of the various power lines is not a very good assumption.

It wasn't a failure of a fundamental theoretical tool (probability) but rather a failure in modeling which led to Fukushima being without a high enough sea wall or enough high-elevation diesel backup generation or a contract to helicopter in Diesel generators in shipping containers in case of serious failures. I've even read reports that at the time there was paleoseismic evidence for 15m Tsunami which was discounted as too unreliable.

 

An illustration of the difference between the average and the median

2012 April 3
by Daniel Lakeland

Also, what happens when you use the average as a measure of location with a power law distribution:

In humorous comic form.

 

A simple, potentially low controversy proposal for tax reform (expanded standardized deductions)

2012 March 16
by Daniel Lakeland

Tax reform is one of my typical hobby horses, and I've been thinking a lot about it of course because not only is Tax Day on its way (the day after my youngest son's birthday!) but also the timeline just ran out on our family's Flexible Spending Account (FSA) so we were thinking of what things we could do to spend some extra money in a useful way.

The Problem:

The FSA is an example of a well-meaning government program gone wrong. It's an account where you can put pre-tax money, but you have to spend it on a restricted set of allowable medical related expenses, you have to document the validity of those expenses, and you lose any unspent money at the end of the year (it goes to the government). This reduces the cost of healthcare, but encourages people to over-spend on these specifically allowable items vs other things that might be a better use of your money (like for example an improved healthcare plan, as you can not use the money for premiums, or a gym membership or whatever).

A complicated solution:

I've previously advocated for a single account, like a combination FSA/HSA/IRA/Education/401k account. My name for it was HCSA I think, the "Human Capital Savings Account". You'd be allowed to put some large fraction of GDP/capita into this account each year. You could invest the money in investment vehicles like stocks, bonds, real estate trusts, mutual funds, and exchange traded funds (just like an IRA), with no capital gains taxes, and you could spend the money on a whole host of things that we consider to be good for people to spend money on (things that improve the public good). Examples would be education, health insurance premiums, healthcare expenses, maintenance of a primary home, childcare, and after "retirement age" you could simply take the money out for living expenses, like an IRA.

A simpler proposal:

Now, I still think that the HCSA is a great idea, but it does involve a fair amount of administrative costs. Here's a simpler proposal that should work well and could compliment reform of the many-different-tax-advantaged-accounts system we have.

Let's get some of the Census Bureau and other governmental statistical groups together, and have them estimate what it costs for a family of N adults and K children to achieve a "basic level of healthy welfare" and measure this as a fraction of GDP/capita each year. This "basic level of healthy welfare" means that these people in this hypothetical average family have:

  1. A healthy diet, including fruits, vegetables, fish, meat, grains, the whole "balanced diet". And not luxury organic produce necessarily, but certainly fresh ingredients, not just a series of frozen meals and starchy sugary snacks.
  2. Coverage of transportation costs to and from work for a primary earner, either in terms of public transportation or in terms of a typical commuter vehicle and associated fuel and insurance.
  3. A healthy but minimal level of housing, with no pest infestations, leaky plumbing, mold, or over-crowded bedrooms, but also not luxury penthouse condos, just a simple standard that includes electricity, heating and cooling to avoid health effects of extreme weather, and sufficient square footage and access to bathroom facilities that the conditions are conducive to human dignity and health for both the adults and children living in the household.
  4. Some typical tradeoff between a second income and the cost of childcare. Then depending on whether the household has one or two incomes there would be an allowance for childcare expenses. In this case I think we should use the cost of organized pre-schools since they are well known to provide a good public benefit (pre-school children do better in school for at least the first several years).
  5. The cost of some sort of basic health insurance providing emergency hospitalization care, immunizations and pediatric care, and access to a primary care physician through at least local "urgent care" type clinics, including a family co-insurance deductible of say 10% of GDP/capita per year (currently around $5k or so).
  6. Education expenses including the cost of paying property taxes and other taxes used to fund public schools, as well as the cost of a public university education divided over say a 40 year savings and repayment period.

Ok, so based on the above general description, and a bunch of political wrangling, we come up with some measure of the cost of "basic healthy living" in the US. This is substantially more than the "poverty level" currently calculated, and yet should still be an absolute measure based on a set of important goods and services, not simply a certain quantile of income or the like.

Now let's say for a family of 2 adults and 2 children, this comes out currently to (I'm guessing) say $40,000 per year. We simply eliminate all specialized tax deductions, and we allow for this family of 4 to simply deduct $40k from their income and pay taxes only on the amount that exceeds this basic level. Obviously for families of different sizes we need different deductions, so we'd have a formula or table depending on your household size. There would be regional differences of course, but I wouldn't argue in favor of regional scaling. We should simply average over those, under the assumption that if it's more expensive to live where you live then it's probably in part due to greater amenities in that area than average for example.

But this deduction is likely to be substantially more than the current standard deduction. The standard deduction for a family of two working parents with two children in 2011 is $11,600 for married filing jointly. The federal poverty level for a family of 4 is $22,350, so my guess of $40k is for this more improved  "healthy living" standard (which includes healthcare and savings for education and all sorts of things that the poverty level is not intended to measure).

For income above and beyond this "basic level" we simply set a flat tax rate that achieves the needed tax income (say it's around 15% of total GDP or so for the US budget, something like that) so perhaps we need to have a marginal tax rate of 25-35% for income above the threshold.

A flat tax above a threshold provides an easy calculation, avoids a lot of administrative hassle, doesn't significantly skew behavior in strange ways (such as the mortgage interest deduction being partially to blame for the housing bubble) and treats everyone similarly, and yet does not have the "regressive" problem of taxing those who earn very little so much that taxes eat significantly into their ability to provide for a healthy family. Most of the tax revenue would come from income that was going to be used by households to consume "non-essentials", exactly the kind of taxation that economic theory says we should prefer.

An even further step, negative taxation:

The next step, potentially more controversial, would be to take this threshold and add a  "negative taxation" which would give you as a payment from the government some fraction of the amount that you fell below the threshold. So for example for every dollar over the threshold you might be taxed say 30% but for every dollar below the threshold maybe you receive a payment of 30%. I personally think that would be a better way of dealing with poverty than our current system of crazy specialized welfare programs, but let's take things one step at a time.

This reform could be combined with the previously mentioned HCSA concept by allowing an additional smaller amount (say 10 to 20% of GDP/capita or currently $5 to $10k) to be placed into these HCSA accounts encouraging savings for retirement and unexpected expenses above and beyond the "basic expenses" that our census bureau and soforth had determined. Those taking advantage of the HCSA would of course need to deal with the additional administrative costs of such a program, but the core concept of ensuring that people have enough income after taxes to maintain a healthy and basic standard of living would be simplified through the new standard deduction procedure.

 

The molecular pump in the inner ear?

2012 March 14
by Daniel Lakeland

My previous post on Otitis media explains how the cause of ear pain is usually reduced pressure in the middle ear resulting in the external pressure pushing the eardrum inward and causing pain.

The question then arose in my mind, "what causes the reduced air pressure?". In the context of an inanimate pipe or tube, the pressure in the interior stays constant unless you have a pump that sucks out the contents. I noticed as I was clearing my ears multiple times per day that it didn't take long before a cleared ear became a painful one.  This suggests that something is pumping down the pressure in my ear at a relatively rapid rate (a significant fraction of an atmosphere per hour or so perhaps).

So far, I don't have the answer, but I can imagine some hypotheses. Perhaps some blog reader knows the right answer?

Candidate Hypotheses:

  1. The inner ear epithelia respirate through direct absorption of gas from the inner ear rather than primarily through oxygen absorbed from capillary beds. This uses up the oxygen, and could cause up to a 20% decrease in pressure (from the 20% oxygen content of air).
  2. There are capillary beds in the inner ear, but they are actually absorbing oxygen onto the hemaglobin and carrying it away (again up to 20% reduction in pressure).
  3. Everyday actions such as chewing, swallowing, drinking, sniffing and soforth both pressurize and depressurize the ear, but when the eustachian tube is swollen, the depressurization direction becomes like a ratchet as lowered pressure squeezes the tube closed, causing progressive depressurization.

Of course, it could be a combination of these, but I have a tendency to think that something is actually chemically pumping out the gas molecules and that this mechanism contributes nontrivially to the depressurization.

 

What I should have known about Otitis media but didn't

2012 March 12
by Daniel Lakeland

Otitis media is the most common cause of "earache", it's a condition in which the middle ear becomes inflamed, often from an infection. Following the inflammation you have a lot of pain in the ears, and you may have pus build up inside the ear, or a variety of unpleasant things.

In the US, this was a much more common and painful condition when I was a child than for my children (I'm told) and the main reasons are the existence of the Pneumococcal vaccine (PCV) and the significant reduction in secondhand smoke exposure for children.

My understanding of this condition has always been along the lines of the following statement (from wikipedia article on Otitis media):

"When the middle ear becomes acutely infected, pressure builds up behind the eardrum (tympanic membrane), frequently causing intense pain."

This statement makes it seem like the inner ear is filled with a high pressure and it causes your eardrum to bulge outward and then sometimes rupture. But further along in the Wikipedia article they correct this misconception:

"At an anatomic level, the typical progression of acute otitis media occurs as follows: the tissues surrounding the Eustachian tube swell due to an upper respiratory infectionallergies, or dysfunction of the tubes. The Eustachian tube remains blocked most of the time. The air present in the middle ear is slowly absorbed into the surrounding tissues. A strong negative pressure creates a vacuum in the middle ear, and eventually the vacuum reaches a point where fluid from the surrounding tissues accumulates in the middle ear."

This interpretation is consistent with other sites that I've read as well.

So the "pressure" in the middle ear is actually a reduced pressure vs the surrounding air. This reduced pressure causes the flow of fluids from the surrounding tissue into the middle ear, and then the fluid provides a medium for bacterial infection.

Since this is a blog about modeling, let's examine a simple mechanical model for what is going on.

Imagine a tube, open at both ends, with a membrane separating the left and right halves. Pressure outside the tube is the atmospheric pressure. Now let's call the left half of the tube your middle ear, and the right half your outer ear. The outer ear, under most conditions, communicates with very low impedance with the atmosphere (unless you have perhaps a very serious earwax buildup), so the pressure on the right side of the membrane is 1 atm +- whatever the barometer and your altitude are doing today.

On the other hand, the left half of the pipe is thinner, and more flexible, it's your Eustachian tube and it goes from your middle ear down into your throat. Suppose that we inflame the eustachian tube so that the walls become thicker and the inner diameter of the tube becomes smaller. Now the flow of air into the space behind the membrane is impeded. With enough inflammation, and enough fluid droplets in the tube, we can stop the flow of air entirely. As soon as we do this, the air trapped in the middle ear ceases to be resupplied and processes going on in the middle ear which absorb the air cause the pressure to decrease. As the pressure in the middle ear decreases, the eustachian tube's pressure decreases and the external atmospheric pressure tends to clamp this tube further shut. In other words, we have a feedback mechanism. The opposite case, of high pressure inside the tube, would tend to cause stretching of the tube, and widening of the diameter. It is possible for this high pressure blockage to occur, and it's called a reverse block by divers because it occurs when ascending to the surface, but the reverse block is a less stable process, it requires that the tube be blocked by a relatively mechanically strong blockage. The vacuum condition on the other hand would seem to feed back on itself, so that a small blockage can become worse and worse.

Also, once we have reduced pressure inside the ear, the flow of any fluid into that space is energetically favorable (fluids flow from high pressure to low pressure) so in this case liquid is extracted from the tissue surrounding your middle ear and fills the middle ear cavity. This liquid contains proteins and things that form a broth that bacteria can grow in.

So when your ear hurts like crazy, it's most likely because the external pressure in the atmosphere is higher than the pressure inside your ear, and your ear drum is being forced deep into your ear. If you know this, and you detect the blockage early enough, gentle Valsalva maneuvers such as the ones that SCUBA divers routinely use to "clear their ears" while diving might be able to help avoid the feedback loop and a lot of pain and suffering. Of course, you might do some damage if you over pressurize so you really must be gentle, and as every SCUBA diver knows, it's better to equalize early and often.

Another thing that we can remember is that inflammation of the eustachian tube is the root of all this evil. Anti-inflammatory drugs such as ibuprofen, and membrane shrinking drugs like pseudoephedrine used in combination with things like saline gargling to flush out irritating mucus from your throat might help you get air back into your middle ear. Also taking these drugs are not just treating the symptoms (pain) but also will prevent or reduce the fluid buildup that leads to a bacterial infection broth.

Anyway, when I was feeling under water yesterday and could barely communicate with my wife I finally looked up some info on all of this, and the resulting theory helped me adjust my method of treatment so that by this morning I was feeling much better. I hope it helps some of you and I wish this was more commonly understood, as it might have led to a lot less suffering through earaches when I was a child.

 

The new chic

2012 March 8
by Daniel Lakeland

I'm sick with one of those upper respiratory infections you inevitably get when you have kids in pre-school. And when I'm sick I have no tolerance for brainy activities, so I tend to gravitate toward cheap television. Today after watching one of my favorite sick-day shows I tried out a few episodes of Castle on Hulu, a lowbrow murder mystery drama with all the latest television clichés. I was amused to find in Season 4 Ep. 15 that the latest cliché is the beautiful 30 year old female PhD in Applied Mathematics who was working on "statistical models of the effects of climate change" and turns up dead in her house due to her involvement in secret multinational spy hijinks.

 

Follow up on building envelope problem from LBL

2012 February 16
by Daniel Lakeland

Over at Andrew Gelman's blog, Phil and I have been having a conversation about issues related to the building envelope problem I blogged recently.

He rightly points out that we can't reduce the problem to one equation, the second equation, which I didn't mention explicitly, is the conservation of mass equation 0 = \epsilon_{hg} (p_{hg}/p_0)^{n_{hg}} - \epsilon_{go} (p_{go}/p_0)^{n_{go}}. This equation is theoretically exact, in the sense that conservation of mass is most likely exactly true, down to the last molecule, but in fact this equation has some assumptions, first of all it would only be exact if there was absolutely no compression of the air in the building, otherwise you need a differential equation for the rate of compression and soforth, but this is probably a small and unimportant effect. A further problem is that the equation is supposed to be for flows, but we are predicting the flows as power laws of the pressure, and that could be not quite right. However, whatever error in prediction we have can be thought of as combined in this equation so that for example the left hand side might be modeled as a normal random variable averaging around zero. On the other hand, perhaps the errors are systematically different at different flow rates. In essence that is saying that the coefficients or the exponents are not really a single constant over the entire range of conditions. That might be another form of modeling error.

In addition to a modeling error induced by the simplified model for flow, the measured values of pressures still have measurement error. In that sense, there is an error term in the equation when the p values that appear are measured rather than some theoretical exact values. This produces something like:

err_{\mathrm model} = \epsilon_{hg} ((p_{hg}+err_{hg})/p_0)^{n_{hg}} - \epsilon_{go} ((p_{go}+err_{go})/p_0)^{n_{go}}

where now the pressure values are measured and the err_hg and err_go are the measurement errors for the pressures.

 

Since we're allowing a variety of errors, perhaps it's best to separate them. Looking at the full two sets of complete equations we have (with P values being measured):

\frac{(Q_{ho}+err_Q)}{Q_0} = (1+\epsilon_{ho})(\frac{(P_{ho}+errP_{ho})}{P_0})^{n_{ho}} + \epsilon_{hg}(\frac{(P_{hg}+errP_{hg})}{P_0})^{n_{hg}} + err_{\mathrm model 1}

 and

err_{\mathrm model 2} = \epsilon_{hg} (\frac{(P_{hg}+errP_{hg})}{P_0})^{n_{hg}} - \epsilon_{go} (\frac{(P_{go}+errP_{go})}{P_0})^{n_{go}}

Perhaps though this form helps us put some priors on the size of the modeling errors and focusing on them, perhaps there is some bias in these errors as for example if \epsilon_{hg} is small then there is a large pressure difference between the house and the garage, but there is a small flow, and since there is a small flow, there is a small pressure difference between the garage and outdoors, so we're operating in different ranges of the power law and might expect modeling errors not to cancel out but rather have a bias!