One of the themes of this blog is the relationships between mathematical models and the actual things in the world they are intended to represent. A theme of the Entsophy blog on the other hand is the difference between probability in the Bayesian sense (as a measure of how credible a particular value of an unknown physical quantity is) and frequencies in random trials. For the most part, I agree with Joseph about what a Bayesian probability is, and I agree with him that the world is not a random process that spits out independent and identical trials. But, there is a sense in which sequences of "random numbers" are meaningful to probability. What is that sense?
First, let's consider how we'd study the mathematics of probability? Could we talk about the "credibility of scientific claims" in a mathematical definition? Certainly not. Whether a claim is credible or not is a scientific question, a question that requires observation of the actual facts of the world. Mathematical objects need to be purely logical things. We can't have the truth of a theorem dependent on whether or not we observe a particular object somewhere in the universe.
So probability theory (ie. the algebra of "random numbers") must be about something else. In particular there are a variety of different definitions. But I'm going to try out one here using nonstandard analysis (internal set theory) and ideas from Per Martin-Löf's paper which I've blogged previously. I'm not going to be too concerned about strange mathematical objects such as probability measures on function spaces etc but we'll talk about some very basic ideas, like univariate continuous random variables.
A probability density is a function which assigns to each x value a non-negative density and whose integral over all of is equal to 1. Any function which satisfies this is a valid density, and we can define some non-standard densities like delta function densities etc as well thanks to our use of nonstandard analysis.
But functions are difficult to compute with. Sure a few functions are easy to specify symbolically and/or evaluate numerically, and we can approximate functions using things like Chebyshev series or other orthogonal or even non-orthogonal bases, but ultimately they are not very easily computable things. It's useful then to have an alternative representation for the density .
A sequence where is a nonstandard integer, is an unlimited sample from the density iff whenever is a standard interval . A nonstandard sample is a representation of the density in the sense that we can compute the probability of any standard region of space by counting the number of samples in the region and dividing by the total number of samples (to within an infinitesimal). Determining whether a number is in a region, counting and dividing are all very simple computational tasks. There is then a very natural mathematical correspondence between a density, and a very big bag of numbers. But this mathematical correspondence is not a scientific fact, it's a mathematical definition.
So far we haven't talked about "independence." From Per Martin-Löf´s paper (which I've blogged previously) essentially "random is as random does". That is, we define random sequences by reference to whether a test based on basic standard probability theory concepts passes the sequence.
The mathematical definition of independence says that variables and are independent if . We can take two disjoint subsequences of our big sequence and ask if they define independent random variables by essentially demanding that the counting version of the above equation holds.
where we choose half of the to represent and the others represent and then and are any arbitrary standard regions of the domain of .
Finally, we could say that are independent and identically distributed random samples from the distribution if almost every standard set of disjoint subsequences have this property. Obviously, we won't get this property if say A is all the positive values and B is all the negative values, but there are different ways to make the sequences and (we divide by 2 because of the possibility of simple label-switching). Since is nonstandard, that's a really LOT of sequences, so if we require that the vast majority of them satisfy the independence criterion, then we're "safe". And what we consider the "vast majority" we could set to be an level for our test. But beyond that we want it to be true for all standard integer number of subsequences, so this would include tests for mutual independence of 1,2,3,4,5,6,7,8,... random subsequences selected in almost any way.
I hope that I haven't overstepped. I mean, I haven't actually done any proofs that these things are equivalent to other more standard definitions, but I'm working on intuition of what this means here and I suspect it will work out fine. The point I want to make though, is that independence isn't a physical thing, it's more a property of "coming from all the different parts of the space". When things are independent, they won't often come together in small regions of space (notice how the right hand side of the above equation is divided by ). When they're independent, any old subsequence will do a good job of summarizing the density. When things are independent we get good coverage and exploration of the space, and that's it... it's not a physical thing independence.
All this is to say, big bags of numbers are nothing more than an equivalent but computationally more useful way to represent density functions. They aren't a philosophical way to define probability, they're a mathematical isometry.
So when it comes to philosophy, we want to distinguish between this mathematical convenience, and the use of probability in answering scientific questions. I guess this post is damn long now, so I'll have to get to the Bayesian philosophy later.
Joseph over at Entsophy has posted about some theoretical physics, namely the problem of what constitutes a Newtonian inertial reference frame. I have some comments over there that aren't terribly well developed, but I thought I'd ruminate here on this topic with a little bit of Nonstandard Analysis (NSA) flavor as well.
Suppose that we are in a spaceship somewhere in the solar system. Do we experience centrifugal forces? The answer seems to depend on our state of rotation, but rotation with respect to what? Newton more or less says to consider the distant stars as fixed, create a coordinate system relative to them, and determine if we are rotating relative to that coordinate system. In practice this produces a frame in which if we are not rotating, we experience essentially no centrifugal force (or so small as to be undetectable).
But this is a global procedure, it requires us to see the photons from extremely far off stars and by using those photons determine where those stars appear to be relative to us, and then construct our coordinate system. Is that essential to the procedure, that the apparent position of far off masses is involved?
Suppose instead we want to construct a coordinate system that is local to us, that doesn't involve masses that are enormous distances away. Can we do it? How?
Imagine we are conducting an experiment to observe some masses and determine whether they experience centrifugal (and/or coriolis) forces relative to our coordinate system. Let's first define a time-scale for our experiment is the time it takes by our lab clock to conduct the whole experiment. To make this concrete, suppose we're interested in human scale objects and processes, so is somewhere between say 1 second and 1 day. Or in any case, it's isn't something like a billion years over which we're supposed to observe galactic events and in which the far off stars actually clearly move.
Next we will define a length scale. So, there is one obvious length scale, which is . This defines a communication horizon, motions we carry out after the start of the experiment can not be felt by objects farther away than this until after the experiment is done. But we may be able to define another length scale which could be of use. Define the force applied by any object on our objects of interest as . Now define a length scale by where ranges over objects farther away than and is any subset of objects beyond distance . Ok, in reality physical measurements are always representable as standard numbers, the use of infinitesimal nonstandard numbers must be interpreted in terms of quantities too small to measure, the threshold is different for different measuring instruments, but we take it for granted that for any given laboratory there is some unspecified level below which forces will be so small as to be unmeasurable. For example good luck measuring the gravitational attraction of two electrons in a lab here on earth.
So we have a length scale that defines a horizon beyond which no individual object produces any appreciable force, and the sum of any group of objects produces no appreciable net force. This is I think a stronger horizon. Why? Well, for example I might have some delicate apparatus here on earth, and it measures the motion of some tiny slightly charged droplets of water in still air for example. The whole experiment might take say 1 minute, but I could imagine being able to just barely detect the tidal effect of the sun in this experiment even though the sun is 8 light minutes away. That is, the retarded field of the sun is still relevant even if the sun doesn't feel the wiggle of my droplet's motion until another 7 minutes go by.
Nevertheless, this radius is distinctly local, in that it's much smaller than the distance to the nearest galaxy for example. In fact, if we can define some cosmic radius as encompassing essentially all the visible mass in the universe, then for our purposes.
Now, we can observe our objects doing their dynamic dance, whatever they are, perhaps we're in a spaceship and we are spinning a bucket of water, or watching charged particles interact, or playing catch the stick with a quadrocopter (see link for YouTube example). Overall we'd like to determine from what point of view do the dynamics occur without any superfluous centrifugal or coriolis accelerations? To figure this out, we can make reference to the positions of any masses within our horizon (but not to the "distant fixed stars"). The important thing here is that we need a notion of "superfluous". Clearly, if we're here on earth and want to study a spinning bucket of water we will say that there are centrifugal forces involved. But if we're doing it while seated on a merry go round there will be additional rotational effects involved. It would be simpler to view the whole thing from the ground rather than the merry go round itself. If we're watching the weather, it might be simpler to describe the forces from the point of view of a coordinate system that doesn't rotate with the earth around it's axis of spin. For example a coordinate system centered at the center of mass of the earth and with one axis pointed between the center of mass of the earth and the center of mass of the sun, the other being the axis of the earth's rotation, and the third being perpendicular to the other two. (yes, I know the earth also orbits the sun, but for the sake of argument perhaps that effect is infinitesimal on our time scale)
So, clearly the notion of a "preferred" or "inertial" reference frame is related to the complexity of the forces involved. So we can fix a coordinate system in our lab, and then determine a rotating coordinate system relative to the lab coordinate system by specifying a vector around which our inertial coordinate system is rotating and the length of this vector is the rate of rotation. If we orient our measurement instrument along that axis and rotate it at the appropriate angular velocity all the dynamics will be "simplest" in terms of rotational forces. Perhaps we can define the net rotational force as something like where are the individual centrifugal forces.
The next question is, in this coordinate system, what is the distribution of angular velocities of all masses within our local shell? If this procedure essentially constructs a zero mean angular velocity coordinate system for the objects that have appreciable affects on our observed experiment, then when you expand your shell to the universe it's not surprising that the distant stars (which make up the vast majority of the universe) are "fixed".
For the moment, that's as much time as I have to devote to this topic, but perhaps I'll come back or get some interesting comments here.
The vast majority of middle class residences have at least one WiFi router these days. Usually connected to cable or DSL internet services. These internet services work a lot better and cheaper than doing everything over your phone's data connection, but they are inconvenient without wireless distribution throughout your house. Many people get a WiFi router with their cable/DSL set modem up, and a few buy an after-market one and set it up themselves. Most of these devices come with some default settings, enough people just keep those defaults that you will frequently see network IDs (ESSID) that look like "linksys" or "NETGEAR" or some other default name. (In the following I assume mostly US based rules and regulations)
Unfortunately, WiFi doesn't have much in the way of non-interference standards. So for example routers out of the box don't periodically search for the "best" channel and switch to it. Doing so would cause a brief interruption of service, but it wouldn't be that hard to take samples throughout the day of the channel usage, and whenever a very low traffic moment arrives, switching to the "best" channel. Few people would notice a 1-3 second downtime at 2am for example. There's the possibility of some interesting and strange iterative dynamics with such a scheme I suppose. As it is, I see routers tuned to all sorts of channels. For 2.4Ghz there are channels 1-11, and I can see 1, 3,4,6,7,9,11 nearby my house. Few everyday non-tech-geeks know that there are actually only 3 independent channels. These are (if you're a geek, say it with me) ... 1, 6, and 11 (this is different for the newer 5Ghz band where channels range from 36 up to 165 and some are forbidden). Channel 1 interferes with 1,2,3,4,5 but not 6-11, channel 6 interferes with 2,3,4,5,6,7,8,9,10 but not 11, and channel 11 interferes with 7,8,9,10,11. In Europe and some other countries, there may be an extra set of channels 12-14 so that you can get 4 non-interfering ones, 1, 6, 11, 14. In some circumstances where there's sufficient separation between the stations, 1,5,9,11 can be used with only minimal interference but added channel capacity that makes up for it. That kind of deployment requires that you have control over all the stations, like inside a hotel or shopping center, since other people are not likely to choose the right channels.
Another thing access points don't usually do is automatically scale back their power output. In fact, manufacturers tout their high power outputs, and they typically come off the shelf configured to max out the power. High power is useful in a point-to-point link where two wireless devices have directional antennas pointed at each other and are trying to send data over long distances, from a few hundred meters to kilometers in some cases. But in a typical single family home or apartment complex, high power output is a nightmare that doesn't improve the range, but does dramatically increase interference. In one example, a wireless networking conference found that their hotel WiFi was terrible, and the biggest fix was to go around turning off access points and turning down power output until the interference was eliminated.
As an analogy, imagine the following scenario, you are in a parking lot with a megaphone. You are trying to have a conversation with your wife (husband) about what items to buy in the store. Your partner has no megaphone. When you're sitting across from each other in your car, the fact that you are pointing a megaphone at your wife and blasting her in the face with 100dB of sound is probably not going to improve your chances of communicating clearly (try it some time!). And, when she's all the way across the parking lot at the entrance to the store, blasting in your megaphone "Did you get the eggs?" doesn't really help, because without a megaphone on her end, you just can't hear her reply. In the mean time, everyone else in the neighboring cars suffers through hearing every detail of your shopping list when you could be quietly talking inside your car without the megaphone.
The same thing goes for WiFi. I was recently trying to improve my WiFi performance and found that I could turn my access point down to 13dBm and still get just as good performance throughout my house as when I had it cranked to the default of 18 dBm. That's about 1/3 of max power output. Sure, it meant that my signal wasn't FULL BARS at the street in front of my house, but that's actually a huge advantage for everyone. I don't need full bar streaming at my curb, and across the street from me, they don't need my WiFi router to be interfering with their laptop. Whereas in fact my across the street neighbors have their power cranked up in such a way that on my front porch their signal is roughly 83 dBm which is a respectable signal strength. They might be able to sit on my porch and stream their netflix if they had a high enough power laptop wifi.
So what is the upshot of it all? Please go to your wireless router and select a channel from 1,6,11 and turn down your power output to about 12 to 17 dBm (for most people this will work). You can check your wireless signal strength using Wifi Analyzer, the Android phone app. Walk around and make sure your coverage works well inside your living area, and in important outside areas (on your deck or whatever) but not across the street or in the apartment 4 doors down the hall. Finally, if you need to extend your range, consider rather than turning up your power output, to add another station connected via wired ethernet to your main router, on a different channel, with low power, and placed in the middle of the "worst" signal area. This is the same as bringing your wife closer to you in the parking lot instead of turning on the megaphone, your neighbors will thank you. Of course sometimes running the wired line is a pain, in which case, try repositioning your main router, and if that doesn't work, turn the signal up slowly until you get coverage.
Perhaps consider discussing this article politely with your closest neighbors if they aren't using channels 1,6,11 or are using a default high power output and leaking a lot of signal into your dwelling.
I submitted my paper Grain Settlement and Fluid Flow Cause the Earthquake Liquefaction of Sand to Proceedings of the Royal Society A and just received reviewer comments today. This preprint summarizes what I've found in my PhD studies with regards to how soil liquefaction works. (Also some more details on the derivation).
The key finding is that contrary to every textbook definition of liquefaction, fluid flow is not ignorable, in fact fluid flow dynamics is a key component of how liquefaction occurs. The standard description of liquefaction involves "undrained" contraction of soil grains during shaking causing an increase of fluid pressure, and a transfer of total stress from the grain-grain contacts to the water pressure. If the water pressure gets high enough, the grains carry no stress and hence can not transfer shear stresses. At this point the slurry flows like a liquid.
Although it's been known that "void rearrangement" could occur (in which grains move one way and water moves the other), it was treated as a special case of soil liquefaction behavior. However, centrifuge scale-experiments and specialized tabletop experiments performed in the last decade have called this in to question, and the standard triaxial tabletop experiments involving 10 to 20cm sized samples of soil require enormous deformations that are unrealizable in-situ before water pressure exceeds total vertical stress. Why is soil so hard to liquefy in a triaxial machine?
Tabletop triaxial experiments involve sand and water wrapped in a completely impermeable rubber membrane, so that the total mass of water inside the membrane remains constant. In the soil, water can conduct from one place to another through the inter-grain spaces. The standard description assumes that such conduction is slow compared to the duration of an earthquake, and this assumption is so strong that some researchers have actually tried to compensate for the bulging of the membrane in these tabletop experiments under the assumption that even the slightest conduction of water causing the membrane to bulge would be more than the actual conduction in-situ.
The truth may be hard for the Geotech community to swallow. My analysis shows that for typical loose sands, water pressure transmission via fluid flow occurs over tens of meters on timescales of substantially less than a single earthquake loading cycle. In the paper I derive a dimensionless equation that describes the phenomenon in the top ~ 10 meters of surface soil and I show that the equation can be solved for equilibrium conditions to determine approximately the water pressure field. We can use an equilibrium solution rather than a dynamic time-varying one because equilibriation occurs so quickly compared to the earthquake duration. I then solve the equation for a variety of special cases that are relevant to actual physical experiments and show how the equation predicts the qualitative results of those experiments.
Suppose we'd like to design some system to perform in an "optimal" way. One intuition I've often had is that "Optimal" is rarely worthwhile in real situations. It is difficult to specify a criterion in an exact enough way which makes the "optimal point" under your criterion actually "optimal" in any sense. Instead, it's better to settle for "Good" or if you're lucky "Pretty Darn Good". In other words, once you've climbed most of the way up the mountain, it really doesn't make much difference if you are standing on the very top, especially if your surveys can't really tell you which of the little peaks really is the highest one.
With that in mind, suppose we have a Bayesian model for the probability that some particular tradeoff function is truly our "favorite" tradeoff. To make it semi-concrete, suppose we have to choose amounts to consume between two quantities, say Beer (B) and Nachos (N). We have a certain amount of money, all of which we are going to spend. We can spend it in such a way that so that there is a line in space that defines the reachable region.
Now in simple micro-economics, the solution is to specify a utility function and maximize subject to the constraint on .
But, in reality, we don't know how to trade-off , there is no utility function. At best, we can sort of guess at how happy we'd be with any given position on the line.
Instead, suppose we specify some probability distribution over different , . This is a distribution over functions that specifies the region of space (say coordinates in a Hilbert space) where we think our true preferences lie. But it's now impossible to say with certainty what exact values of are best. We could try to maximize expected utility, we could maximize the utility under the most-probable , we could do a lot of things, but another way to think about this is that we could choose a range of which are in the high probability region, and find a range of points that are all "good" relative to all the in that region, and then any of those points are going to be good choices.
My intuition is, generally in real problems, this version of the problem is going to be easier to solve than finding the one-true-optimum point of a well specified . There are two issues.
Often in applied mathematics, the utility function is chosen to make things relatively easily solvable. Such a utility function doesn't necessarily incorporate all the things we know about the real problem. Once we do incorporate all those things, we'll often have a non-smooth, weird utility function (think of making investment decisions under accurate tax-code rules, where there are weird break-points and eligibility requirements, and soforth).
On the other hand, putting probability in the mix, we can't be sure of any "one true value" we can now compare "good values" across "good models". There are two scales for error, one is the difference between a "good" value and "the optimal" value for a given utility function, and then there is the difference between "the optimal" value under one utility function and "the optimal" value under another. It makes sense to stop trying to find "the optimum" of a given utility once the scale of your search-progress gets down below the scale of the differences between different utility functions. I think often solving a bunch of related optimizations in a very approximate way will give you a lot more information about what the good values are than misguiding yourself into believing you've specified a real utility function and then solving that (wrong) optimization to perfection.
Whoo, the title is a mouthful! Again, in discussions with Joseph over at the Entsophy blog some interesting issues in the philosophy of statistics were raised. It has me thinking about the following issue:
Suppose that you have some data and some measured covariates and some brainpower. You would like to understand the process that produces in terms of things you know about the state of the world and things you assume about the state of the world, your model (I'm using G because we'll want to distinguish between probabilities and frequencies in ensembles ). So you are going to use the following formulation:
and has a Bayesian probability distribution associated to it. Actually, technically there's a joint distribution over all the epsilons . For the moment pretend we haven't thought about it yet.
So, you build your functional form for and it has some unknown parameters about which you have some general knowledge, say order of magnitude estimates, from which you can build priors that approximate that knowledge . Now you need to do some inference on to find out what reasonable values are for those unknown parameters. Using Bayes Theorem, you write down:
[ Edited: changed this to condition on the observed covariates , and will let the measurement error in enter as with the parameters including the unobservable measurement errors in the measurements of ]
And you're off to the races doing MCMC right? Not so fast! what the heck is ? The usual jargon is "The Likelihood", and well, the predictions from given are well known, so the only thing we're not sure about is how likely we are to see a certain differences between the data and the predictions . In other words we need to specify the Bayesian probability distribution on the values.
Well, first let's consider the "usual" answer. The usual answer is something like "the data are treated as IID draws from distribution 'foo'" (often normal, or binomial, or poisson, or something else fairly convenient). The next thing after that is often some kind of heteroskedastic version, where are all IID and is some kind of known measurement error scale that comes from instrument calibration data.
But, from a Bayesian perspective the joint distribution over the values is really a belief about how closely the prediction equation will match the data in each circumstance, and it could be pretty much any probability distribution. The "frequentist" notion that there is a "data generating process" like IID normal draws is backwards, the actual direction of implication seems to be (Knowledge about model performance and measurement properties) (Belief about errors when values are "reasonable", ie a mathematical specification of the likelihood) (Belief about numerical values for "reasonable values" of parameters , typically through MCMC procedures) (Ensemble of measured errors under high probability values, this ensemble will be constrained to look like draws from some single distribution, typically mean 0 and maybe having other Maximum Entropy like constraints such as standard deviation, quantiles, or other structure).
The "frequentist influenced" Bayesian method goes more around in a circle: first assume a data generating mechanism (the final distribution above) then use that to produce an "automatic" likelihood (back to the likelihood at the beginning of the above chain), then use the prior and the likelihood to get the high probability parameter values via MCMC or whatever (you could do the "full frequentist" version of this by leaving off the priors and simply doing maximum likelihood for example).
In other words, in the fully Bayesian world, depending on what we think the model and measurement devices ought to do (how we specify the likelihood), we will get different distributions on the parameters, and then when we treat the parameters as random variables and observe the discrepancies, we will get an ensemble of discrepancies which look like IID draws (or dependent draws if you want to specify some dependency structure) from some distribution, but the distribution they look like draws from need not be the one we use to define the likelihood. It is more or less the case that we can often use a simple IID "frequency" model as the Bayesian probability model for defining the likelihood and we will get in the end that the ensemble of errors look like they came from that IID distribution we assumed. This is in essence a special case, but that special case is exactly the one that Joseph is complaining about in his "jumped the shark" post. In other words, there is no "Correct" likelihood that corresponds to actual facts about the world. The "Data Generating Process" doesn't really exist, so we can't be right or wrong about it, and confidence intervals derived from assumptions about the "Data Generating Process" need not have real coverage properties because the world need not behave as if that "Data Generating Process" assumed by a frequentist statistician is actually correct.
I was reading and commenting on the Entsophy Blog today and I thought up a little question about the Gaussian distribution in relation to his theory of what a probability distribution means.
Suppose we have some measurement device, and we know it has a resolution. Say a wooden ruler whose smallest mark is 1mm. If we read 32 mm as the length of something then we know that the real length is more or less mm. We also have an intuition about that it should be around say 1.
We could model the real length as uniformly distributed between 31 and 33 mm, or maybe between 30 and 34 mm or maybe... We don't really want to completely exclude any given distance. Since we don't want to exclude any distance, we maybe choose a gaussian distribution with .
Let's interpret the unit gaussian distribution as a mixture of different uniform distributions centered on 0. So we have a half width and the height of a uniform distribution on is . Then if we're using a gaussian to discuss the idea that we'd like to put a fixed bound on how far from zero something can be, but we just don't know what exactly that fixed bound should be, then we're talking about a mixture of uniforms. At any given value the density of the mixture should be . Since uniforms only contribute if their half width is at least x, we have (let's assume x > 0)
So we want to find out what this means about the mixture distribution. Playing around in maxima results in the fact that , which when normalized happens to be a chi-distribution with 3 degrees of freedom. A Chi distributed random variable with 3 degrees of freedom is always positive, and has high probability region in the range of around or so. In other words, we don't know exactly how wide our finite uniform interval should be, but if we let it be an unknown positive (chi-distributed) quantity which won't often be much bigger than 2 or 3 we can recover the idea of the normal distribution as a model for measurement uncertainty. In this sense, the normal distribution has a kind of "bounded and O(1)" quality to it even though it's not actually bounded.
I've been meaning to update my blog several times now in the last month, but I've been having mysterious connectivity issues. I have a mixed ipv4/ipv6 network at home, and by default it connects via ipv6. The blog has an ipv6 address, and so far most of the time it just doesn't respond properly to requests originating from my home network. I can see my blog when I'm away, but from home I just can't see it. This has been causing problems for me, hopefully not for you. Suddenly today I have access again. I am going to try something in the next few days to see if it affects anything. After that if I have solved the problem hopefully I can get back to regularly scheduled blogging.
So, I'm trying to fit my statistical model. I have real data, and I have an ODE that predicts the results of the real data. Both have strong periodic components. The ODE is fixed via dimensional analysis to have period 2, or maybe for some small epsilon, I haven't thought about whether the perturbation effects perturb the frequency, but in any case, it's very close to 2. There's a statistical parameter that's a timescale for each data series where I rescale the real data's time points to try to match this fixed period, this tells me how long a certain meaningful process takes in my real data. In MCMC inevitably the period will not be dead spot on. Whenever you have two periodic signals subtracted from each other, you wind up with a signal that's close to the average period modulated by a signal that is close to 1/difference in periods (or frequency is the difference in frequencies).
I've been trying to model the errors as gaussian processes to form a likelihood, but I've been ignoring this periodic component. If the periods are exactly aligned, the signal goes away, but even a tiny fraction of a percent different and the signal shows up. I should really allow the errors to have this periodic component but constrain its size to be small. I'm working on that now.
One thing that's important is that near the "peaks" the model should fit much better, whereas near the "troughs" I expect it to fit less well, but it's not very important. So I need to anchor the gaussian processes to have small total variance near times 0,2,4... while maintaining the periodic signal and some high frequency noise. Gaussian processes are pretty flexible fortunately.