## Saturday, June 2, 2012

### (In)sanity: 300 Datapoints

According to the Stanford Encyclopedia of Philosophy, "a credence function is actually calibrated at a particular possible world if the credence it assigns to a proposition matches the relative frequencies with which propositions of that kind are true at that world". Now, unless you are really into philosophy or statistics, this probably won't mean a whole lot to you. Luckily, Yvain has an easy to grasp explanation of this concept:

The rationality literature has especially focused on one particular subjective mental estimate: our feelings of probability. For example, someone may say they feel 80% certain that Germany is larger than France. However, if they consistently answer questions like this with 80% confidence, and only get 60% right, then we say they are mis-calibrated: their subjective mental estimate of probability has a consistent mismatch with a more normatively correct probability. Calibration means revising your subjective mental estimate until it matches the objective value it tries to estimate; so that when you estimate something with 80% confidence, you get it right 80% of the time.

Now, it seems to me that much of the time when we talk about someone or another being insane or crazy, we are, in effect, saying that that person is extremely poorly calibrated (at least, when it comes to that particular topic). For instance, imagine a person named Alice. Alice has a problem going out at night because she assigns 95% chance to the proposition that she will be abducted by aliens and transported to Alpha Centauri. Yet, each time she does go out, this event does not occur. If she continues to assign the same probability next time, she is failing to calibrate her beliefs correctly and is likely to be considered insane because of it.

As it turns out, calibration can actually be measured by having the subject make predictions about verifiable or falsifiable events based on their beliefs and then scoring those predictions as the events do or do not occur. In a very real sense, this is a way to measure someone's degree of sanity. Ever since I read Gwern's article about PredictionBook I have been participating in such an experiment myself. I even went so far as to publish a short article encouraging others to do the same. So far I have made over 300 predictions that have been scored and probably over 1,000 in total.

My predictions have ranged over extremely silly and serious topics including whether the new season of My Little Pony with start with a 2-part episode, if I will obtain a particular IT certification within a certain timeframe, who the next POTUS will be, and whether or not George Zimmerman will be convicted of murder. The following is a graphical representation of the current state of my (mis)calibration:
The cyan line indicates how a perfectly-calibrated agent (there almost certainly aren't any humans who meet this standard) would have assigned its probabilities and the other line are the actual probabilities I did assign over the 300+ datapoints I have already collected. A very sane person's line (in a given domain) should more-or-less track the straight line going from 50-100. I'll let the reader decide for themselves if I am meeting this standard, but I think I did pretty well for not having much formal training. On the other hand, if we plotted Alice's data and had her make predictions about the behavior of extraterrestrials, then we should expect the line of her actual probability assignments to deviate wildly from the line of perfect calibration (perhaps, even having a downward slope). Although, to be fair to Alice, she most likely is pretty sane when it come to beliefs about what kinds of things are edible or poisonous (otherwise she would no longer be with us) and if we tracked the implicit predictions of political pundits on CNN on the topic of politics, they would probably come out looking quite insane (because politics is the mind-killer).

I plan to continue this experiment to determine if I can improve my calibration over time and will post on this topic again once I reach approximately 600 datapoints. Until next time...stay sane.