# To do: Control group: Up your game!

As the Control Group

for the Superforecaster Team

that GJ Inc is marketing, those of us forecasting on GJOpen need to up our game.  Let’s not make it too easy!  In particular for the following questions our forecast has been flat-out wrong:

1. NATO Montenegro.  Answer: 100%.  Control Group: 13%
2. S-300 Delivery.  Answer: 100%.  Control Group: 15%.
3. Russia food embargo 1. Answer: 100%. Control Group: 28%.
4. Argentina President.  Answer: MM, 100%.  Control Group: 49%.
5. China IPO. Answer: 100%.  Control Group: 55%.

Brier score can be broken down into three components,

• Inherent uncertainty of the event.  A fair coin toss is most uncertain.
• Reliability of the forecast in the sense that if the forecast says the event will occur 3 out of 10 times for a repeatable experiment, and the event occurs 3 times in 10 experiments, then the forecast is perfectly reliable.
• Resolution is some measure of how well the bins capture distinguished events.  I haven’t really grasped the concept.  I suppose if you have an event with a bimodal probability (sum of two populations), where most people have either \$1000 or \$1MM, then having income bins of [0,2000] and (2000, \$2MM] will show higher resolution than if the bins are [0,\$1MM) and [\$1MM, \$2MM], because there will be higher likelihood of misclassification between thousandaires and millionaires.  Or something like that.  I don’t get it.  Yet.  Maybe someday.  There’s a clear explanation here.

However, this breakdown is only relevant when the same question is forecast repeatedly.  Canonical example: What is the weather going to be like tomorrow, for many tomorrows.   GJOpen questions are generally, with the exception of the South China Sea ADIZ question, about unique situations.  So it is not really possible to make use of the above breakdown to gain further insight into the quality of our forecasts.

We can also be critical of the questions, but beggars can’t be choosers.

On that note, however, some work has been done on IFP (Individual Forecasting Problem) inter-item reliability.  I didn’t know what to make of it, but maybe there’s something to think about there.