# To do: Categorize questions and for open questions plot aggregate weighted by accuracy score

Assign categories to the 80 or so questions.

For the 10 or so closed questions, measure Brier and accuracy scores for each forecaster involved in that question.

For this exercise ignore correlation between categories because hard to define with very few datapoints.  I.e. how do you measure impact of knowledge in one domain on quality of forecasts in a different domain.

For each open question, of the forecasters betting the question who have previously been ranked in that category and have negative accuracy scores (beat the crowd), plot the crowd bet, weighted by accuracy score.  (Not including accuracy influence of correlated accuracy.)

The theoretical range of negative accuracy is [-2,0] where -2 is best.  Suppose accuracy of forecaster i is a(i).  Then let weight w(i)=0.5-0.25*a(i).   Let forecast be f(i).  Sum f(i)*w(i) and divide by sum of w(i) to get the crowd bet.  Please note: I am pulling this out of my drainpipe.  This will be the first of possibly many experiments (depending on my current Coursera load) on effective crowd aggregation.

I will be using these smart-crowd aggregate scores to influence my own play in this game.  To be fair, I will also publish a few examples here as I go along.

# Retrospective Brier and Accuracy scores

For closed questions, for my forecasts, knowing the final outcome, plot my running Brier and Accuracy scores from start to finish of the question.

I’m trying to come up with a measure of foresight.  A low initial Brier line would say that I guessed right in the beginning.  Basically how well do I guess up front.  I’m also thinking about weighting the Brier score so that later observations  less important than initial observations in the quality measure. My initial lines tend to be on top floating above the crowd.

As a matter of practice going forward I’m going to stay closer to the crowd and express my personal view as a spread on the crowd, I’m not going to go 0/1 which was my initial instinct.  However there may be some players in the population who are successful with a 0/1 strategy.

So:

# If you’ve never heard of Kolmogorov

If you’ve never heard of Kolmogorov you may have heard of Claude Shannon.  This paper gives a connection between the work of Kolmogorov and Shannon:

 Wald Lecture I: Counting Bits with Kolmogorov and Shannon Wald Lecture I: Counting Bits with Kolmogorov and Shannon David L. Donoho Stanford University Abstract Shannon’s Rate-Distortion Theory describes the number of bits … Read more…

Shannon’s magnum opus was

 A Mathematical Theory of Communication – worrydream.com Reprinted with corrections from The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948. A Mathematical Theory of Communication Read more…

Shannon’s main idea was bit rate in a channel, how many bits you can reliably pass in a second.  Kolmogorov’s main idea was how compressible the bits of an object were, in the sense of how short a program can you write to generate the bits of the object

 Kolmogorov complexity – Wikipedia, the free encyclopedia In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov … Read more…

# Breaking out multipart questions as separate questions

Consider an election question in which there are 3 candidates, Bush, Trump and Rubio who might win.  Suppose people forecast the outcome of this question by assigning probability X to Bush, Y to Trump and Z to Rubio, where probabilities are between 0 and 100.  Only one person can win, so X = 100 – (Y + Z) = 100 – Y – Z.  Intuitively, looking at this expression, the correlation between X and Y ought to be -1 and similarly corr(X,Z) should be -1.  However note while Y in U[0,100], Z in U[0,Y].  If Y and Z were independent, their sum would follow an Irwin-Hall distribution.  But given Z in U[0,Y], what distribution does Y+Z follow?

I did a simulation experiment and found the correlation for X,Y to be -65% and for X and Z to be -14%, for 10,000 paths, and consistently so with increasing paths.   This is the Python code for the experiment:

``` from numpy.random import * import matplotlib.pyplot as plt import numpy as np from scipy.stats.stats import pearsonr N=10000 Y=random_integers(0,100,N) Z=np.vectorize(lambda y: random_integers(0,100-y))(Y) X=100-(Y+Z) print "rho(x,y)", pearsonr(X,Y) print "rho(x,z)", pearsonr(X,Z) plt.clf() plt.scatter(X,Y) plt.title("X vs Y, X=100-(Y+Z)") plt.xlabel("X") plt.ylabel("Y") plt.savefig("corrxy.png") plt.clf() plt.scatter(X,Z) plt.title("X vs Z, X=100-(Y+Z)") plt.xlabel("X") plt.ylabel("Z") plt.savefig("corrxz.png") ```

Here is a scatterplot of X and Y:

Here is a scatterplot of X and Z:

These plots are virtually identical, for 10,000 paths, though Y is denser towards the top of the triangle and Z is denser below.

I don’t know how to explain this difference in correlations, and I don’t know what is the PDF of (100-(Y+Z)).  The former may have to do with the quality of the NumPy uniform random number generator.  Or it could be due to the Pythagorean Theorem.  The latter is math that’s a little bit above my head.  (Even the PDF of the uncorrelated sum is a bit hairy.)

The reason this is interesting is that I am forecasting questions in an open forecasting tournament called GJOpen.com.  We are assigned accuracy scores based on final outcome.  However it is possible for parts of a question to become determined before all parts of a question are determined.  For example, a question which is whether rainfall by December 31st will be 0-10 inches/year, 10-15 inches/year or > 15inches/year.  If it rains at least 10 inches, then the first bin is closed but the other two are still live.  For such questions, it would be helpful to be scored separately on each bin.  However, the bins are correlated, as above.  Then the problem is how to accomodate in scoring accuracy for correlated bins.

# How to assess multi-bin questions

Question:  Will A, B or C win?

Answer: This is 3 different, correlated questions.  For purposes of assessing my own position versus the crowd, each alternative should be assessed individually.  This includes (and now that I think of it, this is very helpful), isolating the references and arguments by subquestion.