Assign categories to the 80 or so questions.
For the 10 or so closed questions, measure Brier and accuracy scores for each forecaster involved in that question.
For this exercise ignore correlation between categories because hard to define with very few datapoints. I.e. how do you measure impact of knowledge in one domain on quality of forecasts in a different domain.
For each open question, of the forecasters betting the question who have previously been ranked in that category and have negative accuracy scores (beat the crowd), plot the crowd bet, weighted by accuracy score. (Not including accuracy influence of correlated accuracy.)
The theoretical range of negative accuracy is [-2,0] where -2 is best. Suppose accuracy of forecaster i is a(i). Then let weight w(i)=0.5-0.25*a(i). Let forecast i be f(i). Sum f(i)*w(i) and divide by sum of w(i) to get the crowd bet. Please note: I am pulling this out of my drainpipe. This will be the first of possibly many experiments (depending on my current Coursera load) on effective crowd aggregation.
I will be using these smart-crowd aggregate scores to influence my own play in this game. To be fair, I will also publish a few examples here as I go along.