I have been taking for granted the category assignments of questions in GJOpen. International, Politics, Sports and so on. I have been using these category assignments to find best forecasters for related questions.
That’s not using the old noggin. There should be a better way of correlating live questions with resolved ones.
I’m going to try this idea: Plot crowd forecast over chronological time of one question with crowd forecast over chronological time of another question. If the levels correlate, I’ll use that as my correlation.
That tells me how resolved questions relate to each other, and gives a model (to test) of how accuracy in one resolved question factors into accuracy in another resolved question. The result is a correlation matrix R.
But then, given a brand new question with no forecasting history, how do I relate it to prior questions?
To do this relation I need to bring back a notion of categories. I need synthetic categories based somehow on the correlations. A category like “Politics” doesn’t tell me much, for example knowing the Hillary Clinton race doesn’t tell me much about the Spanish election.
OK so I’m foundering here a little bit. Music can be classified and clustered in terms of low-level measures so that you can automatically tell a piano sound from a violin or a drum. What do our questions “sound like”?
To be continued. The problem is: I know how resolved questions are correlated. Maybe I can cluster resolved questions separately in a different way. This clustering should preserve the correlation…OK so cluster based on closeness of correlation. That’s a start, but it’s a single dimension. I need some other dimensions which describe the kind of question and then tell me how a new question with a similar description will correlated with resolved questions. That’s the to-do list.