Collaboration at NCTC and in prediction markets and forecasting tournaments

I am reading Bridget Nolan’s 2013 UPenn thesis on workplace collaboration at the NCTC, and thinking about how that compares to workflow and collaboration in play-money prediction markets such as Almanis and Hypermind and AlphaCast, real-money markets such as PredictIt, and accuracy-score forecasting tournaments such as GJOpen.

The workflow at NCTC is described as follows:

  • Tasking. Analysts receive Taskings
  • Research. Analysts write, as quickly as possible, a, usually short, analysis related to the tasking. This could be either a forecast or an interpretation of a past event, but most likely a forecast.
  • Coordination. Analyst coordinates by sharing the analysis with all other analysts with a stake in the topic.  Analysts are indexed by region and functional area and home agency, so multiple analysts could have a stake in a topic.
    • Analysts must converge on a commonly acceptable text with coordinating analysts.
    • Analysts can game the coordination phase by
      • Limiting the review period to short or 0 (“Flash”) time periods
      • By inventing an exclusive “compartment” that coordinating analysts don’t have access to, and stealing the analysis by placing it in the compartment
  • Review. Once coordination converges, the analysis must then be approved and re-edited by all the hierarchical superiors of each participating analyst.  The claim is that there are 14 layers of management, which seems unlikely, but you never know. The Government pay grades go from GS-1 (the lowest) to GS-15, so maybe analysts come in at GS-1 and they have people at every pay grade.  GS-1 is $18,343 per year however, which implies that there are a lot of analysts subsisting well below the poverty level for the DC Metro Area, which also seems unlikely, but, again, you never know.
  • Publishing.  The analysis is delivered to the original client of the tasking.
  • Compensation.
    • Client Feedback. Some analysts are notified if the client likes or reads the product.
    • Performance Review.  Analysts are compensated by the number of pieces they are involved in that get published.
    • Work Time Away.  Time spent on foreign field visits, training and interagency meetings is not considered for performance and is effectively a form of compensation for writing analysis pieces.

For each published piece, this process can take from 10 minutes to several years, and it is a major source of stress and dissatisfaction for analysts.

Now let’s consider the case of prediction markets and forecasting tournaments.  First of all, there are two major areas of tradecraft which are out of control of the analyst, but which determine the quality of the overall process:

  • Question formation.  How to formulate a “tasking” which has an unambiguous answer, has a reasonable forward time period, covers all the possible outcomes, and is not overly specific to the extent that the actual answer to the question becomes disconnected from the tasking client’s original intent.
  • Question resolution.  How to decide when the conditions of the question have been met, and score the question accurately with unimpeachable sources, so that all analysts agree that the question has been closed and scored fairly and correctly.

Let’s assume that all the markets I mentioned up front are equally competent at question formation and resolution.  Then what distinguishes them are mainly

  • Scoring and compensation model.
    • GJOpen uses Brier Score.  Compensation is reputational: you can say you won a challenge.
    • Almanis and Hypermind use play money and the site pays cash to analysts.  This is called “creating a market for expertise”.
    • AlphaCast uses play money but also reports Brier Score.  It is a demo site for aficionados, some of whom have accumulated extremely large play money scores in a fairly small crowd.   GJOpen uses AlphaCast software underneath, so the pitch here is just to sell the software as an OEM to other vendors.
    • PredictIt uses real cash, supplied by the analysts, in a zero sum market.  Each question is a futures contract.  The site takes an 18% rake.
    • All sites have leaderboards for bins of related questions and overall cash or quality of forecasts.  Almanis has leaderboards for commenting and question posing (analysts can self-task).
  • Effect of compensation on forecasting style.  All sites penalize analysts for participating in questions they aren’t good at.  Accuracy sites however all the analyst to forecast in as many questions as they want.  Cash sites limit the analyst to forecasting in questions they have a remaining cash balance for.  Analysts can also lose all credibility by putting all of their cash on a single question with the wrong forecast.  It is impossible to lose all credibility in an accuracy-based site. Nevertheless, other analysts can see your general credibility by looking at your overall accuracy score.  (But of more relevance is looking at credibility by topic.) Play money compensation sites with thin participation can quickly become dominated by a few obsessive players accumulating very large balances.
  • Socialness.  The amount to which analysts on a site share information is more dependent on the collaboration features provided than on the scoring model. Reddit-like tree-formatted dialogues, notifications when others have responded, ability to notify/call out particular analysts, and ability to see other analysts scores, forecasts, comments and personal profiles, all have a strong impact on how much sharing happens.
  • Teaming.  In a big site with good socialness you will find that analysts form into cliques naturally and tend to have patterns of association over time.
  • Information sources.  Most analysts in public sites just use Google and are limited to what Google Search digs up.  Not much source analysis is done.  The quality of the market is dependent on the prior expertise of the participants.  Most participants simply regurgitate the most recent news as a forecast, thus acting as a Mechnical Turk news digesting machine.
  • Tools.  Public sites do not provide any kind of question domain-specific modelling tools or advanced tools for filtering and visualizing news or publicly available data.
  • Workflow.  Questions get published.  Analysts make predictions.  Questions get closed.

OK, why am I lining up NCTC workflow and forecasting workflow?  Well, the question is, what if you unplugged the Task/Coordinate/Edit/Publish model and replaced it with the Question Posing/Forecasting/Scoring model, would you get a better result?   This is the question being asked by IARPA ACE, CREATE and HFC competitions.  However it’s not clear how much of a gap there is between those competitions and current workflow at NCTC and similar places.  That is, I don’t know if there is any traction or application in the work IARPA is doing that has been translated into the actual analyst workplace.  A few observations are relevant though:

  • Seemingly large crowds in public sites boil down to a much smaller number of fanatics.  Say you have 20,000 registered users.  150 of those will forecast a lot of questions.  50 of those will be consistently accurate, and it’s not clear whether that 50 are accurate just based on survivor bias.
  • Prediction markets are sometimes terrible, especially on binary election questions. Think Scottish Referendum, Brexit, Trump.  These questions were all called wrong.
  • The particularities of intelligence analysis are not reproduced in other occupations. Forecasting markets as a work paradigm are not a toy for intelligence analysis, they are a real solution. They are a not a real solution for other occupations such as banking.
  • Analysts at NCTC are selected maybe 20% on accuracy and 80% on other factors such as
    • Commitment to training to be a professional analyst
    • Willingness and ability to pass a security clearance
    • Commitment to the occupation of being a professional analyst
  • The pre-existing workplace for analysts, with its particularities, will not go away.

The last point is most important: Analysts are a static, small population.  They can’t easily leave their jobs, and their jobs are relatively stable.  They are qualified by many other factors than accuracy.  They are a pre-existing population.  There has been much talk of “Superforecasters” a/k/a unicorns in the prediction market arena.  To adopt forecasting market methodology in intelligence analysis in a pre-existing workplace, we need to think in other terms rather than the search for these unicorns.  We have to ask: Will adopting this technology and implementing a different workflow and compensation model improve collaboration in this workplace with these people?  I think it will, at least in the sense that what Nolan describes is clearly not working well, and in the sense that the prediction market model provides a much more objective standard for scoring both the analysts and the quality of question posing and resolution.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s