15 July 2010

Skill in Prediction, Part I

Over the past month I organized a competition for predictions of the outcome of the World Cup, using ESPN's Bracket Predictor. The competition, fun on its own terms, also provides a useful window into the art and science of prediction and how such efforts might be evaluated. This post summarizes some of the relevant lessons.

A key concept to understand in the evaluation of prediction is the concept of skill. A prediction is said to have skill if it improves upon a naive baseline. A naive baseline is the predictive performance that you could achieve without really having any expertise in the subject. For instance, in weather forecasting a naive baseline might just be the climatological weather for a particular day. In the mutual fund industry, it might be the performance of the S&P 500 index (or some other index).

For the World Cup predictions, one naive baseline is the expected outcomes based on the FIFA World rankings. FIFA publishes a widely available ranking of teams, with the expectation that the higher ranked teams are better than lower ranked teams. So even if you know nothing about world football, you could just predict outcomes based on this simple information.

This is exactly what I did in RogersBlogGroup. The naive FIFA World Ranking prediction outperformed 64.8% of the more than one million entries across the ESPN Competition. Only 33 of the 84 entries in RogersBlogGroup outperformed this naive baseline. The majority of predictions thus can be said to "lack skill."

I also created entries for "expert" predictions from financial powerhouses UBS, JP Morgan and Goldman Sachs, each of which applied their sophisticated analytical techniques to predicting the World Cup. As it turns out, none of the sophisticated approaches demonstrated skill, achieving results in the 35th, 61st and 54th percentiles respectively -- a pretty poor showing giving the obvious time and expense put into developing their forecasts. Imagine that these "experts" were instead predicting stock market behavior, hurricane landfalls or some other variable of interest, and you learn that you could have done better with 10 minutes and Google -- then you probably would not think that you received good value for money!

It gets tricky when naive strategies become a bit more sophisticated. I also created a second naive forecast based on the estimated transfer market value of each team, assuming that higher valued teams will beat lower valued teams. This approach outperformed 90% of the million ESPN entries and all but 11 of the 84 entries in RogersBlogGroup.

It would be fair to say that the TeamWorth approach is not really a naive forecast as it requires some knowledge of the worth of each team and some effort to collect that data. On the other hand, data shows that the market value of players is correlated with their performance on the pitch, and it is pretty simple to fill out a bracket based on football economics. This exact debate has taken place in the context of El Nino predictions, where evidence suggests that simple methods can outperform far more sophisticated approaches. Similar debates take place in the financial services industry, with respect to active management versus market indices.

One dynamic of forecast evaluation is that the notion of the naive forecast can get "defined up" over time as we learn more. If I were to run another world football prediction now, there would be no excuse for any participant to underperform a TeamWorth Index -- excerpt perhaps an effort to outperform the TeamWorth Index. Obviously, matching the index provides no added value to the index. In my one of my own predictions I tried explicitly to out-predict the TeamWorth Index by tweaking just a few selection, and I fell short. Adding value to sophisticated naive strategies is extremely difficult.

There are obviously incentives at play in forecast evaluation. If you are a forecaster, you might prefer a lower rather than higher threshold for skill. Who wants to be told that their efforts add no (or even subtract) value?

The situation gets even more complex when there are many, many predictions being issued for events and the statistics of such situations means that chance alone will mean that some proportion of predictions will demonstrate skill by chance alone. How does one evaluate skill over time while avoid being fooled by randomness?

That is the topic that I'll take up in Part II, where I''l discuss Paul the Octopus and his relationship to catastrophe modeling.


  1. Fascinating that it is so difficult to beat hard cash.

    It might be an interesting exercise to set up a similar predictive system for the Champion's League which is probably easier to forecast.

    Four groups. Managers, players, journalists and mathematical forecasters like banks.

    As a non expert, I might try averaged bookmaker's odds (to eliminate country bias).

  2. Soccer is a difficult sport to work with because of the draws and the low scoring. Reasonable and simple models have been developed for easier sports such as baseball - "A Two-Stage Bayesian Model for Predicting Winners in Major
    League Baseball," Tae Young Yang and Tim Swartz is worth reading.

  3. This is a question to the more quantitatively inclined, but how sensitive are the results to the scoring system used by ESPN? One gets 32 points for predicting the World Cup winner alone, and only 4 points for getting a quarter finalist right. So getting everything wrong except the winner already gets you a long way to decent skill.

    In my case, the results of two single games (Germany vs Argentina and the final) made the difference between ending in the top 30 of ESPN's overall leaderboard and being # 273,525.

    I'm not suggesting the scoring system should have been different; after all this is about predicting the winner. I'm just curious to know how much the scoring system matters.

  4. All interesting stuff. One possible tweak to your 'team worth' model would be to include a factor that reflects the difference in market value between the different major European leagues (where the majority of the players in all the major teams play). My guess, being English, is that (for example) England under-performed their simple 'team worth' whereas Germany over-performed theirs. However, if you factor in the huge amounts of money that are available in the EPL (and hence inflated 'player worth') compared with the more carefully run Bundesliga, you can probably come up with an improved model. Getting the weightings right may be rather difficult though, given that the other league at a comparable level (both financially and in terms of achievement in the European club competitions) to the EPL is the Spanish league that produced the winning team.

    Alternatively, perhaps some combination of rankings and team worth combined might be helpful, as the team worth should increase to reflect recent improved performances.

  5. I might try averaged bookmaker's odds

    My experience is that bookmakers are exceptionally good at picking results, once a season has got underway a bit.

    The only time they are substantially off is early in the season, when form is hard to read.

    Remember though that bookies are not laying the odds according to what they think they will happen. They lay them to beat the punter, which sometimes is quite different.

  6. "They lay them to beat the punter, which sometimes is quite different."

    Very true.

    However, I am sure they have agents, advisers, confederates and what have you everywhere though. Some of their operations were completely dishonest. I worked for William Hills as a student so I became more aware than most of things that happened in that world.

    I haven't bet since then, but football forecasting seems like a good game to get into for a fan , but it isn't easy to win. It works because people generally lose small amounst of money they don't care about and it makes football more interesting to watch.

    Unexpected results are more common than you imagine.

  7. Roger,

    Interesting post. I'm a little hesitant, though, about whether there's enough data here to draw many conclusions. A better predictive method won't out-perform an inferior method in every instance. So you can't really conclude with much confidence that the financial firms didn't have superior predictive methods--can you?

  8. "...soccer through the 20th century has been the sport of despots and dictators..."


    Thank you eric144: how true,
    "...people generally lose small amounst of money they don't care about and it makes football more interesting..."

    If I wagered, maybe I would stay awake during the games.

  9. "The situation gets even more complex when there are many, many predictions being issued for events and the statistics of such situations means that chance alone will mean that some proportion of predictions will demonstrate skill by chance alone."

    Which suggests to me that 'skill' needs to be defined by repeated success, not one lucky hit. If a chimpanzee throwing darts at a board picks stocks better than Wall St. advisors, would you say that it had 'skill?'

  10. John. It's what you are used to. I abhorr all forms exercise that do not involve a large round ball. Especially if it was created in North America.

    A controversial research project in the 1970s revealed that 4 out of 10 Scottish boys would chase a ball over a 200 foot cliff. It also goes some way to explain the military success of the British Empire and the globalisation of soccer.

    The connection between $250,000 a week footballers and the ongoing commie plot to take over the world is not a strong one.

  11. Here's a nice interactive graphic showing where players in the last five World Cups got paid when they weren't doing the World Cup thing.


    The relevance to what Roger is writing about? Er...

    Did I say that it's a nice interactive graphic?

  12. William Hill finds World Cup glory

    William Hill says World Cup match upsets proved a winner for bookies – but City forecasters were left sick as a parrot

    While it was our worst ever Royal Ascot, with a loss on the meeting, the World Cup proved to be one of the best for bookmakers in 40 years," said Ralph Topping, William Hill's chief executive.

    Topping was reluctant to reveal more details about the World Cup, ahead of William Hill's full interim results next month. But it is clear that England's failure to beat the USA or Algeria, and the 4-1 hammering inflicted by Germany in Bloemfontein, was excellent news for William Hill.

    Other unlikely results included the failure of both France and Italy to win a single game, Holland's toppling of Brazil, and eventual winner Spain's early defeat to Switzerland. The growing popularity of taking bets during a game, such as on the next player to score, also raised turnover from the tournament.