27 May 2010

The Significance of Climate Model Agreement: A Guest Post by Ryan Meyer

Ryan Meyer is a PhD student at ASU's School of Life Sciences where he is writing a dissertation about US federal climate science research and its relationship to policy making. He is spending a yer in Australia on a Fulbright and he blogs at Adapt Already.

If four out of five global climate models (GCMs) agree on a result, should we feel more confident about that result? Does agreement among models constitute increased certainty about the models’ basis in reality? My colleagues and I wondered about this a few years ago when we started noticing that many climate scientists seem to adopt this logic without any explanation or justification. They claim, for example, that we should be more worried about drought in the southwestern US because 18 out of 19 models predict a transition to a more arid climate over the next 50 years. Or they pick a subset of models to represent a particular process such as the Asian monsoon, and then point to agreement among those models as significant.

If 18 models get the same result, is that better than just one? Why? Climate science should provide a thorough explanation for this, especially if climate models are to begin informing policy decisions.

We argue in a paper now available in Environmental Science and Policy (PDF here) that agreement is only significant if the models are sufficiently independent from one another. The climate science community has mostly ignored the crucial problem of model independence while taking advantage of a tacit belief in the force of model agreement. To quote from our introduction:
GCMs are commonly treated as independent from one another, when in fact there are many reasons to believe otherwise. The assumption of independence leads to increased confidence in the ‘‘robustness’’ of model results when multiple models agree. But GCM independence has not been evaluated by model builders and others in the climate science community. Until now the climate science literature has given only passing attention to this problem, and the field has not developed systematic approaches for assessing model independence.
To some these arguments may seem like nitpicking. Or they might seem better suited to the pages of some technical journal where modelers work these things out for themselves. But we strongly believe that this extends beyond methodology, and is in fact a policy question. It relates to the kind of investments we can and should be making in climate science.

The question of independence is one small piece of a much needed broad discussion about climate science policy. What kinds of knowledge are most helpful in crafting a response to climate change? What institutions, disciplines, and tools are best suited to deliver such knowledge? Such crucial questions of science policy tend to be ignored. We argue about what "the science" says, rather than how it works and how it could work better for the needs of decision makers.

In our paper, we take it as a given that governments will continue to fund large and complex models of the climate and related systems. (A broader discussion about the merits of this investment is important, but we do not directly address it). But how should they be funded? Who should decide the most important questions to pursue? In the past, we have tended to let climate scientists sort that one out. They are, after all, the experts. But they are certainly not unbiased participants in this discussion. Are they asking the most important questions, or just the ones they find most interesting?

It is important to recognize that there are many possible trajectories for our climate science knowledge. We may not know exactly where each one leads, but we can still make wise, informed choices. This is why the independence problem is important, not just for climate modelers, but for science policy makers, potential users of climate science, and advocates for climate change adaptation.

We have three basic recommendations related to the independence problem:
  1. Climate modelers should be wary of overselling results based on model agreement.

  2. The climate science community should begin to better address the independence problem.

  3. Science policy decision makers should take this problem into account when building strategies for climate modeling, and climate science more broadly.


  1. The various projections of future climate in IPCC 2007 (AR4) are actually based on different climate models.
    The various scenario storylines in the SRES (2000) required a "quantification" based on a specific model. Several such models were available. Some 40 outcomes (specific models applied to specific scenarios) were run.
    In fact, there was at least one model providing predictions for all scenarios (the so-called Message quantification) but it was not designated as the common quantification of SRES scenarios. Instead, each marker scenario (A1B, A2, B1 and B2) was quantified with a different model. As these models differ in many ways, the different outcomes of each scenario are due not only to the scenario itself, but to the different peculiarities of the models used for each. The IPCC database of climate projections contains all the variants for each scenario, but only one is usually discussed. The issue is seldom if ever discussed, and is not touched upon in AR4. The SRES reports on the quantification of storylines without discussing the reasons for choosing different quantifications for different scenarios, and not discussing the implications either.

    Now a new vintage of scenarios is being prepared. I hope this matter is considered in a more explicit and consistent way.

  2. I wonder how you envision this working in terms of changing what kinds of things are modeled for. Are you saying that either academics or politicians should be involved in setting the parameters of what scientists model for? On the academic side, perhaps you mean that economists and political scientists or biologists should help determine what kinds of events (drought, flood, ecosystem collapse, disease) would most greatly affect human beings and then these considerations should be brought to bear on climate modeling. Or perhaps you mean that the politicians who hold the purse strings should exercise more of a role in determining how climate science works (this seems a recipe for disaster).

    I do think that climate science is more of an interdisciplinary field than is commonly appreciated.

  3. Excellent. the fallacy that CAGW promoters have fallen for is the same one that Wall St. fell for.
    Wall Streeters pretty much all used the same models, and were comforted by the consensus reached.
    CAGW promoters rely on the same sort of circular rationalization.
    Pointing this out is a good thing. Thanks.

  4. In section 3.3 they talk about doing uncertainty quantification studies by parameter variation, and mention that "However, as Tebaldi and Knutti recognize, this cannot fully address the problem, as it ignores structural uncertainties stemming from the somewhat subjective art of selecting parameters for inclusion in the model." Bayes model averaging is an approach that allows to include the structural uncertainties in a coherent way. The naive model ensemble strikes me as an undesigned computer experiment where many changes (structure and parametric) are confounded across the change in model.

  5. The problem of independence is a multi-facated one. The problem isn't just that different teams may talk to each other, and become dependent on each other's opinions. There is also the GIGO issue - if everyone is independently using the same parameter values, based on published estimates, and those estimates are wrong, then they are working "independently" but all sharing the same biases. Beyond even that, if the people in the field all share general expectations, they are likely to produce the same general results.

  6. We don't actually care about the independence of models per se. In fact, if we had an ensemble of perfect models, they'd necessarily be identical. What we really want is for the models to be right. To the extent that we can't be right, we'd at least like to have independent systematic errors, so that (a) there's some chance that mistakes average out and (b) there's an opportunity to diagnose the differences.

    It's interesting that this critique comes up with reference to GCMs, because it's actually not GCMs we should worry most about. For climate models, there are vague worries about systematic errors in cloud parameterization and other phenomena, but there's no strong a priori reason, other than Murphy's Law, to think that they are a problem. Economic models in the climate policy space, on the other hand, nearly all embody notions of economic equilibrium and foresight which we can be pretty certain are wrong, perhaps spectacularly so. That's what we should be worrying about.


  7. Mark B. said... 5

    "There is also the GIGO issue"

    Some choice quotes from the 2005 Annual Energy Outlook
    "In the AEO2005 reference case, the annual average world oil price (IRAC) increases from $27.73 per barrel(2003 dollars) in 2003 to $35.00 per barrel in 2004 and then declines to $25.00 per barrel in 2010 as new supplies enter the market."
    "By 2025,the average minemouth price is projected to be $18.26per ton, which is higher than the AEO2004 projection of $16.82 per ton."

    Natural Gas
    "Average wellhead prices for natural gas in the United States are projected generally to decrease, from $4.98 per thousand cubic feet (2003 dollars) in 2003 to $3.64 per thousand cubic feet in 2010"

  8. I'd like to ask the authors, through this blog, to define clearly what they actually mean by "independent".

  9. I was reading The history of climate modeling here: http://www.aip.org/history/climate/GCM.htm when this parapgraph caught my eye:

    The IPCC pressed the modelers to work out a consensus on a specific range of possibilities to be published in the 2007 report. The work was grueling. After a group had invested so much of their time, energy, and careers in their model, they could become reluctant to admit its shortcomings to outsiders and perhaps even to themselves. A frequent result was "prolonged and acrimonious fights in which model developers defended their models and engaged in serious conflicts with colleagues" over whose approach was best.(111) Yet in the end they found common ground, working out a few numbers that all agreed were plausible.

    It kind of explains why all the IPCC models get similar results.

  10. -2- ndsmith
    Thanks for your comment. I am saying that the independence problem deserves attention because it is relevant to how we understand model output, and the extent to which it is useful. The decision to direct attention to this problem does not rest entirely with scientists-it is also a science policy problem. In other words, the institutions that fund GCMs with our tax dollars (NSF, NOAA, NASA, DOE) can help to ensure that we get useful information out of these very expensive tools.

  11. -6- Tom
    You say:
    "We don't actually care about the independence of models per se. In fact, if we had an ensemble of perfect models, they'd necessarily be identical. What we really want is for the models to be right. To the extent that we can't be right, we'd at least like to have independent systematic errors, so that (a) there's some chance that mistakes average out and (b) there's an opportunity to diagnose the differences."

    I think this sums up the argument nicely. There's no such thing as a perfect model, and we can't ever be sure if we're "right." The Levins robustness framework (described in our paper) is one way of dealing with this, and it relies on independence.

  12. -8- James

    Two models are independent if they do not share all the same assumptions. No two GCMs are identical, so in that sense they are all independent to some degree.

    The trickier question, then is how to know whether two models are sufficiently independent from one another to justify increased confidence based on their agreement.

    We have not answered that questions conclusively. We recognize that it's a hard problem, but think the community should address it.

  13. Ryan,

    That seems like a rather silly and vacuous definition, since as you acknowledge it is guaranteed to be false. Where did you get it from, or did you just make it up yourself? You don't appear to state it in your manuscript. I do not accept your claim that the use of ensembles (in general, not just climate science) relies on such a condition. In fact this claim is contrary to all the well-established theory of ensemble prediction.

    It is rather bizarre that your call for the community to address this issue actually cites a number of climate science papers where this specific question is considered (and you could have cited several more).

    You might find our recent GRL paper of some relevance, but perhaps I'll need to write the implications out more clearly for readers without statistical knowledge.


  14. -12-

    Ryan, just to do some introductions ... James Annan is a climate modeler in Japan. He typically argues that everyone who he disagrees with in any way is either stupid or malign. Eventually he'll write a blog post saying as much.

    But perhaps you'll have better luck engaging him;-)

    Since he does not like your answer, perhaps James might explain what he thinks independence in climate models is defined as, and then actually engage your paper, rather than playing the old academic game of saying that you didn't cite this or that ... snore!

  15. Indeed Roger I will write it out in longer form. I think it will be worth publishing formally, as I don't think the question of "independence" has been dealt with very satisfactorily so far, and the fault is certainly not entirely with this paper. So it will take longer than a blog post.

    I believe I did "engage the paper" by stating clearly that I objected to the fundamental claim that "independence" (being interpreted as meaning that models share no assumptions) is generally a condition for the use of ensemble methods in prediction. I would be interested in any substantive response on this point.

  16. -13- James
    Thanks for the link to your paper.

    I think you are confusing the "agreement problem" with the discussion of ensembles in our paper. They are not the same thing.

    We found many examples in which climate scientists show a number of models all saying the same thing, and then point to that agreement as a reason for increased confidence that the prediction is true. But the models must be independent for this to be the case.

    We discuss ensembles in our paper because that is one of the only places in the literature on climate models where the concept of independence is mentioned at all. Our quotation of Hagedorn 2005 is an example of this.

    But the result of an ensemble is an average across models, so it doesn't even matter if the various ensemble members agree. Ensembles are not an example of the problem we are addressing which is the assertion of increased confidence due to agreement.

    So, yes, we cite discussions of independence in the climate literature, but none of those deals with the agreement problem. That's why we are calling for the community to address it.

  17. Adaptalready..
    You said #11

    "There's no such thing as a perfect model, and we can't ever be sure if we're "right.""

    I thought that the way you would be sure that you are "right" is to run the model and see if the world continues to behave the way it predicts? If for some reason we have given up on that for climate modeling, and the way the world behaves is not related, then I think people should be more open about that.

    Your statement implies that modeling is an expensive self-therapeutic exercise.

  18. Another point:
    "independence (being interpreted as meaning that models share no assumptions)"

    This is not the definition I offered. The idea of GCMs that do not make ANY of the same assumptions doesn't make much sense. The question is how different should they be, and in what ways?

  19. Ryan,

    "But the models must be independent for this to be the case."

    Where did you get this idea that models must be "independent" (in the sense you defined) in order for agreement to be a useful/important diagnostic? It is contrary to all established theory and practice in ensemble prediction.

  20. OK,

    I'll try to be a little less cryptic.

    "Agreement" across an ensemble is the standard diagnostic of confidence in just about every ensemble prediction system I know of (where there are exceptions, they are rather esoteric). This diagnostic follows directly from the basic concepts of probability theory and the purpose of the ensemble in the first place (ie, sampling the predictive pdf of the event(s) in question).

    However, I don't know of any system where "independence" (in the sense Ryan has defined the term) has any bearing on either the construction and use of ensembles. I simply don't believe it has any relevance, and we all seem to agree that it is (by definition) vacuously false in all cases.

    There is indeed a sense in which ensemble members (in well-designed systems) should be "independent". However the term here is used in its standard statistical interpretation. It appears to me that Ryan, and probably many others, have fallen into the trap of interpreting technical statistical language in a sufficiently vague and woolly way that it obliterates the well-established basis of the theory.

    BTW there are two papers - both by Jun, Knutti and Nychka - which directly address this issue. I don't necessarily endorse them, but it's not correct to claim that people are not actively considering this area. Indeed it was a big topic in a recent IPCC expert meeting earlier this year in Boulder:


  21. I don't think the lack of dependence is the major problem. They shouldn't be that independent anyway because they should be using similar maths and physics. My bugbear is the assignation of any frequentist statistics to model outputs. This was the major error in Santer's supposed rebuttal to Douglass et al. (yes, likely Douglass et al. made a similar mistake in averaging them, though a lesser one). The plain fact is that model outputs cannot be considered as random so combining them in any way is intrinsically subjective (or even Bayesian).

    Moreover there were clearly many more possible model outputs that the IPCC might have chosen, so clearly they chose an ensemble that, when combined, would roughly resemble reality and then declared to the world that the endemble method works best: Pure circular reasoning - we know what we want to see and this ensemble produces it. But since you can do exactly the same wiggle matching with a combination of graphs from any source it is by no means any kind of validation of climate models and hence of no real value whatsoever.

    Boasting about predicting past temperature wiggles is also of course extemely iffy because when you know the answer you can easily get there with totally false parameters.

    The sensible way is to use percentage error as the test of a model. This would have worked ok for the Douglass et al. paper as one model was apparently better than the others (the Russian one I believe). Then you must identify what parameters does this particular model run use?

    The only other useful validation i can think of would be spatial validations rather than temporal but then we all know they are useless at that. When they can predict local climates reliably in the past over a range of time periods then the models will have arrived. This idea that they are good for predicting climates 100 years hence based apparently on the fact that they are clearly so bad at everything else is quite a triumph of salesmanship over common sense.

  22. Having said that, the example of drought prediction is a perfect example of the misuse of models. ie Not one model can predict the current conditions even remotely correctly but 19 models produce the same result at some future point so the result is robust. Eh? The word robust clearly has been redefined to mean wrong but consistently wrong.

    Crucially you won't find any climate modeler who pretends models can be used for local climate prediction. This fact doesn't seem to stop the climate model users from doing it anyway. And the funny thing is that the results from these invalid exercises are always pessimistic.

  23. Ryan and co, that's a nice paper and you raise some good questions.

    The difficulty is that it is very hard to answer the question because the technical details of the models and their assumptions are not usually given. For example in your paper you have a tick box confirming that all the models 'include' CO2 as a 'forcing' and most of them 'include' 'solar'. But the key question is how do they include it, what assumptions do they make and what are the parameters.

    James, of course, is niffed that you didn't cite his paper, probably because it wasn't out when you were preparing yours. His paper is a triumph of spin over substance ("we find that the CMIP3 ensemble generally provides a rather good sample under the statistically indistinguishable paradigm") and fails to answer your concerns, as you point out in 16.

    To answer your questions properly would require a huge amount of technical detail which the climate scientists would be reluctant to provide, and a great deal of work for whoever did the job.

  24. adaptalready:I think you are confusing the "agreement problem" with the discussion of ensembles in our paper. They are not the same thing.

    We found many examples in which climate scientists show a number of models all saying the same thing, and then point to that agreement as a reason for increased confidence that the prediction is true.

    So, they were assuming a spread-skill relationship that you are saying needs to be validated?

    James Annan:
    However, I don't know of any system where "independence" (in the sense Ryan has defined the term) has any bearing on either the construction and use of ensembles.
    If I understand Ryan's definition, then I think it has bearing in a BMA approach. You wouldn't want to include two dependent models (models with the exact same assumptions / structure) because you'd be giving twice the probability to a certain set of assumptions for no good reason (there may be good reason, but we shouldn't do it 'by accident').

  25. Well I have read Jun, Knutti and Nychka and that doesnt deal with the question of independence either, it just does a similar statistical mangle like James's paper. James seems to be trying to change the question to one of statistics, which it is not. To summarise what the question is: "Do the climate models agree with each other because they are all accurate, or because they all make the same or very similar assumptions?"

    It is amusing that both JKN and James use periods up to the 1990s for their tests on temperature trends. They don't consider the period since 2000. Now why would that be, I wonder?!

  26. -17- Sharon
    "I thought that the way you would be sure that you are "right" is to run the model and see if the world continues to behave the way it predicts?"

    When that happens, it's encouraging, but it does not confirm the model. There are multiple ways to arrive at the same result (and of course there are many ways to evaluate model output), so a match between observations and model output does not mean the underlying causal processes in the model are correct. See

    Oreskes, N., Shrader-Frechette, K. and Belitz, K.: 1994, 'Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences', Science 263, 641.

  27. -20- James
    I'm not sure how else to explain to you that our paper is not about ensembles. The primary problem we are highlighting - AGREEMENT CLAIMS - is not the same as what you are talking about.

    I fully understand the concept of statistical independence you have referenced, and recognize that in the case of ensembles, in which you average across the ensemble members, that is the relevant kind of independence.

    But that's just not what our paper is about. Go back and look at our three examples: the papers by Seager et al.; Wang and Overland; and Kripalani et al. You will see that all three of them are invoking AGREEMENT across models to inspire more confidence. There is no averaging involved. I.E. they are saying, "all of our models say the same thing, so we should be more confident that the prediction is true."

    I've spoken with a number of modelers about this, and most have readily acknowledged that this is a problem, especially in cases involving a subset of GCMs, such as in our second and third examples.

    I'm genuinely interested in your thoughts on the validity of agreement claims that are not supported by a demonstration of model independence.

  28. -17- Sharon
    "Your statement implies that modeling is an expensive self-therapeutic exercise."

    There are many reasons to build a model. They can be very useful tools for integrating and assessing scientific understanding of a system. As the Oreskes paper I just cited argues, they can serve as heuristics for informing our ideas about what might happen in the future. In situations involving a short time horizon, such as weather forecasting, they can become very accurate and reliable.

    So I am not implying that models are useless. Our paper is focused on long term deterministic predictions of climate change, which is a very specific case.

  29. -24- jstults
    "So, they were assuming a spread-skill relationship that you are saying needs to be validated?"

    That's an interesting question. Perhaps that is an implicit assumption. I have not gone back to check, but I do not believe the concept was used explicitly in any of our main three examples.

    In the Seager example it's not even as sophisticated as spread-skill. They simply note that 18 out of 19 models (and 46 out of 49 total projections) indicate that precipitation will fall below a certain value by a certain time, so the spread is not even part of the argument.

  30. The concept of "agreement between models" is directly derived from the standard use of ensembles in probabilistic prediction, which (contrary to your apparent belief) is not just a matter of calculating the ensemble mean, but more generally usually considers other statistics of the full distribution. It is quite possible that some climate scientists are unaware of the theoretical underpinnings of common practice, but that doesn't mean that such underpinnings don't exist.

    I'm not saying that independence isn't a relevant issue, rather that your discussion of it seems to be based on several misconceptions and is so vague as to be meaningless. Further, your claim that it is not being actively considered by climate scientists is demonstrably false. [Contrary to what others seem to have assumed, I'm certainly not in the least put out that you didn't cite our GRL paper which is tangentially relevant, the other stuff I mentioned is from a couple of years ago and there has been ongoing discussion of this issue.]

    To restate more directly: the general use of the basic diagnostic of agreement between ensemble members has a sound statistical basis which does invoke a principle of "independence". However, it simply does not use the term in the way you define it. It is challenging to apply this (well-defined) concept of independence to the multi-model ensemble, but talking about "sharing assumptions" certainly doesn't help.

  31. Ryan, thanks for the citation. I hear that you are saying the model could be predictive but not correct. I agree with that.

    However, I do think the converse- that if a model is not predictive, it is not, ultimately, correct.

    Particularly I think that models of unknown predictive value should not be used in policy setting. Do you agree?

  32. -30-James Annan

    I think that you and Ryan are talking past each other. It seems that you are using the term "independence" in the sense of "statistical independence" such as sequential rolls of a die is independent from the next. Ryan is talking about "methodological independence" in the sense that no matter how many times that you roll a die, you are drawing from the same set of conditions -- a six-sided die.

    Let me try to illustrate ...

    What if you use a die to estimate some property, such as the odds of England winning the World Cup? We could assign a relationship between the number on the die and the probability of winning, say a 1 = 1/6 change and a 6 = 6/6 chance.

    We can then ask, do multiple rolls increase the accuracy, quality, skill (or other dimensions of forecast "goodness")?

    Clearly, multiple rolls will help to provide the statistics necessary to converge on an answer (the ensemble).

    Let me then say that I want to diversify my methodological approach, and I introduce a little roulette wheel with only six numbers. I can then calculate the statistics of multiple spins with the little roulette wheel. And because I have made this example up in this way, the statistics will be the same as with the die.

    Does the addition of another model add confidence to my forecast?

    The answer is no, because the die and the wheel are not methodologically independent. In fact, they are identical, and using the roulette wheel gives me no added value to just rolling the die a few more times.

    If I were to say that the confidence in my forecast of England winning the World Cup is increased because I have used a multiple model ensemble, I would be saying something incorrect.

    The question that Ryan et al. are asking is how do we know if (or more precisely, the degree to which) comparisons across climate models are akin to comparing the die and the wheel.

    This is an important question.

  33. adaptalready #27

    "You will see that all three of them are invoking AGREEMENT across models to inspire more confidence"

    But, but, but ... isn't "AGREEMENT across all models" (not to mention the lack of "relevance" of "independence" - according to James Annan #20) the very essence of the fundamental tenet of "robust" climate science?!

    IOW, those who worked with the models are right, and any questions you (or anyone else for that matter!) might have are "spurious" or "irrelevant" - or both ;-)

  34. adaptalready said... 28

    "In situations involving a short time horizon, such as weather forecasting, they can become very accurate and reliable."

    If I go to weather.com, which uses 'computer modeling' for forecasts, my current local(Greater Seattle) conditions are overcast with a 10% chance of rain rising to 90% by 3 PM.

    It's been raining since at least 5 AM and was raining last night at 10 PM .

    I know what the problem is, the weather station at the airport is 433 feet above sea level. Too high to pick up low stratus clouds.

    The genius's at NOAA think they can extrapolate conditions at the airport to the broader region.
    They can't, since they are unaware of a low cloud layer. The current conditions reported at the airport is a cloud layer at 1,000 ft , 10 miles of visibility and dry. The current conditions at my house, just 15 miles from the airport are a cloud layer at about 300 feet drizzling with visibility of maybe 3 miles.

    If I go to weather.com and look at the radar there is no rain. If I go to the radar provided by my local TV station it's raining everywhere in the greater Seattle region except the airport.

  35. Harrywr2--

    Up in Snohomish Cnty it has been raining fairly consistently for the past 2 days. I forecast there is a 0% chance that the model forecasts will catch up.

  36. Roger, Yes, I agree it seems clear that the authors do not understand the linkage between these concepts. The frequentist text-book toy examples generally have a very simple solution, and doing similar analysis of complex real world cases is very much harder, but the same fundamental principles and mathematical framework does apply.

    When people muddy the water with vague and useless definitions (such as "independent" in the above) then IMO it hinders, rather than helps, progress towards the goal, which is surely to provide credible quantifiable answers to these questions.

  37. -36-James Annan

    I do not think that the authors lack understanding of these concepts at all. Nor do I think that they have muddied the waters. They are asking important questions worth engaging.

    If you really want to criticize their use of the word "independent" you probably should go back to Levins (1996) whose article uses the term as Pirtle et al. does. You'll have a lot of work to do, because that paper has been cited 760 times according to Google Scholar, and has been very influential (in biology, not atmospheric science).

    But better yet, rather than complaining about terminology from other disciplines, why not just engage the arguments as made?

    Do you think that two results from different climate models are more robust than from one model? Why or why not?

  38. Seems Ryan and James are both promoting better schemes than currently used but since they are different they are in conflict about which is best.

    There's another way though where independence isn't an issue and we don't get into arguments about spread, priors, probability, significance etc - we just eliminate the clearly bad model runs by binary classification (true, false, positive, negative). This is used very successfully with ensemble model testing in non-climate areas as the ROC method. Some are trying to use it in climate ensembles though:

    So if we can use this to eliminate the clearly crap models and hence give us a smaller, bounded domain of better performing models we can properly identify a. whether any "signatures" are there with no bogus statistical arguments, and also (hopefully), b. obtain realistic regional climate predictions even from a combination of inadequate models.

    The main concern though is the eagerness to correct assumed "erroneous" data according to model output or even just ignoring historical data in favour of "robust" model agreement (done in too many impacts studies). Any validation procedure must put the emphasis on true/false versus the best data we have rather than assuming apriori that model outputs are somehow a substitute for real data.

  39. Roger, my first criticism is that they seem to be arguing that these issues are not already being addressed by a significant group of researchers - as a refutation I refer you to the Boulder IPCC Expert meeting and the papers I mentioned previously. The authors brief mention that this question "has not gone completely unnoticed" hardly seems a fair assessment of the state of play. It's been a hot topic for several years.

    I see that the Levins paper has been widely cited, but also widely criticised. I didn't find his definition of "independent" but my pdf is a scan and so not easily searchable (I did see the one phrase using the word that is repeatedly quoted in Ryan's paper, but it doesn't seem clear to me what it actually means).

    Of course in answer to your final question, the obvious answer is that two results are likely to be more robust, and the robustness of an ensemble depends not on whether their underlying inputs and assumptions are "independent" in the sense used here, but rather whether their range of inputs and assumptions is commensurate with the uncertainty in our understanding of reality.

    As a trivial example, let's assume our models use different values for pi. Other things being equal an ensemble using values ranging from 3 to 4 is likely to be robust (if there are no other major errors), one using a much wider range of values ranging from 15 to 150 may well not be. I don't see how "independence of assumptions" is in any way a useful metric for discussing the question. The issue is quality of input assumptions and whether their range is commensurate with the (mean) bias. Of course it's non-trivial to answer this, but we can at least start by asking the right questions!

    My final point is there there is a body of mathematical theory relevant to this issue, and vague exhortations from social scientists entirely divorced from any hint of quantitative theory or rigorous analysis are pretty much useless in actually addressing the questions.

  40. Tom fiddaman said...6

    "For climate models, there are vague worries about systematic errors in cloud parameterization and other phenomena, but there's no strong a priori reason, other than Murphy's Law, to think that they are a problem."

    It may be a lot more than "vague worries" with respect to cloud parameterization. The handling of clouds may be a significant flaw in the current climate models which may well invalidate the results obtained from them. Another factor which may not be well modelled is the behaviour of the oceans and their associated cyclic behaviours (PDO, NAO etc).

    Roy Spencer discusses some of these matters in his recent book "The Great Global Warming Blunder".

    The point here in relation to the debate about independence is that the models as a whole may all be missing significant physics in their representation of the climate system and in that sense, they are not "independent".

  41. Peter Huybers has a paper in press on this. He suggests model independence has been overestimated.


  42. -41- Edwin
    Thanks for pointing out that very interesting paper. I think the recommendations at the end are very much in line with the kinds of analysis we are advocating as a means of exploring independence:

    "Convergence between model results, if not truly driven by a decrease in model uncertainty or clearly understood as a result of calibration, could have the unfortunate consequence of lulling us into too great a confidence in model predictions or inferences of too narrow a range of future climates. To the extent that it occurs, tuning the models based upon expectation or convention renders the model partially sujbective exercises from which it becomes very complicated to derive a statistical interpretation.... It may also be sensible to push the most sophisticated models towards generating realizations of future climate that are as inconsistent as possible with current predictions, while still being physically sound."

  43. -31- Sharon
    "Particularly I think that models of unknown predictive value should not be used in policy setting. Do you agree?"

    I think this is a tough question, and the answer depends very much on the specifics of the decision context.

    Roger's paper on the Red River Flood provides a good example of how decision makers can come to rely on a flawed model to their detriment.

    On the other hand, models can be useful tools for exploring plausible scenarios, and clarifying available options under conditions of deep uncertainty. Some folks at RAND have been using models in this way as part of their "robust decision making" framework (note: not the same definition of robustness that we are using!).

    So I think we should be wary of using models of complex systems as crutches. They do not represent an easy way out of difficult decisions, and they're not necessarily the best option. But they can be very helpful if used in an appropriate manner.

  44. James actually has a good post on his blog regarding the traditional ensemble approach, how it's wrong, and why it became so popular:

    It seems he thinks the need for "consensus" forces a lot of clever people to use speculative paradigms rather than hard maths or science, even using the dreaded "groupthink" word. Of course skeptics/lukewarmers had noticed that long ago. A little rapprochement between camps would be nice (if he could ignore the wacky wingnuts like the rest of us have to). I see some brave people reaching across the aisle already.

    Nice that he noticed there is a vast body of established maths techniques that climate scientists ignore in favour of re-inventing their own (often square) wheels. I noticed this behaviour in my own field too. However, after initial resistance you can bring people around to copying the best of the rest. You never get any credit mind you :)

    At some future point in this incremental IPCC tanker pull, I look forward to him finding out that the probability shape of sensitivity should not be a gaussian around 3K (which has about the same mathematical credibility as a show of hands) but is more likely skewed with a mode at 1K.

  45. It may also be sensible to push the most sophisticated models towards generating realizations of future climate that are as inconsistent as possible with current predictions, while still being physically sound.
    A maximum entropy distribution, if you will...

  46. Climate models are like subprime loans. Models shouldn't be used in forecasting unless they've been verified and validated. No climate models have made the grade.

    So this question is similar to the one posed on Wall Street a few years back -- can we bundle together a whole lot of subprime junk and somehow produce some AAA rated prime as a result.

    Can we bundle together a whole lot of poor forecasting tools and produce a quality forecast?

  47. Multi model ensembles as used by the IPCC are best described as junk science or a smoke screen to justify a predetermined conclusion. The widely disseminated figure 9.5 from AR4 was generated by using tuned and cherry picked multi model ensembles. See my poster on this subject: