I read something interesting recently in Nate Silver’s book ‘The Art and Science of Prediction’ about (elite chess) players taking the best of three different computer models to win the game. For them, the task was less about being a player and more about coaching the best contributions from the models that they could then use.

Moving from chess to university research, is this a glimpse of how future university research will be done (develop multiple models that analyse the same vast data-set, then select individually or in combination from those)? One implication is that the demand for big data analysis in this sector will explode as models proliferate like virus mutations.

We’re already living in the age of ‘Big Data’ analysis with researchers crunching massive data-sets to uncover relationships and test out their theories At the same time, statistical theory continues to remind us that correlation isn’t the same thing as causation.

So although the historical data is real (or as real as we can get it using our best available technology to capture it), how much of the resulting output is ‘real’ because of the equations versus ‘more real’ because of the equations and the quality of the programming code? To elaborate, even if a researcher does (unknowingly) formulate the perfect, lengthy set of equations to essentially model something observable, how much is inadvertently ‘lost in translation’ by the data analysis coders? On a related note, perhaps our rate of innovation throughout history has been faster than we realised, it’s just our rate of proof of concept has been slow, since people lacked good tools to test the theories.

Finally, should we take a view that although correlation isn’t causation, perhaps various clusters of correlations can be modelled with the best cluster acting as a proxy for causation.

Why does this matter? Apparently various academic-published research results are ‘false positives’ i.e. hard for an independent set of researchers to repeat and get the identical results. The more this happens, the more it starts to debase all research findings, at least in the eyes of research-grant funders, who grow more skeptical about what they’re really funding.  Furthermore, where those grant funders get their funding from fundraising activity (charities, biotech companies issuing shares, research councils asking central government for more funding), the upstream donors also become increasingly skeptical.

If instead, leading researchers were more honest in their published articles (supported by the university establishment in its incentive structure) about declaring the best correlation cluster model found to act as a proxy for real causation, society’s expectations of researchers (as coaches coaxing approximations, not as lab boffins uncovering ultimate scientific truth) would become more realistic?