The Petabyte Age Deconstructs the Scientific Method

Scientific method has recently been called out by Peter Norvig, Google’s research director, at the O’Reilly Emerging Technology Conference in March 2008 when he offered an update to George Box’s maxim: “All models are wrong, and increasingly you can succeed without them.” Chris Anderson of Wired reported on the potential shift in the scientific method in an article, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.”

Anderson identifies Google’s success during the “The Petabyte Age” as an indicator of this shift. The availability of massive amounts of data that can be synthesized into meaningful statistics could very well change the future of research. “It forces us to view data mathematically first and establish context later,” he wrote.

The idea that you need a model of how things happen before you can connect data to a correlation of events might be on the way out. With access to enough data, the statistics themselves are significant. “Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity,” wrote Anderson.

This use of data without context has huge implications for research. If you can use the resulting statistics to say, “This is what is happening” before the research is fully conducted, getting people on board to find out how and why might be easier. If you can guarantee correlation before the research is fully conducted, finding support to prove underlying mechanics could be considerably easier.

A program called Cluster Exploratory has been developed to provide funding for research designed to run on a large-scale computing platform. This could be the first of many funding programs for research pertaining to finding derived from this data and lead to substantial scientific findings. Anderson wrote, “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”