The Significance of Managing Research Data

-This post initially appeared on the UC3 Blog

Some of the most influential research tools of the last century were created to ensure the quality of beer and extrapolate the results of agriculture experiments conducted in the English countryside. Though ostensibly about the placement of a decimal point, an ongoing debate about the application of these tools also provides a window for understanding what it actually means to manage research data.

The p-value: A very quick introduction

Though now ubiquitous in experiment-based research, statistical techniques for extending inferences from small sample (e.g. the participants in a research study) to larger populations are actually a relatively recent invention. The t-test, an early and still widely used example of “small sample” statistics was developed by William Sealy Gossett in the early 20th century as an economical way of ensuring the quality of stout. Several years later, while assisting with long-term experiments on wheat and grass at Rothamsted Experimental Station, Ronald Fisher would build on the work of Gosset and others to develop a statistical framework based around the idea of comparing observations to the null hypothesis- the position that there is no significant difference between two or more specified sets of observations.

In Fisher’s significance testing framework, devices like t-tests are tests of the null hypothesis. The results of these tests indicate the likelihood of observing a result when the null hypothesis is true. The logic is a little tricky, but the core idea is that these tests give researchers a way of understanding the likelihood that their data is the result of sampling or experimental error. In quantitative terms, this likelihood is known as a p-value. In his highly influential 1925 book, Statistical Methods for Research Workers, Fisher would introduce an informal threshold for rejecting the null hypothesis: p < 0.05.

In one of the most influential sentences in modern research methodology, Ronald Fisher describes p = 0.05 as a convenient point for judging the significance of a statistical test. From: Fisher, R.A. (1925). Statistical Methods for Research Workers.

Despite the vehement objections of all three, Fisher’s work would later be synthesized with that of statisticians Jerzy Neyman and Egon Pearson into a suite of tools that are still widely used in many fields of research. In practice, p < 0.05 has since become a one-size-fits-all indicator of success. For decades it has been acknowledged that work that meets this criterion is generally more likely to be reported in the scholarly literature while work that doesn’t is generally relegated the proverbial file drawer.

Beyond p < 0.05

The p < 0.05 threshold has become a flashpoint the ongoing conversation about research practices, reproducibility, and replicability. Heated conversations about the use and misuse of p-values have been ongoing for decades, but over the summer a group of 72 influential researchers proposed a seemingly simple step forward- change the threshold from 0.05 to 0.005. According to the authors, “Reducing the p-value threshold for claims of new discoveries to 0.005 is an actionable step that will immediately improve reproducibility.”.

As of this writing, two responses have been published. Both weigh the pros and cons of p < 0.005 and argue that the placement of a decimal point is less of a problem than the uncritical use of a single one-size-fits-all threshold across many different circumstances and fields of research. Both end on calls for greater transparency and stronger justifications for how decisions related to research design and statistical practice are made. If the initial paper proposed changing the answer from p < 0.05 to 0.005, both responses highlight the necessity of changing the question from one that is focused on statistics to one that incorporates research data management (RDM).

Ensuring that data can be used and evaluated in the future is one of the primary goals of RDM. For example, the RDM guide we’re developing does not have a space for assessing p-values. Instead, its focus is assessing and advancing practices related to planning for, saving, and documenting data and other research products. Such practices come with their own nuance, learning curves, and jargon, but are important elements to any effort to ensure that research decisions are transparent and justified.

Resources and Additional Reading

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., … & Cesarini, D. (2017). Redefine statistical significance. Nature Human Behaviour. doi: 10.1038/s41562-017-0189-z

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A. J., Argamon, S. E., … Zwaan, R. A. (2017). Justify your alpha: A response to “Redefine statistical significance”PsyArxiv preprint. doi: 10.17605/OSF.IO/9S3Y6

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2017). Abandon statistical significance. arXiv preprint. arXiv: 1709.07588.

Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versaJournal of the American Statistical Association54(285), 30-34. doi: 10.1080/01621459.1959.10501497

Rosenthal, R. (1979). The file drawer problem and tolerance for null resultsPsychological Bulletin86(3), 638-641. doi: 10.1037/0033-2909.86.3.638

Advertisements

Characterizing Data Management Practices in Neuroscience

The following proposal was submitted as part of an application for an RDA Early Career Data Share Fellowship.

Research that provides neuroscience-based explanations for human behavior has historically enjoyed a high level of credibility and interest among both the scientific community and non-expert audiences. Unfortunately, the integrity of the methods underlying this research has recently been called into question due to the application of sub-optimal statistical practices and the discovery of long-standing software errors. As researchers in neuroscience, like those in psychology and medicine, deal with a crisis of reproducibility, the field has begun to cohere around a set of standards regarding how data should be collected, analyzed, and reported. While these standards address a range of issues related to experimental design, collaboration, and openness, they largely neglect a practical issue central to all three- how researchers manage their data and code.

Because of the size and complexity of the datasets involved, research data management (RDM) is vitally important in neuroscience. For example, projects involving neuroimaging methods like functional magnetic resonance imaging (fMRI) involve the acquisition and analysis of data in the form of brain images, questionnaire responses, behavioral measures, and sensitive medical history information. In terms of both the financial investment and researcher hours involved, fMRI research is quite expensive. Open data sharing has been proposed as a method for maximizing the value of individual fMRI datasets and increasing reproducibility for the field as a whole. But, because there has been little effort to standardize RDM practices within or between labs, neuroscience data sharing initiatives have been hampered by the fact that researchers often organize, document, and save their data and code in very different ways.

The value of RDM can be established using capability maturity models which describe the extent to which practices are defined, standardized, and implemented. However, at present, there is a lack of empirical information about the maturity of RDM practices within neuroscience and how such practices relate to either reproducibility or open data sharing. Therefore, I propose a twelve month project with the following three specific aims:

  1. Characterize the breadth and maturity of RDM practices in neuroscience, a community which has a strong incentive to define data-related standards and practices, but also a complex and diverse set of of data-related requirements.
  2. Examine the relationship between RDM maturity and data sharing practices in neuroscience, especially practices related to the assessment, citation, and publication of data and code.
  3. Develop a research instrument that can be used to characterize the breadth and maturity of data sharing practices within and between cognate research areas (e.g. psychology, biology, and medicine).

The critical difference between this project and related efforts like the Digital Curation Profiles and DMVitals is that I aim to engage researchers on their own terms. To address my first two specific aims, I will draw upon my previous work in neuroscience and my current work in software and data curation to a create a survey instrument that will frame questions about RDM maturity, reproducibility, and data sharing within a discipline-specific context. I will distribute my survey as widely as possible so that I can collect and share data in a form that will be recognized and accepted by members of the research community. I will then apply my experience collecting, analyzing, and disseminating data from the neuroscience community to develop the instrument described by my third specific aim.

An RDA/US Data Share Fellowship would benefit me both in my current role as a postdoctoral fellow at the California Digital Library (CDL) and in my long-term career trajectory as digital curation specialist invested in ensuring the integrity of the research process through open data sharing. At CDL, I am currently working on two efforts that serve as precursors to this project: a rubric allowing researchers to assess the maturity of their own RDM practices and a survey of how researchers across the academy use, share, and value software and computer code. The proposed project would not only serve as a complement to my current cross-disciplinary work, but would also allow me to directly engage with the practices, perceptions, and concerns of the research community.

Successfully addressing my three specific aims will require me to engage with a wide variety of research data stakeholders, including many whose standards, practices, and motivations for managing research data and code differ significantly from my own. This fellowship would provide a launchpad for me to both raise my profile and extend my reach within the neuroscience and digital curation communities. Perhaps more significantly, this fellowship would also allow me to cement my connection to the international community of individuals, organizations and policy makers invested in supporting the open sharing of data.

As an RDA/US Data Share Fellow, I propose to work directly with the Reproducibility Interest Group and the Libraries for Research Data Interest Group. I would also be interested in engaging with other groups, such as the Education and Training on Handling of Research Data Interest Group and the Active Data Management Plans Interest Group, that are also working to develop best practices, policies, and standards related to the management of data and code.

Timeline

Because researchers in neuroscience have only just begun to define data-related standards and practices, now is the ideal time to characterize RDM gather data on discipline-specific RDM, reproducibility, and data-sharing practices. Twelve months would give me sufficient time to develop my initial survey instrument, solicit feedback from the neuroscience and digital curation communities, and begin the preliminary data collection needed to inform a more broadly focused survey instrument.