Characterizing Data Management Practices in Neuroscience

The following proposal was submitted as part of an application for an RDA Early Career Data Share Fellowship.

Research that provides neuroscience-based explanations for human behavior has historically enjoyed a high level of credibility and interest among both the scientific community and non-expert audiences. Unfortunately, the integrity of the methods underlying this research has recently been called into question due to the application of sub-optimal statistical practices and the discovery of long-standing software errors. As researchers in neuroscience, like those in psychology and medicine, deal with a crisis of reproducibility, the field has begun to cohere around a set of standards regarding how data should be collected, analyzed, and reported. While these standards address a range of issues related to experimental design, collaboration, and openness, they largely neglect a practical issue central to all three- how researchers manage their data and code.

Because of the size and complexity of the datasets involved, research data management (RDM) is vitally important in neuroscience. For example, projects involving neuroimaging methods like functional magnetic resonance imaging (fMRI) involve the acquisition and analysis of data in the form of brain images, questionnaire responses, behavioral measures, and sensitive medical history information. In terms of both the financial investment and researcher hours involved, fMRI research is quite expensive. Open data sharing has been proposed as a method for maximizing the value of individual fMRI datasets and increasing reproducibility for the field as a whole. But, because there has been little effort to standardize RDM practices within or between labs, neuroscience data sharing initiatives have been hampered by the fact that researchers often organize, document, and save their data and code in very different ways.

The value of RDM can be established using capability maturity models which describe the extent to which practices are defined, standardized, and implemented. However, at present, there is a lack of empirical information about the maturity of RDM practices within neuroscience and how such practices relate to either reproducibility or open data sharing. Therefore, I propose a twelve month project with the following three specific aims:

  1. Characterize the breadth and maturity of RDM practices in neuroscience, a community which has a strong incentive to define data-related standards and practices, but also a complex and diverse set of of data-related requirements.
  2. Examine the relationship between RDM maturity and data sharing practices in neuroscience, especially practices related to the assessment, citation, and publication of data and code.
  3. Develop a research instrument that can be used to characterize the breadth and maturity of data sharing practices within and between cognate research areas (e.g. psychology, biology, and medicine).

The critical difference between this project and related efforts like the Digital Curation Profiles and DMVitals is that I aim to engage researchers on their own terms. To address my first two specific aims, I will draw upon my previous work in neuroscience and my current work in software and data curation to a create a survey instrument that will frame questions about RDM maturity, reproducibility, and data sharing within a discipline-specific context. I will distribute my survey as widely as possible so that I can collect and share data in a form that will be recognized and accepted by members of the research community. I will then apply my experience collecting, analyzing, and disseminating data from the neuroscience community to develop the instrument described by my third specific aim.

An RDA/US Data Share Fellowship would benefit me both in my current role as a postdoctoral fellow at the California Digital Library (CDL) and in my long-term career trajectory as digital curation specialist invested in ensuring the integrity of the research process through open data sharing. At CDL, I am currently working on two efforts that serve as precursors to this project: a rubric allowing researchers to assess the maturity of their own RDM practices and a survey of how researchers across the academy use, share, and value software and computer code. The proposed project would not only serve as a complement to my current cross-disciplinary work, but would also allow me to directly engage with the practices, perceptions, and concerns of the research community.

Successfully addressing my three specific aims will require me to engage with a wide variety of research data stakeholders, including many whose standards, practices, and motivations for managing research data and code differ significantly from my own. This fellowship would provide a launchpad for me to both raise my profile and extend my reach within the neuroscience and digital curation communities. Perhaps more significantly, this fellowship would also allow me to cement my connection to the international community of individuals, organizations and policy makers invested in supporting the open sharing of data.

As an RDA/US Data Share Fellow, I propose to work directly with the Reproducibility Interest Group and the Libraries for Research Data Interest Group. I would also be interested in engaging with other groups, such as the Education and Training on Handling of Research Data Interest Group and the Active Data Management Plans Interest Group, that are also working to develop best practices, policies, and standards related to the management of data and code.

Timeline

Because researchers in neuroscience have only just begun to define data-related standards and practices, now is the ideal time to characterize RDM gather data on discipline-specific RDM, reproducibility, and data-sharing practices. Twelve months would give me sufficient time to develop my initial survey instrument, solicit feedback from the neuroscience and digital curation communities, and begin the preliminary data collection needed to inform a more broadly focused survey instrument.