What’s the Real Impact of Poor Scientific Data Management?

The amount of research data being generated is currently increasing by 30% every year. Worryingly, one study has found that the odds of sourcing datasets decline by 17% each year, and a massive 80% of scientific data are then lost within two decades (Vines, T.H. et al., 2013). It’s stats like these that are making research data management a pressing issue for the scientific community, not just for lab management teams, but for every individual researcher.

From our discussions with scientists over the problems and challenges they face in their work, the difficulty of managing and accessing their data is one of the most common issues cited. In an extreme case of data management issues, we heard from biologist Billy Hinchen, who told us, “I lost 400GB of data and close to 4 years of work after my laptop was stolen. As a result I ended up getting an M.Phil rather than a PhD.”

A growing challenge

The concern is that as data output grows, effective data organization is only going to get more difficult. And if data continue to be managed poorly, then science will ultimately suffer. At best, experiments will be hard to replicate and findings called into question. At worst, papers will be retracted and careers impacted.

We’ve carried out some investigation into the statistics around data management, and have produced an infographic (https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/) that tells the story of the impact of poor scientific data management.

This includes the key reasons to protect your scientific data—and the statistics to back this up. With data output growing rapidly and significant investment in scientific research and development, data are still not being managed effectively. Much of the data remain unverifiable, and both time and money are being wasted, which is impacting science and society. In response, funders now require data management and sharing policies and 34 countries have signed up to the “Declaration on Access to Research Data from Public Funding.

Our view is that it’s time to start protecting and managing scientific data better. But how can this be done?

Tools for data management

There is a growing effort among start-ups to resolve this data management problem, and Digital Science is one such business leading the way with software designed to make research more efficient. Two of our tools are already proving popular by helping researchers better manage their data. figshare is an open data tool that allows researchers to store their outputs securely in the cloud, share them privately with lab-mates and collaborators, or make them public in the name of open research with a permanent Digital Object Identifier (DOI). Projects is a new application that helps scientists stay on top of all their research with a simple, safe, and structured way to manage and organize data files on the desktop.

Other options for data management include trying to make the raft of generic tools fit into scientific workflows. Examples of such tools being used by the scientific community are organizational apps like Evernote, cloud storage services like Dropbox, and Google Drive and code hosting sites like GitHub. However, having not been designed specifically for scientists, they have their drawbacks. There are also various electronic laboratory notebooks available to help scientists share data easily as a team and collect notes and metadata about their research and protocols, like Labguru.

Have you been affected by data availability issues? We’re keen to hear your stories.

Nathan Westgarth is Product Manager for Research Tools, Digital Science; www.digital-science.com.