Research Data Management – a new area for the library

During spring 2017, the library has participated in an education in managing research data. The education has been led by the Swedish National Data Service (SND) and has included several different aspects of managing research data; creation of data management plans, description of data, file management, archiving and making data available to others. For three whole days, the library team for research support, along with archivist and legal experts at the university, has studied and discussed these issues. The education has been very rewarding, and given a deeper insight to how complex these issues are, not least the legal aspects of data management. This becomes especially clear when research is about people, and personal information is handled.

After the education at SND, a training package for researchers has been planned out, and a test of this has been carried out during the late spring. For two half days, four researchers at the university have participated in lectures and workshops on the management of research data, focusing on their own data. To be able to deepen the discussions with researchers who are experts in their own data, has been rewarding to all involved (including researchers). The researchers who participated in the training are Daniel Ekwall, Helena Francke, Katarina Karlsson and Laura Darcy.

The first half day was about data management plans. Data management plans are really no news in the research process. What’s new is that the data management plan is a coherent document answering all questions about why and how data is collected, how it is preserved, and who has access to it. This document needs updating continuously during the research process. Previously, similar issues may have been raised for research applications but not at the same level of detail. Some tools that could facilitate the work on data management plans were demonstrated.

The second half day was used to talk about legal aspects of data management and archiving of research data. The focus was on the new data protection regulation, which will come into force in May 2018. The four researchers had many questions regarding the handling of personal data in the light of the new regulation.

The education at SND will be the basis for establishing a working group at the University of Borås, whose task is to assist researchers with data management plans, archiving research data and making research data accessible. Currently, the prospective group is called Data Access Unit (DAU). Similar work is ongoing at most Swedish universities since the issue of archiving and open-source research data is high on the EU agenda (Horizon 2020, for example, requires open-source research data) and in Sweden it is assumed that many research funding will in future require the inclusion of data management plan in the application for research funding and open access to research data.

Do you want us to come to your research group for a conversation or workshop about research data and data management plans? Please contact us!

Read previous posts about research data in Forskningsrelaterat.

What is a data management plan good for?

A data management plan (DMP) is a term that we hear quite often nowadays. It is not uncommon outside Sweden that researchers have to submit a DMP to be able to apply for research funds. This will probably be the case in Sweden as well in the future, and it is therefore of importance that researchers in Sweden are informed what a DMP is and how they can use it for their own benefit.

A DMP is pretty much what it sounds like – it’s a document describing how you plan to manage your data during and after a research project. The document describes things like how the data will be stored, if and how you are going to make the data freely available, and what kind of data you are working with. But a DMP is more than just a document for administration. Foremost it’s a document for the researcher to simplify the process of research. If you use it, many tasks relating to your research will gain from it.

The most important aspect of the DMP is that it significantly simplify for the researcher to return to a research project at a later point in time. A well structured and documented DMP makes it possible for you to get an overview over what data you have used before, what role they have in the research, and why you have made certain decisions.

What does a data management plan contain?

There are several guides to what a DMP should contain. Digital Curation Centre has a checklist, for example, that lists questions regarding administrative aspects, data collection, metadata and documentation, and more. By going through the checklist and answer the questions you will take a stance on several important issues regarding your research.

A living document

A DMP is not supposed to be just an administrative task when applying for research funds. You should update the document regularly when you make new decisions in your research. If you for example make changes in your data, removing or adding a column, or changing a definition, you should write this down in your DMP. This way you can always go back and check the exact process of your work.

Why should you write a data management plan?

There are several reasons why to write a DMP, except the obvious one that it might be needed to require funding. A DMP is a good way to structure the research process and ahead of time reflect on several important decisions about the research. If you work in a research team a DMP can be a way to help distribute fields of responsibility between the team members. The DMP also makes the task to describe and make a plan for your research data, both for making them freely available and to make sure you yourself can re-sue them at a later time. If someone questions your research, you have a document where every decision is recorded. This makes it very easy to defend your choices during the research, even several years after the project has ended. Lastly, you may want to return to a research project a couple of years later. A DMP makes it easy to read up on everything that you did, and makes sure that you don’t forget anything important concerning the project or its data.

Kristoffer Karlsson

Who owns the research data?

Who actually owns the data that scientists collect? Who has the right to request to see collected research data? Do researchers bring their collected data if they start working at another institution? When it comes to research data are many questions. In this post I hope to clear up some of them.

Who owns the research data that researchers collect? The institution where the researcher work is responsible for the research conducted at the university. That means that the institution has ownership of the research data collected by researchers at the institution. According to the Freedom of the Press Act (1949:105) and the Public Access and Secrecy Act (2009:400), the institution is responsible for archiving and giving access to the research data, as well as to protect it from unauthorized access.

Can scientists bring their research data if they start working at another university? Since the institution has ownership of research data, the researcher is not entitled to bring the data to another institution without the approval of the institution where data were collected. The researcher can request to have their research data extradited using the Public Access and Secrecy Act.

Who has the right to request research data? Thanks to the Public Access and Secrecy Act, all Swedish citizens have the right to request to see the collected research data. The institution shall hastily make the requested research data available to the person who has requested it. The exception is if the requested data are classified. In such cases, the university deny the request.

When is research data classified? Research data concerning for example study participants’ health or sex life, psychological studies, or health condition and personal circumstances may be classified. The fact that a document is classified means that not everyone has the right to request the data, and that the institution may deny a request to access it.

What is a public document? Any text, image, or other recording that can be read, listened to, or otherwise comprehended only using technical aids is defined as a document (Freedom of the Press Act (1949:105)). Any document held by a public authority, or received and prepared by an authority, is classified as a public document. Virtually all research documents, in the form of surveys and questionnaires, video and audio playback and more are considered public documents, and as long as they are not classified, anyone can make a request to access them.

Kristoffer Karlsson

Data quality & documentation

It is the international love your data week. The aim of the week is to pay attention to research data and help researchers to improve their data management. This blog post is about data quality and documentation and description of research data. Data management is often considered just refer to quantitative data but qualitative data must also be documented and described to ensure data quality. However, researchers with qualitative data seem to be more concerned with ethical issues such as anonymization, confidentiality and problems of someone using qualitative data to answer research questions the data was not collected for.

Data quality is about the quality of the content, the values of a dataset. This means that data has to be complete (all data must be there), accurate and current. Data quality is also about completeness, validity, consistency, timeliness and accuracy. Furthermore, data quality ensures that data is useful, documented and reproducible/verifiable.

Collecting data, storage/access and formatting are activities which affect data quality. The responsibility of these activities is on both the data provider and data curator. Data curation is often taken care of by archivists and librarians. Archivists make sure datasets are preserved and librarians add metadata to datasets. It is also often librarians making data available for others. Just remember a dataset is not automatically available even if it is archived and preserved.

Documentation of data is about to increased transparency, and trust, in the research process. It has to do with validation, reproducibility and reusability. It is important to document data to contribute to data quality and usability of the data for the researcher him/herself, colleagues, students and others. The write-up process can become more efficient and stress-free when the data is well described and has a well thought of structure. Also the work of a research group becomes easier when the data is thoroughly described and has a structure and possible questions during the peer review process are easier to answer.

Research today is measured with different indicators, one of which is the number of citations. If datasets are citable it might give a researcher advantage in the research fund application process and promotion process. Documentation also increases integrity of research because the process becomes more transparent. Furthermore public trust in research might increase as well. If public trust does not increase it might at least affect colleagues in research.

Harvard Business Review (HBR) has published an article where IBM estimates the yearly cost of bad data up to $3.1 trillion in USA alone in 2016. So there is a lot to be done when it comes to data quality. Cost estimate is based on the time and costs generated from the work decision makers, managers, knowledge workers and data scientists use to correct bad data in order to be able to do their work. This cost is related to costs in organizations where e.g. sales department gets an order wrong and the faulty data is then inherited by another department, not data produced in research context. Nevertheless, it is important to consider costs (not necessarily money) in research.

Retraction watch, a blog tracing retracted publications, reports of a case where a researcher noticed problems in the database he was using to investigate trends in extinction patterns. The problem the researcher noticed had impact on two of his publications. One of them is new retracted. In this case there were problems in data collection and the database used which impacted conclusions made. When the problems where corrected the results of the study changed.

Here you can find examples of bad data. Click on the image to come to an explanation what is the problem with the dataset.

Pieta Eklund