Publish trait data together with metadata

(based on Rule 8 of the article doi:10.1111/2041-210X.14033)

Openly publish trait data to facilitate answering yet unknown questions beyond their original study, lay the groundwork for understanding ecological processes beyond clearcut niches (Elton, 1927; Schneider et al., 2019), and democratise access to valuable trait datasets (Soranno et al., 2015). Each data point of trait measurements has a considerable value for the scientific community and future generations working on trait-related research questions.

Consider the stakeholders:

As our scholarly processes evolve to better find, access, integrate, and reuse scientific data, we face the communal task of treating trait datasets as first-class research citizens. However, doing so is not easy as it involves different stakeholders: publishers have to make their publications open and FAIR (Wilkinson et al., 2016), scientists have to improve their skills to publish, reuse and correctly cite datasets, and funding agencies have to find ways to reward exemplary projects. A welcome development is that many publishers now consider trait data papers (e.g., Falster et al., 2021; Guerrero-Ramírez et al., 2021; Tobias et al., 2022; Vandvik et al., 2020), which allow for a detailed methodological and context description, open access, and at the same time, accreditation of trait data collectors by citations.

Accept the additional responsibility:

Erroneous data might bias a current project, but also the future works of others. Currently, no common established practices exist on how peer review is also extended to trait data. A way to ensure that a dataset conforms to community standards is to submit it to an established curated database (e.g. TRY (Kattge et al., 2020) for plant traits; Coral Traits (Madin et al., 2016) for anthozoans). Further, consider publicly depositing raw and processed data and clearly differentiating between the two types. This allows tracing errors generated during processing and grants future users access to the original values.

Aim for redundancy:

Public trait data suffer from the same generic issues as other data, e.g. hardware failures, linkrot (URLs not entirely reliable), or content-drift (content changes, but URLs do not, Koehler, 1999). To mitigate such issues and reliably preserve data in the long term, data can be submitted to multiple repositories, e.g. beside trait databases, also in general storage platforms such as FigShare (https://figshare.com) or Zenodo (https://zenodo.org). This procedure, however, requires systematic methods to track changes and separately citable versions, e.g., by unique DOIs.

Make data accessible for machines and humans:

In order to facilitate trait data reuse in general, machine-readable and non-proprietary data formats should be preferred (i.e. plain csv over excel or pdf). In this context, the license under which data is released should also be correctly chosen (e.g., CC0, Creative Commons, 2009). When reporting already published data, future studies might also run the risk of using the same trait from independent sources, thus resulting in pseudo-replication of measurements. This makes it important to render data traceable throughout the life cycle; especially because trait data collections often carry large numbers of references and republished original data. Data and reference tracing thus particularly calls for systematic, reproducible and automated methods (Elliott et al., 2020) that rely on machine-readable data.

Register trait data:

Independent of the choice of actual data deposition, it important that datasets are registered in a trait data registry (e.g., https://opentraits.org) to allow fellow scientists to find the data quickly.