Consult existing data

(based on Rule 2 of the article doi:10.1111/2041-210X.14033)

Build on existing trait resources to reduce the likelihood of redundancy and ensure compatibility with current data. The decision when to collect new trait data is generally based on the research question, the scope of the analysis (e.g. local, global), and the availability of the existing data. Financial and geographic constraints may also influence the decision to use current trait data instead of embarking on a measurement campaign. However, the existing trait data must be ‘fit for purpose’ to avoid compromising the capacity to answer the research question and in many cases, new trait measurements will still be needed.

Check public data sources:

Most data probably exist decentralised as individual trait datasets in the form of raw data attachments to publications, data papers, or data uploads to unspecific public databases (e.g. Zenodo https://zenodo.org, Dryad https://datadryad.org). However, these datasets can be challenging to find if not registered at central hubs (e.g. https://opentraits.org). To counter this challenge, dedicated centralised trait databases have been and continue to be developed (e.g., TRY (Kattge et al., 2020), Encyclopedia of Life (EOL) TraitBank (Parr et al., 2015), Marine Traits Portal of the World Register of Marine Species (WoRMS, Marine Species Traits editorial board), AusTraits (Falster et al., 2021). Common to these efforts is the fact that they contain already harmonised, error-checked, and standardised values. These resources usually provide user-friendly interfaces for searches and dynamic, up-to-date aggregations of data. Particularly for studies of larger scale (e.g. many taxa, many bioregions), it often makes sense to consult these existing big databases and data registries.

Identify and cite data origins:

Trait data are not always raw or first-hand: they can be created and perhaps aggregated from original observations and measurements (e.g., Kattge et al., 2020) but also mobilised from literature or undigitised legacy trait data (e.g., Parr et al., 2015), synthesised as imputed trait data (e.g. Penone et al., 2014), reused from data publications (e.g. Kattge et al., 2020), or mined from texts with automated algorithms or other contexts (Thessen et al., 2018). Thus, when reusing trait data, it is essential to check and report information about the source to downstream analyses and subsequent publications (i.e., data provenance). Importantly, providing this information also gives credit to the original trait data collectors.

Fill the gaps:

Existing databases are taxonomically and biogeographically biased, ‘gappy’, and traits assigned to the same species are rarely collected in the exact locations or conditions (Etard et al., 2020; Penone et al., 2014). Despite the presence of large trait databases, new trait collections continue to remain valuable. When collecting new data, we encourage researchers first to check available trait databases, identify such gaps, and contribute to the broader trait community by filling these gaps even if this collection goes beyond the current project. Additional traits may be easily collected with little extra effort, yet provide the possibility to close gaps in trait coverage. Filling gaps may be especially valuable in biodiverse but hard-to-access regions (Etard et al., 2020), for rare but functionally important species which may be less likely to have traits documented (Leitão et al., 2016), or for threatened species which will benefit from functional approaches to their conservation (Gallagher et al., 2021).