Data quality can be defined as an essential and necessary characteristic for data to be “fit for use” or have “potential valuable use” (Chapman, 2005).
When data are used without a critical view of the potential errors they contain, the results of their analysis can lead to erroneous conclusions and unwise decisions based on unreliable evidence. Data will have quality when the information derived from it correctly represents the real world (the facts).
In order for the data generated to influence decision making, it is important to consider the concept of data quality throughout the entire information chain. Listed below are several useful documents and tools for validating, structuring and cleaning biodiversity data.
Tools
Data cleaning and structuring
Name | Description |
---|---|
Darwin Core - Excel Template Generator | This tool, developed by the GBIF Norway node, makes it possible to generate a Darwin Core (DwC) template from checkboxes for each of the elements of the standard. |
Online UUID Generator | In case you do not have unique identifiers for each record or event in your dataset (occurrenceID or eventID), this tool allows you to generate up to 500 Universally Unique Identifiers (UUID). |
EML generator | This tool, developed by the GBIF Norway node, makes it possible to generate an EML file by filling in the standard fields, so that they can later be loaded into an IPT resource. |
Cross-table to list converter | When data are organized in two-dimensional tables, this simple tool, developed by the GBIF Norway node, makes it possible to convert them to a one-dimensional list. |
OpenRefine | It allows you to clean, transform and format data, use web services, mass field correction, among many others. Learn more |
OpenRefine - Scripts for biodiversity Data Quality | Repository of data quality routines implemented in open source software. OpenRefine, based on free, free and easy-to-use software tools. Learn more |
Data Validator | It detects possible problems in the structure and content of the datasets, improving the quality of these to be published through SIBUy, GBIF and OBIS. Learn more |
R Project | It allows to clean and transform data through packets that are uploaded to the software. |
R Studio | Facilitates the visualization of the R Project tool and integrates different functional windows. |
LifeWatch - Data Services | Through the connection with different web services, the tool allows the validation of formats, DwC elements for publication in OBIS, taxonomy and geography. |
IPT - Integrated Publishing Toolkit | The GBIF Publication Tool (IPT) is an open source web application, available free of charge, that facilitates the publication of biodiversity data. During the process of accompanying the data publication, the SIB Uruguay Team will provide you with a user and password for metadata documentation. Learn more |
Dates
Name | Description |
---|---|
Canadensys - Date parsing | Performs massive date conversion to ISO8601 format: YYYYY-MM-DD. Learn more |
Name and taxonomy validation
Name | Description |
---|---|
Species Matching | Normalize species names from a CSV file according to the GBIF taxonomic tree. The file to be submitted must contain a column named ‘scientificName’ and optionally the column ‘kingdom’ (for the Kingdom) and ‘id’ (for an identifier). Learn more |
WoRMS Taxon match | Automatically check a species list or taxon list against the World Register of Marine Species - WoRMS. After matching, the tool will return your file with AphiaIDs, valid names, authorities, WoRMS classification and/or any other output you have selected. Validates max. 1500 records. https://biodiversidad.co/ |
TNRS | The TNRS (Taxonomic Name Resolution Service) tool allows the standardization of botanical scientific names from taxonomic sources, such as Tropicos, USDA y TPL with the dynamic list of the Catalogue of Life. |
Global Names Resolver | Resolves lists of scientific names against known sources. This service separates scientific names, identifies exact or ambiguous matches, and displays a matching tip. |
Regi0 | It is a Python package developed by the Biodiversity Assessment and Monitoring Program of the Alexander von Humboldt Biological Resources Research Institute, with useful functions to complement and verify biological records. These functions are divided into 2 main modules (geographic and taxonomic) and are based on both user data and various web APIs (e.g. GNR, IUCN and Species+). |
GBIF - Name parser | Separates scientific names into their various components from the name entered. It allows to interpret most scientific names and to atomize them independently of their nomenclatural code. |
Global Names Index | It allows correcting and/or linking information about any taxon through a process of “reconciliation” between names, since it contains examples of scientific names written with some variation. |
Geographical cleanliness
Name | Description |
---|---|
Canadensys - Coordinate conversion | Performs mass conversion of geographic coordinates (degrees, minutes and seconds) to decimal degrees. Learn more |
MarineRegions | It is a standard list of globally geo-referenced marine names and areas. It integrates and provides geographic information from the VLIMAR geographic index and the MARBOUND database, and proposes a standard of marine locations, boundaries and georeferenced regions. |
ispecies | It allows to visualize on a map the biological records of a specific species. The records are linked to the GBIF Data Portal, where specimen-specific information can be consulted. |
GEOLocate | Allows georeferencing and confirmation of locations. A desktop application is also available. |
GPS Visualizer | Permite crear mapas y perfiles a partir de datos geográficos. La entrada de los datos puede ser en forma de datos de GPS, rutas, direcciones de calles o coordenadas simples. |
GeoNames | Allows you to create maps and profiles from geographic data. Data input can be in the form of GPS data, routes, street addresses or simple coordinates. |
OBIS map tool | It can be used to geocode locations to match coordinate pairs or coordinate strings in WKT format. WKT strings are textual representations of geometries such as points, polygons and lines. |
Calculadora Geodésica | Allows the conversion or transformation of coordinates in up to 18 different systems. |
OBIS Plotter | It is a very simple tool to quickly check points on a map. It requires input a delimited text format (e.g. CSV or Excel paste) and that the data have a decimal longitude column: ‘decimalLongitude’; and decimal latitude: ‘decimalLatitude’ for the coordinates. Thus, it is possible to select a field of interest from the original table to change the color of the points and the label displayed when a specific point is clicked. Learn more |
geo:truc | Allows to obtain the coordinates of a selected point on the map through google maps. |
CartoDB | Allows you to import and visualize geospatial data by creating dynamic maps. |
infoXY | When entering decimal coordinates the tool returns information about each point, such as the name of the country, department and other political-administrative divisions. If the point falls on the sea the tool calculates the nearest distance to the coast, indicating the name of the country. |
GeoPick | GeoPick is a complementary open source online tool based on the Georeferencing Best Practices (Chapman A.D. and Wieczorek J.R., 2020) that follows its recommendations and practices. Their idea arose from work done at the Museu de Ciències Naturals de Barcelona (MCNB) and the MOBILISE Cost Action. It aims to provide georeferencers with a simple, easy-to-use, yet powerful tool to help them follow georeferencing best practices and data standards (i.e., Darwin Core). The guiding principle of its design is to remain as simple and easy to use as possible. |
Courses
Name | Year | Description |
---|---|---|
Introduction to GBIF | 2021 | This course provides an introduction to GBIF, the data available on the GBIF portal, how to access that data, and information on how to participate in GBIF and its community of practice. |
Biodiversity Data Mobilization Course | 2021 | This course will enable participants to effectively plan and implement biodiversity data mobilization efforts using community accepted standards. It aims to increase the volume, richness and quality of data published through the GBIF network. |
Virtual training cycle SiB Colombia | 2021 | The data labs designed and instructed by EC-SiB will help you to strengthen your skills in data management and publication through SiB Colombia. |
Documents
Data cleaning and structuring
Name | Year | Description |
---|---|---|
Principles of Data Quality | 2005 | Data quality principles and best practices applicable to primary biodiversity data in their taxonomic, temporal and geographic components. |
Guide to cleaning biodiversity data with OpenRefine | 2021 | Guide to openrefine use OpenRefine for validation and cleaning of biodiversity data. |
OpenRefine - Basic Guide | 2020 | Guide to openrefine use OpenRefine for validation and cleaning of biodiversity data. |
OpenRefine - Guide to validation and cleaning of biodiversity data | 2020 | Guide to the use of data quality routines implemented in openrefine open source software environment OpenRefine, and allows to create specific workflows for each dataset (Records, Lists, Events). |
Data Quality - A toolkit for improving primary biodiversity data | 2015 | This document is a compilation of the various tools and practices that attempt to facilitate the process of providing quality primary biodiversity data through different methodologies. |
Validation of geographic information
Name | Year | Description |
---|---|---|
Georeferencing Best Practices | 2020 | The Guide to Good Georeferencing Practices provides guidelines for proper georeferencing. Although it is aimed specifically at biological records, the concepts and methods presented here can be equally useful in other disciplines. |
Georeferencing Quick Reference Guide | 2020 | This document provides guidance on how to georeference using the radius point method. It also provides methods for determining the boundaries of geographic entities, which are the basis of the geometric shape method in georeferencing. |
Georeferencing Calculator Manual | 2020 | The Georeferencing Calculator (Wieczorek & Wieczorek 2020) described in this paper is a tool created to assist in georeferencing descriptive localities. |
Protocol for georeferencing localities | 2016 | This document, developed by SiB Colombia with the support of the Humboldt Institute, defines the methodology for assigning coordinates to primary biodiversity data. |
Good publication practices
Name | Year | Description |
---|---|---|
Current Best Practices for Generalizing Sensitive Species Occurrence Data | 2021 | The objective of this document is to provide good practice (or current good practice) for dealing with sensitive species occurrence data, and to provide guidance on how to make as much data available as possible without exposing the species by the fact that the data have been placed in the public domain. |
Best Practices for Publishing Biodiversity Data from Environmental Impact Assessments | 2021 | This guide aims to help practitioners, consultants and other “interested and affected parties” (I&APs) working with environmental impact assessments to improve the curation, storage and management of primary biodiversity data obtained during environmental impact assessment (EIA) processes and to share the data freely and openly in standardized, accessible and interoperable formats through the Global Biodiversity Information Facility (GBIF). |