Published December 17, 2021 | Version v1
Dataset Open

DeepOrchidSeries: A Sentinel-2 Dataset to inform convolutional SDMs with twelve-month Sentinel-2 image time-series, Orchid family

  • 1. Inria, Montpellier, France; LIRMM, Univ Montpellier, CNRS, Montpellier, France
  • 2. LIRMM, Univ Montpellier, CNRS, Montpellier, France; AMIS, Université Paul Valéry Montpellier, CNRS, Montpellier, France
  • 3. AMAP, Univ Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, France; CIRAD, UMR AMAP, Montpellier, France
  • 4. LIPHY, Université Grenoble Alpes, Grenoble, France

Description

Deep Species Distribution Modelling from Sentinel-2 Image Time-series: a Global Scale Analysis on the Orchid Family 

  • DeepOrchidSeries dataset gathers Sentinel-2 image time-series around geolocated orchid occurrences. Seasonal evolutions of the habitats are captured in the twelve-month RGB/IR time-series with 640x640m spatial resolution. It allows novel Species Distribution Models (SDMs) coupled with convolutional networks to take advantage of both spatial and temporal information.
  • Our associated article is describing the modeling choices made to shape this ambitious dataset. It is submitted to https://www.frontiersin.org/research-topics/18336/plant-biodiversity-science-in-the-era-of-artificial-intelligence. We believe such global data, methods and scripts are valuable to the conservation ecology community and especially deep-SDMs users. To our knowledge, no similar ready-to-use dataset is available. In the article, the dataset's temporal dimension is proven to significantly improve SDMs performances.
  • sen2patch is the gitlab project gathering the code to create such dataset. It is available at https://gitlab.inria.fr/jestopin/sen2patch.
  • DeepOrchidSeries.csv contains all occurrences-level information.
    • We advice to load it with:
      import pandas as pd
      df = pd.read_csv("path/to/DeepOrchidSeries.csv", sep=';')
      
      df.columns
      ['gbifid', 'canonical_name', 'decimallatitude', 'decimallongitude', 'speciesKey', 'cell_index', 'bot_country', 'bot_code', 'lvl2_code', 'continent_code']
      • 'gbifid' is the occurrences GBIF ID
      • 'canonical_name', is the species canonical name
      • 'decimallatitude', 'decimallongitude' are the species coordinates in decimal degrees
      • 'speciesKey' is the species GBIF unique identifier
      • 'cell_index' is a unique cell ID in a 0.0025° lon/lat grid partitioning the Earth (used to stratify train/val/test set by geographic blocks)
      • 'bot_country', 'bot_code', 'lvl2_code', 'continent_code' are geographic subdivisions defined in https://github.com/tdwg/wgsrpd (code and string for WGSRPD level 1, the botanical countries)
  • Initial GBIF query DOI is https://doi.org/10.15468/dl.4bijtu (26 August 2019).

  • DeepOrchidSeries.tar file contains the satellite image time-series and is available at https://lab.plantnet.org/deeporchidseries/
    • .tar archive measure 286 GB and extends to 432 GB once decompressed.
    • Image time-series relative tree paths are constructed from the occurrences unique GBIF IDs.
    • For a given occurence gbifid, matching patches are located in: final_dataset_by_gbifid/gbifid[-2:]/gbifid[-4:-2], i.e. in a first folder named with the gbifid last two numbers and a subfolder with the previous two ones. Example: the time-series files matching occurrence 2236837714 are located at final_dataset_by_gbifid/14/77/
    • Image time-series are composed of twelve 16 bits RGB .png  and twelve 16 bits IR .png files containing data identical to the original L1C products, no lossy compression was made. There are one RGB and one IR .png file per month.
    • Patches from month MM/YYYY of occurrence gbifid are named RGB_YYYY_MM_gbifid_.png and IR0_YYYY_MM_gbifid_.png.
  • models.zip is the archive containing the four PyTorch models weights described in our article and inception_env.py the used Inception V3 architecture. index.json contains the dictionnary linking the models class indexes from 0 to 14128 with our labels speciesKey: {"class_index":speciesKey}.

 

  • ACKNOWLEDGMENTS: We warmly thank Alexander Zizka et al. for providing us the geographically and taxonomically curated set of Orchids occurrences. This dataset contains modified Copernicus Sentinel data and Copernicus Service information (2018). Sentinel-2 MSI data used were available at no cost from ESA Sentinels Scientific Data Hub.

Files

DeepOrchidSeries.csv

Files (2.0 GB)

Name Size Download all
md5:b53bfb49a646bbcacc5b44558513d55a
82.3 MB Preview Download
md5:5a2e00bdd7b2e7af7c2e98fc2d2394e9
12.0 kB Download
md5:c44bd093166ac332787a51e9b696c11b
243.2 kB Preview Download
md5:3e626aef8ba5e73fac53c3a4d2a39439
1.9 GB Preview Download
md5:26b357332e4abf2d4c41b45fd22e5e0e
3.9 kB Download

Additional details

Related works

Cites
Journal article: 10.1111/cobi.13616 (DOI)

References

  • Zizka, Alexander, et al. "Automated conservation assessment of the orchid family with deep learning." Conservation Biology 35.3 (2021): 897-908.