5 large language models that works to achieve nature positive goal
Large language models have been a power tool for organizations and social activists to achieve the global nature positive goal. Here are 5 models that have been adopted with huge potentials.
1. Using AI to depict global tree canopies
Recently, Meta and the World Resources Institute (WRI) released a global tree canopy height map with a resolution of 1 meter, capable of detecting the status of every tree worldwide. This artificial intelligence model and all tree canopy height data are free and publicly available.
The significance of nature-based carbon removal is immense for achieving the goals of the Paris Agreement. To manage the scale of carbon reduction required for climate change mitigation goals, it is essential to strengthen the monitoring and verification of forest carbon credits globally, particularly by enhancing the spatial resolution of forest structural data. The dataset jointly released by META and WRI has created a global baseline for canopy height, aiding in the accounting of global forest resources with a high level of detail. This dataset analyzes available satellite images from 2009 to 2020 and implements a model called DiNOv2 for global robustness and rapid inference. The algorithm is trained on 18 million satellite images from around the world, covering over one trillion pixels. The research and development team utilized powerful self-supervised learning (SSL) methods to obtain a high-resolution foundational model. For more details about this model, you can click the link below.
https://github.com/facebookresearch/HighResCanopyHeight/blob/main/README.md
2. Pytorch-Wildlife: A Deep Learning Platform Revolutionizing Wildlife Conservation
The rapid decline of global biodiversity highlights the urgent need for large-scale wildlife monitoring. In response to this challenge, Pytorch-Wildlife has emerged. This is an open-source deep learning platform built on PyTorch, designed to create, modify, and share powerful AI models suitable for various application scenarios, including processing camera trap images, aerial images, underwater images, and bioacoustic data. In projects in the Amazon rainforest and the Galapagos Islands, Pytorch-Wildlife was used to train animal classification models, achieving recognition accuracies of 36 species and 92% for Amazon species, respectively. These results demonstrate the platform's efficient performance and broad applicability in different environments. For more details about this model, please visit the following link.
https://github.com/microsoft/CameraTraps/blob/main/README.md
Figure: Pytorch Wildlife model
3. BioCLIP: A Large-Scale Multimodal Model for Biological Image Analysis
Images from nature are an important source of biological information; however, existing computational methods are mostly custom models that are difficult to adapt to new problems and datasets. To address this challenge, the first large-scale multimodal model, BioCLIP, has emerged, focusing on general issues in biological imaging. BioCLIP is based on OpenAI's CLIP model and has been trained on a novel 10M biological image dataset, TreeOfLife-10M, featuring fine-grained classification labels. BioCLIP outperforms traditional model baselines across a wide range of biology-related tasks, including zero-shot and few-shot classification. By training with standard CLIP objectives, BioCLIP not only identifies different species but also understands the hierarchical relationships among species in the tree of life, thereby assisting biologists in discovering new and relevant organisms. For more details about this model, please visit the following link.
https://github.com/Imageomics/bioclip/blob/main/README.md
4. DeepAqua: A Deep Learning Model for Water Detection
Deep learning and remote sensing technologies have significantly improved the monitoring of water surfaces; however, the demand for annotated data remains a challenge. This is particularly problematic in wetland detection, as the extent of wetlands varies over time and space, necessitating multiple annotations for the same area. To address this challenge, DeepAqua was developed. DeepAqua is a deep learning model inspired by knowledge distillation that automatically generates labeled data, eliminating the need for manual annotation during the training phase. Experimental results indicate that DeepAqua's accuracy improved by 3%, outperforming other unsupervised methods. This method provides a practical solution for monitoring changes in wetland water extent without the need for ground truth data, making it highly adaptable and scalable for wetland monitoring. For more details about this model, please visit the following link.
https://github.com/melqkiades/deep-wetlands?tab=readme-ov-file
5. Sen2Fire: A Satellite Remote Sensing Dataset Enhancing Wildfire Detection Efficiency
Wildfires pose a significant threat to the ecological environment and socio-economic stability as a global natural disaster. It has been reported that the wildfires in California in 2018 alone caused an economic loss of $148.5 billion, accounting for 0.7% of the country's GDP. However, existing remote sensing wildfire detection methods face challenges such as a lack of benchmark datasets, differences in band sensitivity, and issues with model transferability. To address these issues, the Sen2Fire dataset has emerged, which is a challenging satellite remote sensing dataset tailored for wildfire detection. It integrates Sentinel-2 multispectral data and Sentinel-5P aerosol products, containing 2466 image patches, each sized 512×512 pixels, covering 13 bands. The introduction of the Sen2Fire dataset provides strong technical support for global wildfire monitoring. For more details about this model, please visit the following link.
https://zenodo.org/records/10881058