Back to Projects 🌾 Agriculture & AI Open Source — Research

Agri-SigLIP

Fine-tuning Vision-Language Models for agricultural understanding — domain-adapting Google's SigLIP to accurately identify crop diseases, achieving a 32% improvement in zero-shot retrieval accuracy.

The Problem

Agriculture relies heavily on visual diagnosis. General-purpose Vision-Language Models like CLIP lack specific "agronomic literacy" — they might identify an image as "a green leaf," but fail to detect specific pathologies such as Cercospora Leaf Spot or distinguish Early Blight from Late Blight.

The Solution

We fine-tune google/siglip-base-patch16-224 on curated agricultural image-text pairs. Unlike standard CLIP which uses softmax loss requiring massive batch sizes, SigLIP uses a pairwise sigmoid loss — enabling memory-efficient training and better performance on smaller, high-quality scientific datasets.

Methodology

Three-step pipeline from data preparation to evaluation.

1

Data Preparation

Aggregated image-text pairs of healthy vs. diseased crops. Images resized to 224×224 and normalized. Dataset split 80/10/10 for train, validation, and test.

2

Fine-Tuning

Frozen vision encoder (lower layers) with trainable attention heads and projection layers. AdamW optimizer, learning rate 5e-6 with cosine decay, mixed precision (BF16/FP16).

3

Evaluation

Evaluated using Recall@K (R@1, R@5, R@10) for image-text retrieval, compared against the zero-shot baseline of the original Google checkpoint.

Key Features

Designed for scientific rigor and real-world agricultural use.

🔬

Domain Adaptation

Injects agronomic knowledge into a general-purpose vision-language model.

🧠

Sigmoid Loss

SigLIP's pairwise loss enables efficient training on smaller scientific datasets.

🛡️

No Catastrophic Forgetting

Low learning rate preserves general visual features while learning domain specifics.

Mixed Precision

BF16/FP16 training for VRAM efficiency on consumer-grade GPUs.

Results

Significant improvement in zero-shot retrieval accuracy after fine-tuning.

Retrieval Accuracy Gain
+32%
Over baseline pre-trained SigLIP model
Base Model
SigLIP
google/siglip-base-patch16-224
Input Resolution
224×224
Standardized image preprocessing

Current Limitations

Known constraints guiding future research directions.

☀️

Lighting Conditions

Better performance on daylight field images; reduced accuracy under low-light or night conditions.

⚖️

Class Imbalance

Rare diseases with fewer than 50 samples show lower retrieval scores compared to common diseases like Corn Rust.

🍃

Leaves Only (for now)

Current implementation works best with leaf crops rather than fruits, due to the training data composition.

Get Started

Agri-SigLIP is open-source under the MIT License. Clone the repository, prepare your dataset, and start training or running inference on crop disease images.

Python 3.8+ PyTorch 2.0+ HuggingFace Transformers SigLIP AdamW Mixed Precision