The ocean covers more than 70% of Earth’s surface, yet much of it remains unexplored. From marine biodiversity and underwater archaeology to offshore energy exploration and national security, understanding underwater environments has become increasingly important. However, the underwater world presents unique challenges for artificial intelligence systems due to low visibility, light absorption, scattering, turbidity, and rapidly changing environmental conditions.
To address these challenges, researchers from Huazhong University of Science and Technology and National University of Defense Technology introduced Nautilus, a groundbreaking Large Multimodal Model (LMM) specifically designed for underwater scene understanding.
Nautilus represents a major advancement in marine AI by combining multimodal learning, underwater imaging physics, and large-scale instruction-following datasets into a unified framework capable of understanding underwater environments at image, region, and object levels simultaneously.
Why Underwater Scene Understanding Matters
The underwater world is one of the most complex and visually challenging environments for computer vision systems. Unlike terrestrial images, underwater imagery suffers from:
- Severe color distortion
- Reduced visibility
- Low contrast
- Light absorption
- Backscattering effects
- Dynamic lighting conditions
- Turbidity and suspended particles
These issues significantly reduce the performance of traditional AI models trained on normal “in-air” images.
Yet underwater scene understanding is crucial for applications such as:
- Autonomous underwater vehicles (AUVs)
- Marine biodiversity monitoring
- Coral reef protection
- Underwater infrastructure inspection
- Defense and surveillance
- Deep-sea exploration
- Fisheries management
- Environmental conservation
Traditional underwater AI systems were typically designed for only one task at a time, such as object detection or image classification. Nautilus changes this paradigm entirely.
What Is Nautilus?
Nautilus is the first comprehensive underwater Large Multimodal Model capable of performing eight different underwater scene understanding tasks within a single unified framework.
The model combines:
- Visual understanding
- Language reasoning
- Physical underwater imaging priors
- Feature restoration mechanisms
- Multi-granular perception
Instead of handling only object recognition or captioning, Nautilus understands underwater scenes holistically.
Its supported tasks include:
- Coarse-grained classification
- Fine-grained classification
- Object detection
- Grounding
- Visual Question Answering (VQA)
- Counting
- Region captioning
- Image captioning
This multi-task capability enables much deeper and more intelligent underwater perception than previous systems.
The Challenge of Underwater AI
Most existing multimodal AI systems such as:
- LLaVA-1.5
- Qwen2.5-VL
- InternVL
were trained primarily on terrestrial images and internet-scale datasets. As a result, they struggle underwater because underwater imagery differs dramatically from standard photographs.
The two major challenges are:
1. Domain Shift
Underwater scenes look fundamentally different from everyday visual data.
For example:
- Fish appear distorted due to lighting
- Coral colors fade with depth
- Visibility changes rapidly
- Water particles introduce visual noise
General-purpose models cannot easily adapt to these conditions.
2. Underwater Image Degradation
Underwater images degrade because water absorbs and scatters light.
Red wavelengths disappear first, causing scenes to appear blue or green. Suspended particles create haze-like effects, while depth reduces brightness and clarity.
These degradations make underwater perception extremely difficult for standard AI systems.
Introducing NautData: A Massive Underwater Dataset
One of the biggest breakthroughs behind Nautilus is the creation of NautData, a massive underwater instruction-following dataset.
NautData contains:
- 1.45 million image-text pairs
- 158,000 underwater images
- Eight different task annotations
- Multi-granular understanding data
This makes it one of the most comprehensive underwater vision-language datasets ever developed.
What Makes NautData Unique?
Unlike previous underwater datasets that focused on only one or two tasks, NautData supports:
| Task | Supported |
|---|---|
| Classification | Yes |
| Detection | Yes |
| Captioning | Yes |
| Grounding | Yes |
| Counting | Yes |
| VQA | Yes |
The dataset operates across:
- Image-level understanding
- Region-level understanding
- Object-level understanding
This hierarchical structure allows Nautilus to reason about underwater scenes in much greater detail.
Eight Core Underwater Tasks
1. Coarse-Grained Classification
The model identifies broad underwater categories such as:
- Fish
- Coral reefs
- Turtles
- Sharks
- Sea plants
2. Fine-Grained Classification
Nautilus can distinguish detailed taxonomic categories and species-level differences.
For example:
- Different fish species
- Coral subtypes
- Marine invertebrates
This is especially valuable for marine biology research.
3. Object Detection
The model detects underwater objects and provides precise bounding boxes.
It can localize:
- Fish schools
- Marine organisms
- Underwater debris
- Coral structures
4. Grounding
Grounding links textual descriptions to specific image regions.
Example:
“Locate the yellow fish near the coral reef.”
The model identifies the exact object corresponding to the text.
5. Visual Question Answering (VQA)
Users can ask natural-language questions about underwater scenes.
Examples include:
- “How many fish are visible?”
- “What type of coral is shown?”
- “Is the water turbid or clear?”
6. Counting
Nautilus estimates object quantities in dense underwater scenes.
This is particularly useful for:
- Fisheries monitoring
- Population estimation
- Ecosystem analysis
7. Region Captioning
The system describes specific areas within underwater images.
Instead of describing the entire scene, it focuses on local regions and behaviors.
8. Image Captioning
Nautilus generates complete natural-language descriptions of underwater scenes.
This includes:
- Environmental conditions
- Marine species
- Water clarity
- Lighting
- Spatial relationships
The Vision Feature Enhancement (VFE) Module
A key innovation of Nautilus is the Vision Feature Enhancement (VFE) module.
Rather than simply enhancing underwater images before processing, Nautilus enhances visual representations directly within the feature space.
This is a major advancement because traditional image enhancement methods often introduce artifacts or remove important ecological details.
Understanding Underwater Imaging Physics
The Nautilus architecture is inspired by real underwater imaging physics.
Underwater images can be modeled as:
Ic=Dc+BcI_c = D_c + B_c
Where:
- IcI_c is the observed underwater image
- DcD_c is the direct reflected signal
- BcB_c is backscattering noise
Backscattering occurs when light reflects off suspended particles in water, reducing image quality.
Removing Backscattering
Nautilus identifies “dark pixels” in underwater images because dark regions often reveal scattering intensity.
The model then removes these unwanted scattering responses from feature representations.
This process significantly improves:
- Object detection
- Scene grounding
- Classification accuracy
especially under:
- Low-light conditions
- Turbid water
- Green-tinted scenes
Restoring Light Absorption
Water absorbs light exponentially with depth.
The restoration process follows the underwater imaging equation:
Jc=(Ic−Bc)e−βc(z)⋅zJ_c = (I_c – B_c)e^{-\beta_c(z)\cdot z}
Here:
- JcJ_c represents restored visual information
- βc(z)\beta_c(z) represents depth-dependent attenuation
- zz is imaging depth
Nautilus uses depth estimation to compensate for lost scene information.
Depth-Aware Feature Restoration
The model integrates:
- Vision encoder
- Depth encoder
- Multimodal projector
- Vision Feature Enhancement module
- Large Language Model
By incorporating depth information, Nautilus understands how image degradation changes with underwater distance.
This depth-aware reasoning enables more accurate underwater perception.
Why Feature Enhancement Is Better Than Image Enhancement
Traditional underwater image enhancement methods attempt to restore images before feeding them into AI models.
However, this often causes:
- Loss of ecological details
- Over-smoothing
- Artificial color distortion
- Reduced semantic accuracy
Nautilus instead enhances features internally.
Experiments showed that feature-space enhancement consistently outperformed standard image enhancement approaches.
Experimental Results
The researchers evaluated Nautilus on multiple underwater benchmarks and compared it with state-of-the-art multimodal models.
The results demonstrated major improvements in:
- Fine-grained classification
- Grounding
- Detection
- Captioning
- VQA
- Counting
Superior Underwater Performance
Compared with powerful models like:
- GPT-4o
- Gemini 2.0 Flash
- Qwen2.5-VL
Nautilus achieved superior underwater scene understanding because it was specifically designed for marine environments.
Strong Performance Under Difficult Conditions
One of the most impressive findings was Nautilus’ robustness under degraded underwater conditions.
It performed exceptionally well in:
- Low-light environments
- Green-tinted water
- Blue-tinted scenes
- Turbid conditions
- High-scattering environments
This robustness is essential for real-world underwater deployment.
Generalization Across Datasets
Nautilus also demonstrated strong zero-shot generalization on external underwater datasets.
This means the model can adapt to new underwater environments without extensive retraining.
Such flexibility is vital for:
- Oceanographic missions
- Autonomous robotics
- Marine research operations
Potential Applications of Nautilus
Marine Biology
Researchers can automate:
- Species identification
- Population counting
- Habitat analysis
- Behavioral studies
Underwater Robotics
Autonomous underwater vehicles can use Nautilus for:
- Navigation
- Scene understanding
- Obstacle detection
- Mission planning
Coral Reef Monitoring
The model can help detect:
- Coral bleaching
- Reef degradation
- Biodiversity changes
Offshore Infrastructure Inspection
Nautilus can assist in inspecting:
- Oil pipelines
- Underwater cables
- Ship hulls
- Offshore platforms
Defense and Security
Applications include:
- Harbor monitoring
- Subsea surveillance
- Threat detection
- Autonomous reconnaissance
Why Nautilus Is Important for AI Research
Nautilus represents a major milestone because it demonstrates how domain-specific multimodal models can outperform general-purpose AI systems in specialized environments.
Its contributions include:
- Large-scale underwater datasets
- Physics-guided multimodal learning
- Feature-space enhancement
- Multi-task underwater reasoning
- Robust degraded-environment perception
This research may inspire future AI systems for:
- Space exploration
- Medical imaging
- Remote sensing
- Industrial robotics
where domain-specific degradation also exists.
The Future of Underwater Multimodal AI
The development of Nautilus signals the beginning of a new era in underwater AI.
Future improvements may include:
- Real-time underwater dialogue systems
- Autonomous marine exploration assistants
- 3D underwater scene reconstruction
- Swarm robotics coordination
- Underwater digital twins
- Long-term ecological monitoring
As underwater datasets continue growing and multimodal architectures become more advanced, AI-powered ocean exploration may soon become routine.
Final Thoughts
Nautilus is far more than another multimodal model. It is a specialized underwater intelligence system designed to tackle one of the most visually challenging environments on Earth.
By combining:
- Massive underwater datasets
- Physical imaging priors
- Depth-aware enhancement
- Multi-task reasoning
- Large multimodal architectures
the researchers created a powerful framework capable of understanding underwater scenes at unprecedented depth and accuracy.
Its strong performance across classification, detection, grounding, captioning, counting, and VQA tasks establishes Nautilus as a major breakthrough in underwater scene understanding.
As marine exploration becomes increasingly important for science, environmental protection, and global industries, systems like Nautilus could play a transformative role in helping humanity better understand the hidden world beneath the oceans.
