NerfFormer

by Aakash Kumar, Amanjyoti Mridha, Sooraj Vydyanathan March 21, 2023

What is NerfFormer?

In the last five years, over six million acres of forest land in California were burned, a trend that shows no signs of ceasing in the future. Though there has been work done in the field of fire prediction via machine learning, there is no comprehensive method to survey across large swaths of forest land nor predict based on vast amounts of unstructured data in order to truly combat this issue. We introduce a pipeline, NerfFormer: Multi-Modal Perception with Iterative Attention, that aims to change this. We utilize drone footage of the surveyed area, applying pose estimation algorithms to find the camera’s transformation matrices which are then used to train a neural radiance field (NERF). The NERF learns a 3D representation of the scene, which is queried using a perceiver-based visual language transformer model. The NERF data and text prompt provided by the user are converted into a byte array representation, and cross-attended with a learned latent array parameter internal to the model, allowing multi-modal input at a fraction of the memory. Through NerfFormer, the user can query pertinent areas of forest at high risk of fires, and track a probable path for a hypothetical forest fire in a scene based on humidity, foliage, and other atmospheric conditions. NerfFormer effectively synthesizes information to aid first responders for disaster management and combatting the effects of climate change. Firefighters will be able to locate high-risk areas and predict the paths of fires without human intervention, allowing them to guide their operations accordingly.

Why?

A key cause for the prevalence of forest fires is that monitoring vast forests requires substantial time and effort, making regular monitoring inefficient. As a result, the probability for wildfires occurring slowly increases as unnoticed damage accumulates over time. In recent years, forest fires have caused immeasurable harm. Preventing these disasters is a vital step to mitigating and adapting to climate change.

What are the current methods?

Current forest fire mitigation strategies include the U.S. Forest Service which uses weather and historical fire data to forecast fire risk, as well as NASA's satellite imagery to forecast the risk of fires. However, these satellites are at high altitudes and, thus, are insufficient to make specific predictions. In contrast, our approach seeks to combine drone footage and neural radiance fields using visual language transformers to provide more specific and reliable data.

What is the NerfFormer Methodology?

Overview

In NerfFormer, there are four major components. In particular, we begin by collecting drone footage and using a pose estimation algorithm to approximate a three-dimensional position for each image. We then use this data to construct a NeRF model of our environment. This model is then used as a knowledge base for our NerfFormer model, which is a visual language model. A positional information query is then passed into the NeRF model, and a heatmap is generated based on the NeRF model outputs.

NerfFormer architecture

More specifically, the NerfFormer model is a transformer model that applies a series of cross-attention and self-attention blocks to embed the visual information from the NeRF model into latent space.

More Projects

NerfFormer

What is NerfFormer?

Why?

What are the current methods?

What is the NerfFormer Methodology?

More Projects

Adversarial Self-Driving

CliMap

The Life Cycle