Evaluation of the effectiveness of two approaches to building damage detection with satellite imagery

Authors

DOI:

https://doi.org/10.20535/2786-8729.7.2025.341475

Keywords:

satellite image analysis, damage detection, semantic segmentation, U-Net, large vision-language model

Abstract

This study addresses the approaches for satellite image analysis to assess infrastructure damage. The main aim is to conduct a comprehensive comparative analysis of the effectiveness of two key machine learning approaches: specialized semantic segmentation based on the U-Net architecture and generalized visual analysis using large vision-language models. The object of the research is the process of quantitatively benchmarking these two distinct approaches to determine their practical applicability for multi-class damage classification.

The research material is the publicly available xView2 dataset. The methods involved two parallel experiments. For the semantic segmentation approach, a U-Net model with an EfficientNet-B4 encoder was implemented and trained on 6-channel input data ("before" and "after" images) using a combined Dice and Focal loss function. For the vision-language models approach, the open-source LLaVA-1.5-7B model was evaluated in a zero-shot mode using advanced prompt engineering for an aggregative counting task. To enable a direct comparison, the standard Jaccard index was calculated based on the aggregated object counts for each damage class.

The results of the experiments revealed a significant performance disparity. The specialized U-Net model demonstrated high effectiveness, achieving an intersection over union score of 0.6141 on the test set. In contrast, the LLaVA model proved unsuitable for accurate quantitative analysis, yielding an extremely low Jaccard index of approximately 0.063, primarily due to its systemic failure to correctly identify and count objects (Recall ≈ 0.07). The scientific novelty lies in being the first study to quantitatively document this order-of-magnitude capability gap, confirming that for tasks requiring high-precision mapping, specialized segmentation models remain the indispensable tool.

Author Biographies

Oleksii Rumiantsev, National Technical University Of Ukraine“Igor Sikorsky Kyiv Polytechnic Institute”

PhD student of Department of Computer Science and Software Engineering of the Faculty of informatics and Computer Technique

Yurii Oliinyk, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Associated Professor of Department of Computer Science and Software Engineering of the Faculty of informatics and Computer Technique, Candidate of Technical Sciences

References

S. Voigt et al., “Global trends in satellite-based emergency mapping,” Science, vol. 353, no. 6296, pp. 247–252, Jul. 2016, https://doi.org/10.1126/science.aad8728.

O. Ronneberger, P. Fischer, and T. Brox, “U-NET: Convolutional Networks for Biomedical Image Segmentation,” arXiv.org, May 18, 2015, https://doi.org/10.48550/arXiv.1505.04597.

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNET++: a nested U-Net architecture for medical image segmentation,” Lecture Notes in Computer Science, vol. 11045, pp. 3–11, Jan. 2018, https://doi.org/10.1007/978-3-030-00889-5_1.

R. Gupta et al., “xBD: A Dataset for Assessing Building Damage from Satellite Imagery,” arXiv (Cornell University), Feb. 2022, https://doi.org/10.48550/arxiv.1911.09296.

N. Kaur, C. Lee, A. Mostafavi, and A. Mahdavi‐Amiri, “Large‐scale building damage assessment using a novel hierarchical transformer architecture on satellite images,” Computer-Aided Civil and Infrastructure Engineering, vol. 38, no. 15, pp. 2072–2091, Feb. 2023, https://doi.org/10.1111/mice.12981.

O. Rumiantsev, Y. Oliinyk, and L. Oliinyk, “Damage detection based on satellite image analysis,” in Lecture notes on data engineering and communications technologies, 2025, pp. 177–189. https://doi.org/10.1007/978-3-031-88483-2_9.

H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” arXiv (Cornell University), Apr. 2023, https://doi.org/10.48550/arxiv.2304.08485.

G. Team et al., “Gemini: a family of highly capable multimodal models,” arXiv (Cornell University), Dec. 2023, https://doi.org/10.48550/arxiv.2312.11805.

Z. Zhang et al., “GeoRSMLLM: a multimodal large language model for Vision-Language tasks in geoscience and remote sensing,” arXiv.org, Mar. 16, 2025. https://arxiv.org/abs/2503.12490v1

Z. Xiao and J. Ma, “LLM agent framework for intelligent change analysis in urban environment using remote sensing imagery,” Automation in Construction, vol. 177, p. 106341, Jun. 2025, https://doi.org/10.1016/j.autcon.2025.106341.

M. Tan and Q. V. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” arXiv (Cornell University), Feb. 2022, https://doi.org/10.48550/arxiv.1905.11946.

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Xplore, pp. 2999–3007, Oct. 2017, https://doi.org/10.1109/iccv.2017.324.

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv (Cornell University), Mar. 2022, https://doi.org/10.48550/arxiv.1711.05101.

T. B. Brown et al., “Language Models are Few-Shot Learners,” arXiv (Cornell University), vol. 33, pp. 1877–1901, Feb. 2022, https://doi.org/10.48550/arxiv.2005.14165.

J. Wei et al., “Chain-of-Thought prompting elicits reasoning in large language models,” arXiv.org, Jan. 28, 2022, https://doi.org/10.48550/arXiv.2201.11903.

Downloads

Published

2025-12-27

How to Cite

[1]
O. Rumiantsev and Y. Oliinyk, “Evaluation of the effectiveness of two approaches to building damage detection with satellite imagery”, Inf. Comput. and Intell. syst. j., no. 7, pp. 61–71, Dec. 2025.