Scientific article summarization model with unbounded input length

Oleksandr Steblianko; Volodymyr Shymkovych; Peter Kravets; Anatolii Novatskyi; Lyubov Shymkovych

doi:10.20535/2786-8729.5.2024.314724

Authors

Oleksandr Steblianko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine https://orcid.org/0009-0006-5055-0934
Volodymyr Shymkovych National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine https://orcid.org/0000-0003-4014-2786
Peter Kravets National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine https://orcid.org/0000-0003-4632-9832
Anatolii Novatskyi National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine https://orcid.org/0009-0009-7457-7391
Lyubov Shymkovych National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine https://orcid.org/0000-0002-1291-0373

DOI:

https://doi.org/10.20535/2786-8729.5.2024.314724

Keywords:

neural networks, transformers, text summarization, long document summarization, natural language processing, attention

Abstract

In recent years, the exponential growth of scientific literature has made it increasingly difficult for researchers and practitioners to keep up with new discoveries and developments in their fields. Thanks to this, text summarization has become one of the primary tasks of natural language processing. Abstractive summarization of long documents, such as scientific articles, requires large neural networks with high memory and computation requirements. Therefore, it is all the more important to find ways to increase the efficiency of long document summarization models.

The objects of this research are long document summarization transformer models and the Unlimiformer cross-attention modification. The article reviews the basic principles of transformer attention, which constitutes the primary computational expense in transformer models. More efficient self-attention approaches used for long document summarization models are described, such as the global+sliding window attention used by Longformer. The cross-attention mechanism of Unlimiformer, which allows a model to have unbounded input length, is described in detail. The objective of the study is the development and evaluation of a long document summarization model using the Unlimiformer modification. To achieve this goal, a Longformer Decoder-Encoder model pretrained on the arXiv dataset is modified with Unlimiformer cross-attention. This modification can be applied without additional model fine-tuning, avoiding the cost of further training a large sequence length model.

The developed model was evaluated on the arXiv dataset using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The developed model showed improved results compared to the baseline model, demonstrating the viability of using this approach to improve long document summarization models.

Author Biographies

Oleksandr Steblianko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Master's degree student of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique

Volodymyr Shymkovych, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Associated Professor of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique, Candidate of Technical Sciences, Associated Professor

Peter Kravets, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Associated Professor of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique, Candidate of Technical Sciences, Associated Professor

Anatolii Novatskyi, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Associated Professor of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique, Candidate of Technical Sciences, Associated Professor

Lyubov Shymkovych, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Assistant of the Department of Information Systems and Technologies of the Faculty of informatics and Computer Technique

References

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimedia Tools and Applications, 2022, vol. 82, no. 3, pp. 3713–3744. https://doi.org/10.1007/s11042-022-13428-4.

J. Sawicki, M. Ganzha, and M. Paprzycki, “The State of the Art of Natural Language Processing – A Systematic Automated Review of NLP Literature Using NLP Techniques,” Data Intelligence, 2023, vol. 5, no. 3, pp. 707–749. https://doi.org/10.1162/dint_a_00213.

S. Kusal, S. Patil, J. Choudrie, K. Kotecha, D. Vora, and I. Pappas, “A systematic review of applications of natural language processing and future challenges with special emphasis in text-based emotion detection,” Artificial Intelligence Review, 2023, vol. 56, no. 12, pp. 15129–15215. https://doi.org/10.1007/s10462-023-10509-0.

M. Gambhir and V. Gupta, “Recent automatic text summarization techniques: a survey,” Artificial Intelligence Review, 2016, vol. 47, no. 1, pp. 1-66. https://doi.org/10.1007/s10462-016-9475-9.

S. Gupta and S. K. Gupta, “Abstractive summarization: An overview of the state of the art,” Expert Systems with Applications, 2019, vol. 121, pp. 49-65. https://doi.org/10.1016/j.eswa.2018.12.011.

A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” Journal of King Saud University - Computer and Information Sciences, 2022, vol. 34, no. 4, pp. 1029–1046. https://doi.org/10.1016/j.jksuci.2020.05.006.

M. M. Saiyyad and N. N. Patil, “Text Summarization Using Deep Learning Techniques: A Review,” RAiSE-2023. MDPI, 2024, p. 194. https://doi.org/10.3390/engproc2023059194

D. O. Cajueiro et al., “A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding,” 2023, arXiv. https://doi.org/10.48550/ARXIV.2301.03403.

O. Klymenko, D. Braun, and F. Matthes, “Automatic Text Summarization: A State-of-the-Art Review,” Proceedings of the 22nd International Conference on Enterprise Information Systems. SCITEPRESS - Science and Technology Publications, 2020. https://doi.org/10.5220/0009723306480655.

V. Patel and N. Tabrizi, “An Automatic Text Summarization: A Systematic Review,” Computación y Sistemas, vol. 26, no. 3. Instituto Politecnico Nacional/Centro de Investigacion en Computacion, Sep. 05, 2022. https://doi.org/10.13053/cys-26-3-4347.

A. Gidiotis and G. Tsoumakas, “A Divide-and-Conquer Approach to the Summarization of Long Documents,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28. Institute of Electrical and Electronics Engineers (IEEE), pp. 3029–3040, 2020. https://doi.org/10.1109/taslp.2020.3037401.

I. Beltagi, M. E. Peters, and A. Cohan, "Longformer: The Long-Document Transformer," 2020, arXiv. doi: 10.48550/arXiv.2004.05150.

M. Zaheer et al., “Big Bird: Transformers for Longer Sequences,”2020, arXiv. https://doi.org/10.48550/arXiv.2007.14062.

A. Bertsch, U. Alon, G. Neubig, and M. R. Gormley, “Unlimiformer: Long-Range Transformers with Unlimited Length Input,” 2023, arXiv. https://doi.org/10.48550/arXiv.2305.01625.

A. Cohan et al., “A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents,” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 2018. https://doi.org/10.18653/v1/n18-2097.

M. Guo et al., “LongT5: Efficient Text-To-Text Transformer for Long Sequences,” Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, pp. 724–736, 2022. https://doi.org/10.18653/v1/2022.findings-naacl.55.

S. Sotudeh, A. Cohan, andN. Goharian,“On Generating Extended Summaries of Long Documents,” 2020. arXiv. https://doi.org/10.48550/arXiv.2012.14136.

M. Yasunaga et al., “ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01. Association for the Advancement of Artificial Intelligence (AAAI), pp. 7386–7393, Jul. 17, 2019. https://doi.org/10.1609/aaai.v33i01.33017386.

C. An, M. Zhong, Y. Chen, D. Wang, X. Qiu, and X. Huang, “Enhancing Scientific Papers Summarization with Citation Graph,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14. Association for the Advancement of Artificial Intelligence (AAAI), pp. 12498–12506, May 18, 2021. https://doi.org/10.1609/aaai.v35i14.17482.

A. Vaswani et al., “Attention Is All You Need,” 2017, arXiv. https://doi.org/10.48550/arXiv.1706.03762.

Huggingface, “allenai/led-large-16384-arxiv.” Accessed: Nov. 1, 2024. [Online]. Available: https://huggingface.co/allenai/led-large-16384-arxiv.

Scientific article summarization model with unbounded input length

Authors

DOI:

Keywords:

Abstract

Author Biographies

Oleksandr Steblianko, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Volodymyr Shymkovych, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Peter Kravets, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Anatolii Novatskyi, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

Lyubov Shymkovych, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Developed By