Software for collecting and analyzing metrics in highly loaded applications based on the Prometheus monitoring system
DOI:
https://doi.org/10.20535/2786-8729.5.2024.316366Keywords:
metrics, quantile, highly loaded applications, Prometheus monitoring system, PythonAbstract
This paper emphasizes the importance of collecting metrics during application operation for early detection of potential problems. The undisputed leader in this area is the Prometheus monitoring system, which, combined with Grafana – a platform for visualizing collected data in numerous graphs – becomes an indispensable tool for programmers and site reliability engineers. However, the average value of a certain metric is often unrepresentative, because it does not reflect a comprehensive picture. Instead, collecting metrics in terms of various quantiles over a long period is useful to identify even single instabilities. Still, the use of standard tools in the Python ecosystem may require a lot of server resources and long preliminary analysis, which can be quite costly for businesses from a financial point of view. That is why the development of a new approach for collecting and analyzing metrics in highly loaded applications based on the Prometheus monitoring system is relevant.
The research aims to improve the efficiency of storing metrics across different quantiles, which will create additional opportunities for further analysis.
A review of existing approaches for calculating quantile values on large data sets was conducted. Their comparative characteristics in terms of speed and memory usage were also presented. The chosen method was adapted for use with the real-time data stream and implemented as a Python extension for the official Prometheus library. It opens up opportunities for comprehensive monitoring of highly loaded systems in terms of both server resource usage and the quantity and quality of collected useful data. This solution can be easily implemented on large projects requiring continuous tracking of various metrics to ensure stable and uninterrupted service operation.
References
Prometheus. “Overview.” Accessed: Nov. 14, 2024. [Online]. Available: https://prometheus.io/docs/introduction/overview/.
A. Mueen, E. Keogh, Q. Zhu, S. Cash, and B. Westover. “Exact Discovery of Time Series Motifs,” in Proc. of the SIAM International Conference on Data Mining (SDM), Sparks, NV, USA, pp. 473–484, 2009, https://doi.org/10.1137/1.9781611972795.41.
C. Wang et al., “Apache IoTDB: time-series database for internet of things,” in Proc. VLDB Endow, vol. 13, no. 12, pp. 2901–2904, 2020, https://doi.org/10.14778/3415478.3415504.
S. Alhusain. “Predicting Relative Thresholds for Object Oriented Metrics,” Cornell University, p. 9, 2021, https://doi.org/10.48550/arXiv.2103.11442.
Prometheus. “Metric types.” Accessed: Nov. 14, 2024. [Online]. Available: https://prometheus.io/docs/concepts/metric_types/.
Z. Chen and A. Zhang, “A Survey of Approximate Quantile Computation on Large-Scale Data,” in IEEE Access, vol. 8, pp. 34585–34597, 2020, https://doi.org/10.1109/ACCESS.2020.2974919.
L. Chen and A. Dobra. “Histograms as statistical estimators for aggregate queries”, in Information Systems, vol. 38, no. 2, pp. 213–230, 2013, https://doi.org/10.1016/j.is.2012.08.003.
F. Chen, D. Lambert, and J. Pinheiro. “Incremental Quantile Estimation for Massive Tracking, ” in Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, pp. 516–522, 2000, https://doi.org/10.1145/347090.347195.
T. Dunning and O. Ertl. “Computing Extremely Accurate Quantiles Using t-Digests,” Cornell University, p. 22, 2019, https://doi.org/10.48550/arXiv.1902.04023.
T. Dunning. “The t-digest: Efficient estimates of distributions,” in Software Impacts, vol. 7, p. 100049, 2021, https://doi.org/10.1016/j.simpa.2020.100049.
G. Cormode, S. Muthukrishnan, F. Korn, and D. Srivastava. “Effective Computation of Biased Quantiles over Data Streams,” in Proc. of the 21st International Conference on Data Engineering (ICDE’05), Tokyo, Japan, pp. 20–31, 2005, https://doi.org/10.1109/ICDE.2005.55.
M. Greenwald and S. Khanna. “Space-Efficient Online Computation of Quantile Summaries,” in Proc. of the ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, pp. 58–66, 2001, https://doi.org/10.1145/375663.375670.
M. Ajtai, T. S. Jayram, R. Kumar, and D. Sivakumar. “Approximate counting of inversions in a data stream,” in Proc. of the 34th Annual ACM Symposium on Theory of Computing (STOC’02), New York, NY, USA, pp. 370–379, 2002, https://doi.org/10.1145/509907.509964.
X. Lin, H. Lu, J. Xu, and J. X. Yu. “Continuously maintaining quantile summaries of the most recent N elements over a data stream,” in Proc. of the 20th International Conference on Data Engineering (ICDE’04), Boston, MA, USA, pp. 362–373, 2004, https://doi.org/10.1109/ICDE.2004.1320011.
GitHub. “Prometheus-summary.” Accessed: Nov. 23, 2024. [Online]. Available: https://github.com/RefaceAI/prometheus-summary.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Information, Computing and Intelligent systems
This work is licensed under a Creative Commons Attribution 4.0 International License.