Prometheus High Memory and CPU Usage in PMM

Question

We are running PMM v1.17.0 and prometheus is causing huge cpu and mem usage (200% CPU and 100% RAM), and pmm went down because of this. We are running PMM on a VM with 2vCPUs and 7.5G RAM, and are monitoring about 25 servers. PMM is running with below command >>

docker run -d -it --volumes-from pmm-data --name pmm-server  -e QUERIES_RETENTION=1095   -p 80:80   -e METRICS_RESOLUTION=3s  --restart always percona/pmm-server:1

The prometheus.log is filled with below entries:

level=warn ts=2020-01-30T10:27:12.8156514Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:26.464361371Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:27.81316996Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:27.813257165Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:41.462420708Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:42.813356387Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:42.813441108Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:56.463798729Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:57.82083775Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:57.820912309Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"

Can someone please let me know why prometheus is causing issue? Any parameters we need to add/change?

Gordan Bobić · Answer

How many server are you monitoring? PMM server of that spec can handle maybe 4-8 monitored servers if they are not too busy. Nearer 4 if they are busy and send a lot of queries to PMM for QAN. It also depends on your data retention, if you increase retention from defaults you will need to add more RAM and CPU to the host.

Answered by Gordan Bobić on December 15, 2021

Rick James · Answer

100% RAM -- You are probably swapping, which is terrible for performance.  Lower innodb_buffer_pool_size a little to avoid swapping.

200% CPU -- Poor indexes and/or poor formulation of queries.  Please provide some of the queries and SHOW CREATE TABLE; there may be a quick fix.

"out of order" and "different value" -- Either a bug with the collection mechanism or a bug in Percona.

Prometheus High Memory and CPU Usage in PMM

2 Answers

Add your own answers!

Ask a Question