Logging & Monitoring on multiple servers

Question

I lead a small project with 2 groups of servers (servers in a single group are clones with the same microservices, like pods in the k8s), a balance loader, and a DB. Docker & k8s aren't used because of small scale of the project (but there are plans for migration). Instead, our cloud provider allows to set dynamic servers creation/removal depending on the load using filesystem image. This is fine for now, but there is a lack of logging and monitoring, I tried to structure the requirements:

A dashboard with server parameters such as CPU, RAM, disk space, network, etc usage for each server. Some dashboards with such technical information are available by our provider, but a provider-independent solution is desired.
A dashboard with numbers of responses by HTTP status code, for each server group in total and each server separately. We use NginX and it provides a status page, but it works on separate servers only.
A logs analysing & digging window, with support of multiline messages (like exception tracebacks).
Programming language independence. Our server apps write logs to several different files with a date in filename, probably this will have to be somehow changed.
Ability to set email (or even launching a shell script?) notifications with custom triggers.
Maybe there are some other important features?

Additional wishes:

Logging/monitoring setup should be free to use.
Simplicity is preferred. I hope a single solution can be flexible enough to cover all the needs of logging & monitoring.
Solution maturity / popularity and availability of docs,examples is important.
The solution should persist when migration to containers and orchestration will come.

I imagine the final result to be 1) a server that accumulates all the logs, provides dashboards, and logs digging panel, and 2) a script that is installed on each server, is run by SystemD timer, and periodically sends local logs to the server. But how the script will know which exact log files are presented on the current server install, using some config? Though I can be wrong about this algorithm.
I've heard about LogStash, ElasticSearch, Grafana, Kibana, Zabbix, Loki, Prometheus, etc, tried to google everything out, but found a total mess of recommendations. I'd to see modern explanations of different technologies, stacks, what software can be used together or is analog of another, and a comparison by the mentioned needs.
I also hope that the answers will be very helpful to many other developers and beginning project leads for the topic is highly important.

AivanF. · Answer

Well, I spent hours reading lots of articles, documentation, and finally figured everything out. See the comparative table:

Brief description
Zabbix is a full-fledged technology that can deal with both metrics and logs processing, yet can be extended. It also supports automatic hosts detection & network maps building.
Prometheus is a metrics monitoring system. It is Kubernetes native, thus it can automatically detect hosts for data extracting. It also has myriads of available adapters: docs: Exporters And Integrations. To visualize the data, Grafana is used. Speaking about data storage, Prometheus has its own local system for that, but there are integrations with other technologies for sharding, see docs: Storage.
PLG is a stack for log aggregation. Logs get collected with PromTail, Loki is a core processing engine, and Grafana is used as an interface. Loki has no own storage system, so some other solution is required. It stores data as chunks and indexes, which are usually stored separately docs: Storage. As I understood, PLG has no trigger/notification system, but the data extracting part also can parse logs into metrics and export them into Prometheus, so we get alerts. Also, I met a notice that PromTail doesn't support multiline logs, but FluentD can be used instead as it is more flexible.
ELK / EFK is a technology stack for logs parsing, indexing, searching, so it can perform complex queries. For data extracting, LogStash was originally used, but FluentD is becoming popular.

Logging & Monitoring on multiple servers

One Answer

Brief description

Add your own answers!

Ask a Question