Bluedesk Consulting

Building a Highly Available Monitoring Infrastructure with Zabbix 7.2

At Bluedesk Consulting, we believe that monitoring is a mission-critical layer in modern IT environments. Ensuring it is always available, scalable, and resilient is not optional—it's essential. Below is a reference architecture we consider a best practice for implementing a high availability (HA) monitoring infrastructure using Zabbix 7.2.

What is Zabbix?

Zabbix is an open-source enterprise-level monitoring platform designed to monitor and track the status of various network services, servers, virtual machines, and other IT resources. It supports multiple monitoring methods such as SNMP, IPMI, JMX, and agent-based or agentless checks, making it a versatile solution for organizations of any size.

Key Components of the Infrastructure

This HA design ensures no single point of failure by introducing redundancy at each critical layer:

1. HAProxy + Keepalived with VIP

At the entry point, we use two HAProxy instances configured for load balancing and high availability. These are managed with Keepalived, which ensures that a Virtual IP (VIP) is always active—even if one HAProxy node fails. Web browsers and clients access Zabbix through this shared VIP.

Purpose: High availability and load balancing for incoming HTTPS connections to the Zabbix frontends.

2. Zabbix Frontends

Behind HAProxy, two Zabbix frontend servers provide a web interface for users. These are stateless and can scale horizontally, allowing for better performance and zero-downtime upgrades.

Purpose: Web UI layer—can be accessed through the VIP.

3. Zabbix Servers (A & B)

These are the core of the monitoring platform. Two Zabbix server nodes are configured in a failover/active-passive mode or with HA scripts to synchronize state. They process data from monitored hosts and proxies.

Purpose: Central monitoring engine—collects, processes, and stores metrics and events.

4. ProxySQL with Keepalived (VIP)

To ensure fault-tolerant database access, we use two ProxySQL nodes (with Keepalived for VIP failover). They sit between the Zabbix servers and the backend database cluster.

Purpose: SQL load balancing and failover between Zabbix servers and the database cluster.

5. Percona XtraDB Cluster (PXC)

This is the highly available MySQL cluster backend. It consists of three master nodes in a synchronous replication setup.

Purpose: Highly available and consistent storage for all Zabbix data.

6. Zabbix Proxies (per datacenter)

Each datacenter has a dedicated Zabbix proxy, which handles data collection from local agents, SNMP devices, and other monitored resources. Each proxy stores data temporarily and sends it to the central Zabbix server.

Purpose: Decentralized data collection—ideal for multi-datacenter or segmented networks.

7. Zabbix Agents + SNMP, JMX, IPMI

Each monitored VM or device has the Zabbix Agent installed, or is monitored using SNMP, JMX, or IPMI—depending on the type of system or application.

Purpose: Direct metric collection from infrastructure components.

Monitoring Flow Explained

Here’s how data and access flow through the system:

Users access the monitoring dashboard through a Web Browser, routed via HAProxy VIP, which balances traffic to the Zabbix Frontends.
Zabbix Frontends send user actions and configuration changes to the Zabbix Server nodes.
The Zabbix Servers collect monitoring data either directly from Zabbix Agents, or indirectly via Zabbix Proxies deployed in each datacenter.
Each Zabbix Proxy aggregates data from local devices (using SNMP, JMX, IPMI, or agents) and forwards it to the central servers.
The Zabbix Servers query the MySQL cluster via ProxySQL (again behind a VIP for HA), storing and retrieving monitoring data.
All components are backed by Keepalived-managed VIPs, ensuring that failover between redundant nodes is seamless and automatic.

Additional Notes

The use of Percona XtraDB Cluster provides data consistency and failover capabilities not achievable with a single MySQL instance.
The Zabbix Proxy approach offloads the central servers, helps with scalability, and reduces latency for remote or large sites.
Keepalived + HAProxy combo ensures zero-downtime maintenance and updates, supporting live patching of frontend nodes.
Each layer is designed with redundancy, allowing any single node to fail without impacting the overall monitoring capability.

Final Thoughts

This HA setup for Zabbix ensures your monitoring is always up—even when parts of the system are down. With growing complexity in IT environments, such architectures are not a luxury—they’re a necessity. At Bluedesk Consulting, we help our clients implement resilient infrastructures that support operational excellence and proactive monitoring.

It’s important to note that while this architecture represents a high availability (HA) best practice, it is also adaptable to businesses of various sizes and needs.

Smaller environments or those just starting with Zabbix may opt for a non-HA deployment with a single server and database instance to simplify setup and reduce overhead.
However, for production-critical systems, financial services, cloud platforms, or any organization with 24/7 operations, we strongly recommend implementing HA from the beginning.
This design is modular, meaning components like proxies, HAProxy, or PXC can be gradually added as your infrastructure grows.
Even if full HA is not feasible at first, planning for scalability from day one helps avoid costly re-architectures later.

While a non-HA Zabbix installation is supported and can function well for limited use cases, it lacks failover capabilities, load balancing, and redundancy—which are essential for business continuity in most modern IT landscapes.

Whether you’re exploring new solutions, need expert guidance, or just have a few questions—we’re here for you. Feel free to reach out and let’s start a conversation.

Email: [email protected]
Phone: 0765 846 395
Or use our contact form: Contact Us

Bluedesk
Consulting

Your IT Partner for Success

We are a dedicated infrastructure team, harnessing the power of open-source technology to build and maintain robust, secure, and highly efficient IT environments.