.. _rosi-collector-architecture:

Architecture
============

.. index::
   pair: ROSI Collector; architecture
   single: Loki
   single: Prometheus
   single: Traefik

ROSI Collector combines several components into a cohesive logging and
monitoring stack. This page describes each component, how they interact,
and the data flows through the system.

.. figure:: rosi-architecture.svg
   :alt: ROSI Collector Architecture Diagram
   :align: center
   :width: 100%
   
   ROSI Collector architecture showing data flows between components

Component Overview
------------------

rsyslog (Log Receiver)
^^^^^^^^^^^^^^^^^^^^^^

The rsyslog container receives logs from client hosts over TCP port 10514.
It processes incoming messages and forwards them directly to Loki using
the omhttp output module. rsyslog provides:

- High-performance log reception (handles thousands of messages/second)
- Message parsing and normalization
- Queue-based reliability (messages survive brief outages)
- Direct Loki integration via omhttp module
- Optional JSON file output for backup/debugging

Grafana Loki (Log Storage)
^^^^^^^^^^^^^^^^^^^^^^^^^^

Loki stores logs in a compressed, indexed format optimized for LogQL
queries. Unlike traditional log management systems, Loki indexes only
labels (metadata) rather than full-text, making it highly efficient:

- 30-day default retention
- Label-based indexing for fast queries
- Efficient compression reduces storage needs
- LogQL query language (similar to PromQL)

Prometheus (Metrics)
^^^^^^^^^^^^^^^^^^^^

Prometheus scrapes metrics from node_exporter running on client hosts.
It stores time-series data and provides alerting capabilities:

- Automatic service discovery from targets file
- 15-day metrics retention
- PromQL for metric queries and alerts
- Recording rules for dashboard performance

Grafana (Visualization)
^^^^^^^^^^^^^^^^^^^^^^^

Grafana provides the web interface for exploring logs and viewing
dashboards. It connects to both Loki (logs) and Prometheus (metrics):

- Pre-provisioned dashboards
- Explore interface for ad-hoc queries
- Alerting integration
- User authentication

Traefik (Reverse Proxy)
^^^^^^^^^^^^^^^^^^^^^^^

Traefik handles external access to the stack, providing:

- Automatic TLS certificates via Let's Encrypt
- Basic authentication for Prometheus/Alertmanager
- Request routing to internal services
- HTTP to HTTPS redirect

Downloads (File Server)
^^^^^^^^^^^^^^^^^^^^^^^

A lightweight nginx container serves client setup files:

- Installation scripts (rsyslog client, node_exporter)
- Configuration templates
- CA certificates (when TLS is enabled)
- Client certificate packages (for mTLS)

Files are accessible at ``https://grafana.TRAEFIK_DOMAIN/downloads/``.

Data Flow
---------

Log Data Flow
^^^^^^^^^^^^^

1. **Client hosts** run rsyslog configured to forward logs to the collector
2. **Collector rsyslog** receives logs on TCP 10514
3. **rsyslog omhttp** sends logs directly to Loki with labels
4. **Loki** stores labeled log entries in compressed format
5. **Grafana** queries Loki to display logs in dashboards and Explore

Metrics Data Flow
^^^^^^^^^^^^^^^^^

1. **Client hosts** run node_exporter exposing system metrics on port 9100; optionally an impstats sidecar on port 9898 for rsyslog internal metrics (Syslog Health dashboard)
2. **Prometheus** scrapes metrics from targets listed in ``nodes.yml`` and ``impstats.yml``
3. **Grafana** queries Prometheus to display metrics in dashboards (Host Metrics Overview and Syslog Health)

Network Ports
-------------

+-----------+--------+-----------+------------------------------------------+
| Service   | Port   | Protocol  | Description                              |
+===========+========+===========+==========================================+
| Traefik   | 80     | TCP       | HTTP (redirects to HTTPS)                |
+-----------+--------+-----------+------------------------------------------+
| Traefik   | 443    | TCP       | HTTPS - external access                  |
+-----------+--------+-----------+------------------------------------------+
| rsyslog   | 514    | UDP       | Syslog reception (UDP)                   |
+-----------+--------+-----------+------------------------------------------+
| rsyslog   | 10514  | TCP       | Log reception from clients (TCP)         |
+-----------+--------+-----------+------------------------------------------+
| rsyslog   | 6514   | TCP       | TLS-encrypted syslog (optional profile)  |
+-----------+--------+-----------+------------------------------------------+
| Grafana   | 3000   | TCP       | Web UI (internal, proxied by Traefik)    |
+-----------+--------+-----------+------------------------------------------+
| Loki      | 3100   | TCP       | Log API (internal)                       |
+-----------+--------+-----------+------------------------------------------+
| Prometheus| 9090   | TCP       | Metrics API (internal, proxied)          |
+-----------+--------+-----------+------------------------------------------+

Container Services
------------------

The Docker Compose stack defines these services:

.. code-block:: yaml

   services:
     traefik:      # Reverse proxy with TLS
     rsyslog:      # Log receiver with omhttp output (TCP/UDP)
     rsyslog-tls:  # TLS-encrypted log receiver (profile: tls)
     loki:         # Log storage
     prometheus:   # Metrics collection
     grafana:      # Visualization
     downloads:    # Client setup file server

The ``rsyslog-tls`` service starts automatically when ``SYSLOG_TLS_ENABLED=true``
in your ``.env`` file:

.. code-block:: bash

   docker compose up -d

All containers communicate on an internal Docker network. Only Traefik,
rsyslog (10514/514), and rsyslog-tls (6514) are exposed to external traffic.

Storage Volumes
---------------

The stack uses Docker volumes for persistent data:

+--------------------+---------------------------+---------------------------+
| Volume             | Mount Point               | Purpose                   |
+====================+===========================+===========================+
| loki-data          | /loki                     | Log storage               |
+--------------------+---------------------------+---------------------------+
| prometheus-data    | /prometheus               | Metrics storage           |
+--------------------+---------------------------+---------------------------+
| grafana-data       | /var/lib/grafana          | Dashboards, preferences   |
+--------------------+---------------------------+---------------------------+
| rsyslog-logs       | /var/log/remote           | Received log files        |
+--------------------+---------------------------+---------------------------+
| traefik-certs      | /letsencrypt              | TLS certificates          |
+--------------------+---------------------------+---------------------------+

Resource Requirements
---------------------

Minimum requirements for a small deployment (10-50 clients):

- **CPU**: 2 cores
- **RAM**: 4 GB (Loki is the primary consumer)
- **Disk**: 50 GB (depends on log volume and retention)

For larger deployments, scale based on:

- **Log volume**: ~1 GB storage per 10 million log lines
- **Retention period**: Multiply daily volume by retention days
- **Query load**: Additional RAM improves query performance

Scaling Considerations
----------------------

ROSI Collector is designed for single-server deployments. For larger
environments, consider:

- **Horizontal scaling**: Deploy multiple collectors with load balancing
- **Loki clustering**: Run Loki in distributed mode
- **External storage**: Use S3-compatible storage for Loki chunks
- **Prometheus federation**: Aggregate metrics from multiple Prometheus
