Host Monitor
A monitoring appliance for the Waveshare ESP32-S3-Touch-LCD-4.3. It watches a fleet of hosts with six pluggable checks, shows health on the on-device LCD, and serves a web dashboard embedded in the firmware — over plain HTTP behind Basic Auth — with webhook alerting and pause / acknowledge governance.
00Overview
Host Monitor is an Arduino IDE project that turns a Waveshare ESP32-S3 touch display into a standalone uptime monitor. It loads a host list from hosts.csv in on-board flash (LittleFS), runs six checks against each host on independent schedules, and reports fleet status three ways: on the LCD, over an embedded web dashboard, and through alert delivery.
On-device LCD
800×480 capacitive touch home screens — a health donut, a host grid, and an Alerts·Setup card, on a 3-tab nav with live refresh.
Embedded dashboard
The full web UI + JSON REST API are compiled into the firmware (gzip) and served over plain HTTP, gated by Basic Auth.
Six checks
Ping, DNS, Port, HTTP, SSL and Trace — each with its own interval and per-host enable. SSL reads cert-expiry; HTTP has a per-host HTTP/HTTPS scheme.
Alerting
Webhook delivery (JSON POST/PUT) routed per host on down / warn / recovered, with re-notify. SMTP email was removed — route a webhook to a relay.
Governance
Pause and acknowledge with a required reason — kept on the host and surfaced in the UI.
Timekeeping
Time comes from NTP, falling back to a runtime/uptime clock. (The on-board PCF85063 RTC is unused in this build — see Hardware.)
01Hardware
- Waveshare ESP32-S3-Touch-LCD-4.3 — ESP32-S3, 8 MB OPI PSRAM, 8 MB flash (some revisions ship 16 MB — match Flash Size + partitions to your chip)
- No microSD card and no RTC battery needed — the host list and dashboard both live in on-board flash
Wire-free — so the on-board PCF85063 RTC is unused. The GT911 touch + CH422G expander run on the panel library's legacy IDF I²C driver; Arduino's Wire pulls in the new driver_ng, and linking both aborts at boot (check_i2c_driver_conflict). The RTC shares that bus, so it stays disabled; time comes from NTP with a runtime/uptime fallback.The board library owns the I2C bus and brings up the expander → RGB bus → GT911 touch automatically once you select the Waveshare board in its config (covered in Installation).
02Installation
This is an Arduino IDE project. Set up the board package, libraries, and Tools-menu settings in order.
-
Install the ESP32 board package (pinned)
Add the boards-manager URL, then install esp32 by Espressif → version 3.3.8 — the version this code is built against (it relies on the IDF v5 / core 3.x APIs).
https://espressif.github.io/arduino-esp32/package_esp32_index.json
-
Install libraries
From Library Manager unless noted. Versions matter — LVGL v9 and on-board TLS won't work against this code.
Library Version Notes lvgl 8.4.x Graphics engine. v9 will not compile. ESP32_Display_Panel 1.0.4 Drives the CH422G expander + RGB bus + GT911 touch. ESP32_IO_Expander 1.1.1 Dependency of ESP32_Display_Panel. esp-lib-utils 0.3.0 Dependency of the panel stack. ESP32_IDF5_HTTPS_Server 1.1.1 Used for its HTTP server + Basic Auth only (IDF5 fork of fhessel's server). Its TLS classes are not used — on-device HTTPS isn't viable here. ArduinoJson 7.4.x JSON API (code still uses the DynamicJsonDocumentalias).LittleFS,WiFi,HTTPClient,WiFiClientSecure,ESPmDNS,Preferences, the esp_ping/lwIP stack and mbedTLS ship with the ESP32 core. (mbedTLS is pulled in byWiFiClientSecurefor the SSL check, HTTPS host checks and webhooks — all unverified.)iESP Mail Client is no longer required. SMTP / email alerting was removed (its TLS footprint exceeded this board's free internal RAM) — webhook is the only delivery channel now, so you can uninstall it. -
Place lv_conf.h
Copy
lv_conf.hfrom the sketch folder into your Arduinolibraries/folder — it must sit next to thelvglfolder. This build setsLV_MEM_CUSTOM=1and allocates the LVGL pool in PSRAM, which frees internal RAM for Wi-Fi and the checks — don't skip it.…/Arduino/libraries/lv_conf.h # next to the lvgl/ folder -
Select the Waveshare board in ESP32_Display_Panel
This is the single most important step. Place a file named
esp_panel_board_supported_conf.hat your Arduinolibraries/root (next tolv_conf.h) — the library finds it via__has_include("../../../…"), so it must live there, not inside the library folder. Without the board selected,Board::init()fails and the screen stays black (backlight on).#define ESP_PANEL_BOARD_DEFAULT_USE_SUPPORTED (1) #define BOARD_WAVESHARE_ESP32_S3_TOUCH_LCD_4_3
-
Arduino IDE board settings (Tools menu)
PSRAM must be OPI PSRAM. The app build is large (LVGL + HTTP server + mbedTLS + ESP Mail Client + Display Panel), so use the bundled custom partition table — a default scheme won't fit.
Board ESP32S3 Dev Module PSRAM OPI PSRAM — mandatory Flash Size 8MB (64Mb) — or 16MB if your board has it Partition Scheme Custom → bundled partitions.csv(5 MB app / ~2.9 MB LittleFS / coredump)Upload Speed 921600 USB CDC On Boot Enabled (serial log; unreliable once RGB + Wi-Fi are live — the LCD surfaces diagnostics)
03Dashboard assets
There is no separate filesystem upload. The data/ folder (HTML / CSS / JS + setup.html) is gzip-compressed into web_assets.h and compiled straight into the binary, then served by the API's default route. Flashing the sketch ships the dashboard with it.
Edit the dashboard (optional)
Change the files under
data/, then regenerateweb_assets.hfrom them so the embedded copy matches.Upload the sketch
Sketch → Upload. That's it — the dashboard travels inside the firmware image. No LittleFS data-upload tool required.
(Optional) seed a host list
Fresh flash starts with no hosts — add them in the dashboard, or pre-seed by writing a
hosts.csvto LittleFS. Either way it's persisted to and reloaded from flash.
hosts.csv, auto-created on first boot.config.h, model.h…) or web_assets.h can leave a stale object in Arduino's build cache. If a change “doesn't take”, clear …/arduino/sketches/* and rebuild — the Alerts·Setup card shows a built HH:MM:SS marker so you can confirm the running binary is current.hosts.csv, and generates a random per-device web password (shown on the LCD Alerts·Setup card). There's no on-device TLS key generation — the dashboard is plain HTTP — so the screen comes up in seconds.04First run
Get the device onto your Wi-Fi one of two ways. Method 1 bakes the network name and password into the firmware for a fixed deployment; Method 2 is the interactive AP wizard. Either way you finish at the same place — open the dashboard.
Method 1 — hardcode the Wi-Fi name & password in config.h
For a fixed deployment, set the SSID and password in config.h before flashing and skip the wizard entirely. If WIFI_SSID_BUILTIN is non-empty the device joins that network on every boot and ignores saved / AP credentials.
#define WIFI_SSID_BUILTIN "my-ssid" #define WIFI_PASS_BUILTIN "my-pass" // leave both "" for the AP wizard
Method 2 — AP setup wizard
Join the AP
With no saved Wi-Fi, the device raises an open access point named
HostMon. Join it from a phone, then browse manually tohttp://192.168.4.1— there's no captive-portal auto-popup, so open the address by hand.Choose a network
Pick your network and enter the password — the device saves them and reboots onto your LAN. There's no “stay an access point” mode; AP mode is only the no-credentials fallback.
http://192.168.4.1 manually after joining.Open the dashboard
Once the device is on your LAN, browse to http://monitor.local/ (or http://<ip>/, shown on the LCD). It's HTTP only — read the auto-generated login off the screen (see below).
Transport & TLS
The dashboard + API serve over plain HTTP only. On-device HTTPS was removed — it isn't viable on this board: an mbedTLS connection needs ~16–32 KB of contiguous internal RAM, which the device can't reliably spare alongside Wi-Fi + the check engine + the web server. With it enabled the TLS listener came up but handshakes reset (ERR_CONNECTION_RESET) regardless of cert type, so the TLS server, cert provisioning, and embedded cert/key were all deleted.
http://<device-ip>/. Or keep the device on a trusted / isolated VLAN and accept plain HTTP there.MAX_HOSTS was reduced to reclaim internal RAM, a single mbedTLS session (~44 KB peak) fits — confirmed by a boot-time self-test. The firmware uses it for the SSL cert-expiry check, insecure HTTPS host checks, and the webhook notifier. A global gate (tls_gate.*) serialises every outbound TLS site across the check and web tasks, so two sessions can never allocate at once. Verified TLS (CA-bundle chain validation) still doesn't fit — that stays reverse-proxy territory.TLS_MIN_FREE_BLOCK (20 KB largest contiguous internal block, for the ~16 KB session buffers) and TLS_MIN_FREE_TOTAL (48 KB total free internal, for the ~44 KB whole session). If either is below its floor the session is skipped for that cycle and reported as low mem — the check / webhook simply defers (re-notify retries later) instead of crashing. A distinct case: if a TLS site can't get the single-session slot within ~15 s it reports tls busy rather than low mem. It's sized as insurance: it only blocks a session that couldn't have fit anyway, so checks that currently succeed are unaffected. Watch the RAM gauge on the LCD Alerts·Setup card (free · largest · min) as you add hosts — if min trends toward single digits, the guard is what keeps a depleted moment from faulting. Raise the floors (e.g. 54–60 KB total) for a higher guaranteed margin, accepting that TLS checks defer sooner under pressure.Login — auto-generated password
On first boot the device generates a random per-device web password (12 chars, unambiguous alphabet) and shows it on the LCD Alerts·Setup card as Web login: admin / …. There is no shared default password — read it off the screen to log in, then set your own in Settings (8–39 chars); the LCD line disappears once you do.
POST /api/settings/auth.Timekeeping
Time comes from NTP once the network is up. Offline, or before NTP completes, the device falls back to an uptime-based runtime clock. The active source is reported as device.time (ntp / runtime) in GET /api/summary.
Wire-free to avoid an I²C driver conflict at boot (see Hardware). So time does not survive a power loss before NTP re-syncs.05Security
This is a LAN appliance, not an internet-facing service. Every data endpoint is gated by auth and every write is validated server-side, but it deliberately stops short of protections the hardware can't support (on-device TLS). The web-layer checks live in api.cpp / validate.cpp and run on every request.
Basic Auth, random password
Every /api/* endpoint sits behind HTTP Basic Auth with a random per-device password. It's drawn from the hardware RNG after the RF subsystem is live — true entropy, not the weaker pre-RF PRNG — with no shared default. Only the static shell is public.
Constant-time compare
Credentials are checked byte-for-byte in constant time, so response timing can't leak how much of the password matched.
CSRF / Origin guard
A cross-origin POST carrying a mismatched Origin / Host is rejected with 403 before it touches state.
Request body cap
Bodies are capped at 8 KB and over-cap reads truncate, so a flood can't exhaust RAM on a constrained device.
Header-injection guard
The webhook custom header rejects CR / LF, so a crafted value can't smuggle extra HTTP headers downstream.
Server-side validation
Every POST + CSV row is validated in validate.cpp (IPv4/RFC-1123, interval whitelist, printable-ASCII, length caps); invalid input returns 400 and never mutates the store.
CSV / formula-injection guard
CSV-injection characters are rejected, and a name or group may not start with = + - @ — so a value can't turn into a spreadsheet formula if hosts.csv is opened in Excel.
Concurrency-safe scans
The engine snapshots a host under lock, runs the seconds-long checks against the copy, and re-finds the live host by id for each write-back — so deleting a host mid-scan can't misattribute a result or touch a stale slot.
TLS heap / crash guard
Before any outbound TLS session the gate checks free internal RAM (≥20 KB contiguous & ≥48 KB total). Under that floor the session is skipped as low mem rather than risking an out-of-memory fault.
06hosts.csv format
A header row is required, then one host per line. Every row is validated on load — malformed rows are skipped and logged over serial. Edits made in the dashboard are written back to this file in flash. New hosts default to ping only (enable other checks per host in the dashboard).
name,address,group,checks,intervals,alerts nas-01,192.168.1.10,Storage,ping|dns|port|http,,down|recovered pihole,192.168.1.12,Network,ping|dns|port|http,http:30,down|warn|recovered home-asst,192.168.1.20,Apps,ping|dns|port|https,ping:10|port:10,down|warn|recovered example-blog,example.com,External,ping|dns|port|https|ssl|trace,ping:60,down|warn|recovered
| Field | Meaning |
|---|---|
| name | Display name for the host. |
| address | IP address or hostname to monitor. |
| group | Free-text group used to organise the fleet. |
| checks | Pipe list of enabled checks: ping | dns | port | http | https | ssl | trace. http = HTTP check over plaintext; https = over TLS, always insecure (accepts any cert); ssl = cert-expiry check (insecure handshake, reads notAfter). |
| intervals | Optional key:seconds overrides, e.g. ping:10|http:30. Seconds must be one of 10, 30, 60, 120, 300, 900, 3600, 21600, 43200, 86400. |
| alerts | Optional pipe list from down | warn | recovered (default down|recovered). |
httpsv → https and sslv → ssl (verified TLS modes aren't supported). On the next save the file is rewritten with the current keys.07The six checks
Each host enables any subset of these. Every check runs on its own interval and reports its own state — the host's overall status is the worst of its active checks. The engine runs on a statically-allocated task pinned to core 0, away from the LVGL display loop on core 1.
Ping
def 30sICMP reachability plus packet-loss % via esp_ping.
DNS
def 5mResolves the hostname and times the lookup.
Port
def 60sTCP connect to a port (per host; default 80), bounded by CONNECT_TIMEOUT_MS.
HTTP
def 60sGETs the host and expects a 2xx / 3xx. Per-host http / https scheme — HTTPS is always insecure (accepts any cert; reports reachability + status, not cert trust) and works on non-standard ports.
SSL
def 12hInsecure TLS handshake that reads the peer cert's notAfter and reports days-to-expiry — warns under SSL_WARN_DAYS (14), goes down once expired. Port per host (default 443). Use the hostname, not an IP — the cert is selected by SNI.
Trace
def 5mReachability plus a hop-count estimate from the reply TTL.
MAX_HOSTS was reduced enough to free the internal RAM for a single mbedTLS session (see Transport). It reads the certificate's expiry date only — it does not validate the chain. Because only one outbound TLS session fits, the tls_gate serialises the SSL check, HTTPS host checks, and webhook deliveries — and a heap guard defers any of them (as low mem) if free internal RAM is below the TLS floor, rather than risking an OOM crash.08Web / JSON API
Everything is served over plain HTTP behind Basic Auth. Cross-origin POSTs are blocked, and every POST body is validated server-side; invalid input returns 400 {ok:false, error:"…"}.
Read
| Method | Endpoint | Returns |
|---|---|---|
| GET | /api/summary | Fleet counts + device / net + clock |
| GET | /api/hosts | All hosts |
| GET | /api/host?id=h1 | One host |
| GET | /api/alerts | Recent alerts |
| GET | /api/settings | Webhook / defaults / device |
| GET | /api/status | {ap, online, ip, ssid} |
| GET | /api/wifi/scan | Nearby networks |
Host governance & editing
| Method | Endpoint | Body |
|---|---|---|
| POST | /api/host/ack | {id, reason, who?} — reason required |
| POST | /api/host/pause | {id, reason, until?, who?} — reason required |
| POST | /api/host/resume | {id} |
| POST | /api/host/clear | {id} |
| POST | /api/host/interval | {id, key, every} |
| POST | /api/host | {id?, name, addr, group, checks[{key,enabled,every,port,secure}], alerts{}} — writes CSV |
| POST | /api/host/delete | {id} |
Settings & system
| Method | Endpoint | Body |
|---|---|---|
| POST | /api/settings/webhook | {url, method, header, enabled, when[]} |
| POST | /api/settings/defaults | {interval[6], fails, lcdHome, renotify, renotifyEvery} |
| POST | /api/settings/auth | {user, pass} — pass 8–39 chars |
| POST | /api/test/webhook | {} |
| POST | /api/sd/reload | {} — reloads hosts.csv from flash |
| POST | /api/wifi/join | {ssid, pass} — saves + reboots |
09Source layout
10Limitations
Capabilities that were attempted and then removed (or never shipped) because of hardware or platform constraints on this board. Recorded here so the gaps — and why they exist — are explicit.
Wire-free build).On-device TLS — one insecure session, no verification
An mbedTLS session needs ~16–44 KB of contiguous internal RAM. After MAX_HOSTS was reduced (and the old cert / HTTPS-server cruft removed) to reclaim RAM, the board now runs one insecure session at a time — confirmed by a boot self-test reporting ~23 KB still free at the session's peak. That re-enabled the cert-expiry check plus insecure HTTPS checks / webhooks, serialised by the TLS gate — which also enforces a pre-session heap floor (≥20 KB contiguous & ≥48 KB total free internal), skipping a session as low mem if RAM is too tight rather than faulting. What still doesn't fit is multiple concurrent sessions or verified TLS (CA-bundle chain validation).
- HTTPS for the dashboard — removed; a page load opens several concurrent TLS sessions, which don't fit. TLS server, cert provisioning and embedded cert/key all deleted; the dashboard is plain HTTP only.
- Self-signed cert + trust-on-first-use — wired up as a lighter alternative, then abandoned; handshakes reset regardless of cert type (tested RSA and ECDSA).
- SSL/TLS certificate-expiry check — removed in 1.4.3, restored (insecure) in 1.4.8 once the self-test confirmed a single session fits. Reads the cert notAfter; does not validate the chain.
- HTTPS host-check “verify” option — removed; HTTPS host checks are insecure-only (reachability + status, not cert trust).
- Email / SMTP alerting (ESP Mail Client) — removed in 1.4.9. It was the heaviest TLS user; dropping it raises the worst-case internal-RAM floor. Webhook is the only delivery channel now — route it to a relay if you need email.
- Notifier outbound TLS verification — removed; HTTPS webhooks are encrypted but the server identity is not checked.
- Verified HTTPS via
setCACertBundle()— abandoned earlier: the core CA bundle couldn't be referenced by symbol on this core (undefined reference _binary_…_crt_bundle_bin).
Hardware / driver conflicts
- On-board PCF85063 RTC — disabled. The build stays
Wire-free to avoid the legacy/driver_ngI²C clash, so there's no wall clock across a power-cycle until NTP re-syncs. - SD-card storage — dropped. SD chip-select falls back to GPIO10, an active RGB data line (DATA4); mounting SD while the panel runs corrupts the parallel bus. Persistence uses LittleFS instead.
- Captive-portal auto-redirect — dropped. The bundled DNSServer calls
udp_new()without the lwIP core lock and asserts under core-locking. The AP still works — browse to its IP manually. - Reliable runtime USB-serial logging — given up as the primary diagnostic channel; serial is unreliable once RGB + Wi-Fi are both live, so diagnostics surface on the LCD Alerts·Setup card.
Tried & reverted (memory pressure)
- Internal-RAM LVGL draw buffers — attempted to fix the green RGB flicker, but consumed enough internal heap that the check task failed to create. Reverted to PSRAM draw buffers; the flicker was fixed instead with conditional
markDirty.
Accepted platform pins
- LVGL stays on v8.4 — v9 won't compile against this code.
- A custom
partitions.csvis mandatory — the app won't fit a default partition scheme. - The build must stay
Wire-free — see the RTC note above.