As in, when I watched YouTube tutorials, I often see YouTubers have a small widget on their desktop giving them an overview of their ram usage, security level, etc. What apps do you all use to track this?
I thought the same thing but it's not bad actually, there are some pre build dashboards you can import for common metrics from Linux, windows, firewalls etc .....
Checkmk (Raw - free version.) Some setup aspects are a bit annoying (wants to monitor every last ZFS dataset and takes too long to 'ignore' them one by one.) It does alert me to things that could cause issues, like the boot partition almost full. I run it in a Docker container on my (primarily) file server.
Grafana. Have alerts set up and get data with node exporter and cadvisor with some other containers giving some metrics.
I have alerts setup and they just ping me on a discord server I setup. High cpu and temps low disk space memory things like that. Mostly get high CPU or temp alerts and that's usually when plex does its automated things at 4am.
I just check the proxmox dashboard every now and then. Honestly if everything is working I'm not too worried about exact ram levels at any given moment
I use Telegraf + InfluxDB + Grafana for monitoring my home network and systems. Grafana has a learning curve for building panels and dashboards, but is incredibly flexible. I use it for more than server performance. I have a dual-monitor "kiosk" (old Mac mini) in my office displaying two Grafana dashboards. These are:
Thanks, and yes, Type "static" are DHCP reservations.
Where do you do DHCP, on the PFSense or somewhere else?
Yes, on pfSense. I use the Python function written by pletch/scrape_pfsense_dhcp_leases.py (on Github) that scrapes the pfSense status_dhcp_leases.php page. Then added my own function for querying my TP-Link APs using SNMP to determine which AP a wireless DHCP client is connected to.
I can throw the script up on Dropbox if you are interested. I am mediocre at writing Python, so it is pretty specific to my environment.
We use zabbix here. Zabbix is amazing and we put it in all of our templates so any new servers and hosts pop up on zabbix dashboard preconfigured just like that. For logs and security we use an Elastik "ELK stack" which gives us a heads up if anything is wrong in the logs, and zabbix gives us a head up of the systems health all together. Between the two, our health monitor panel combines the two windows so we can see full server health and any problems right there as a todo list for the IT team
CheckMK for general monitoring, Grafana/Prometheus for Proxmox-cluster, Wazuh for IDS-purposes and UptimeKuma for general uptime on services. It's not like it's necessary, but it's nice to tinker in my homelab before implementing the same services on a "professional level" at work.
My HomeAssistant is stable, so wifey is not being used as a monitor ;-)
I run music bots game servers mostly so even if something fails it‘s nothing really that critical.
When I‘m at home I usually ssh into my main host machine and have btop running on my second monitor. It shows me the processes, ram , cpu, network and disk space. Oh yeah and load averages. It also looks super pretty and supports skins :)
I don't track their performance, I just track if they're up or down.
I use uptimekuma running on a free tier of fly.io so I can tell if my cluster had a catastrophic failure. There's no point in the alerting system running on the same system.
Nagios for service/QOS, Grafana for dashboarding for some items more specific.
Planning on eventually switching to zabbix but nagios is so simple that i feel having a hard time justifying moving over 400 monitored services to it
Its not well liked but I use nagios core for alerts and jump to grafana which has data in prometheus, influxdb, and mysql backend for trends like cpu usage hard drive Temps etc.
Oh lord, I have so much info to give ! For the setup, it's running on kubernetes 1.28.2, so YMMV. My monitoring stack is :
Grafana -- Dashboards
Alertmanager -- Alerting
Prometheus -- Time series Database
Loki -- Logs database
Promtail -- Log collector
Mimir -- Long term metrics&logs storage
Tempo -- Datadog APM, but with Grafana, allows you to track requests through a network of services, invaluable to link your reverse proxy, to your apps, to your SSO to your database...
SMTP Relay -- A homemade SMTP relay that eases setting up mail alerts, allows me to push mail through mailjet using my domain
Node-exporter -- exports metrics for the server
Exportarr -- exports metrics for sonarr/radarr etc
pihole-exporter -- exports pihole metrics for prometheus scraping
ntfy -- for notifications to my phone (other than mail)
The rest is pretty much the same, if the service exports prometheus metrics by default, I use that, and write a ServiceMonitor and a Service manifest for that, it usually looks like that
If the app doesn't include a prometheus endpoint, I just find an existing exporter for that app, most popular ones have that, and ready made grafana dashboards.
For alerting, I create PrometheusRule object with the prometheus query and the message to alert me (depending on the severity, it's either a mail for med-low severity incidents, phone notification for high sev). I try to keep mails / notifications to a minimum, just alerts on load, CPU, RAM, and potential SMART errors as well give me alerts.
Alerts are much more important than fancy dashboards. You won't be staring at your dashboard 24/7 and you probably won't be staring at it when bad things happen.
Creating your alert set not easy. Ideally, every problem you encounter should be preceded by corresponding alert, and no alert should be false positive (require no action). So if you either have a problem without being alerted from your monitoring, or get an alert which requires no action - you should sit down and think carefully what should be changed in your alerts.
As for tools - I recommend Prometheus+Grafana. No need for separate AletrManager, as many guides recommend, recent versions of Grafana have excellent built-in alerting. Don't use those ready-to-use dashboards, start from scratch, you need to understand PromQL to set everything up efficiently. Start with a simple dashboard (and alerts!) just for generic server health (node exporter), then add exporters for your specific services, network devices (snmp), remote hosts (blackbox), SSL certs etc. etc. Then write your own exporters for what you haven't found :)
When you've got a lot of variables, especially when dealing with a distributed system, that importance leans the other way. Visualization and analytics are practically required to debug and tune large systems
Alerts are much more important than fancy dashboards.
It depends, If you have to install lot of stuff or manage a lot of thing it's a good idea to have one but if you mainly do maintenance and you want to have something reliable yes you should have an alerts, for exemple I don't have a lot of thing install and doesn't rly care about reliability so I do everything in terminal, I use arch btw
One thing about using Prometheus alerting is that it’s one less link in the chain that can break, and you can also keep your alerting configs in source control. So it’s a little less “click-ops,” but easier to reproduce if you need to rebuild it at a later date.
When you have several Prometheus instances (HA or in different datacenters), setting up separate AlertManagers for each of them is a good idea. But as OP is only beginning his journey to monitoring, I guess he will be setting up a single server with both Prometheus and Grafana on it. In this scenario a separate AlertManager doesn't add reliability, but adds complexity.
As for source control, you can write a simple script using Grafana API to export alert rules (and dashboards as well) and push them to git. Not ideal, sure, but it will work.
Anyway, it's never too late to go further and add AlertManager, Loki, Mimir and whatever else. But to flatten the learning curve I'd recommend starting with Grafana alerts that are much more user-friendly.
Good luck, if you get into it, you'll be unable to stop. Perfecting your monitoring system is a kind of mania :)
One more advice for another kind of monitoring. When you are installing / configuring something on your server - it's handy if you can monitor it's resource usage in real time. And that's why I use MobaXterm as my terminal program. It has many drawbacks, and competitors such as XShell, RoyalTS or Tabby look better in many ways... but it has one killer feature. It shows a status bar with current server load (CPU, RAM, disk usage, traffic) right below your SSH session, so that you don't have to switch to another window to see the effect of your actions. Saved me a lot of potential headache.
Honestly my load is so light I don't bother monitoring performance. Uptime kuma for uptime, I used to use prtg and uptime robot when I ran a heavier stack before I switched to an all docker workload.
Influx/telegraf/grafana stack. I have all 3 on one server and then I put just telegraf on the others to send data into influx. Works great for monitoring things like usage. You can also bring in sysstat.
I have some custom apps as well where each time they run I record the execution time and peak memory in a database. This lets me go back over time and see where something improved or got worse. I can get a time stamp and go look at gitea commits to see what I was messing with.
I use sar for historical, my own scripts running under cron on the hosts for specific things I'm interested in keeping an eye on and my on scripts under cron on my monitoring machines for alerting me when something's wrong. I don't use a dashboard.
I'm running a tick stack with a couple of thousands of servers - way less CPU usage than checkmk/nagios or anything else from the previous millennium ...