Help! How do I troubleshoot NAS becoming unavailable, requiring hard reboot? ZFS pools in Debian 12 shared via SMB/NFS, LGA1366 Intel X5670 18 GB RAM.
What should I monitor/log and how should I monitor/log to determine why my headless NAS is often becoming unavailable?
The problem:
Another machine that depends on the NAS routinely has its services unavailable because the NFS mounts are no longer mounted.
When that happens, sometimes a sudo mount -a recovers them.
Other times, the NAS is not pingable, so I go to the physical host, plug in monitor/keyboard and find that I can't log in. The login screen is frozen, requiring hard reboot.
Often when I leave a monitor attached (VGA), I come back to a screen that says:
critical medium error, dev sda, sector 163776752 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 2
I started a sudo smartctl -t long /dev/sda a few hours ago, and sometime since then, the server depending upon it no longer had NFS mounted. But a simple sudo mount -a resolved.
What the server was also doing when it had a network blip:
rclone was backing up to backblaze b2
Acting as NFS server for Plex/*arr media server
Acting as NFS storage for Proxmox machine (but no VMs or CTs running)
Pasted some zpool output below. Details about the machine:
Repurposed old hardware, just built this Debian 12 NAS a couple months ago
Operates as backup destination for other machines
Operates as media location for my Plex machine - other server that mounts the NAS via NFS.