Why ask selfhosted? It is exactly where you will not get Pi specific suggestions, but rather arm64 suggestions. Just as you said, you want to stay fairly specific to the Pi itself. So, look up some Pi specific projects.
Wireguard is the fastest since it is based off the kernel. Headscale is slower as it uses the Go implementation. Unless you needing ACL, wireguard is your best bet.