Hey everyone, so as I'm sure everyone is aware Lemmy.World has been experiencing several outages throughout the last few days.
We have been investigating the root cause of these outages but believe that they are related to our current hosting provider (Hetzner) blocking access from ClouldFlare as (we think) they believe that our CDN is a DDoS'er, and is causing these disconnects to our backend server, problematic for sure.
We've opened support tickets with our current provider and are awaiting a response. We have no issue with being as transparent as possible with downtime. Anyone that is curious, can feel free to check out https://status.lemmy.world and https://dash.lemmy.world for up to the minute outage information. We are also looking into other fediverse friendly methods of posting status and outage updates
In the meantime, we are evaluating alternative hosting options and solutions to provide a high level of reliability to you, our users.
Really, we want to say thanks to everyone for soldiering through all our technical growing pains.
That's mostly how I redditted for years. It was mostly for those moments in between things, on the toilet, laying in bed at night. Not something I did for long periods of time.
I don't blame you for this, but the uptime records are incomplete at best. I've experienced the site being down (and confirmed with Down for Everyone or Just Me), yet status.lemmy.world showed all systems operational. As I'm writing this, status.lemmy.world is missing most data up to yesterday and dash.lemmy.world shows 16 days uptime.
I have lots of respect to you for even having these. I also remember status.lemmy.world work mostly fine some time ago. But as of right now, both uptime monitors fail to serve their purpose.
You need to hover over the status bar to see if there is any down time for that day. We can enable it to log incidents every time there is a burp, but we are still tuning alerts as we only have it create a incident when we ACK it in PagerDuty.
You can always check the dashboard for up to the minute stats, as well as https://lemmy-status.org/endpoints/_lemmy-world
We'll add this info to make things clearer <3
EDIT:
Added more info to our status page, thanks for the feedback Machefi!
EDIT2:
Also the missing data is due to us removing and adding more specific monitors for the different infra services.
On your Cloudflare account, if there was a change in the CNAME/A record being proxied vs. DNS only, that could cause an issue, as Cloudflare would then strip headers off the request that your Apache/Nginx would be looking for.
If you enabled HTTP DDoS protection in your Security -> WAF tab (I think that’s where it is) that could do this too. Might be worth disabling.
Also check for any headers your HTTP load balancer might be expecting, that Cloudflare could be stripping.
Might be worth tailing the webserver logs to see what happens to requests coming in from Cloudflare.
Let me be real. I never noticed outages stopping. It feels like it's daily, I'm used to it, but I think it happens so often that lemmy.world has lost its growth opportunity, and we alienated the normies. I'm still going to stay on Lemmy, and I believe you're doing the best you can, but we lost for the time being, the migration to Lemmy from Reddit is stunted.
It didn't help that almost every other general purpose instance blocked sign-ups in June and early July either, or required an essay on the application. Lemmy.world was the only one that was even trying at all, and I will commend them for that.
Hopefully things will get better by the next time spez screws up. Because there will be a next time.
Next time the lemmy join page needs to be improved so people can spread and don't try to centralize into a single instance and break the purpose of lemmy in the first place.
Partly due to the fact..... lemmy itself, basically has no moderation or administration features at all....
So, the only way to assist with that issue, is stricter enforcement up-front.
Besides, if someone doesn't wanna take the time to have a verified email, and literally type 49 when registering an application.... I really don't wanna take the time to worry about having to potentially worry about them being spammers/etc.
We shouldn't be trying to grow a single instance. That defeats the whole point of Lemmy. I started on Lemmy world and switched once I got fed up with the constant connection issues. Plus, Lemmy world blocked piracy communities so fuck that. I'm happy that I am able to quickly create an account on another instance.
I am in the same boat. If it were not for these posts, I'd have never noticed lemmy world was down.
If I post to, or put a comment on something from lemmy world, it will just federate over when it's back online.
If someone posts to something on lemmy world, it will eventually federate over my way.
But, hey, everyone is gonna fuss when they decided to put all of their eggs in one basket and now, that one basket gets targeted. (metaphor for lemmy world.)
Could look into Dacentec if you need more cheap servers. I use them for my stuff. YMMV since you're getting a hell of a lot more traffic than I am but they haven't blocked Cloudflare on mine yet so that's a plus. :)
I imagine spreading out across instances, and having accounts in instances beyond the biggest one helps, as it reduces traffic and strain on lemmy.world and its servers.
lemmy.world still deserves support, like with donations, though helping reduce the strain is something anyone can do, albeit small, though free of charge.
This is about what's happening the last few days I think? Lemmy World has been a lot more stable compared to even a week ago but still has some daily downtime. But not for hours every day anymore
I mean, I'd help if I didn't have to have the educational background of a rich upper class white man. Teach me what I need to know and I'll sysop. But it looked like you had lots of people already come forward with experience "serving at the edge", etc. All I've done is play with Kubernetes and Proxmox at home on a residential IP with a domain pointed at my residence. Unfortunately, with no college education, I've had to teach myself all I know and it probably isn't up to what's wanted. I've heard "A bad sysop can be worse than just having nobody", so :(
I quit highschool and was force-fed some kind of formal education in the Navy. I think I have an associates degree of some kind? I dunno and I personally don't care if I do.
However, I built my career doing what I love and that was doing anything in IT... I settled on IT Security and just kept getting real world experience. Proving your skills and abilities can be a challenge at first, but its no different than anyone else fresh out of college.
Don't sell yourself short because you didn't go to college. I likely saved thousands of dollars and years of my life by actually learning and applying useful skills. Many people in this world have done the same.
College is not for everyone. As a matter of fact, I tend to stear away from candidates during interviews that have just memorized everything and don't know how to apply it. Case in point, I spent a week trying to teach someone how to troubleshoot a problem only to find out he had no idea how TCP and UDP actually worked. (I am a hands off teacher. I explain a problem and the concepts needed to fix it, but it is up to that person to do the leg work.) Sure, he could quote what he was told to memorize at school, but for many tasks he is basically useless. Ugh.
If someone reading this has gone to college and enjoyed it, awesome! Good on ya! You learn differently than I do.
When one mentions "Hetzner", it doesn't immediately evoke cloud services in the same vein as AWS, GCP, or Azure. For instance, while SAP offers "cloud" solutions, it might equate to a single server in Hannover. I hope this clarifies the context of my question :)