Why are (rational) databases preferred over keeping the data in memory?
Why do so many programs use rational databases instead of loading the data during startup and keeping it in memory? Especially for smaller datasets I would think, that a database adds unnecessary complexity and overhead. Also, a lot of data can be saved using modern RAM and when using an in-memory approach, optimized data structures can be utilised to further improve the performance
Why do so many programs use rational databases instead of loading the data during startup and keeping it in memory?
I presume you're referring to relational databases instead of rational.
The responsibility of a RDBMS is to implement a set of performance-focused data structures that help clients reliably get the data that they need in the fastest possible way, without having to reinvent the wheel.
More often than not, this data does not fit in the heap.
Also, in many usecases there is more than a single client.
I think you answered your own question in a way. For smaller datasets and workloads which only require one machine, storing your data in memory or on disk locally is probably fine.
When you start getting larger datasets or need to have a dataset that's consistent across multiple servers, you need something more than just the ram of the server.
At that point it's best not to reinvent the wheel and just use some kind of pre existing datastore who's tradeoffs make sense for your use case. Some of these datastores do use RAM under the hood for performance but there are pros and cons as with anything.
In-memory structures are usually faster to work with, but harder to coordinate multiple updates from multiple sources (different applications, services, etc).
Databases have all sort of failsafe mechanisms to ensure data integrity and recovery options, in most times there is no need to reinvent it all over again.
Persistent - do you need to access the data again once your program was finished? How often does the data change by other programs/tasks once you read it? How big is your data and how complex are the connections between your data objects?
Many times the implementation is a mixed approach. It is better to know and calculate the needs before you start your project, but as it usually happen, once you get performance issues, you start optimizing adding in-memory cache or scale to a bigger database.
Good luck keeping an entire corporate database in RAM only. It's not a 1:1 ratio of space on disk to space on RAM, pointers to the appropriate addresses also have to be stored in memory.
Even for a relatively small database, say, 100MB, keeping it in RAM only is a safety hazard. Even if you fully isolate the memory, make it completely "impregnable" from any outside application, memory corruption remains a problem. If you don't isolate, memory leaking into or out of the database is also a problem, the former being much more damaging.
The fact that you still have to dump it into a hard drive also shows the main problem of keeping it in RAM: if the database service goes down, you "lose" the entire database
Put it another way, your suggestion is to have Notepad open, reading/writing the data and only saving at certain intervals.
Relational database/RDBMS? It's because the added complexity was necessary or desirable for some reason - relational databases are pretty good at managing data fairly quickly, often with features to deal with timing issues, concurrency, transactions, security, auditing, replication... They have theory going back decades and are still highly relevant. Same as any other software offering, you sometimes expect it to provide features you can't, won't or shouldn't implement yourself.
The overhead is likely negligible, or was considered a fair tradeoff, or the database is actually better at its job in the given scenario. Hopefully. Sometimes people really do add shitloads of very unnecessary overhead by mistake, or overengineer their solutions terribly (cause it's fun)
First, persistency. You data lifecycle may not be directly proportional to your applications lifecycle. You may need it even after the app is shut down.
Second, RDBM systems provide a well defined memory/storage structure and API - "structured query language". This enables you to easily query your data and acquire suitable views that are beneficial for your applications purposes.
Third, It's always better to outsource your data layer to a battle tested, and trustworty database then trying to reinvent the wheel.
So this paves a road for you to focus on your business logic than splitting that focus for the data layer and business logic.