Why did I decide to invest my time building a self-hosted cluster, and why would I suggest someone else to dig into this world? These are the questions we will answer throughout this article, the first of a series about how I built my self-hosted cluster at home, and how you could do that too.

So, what led me on this journey?

A playground you actually own

The first big win of self-hosting is having your own playground. A place where you can experiment, try things out, break stuff, and learn, all on your own terms.

The self-hosting ecosystem today is massive. The awesome-selfhosted list on GitHub has over 270,000 stars, making it one of the most popular repositories on the entire platform. The r/selfhosted subreddit has grown past 300,000 subscribers. There is clearly something that attracts people to this world.

And once you start exploring, you quickly see why. Here are some tools that got me excited:

Immich, a self-hosted Google Photos replacement with automatic phone backup, facial recognition, and map views. It is one of the fastest-growing open-source projects right now, with over 90,000 GitHub stars.
Jellyfin, your own Netflix. Stream your movies, TV shows, and music to any device, no subscription, no ads, no content disappearing.
Home Assistant, a smart home automation hub with 2,000+ integrations. Your smart home keeps working even if the internet goes down.
Pi-hole, a network-wide ad blocker. Install it once and every device on your network gets ad blocking with zero configuration.
Paperless-ngx, scan, OCR, tag, and archive all your physical documents. Turns shoeboxes of receipts and tax papers into a searchable digital archive.
Nextcloud, essentially your private cloud. File sync, calendar, contacts, office suite, all in one place. Replaces Google Drive, Google Calendar, and Google Docs.

You see a new open-source tool that looks interesting? You can spin it up on your cluster in minutes. You built a small project and want to share it? Put it on your server. You want to sharpen your sysadmin skills? This is the perfect gym for that.

Your own mini enterprise

Running a self-hosted cluster is essentially managing a small-scale production environment. You have to think about everything: networking, storage, backups, security, monitoring, updates. It is the same kind of problems you face in an enterprise setup, but you own the entire stack and you can make all the decisions.

Want to try a new container orchestrator? Do it. Want to experiment with a different backup strategy, or see how your services behave when a node goes down? Go ahead. You need to understand your network in depth, segment interfaces, set up VLANs and a DMZ? Nobody is stopping you. Want to set up a proper CI/CD pipeline, play with GitOps, or migrate everything to a new architecture? Go for it.

And then you start thinking about resilience. What happens if the power goes out? You need a UPS. How do you back up your data? You learn about the 3-2-1 backup strategy. How do you keep your services secure? You set up Fail2ban, SSH key authentication, firewalls.

It is a space where you can break things, rebuild them, and actually understand how everything fits together. The kind of experience that is hard to get in a corporate environment where you only see a slice of the infrastructure.

The way things are

Now, here is the other part that made me start this journey. It is a bit less fun, but it is something I found myself thinking about more and more, and I wanted to free myself from it.

Today we are used to relying on third-party companies for the vast majority of our digital data: photos, messages, documents, passwords. We are used to doing so because it is easier, we do not have to worry about availability, storage, or access. You have an app, you use it, and someone else takes care of all the rest.

But consider how concentrated this really is. Google, Apple, Microsoft, and Meta collectively control your search engine, your phone’s operating system, your email, your cloud storage, your browser, and increasingly your home devices. Amazon, Microsoft, and Google together control roughly two-thirds of the global cloud market. For most people, opting out of all of them simultaneously is practically impossible.

A Vanderbilt University study found that a dormant Android phone with Chrome running in the background sent location data to Google 340 times in a single 24-hour period, and that two-thirds of Google’s data collection happens passively, without any direct user interaction.

Data sovereignty

Of course, these companies need to follow strict rules, especially in the European Union, to handle sensitive content. Yet, there have been cases where this didn’t happen, or where these rules have been circumvented.

Here are some real examples:

Google Photos, a father took photos of his toddler’s infection for a telehealth consultation. Google’s automated scanning flagged the images, locked his entire account, and reported him to the police. He was cleared, but Google permanently refused to reinstate his account. Over a decade of emails, contacts, and photos, gone.
Amazon Alexa, Bloomberg revealed that thousands of Amazon employees were listening to audio recordings captured by Echo devices in people’s homes. The FTC later found that up to 30,000 employees had access to this data, and Amazon had been keeping children’s voice recordings indefinitely, even after parents requested deletion.
AI training without consent, Meta confirmed that all public text and photos posted by adult Facebook and Instagram users since 2007 have been used to train its AI models. European users were given a convoluted opt-out process, while US users were given no opt-out at all.
Changing Terms of Service, Adobe updated its Creative Cloud terms with language granting the right to “access your content through both automated and manual methods,” which users interpreted as permission to use their work for AI training. The backlash was so severe that Adobe was forced to rewrite the terms. The FTC itself warned that companies quietly changing their ToS could constitute unfair or deceptive practices.

The point is you do not really know how things work behind the scenes, and sometimes the reality is not what you would expect.

Usually the first reaction I get when talking about this is: “yeah, this could be dangerous, but not for me, I don’t have anything to hide.” And yeah, maybe you don’t. But that is not the point.

As Edward Snowden put it: “Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.”

Privacy is not about secrecy, it is about autonomy. It is the right to choose what you share, with whom, and on what terms. And it is something you have to care about day by day, because things change over the years. Maybe even a country you trust could one day decide they want to read your messages.

And regulation alone does not solve this. GDPR gives EU residents stronger rights on paper, but if your email runs through Gmail, your files live on OneDrive, and your photos sync to iCloud, your data still resides on infrastructure controlled by American corporations, subject to American jurisdiction. Regulation helps, but it is not a substitute for control.

Having your core services self-hosted makes you independent of these issues.

Risk of dependency

Another risk worth mentioning is the dependency on third-party services. You never know when a service you rely on might be discontinued, or when a company decides to raise prices to a point where it is no longer viable for you.

This is not hypothetical, it happens all the time:

Google Photos (2021), Google ended free unlimited storage and started counting all uploads against a 15 GB quota, pushing hundreds of millions of users toward paid plans. A self-hosted alternative like Immich gives you the same experience with no storage limits and no subscription.
Google Reader (2013), Google killed its beloved RSS reader despite millions of devoted users, simply because it no longer aligned with their business priorities. Today you can self-host FreshRSS or Miniflux and never worry about it.
Evernote (2023), after being acquired by Bending Spoons, Evernote laid off most of its staff, slashed the free plan to just 50 notes, and hiked prices by 67%. Self-hosted alternatives like Joplin let you own your notes with end-to-end encryption.
GitHub Actions (2026), GitHub introduced a per-minute charge for all Actions workflows on private repos, including self-hosted runners. Even if you run your own hardware, you now pay GitHub for the orchestration. With a self-hosted Gitea instance and its built-in CI/CD, your pipelines run entirely on your infrastructure with no per-minute fees.

It is not about conspiracy

I know this might sound conspiratorial, and maybe a little bit it is. But I am not telling you to retire to the mountains or to stop trusting every product out there.

As is always the case in life and in tech, it is a trade-off. I think everyone should look at the cloud services they rely on and ask: which ones hold my most sensitive data? Which ones should always be available to me, no matter what? Those are the services I would try to self-host.

Conclusion

To wrap up this first article, the goal was to plant the seed of a few questions you should ask yourself: rethink your digital life, your online presence, and the tools you use every day.

Maybe you will find the inspiration to start this journey. I think it is really fun, and you grow a lot as an engineer.

Start small. Self-host one service. Build your own space and own your digital footprint.

Resources & Inspiration

Some references that helped me along the way and might help you too:

r/selfhosted, the subreddit where it all starts. Great for ideas, troubleshooting, and seeing what other people run.
awesome-selfhosted, a curated list of self-hosted software. One of the most starred repos on GitHub.
Morrolinux, if you speak Italian, one of the best channels about Linux, open source, and digital freedom.
Proton Blog, especially their pieces on what is data privacy and data sovereignty.
EFF on Apple CSAM scanning, a great breakdown of why client-side scanning is dangerous.