September 8, 2024 | 13:30
Reading-Time: ca. 6 Min

Enterprise-Backup Solution

Why is the ransomware business model so successful? How do criminals manage to steal data, encrypt it and often also destroy data backups? According to a representative survey conducted by BITKOM over the past 12 months, 60% of companies in Germany have been affected¹.

A brief excursion into this topic, my work and how I was able to help a company save EUR 17,000. As always, no claim to universality and completeness. Your milage may vary.

Broken data backup concepts

The answer lies in the data backup concepts. Embedded in the ISMS and linked to emergency and continuity plans. From the BITKOM survey mentioned at the beginning, it can be seen that for 60% of those affected, these are precisely total failures:

Four out of ten (40 percent) of the affected companies were able to recover their data themselves, while 10 percent were able to get it back from the perpetrators without paying a ransom.

Without delving into the depths of the various certification standards, a concept always answers the following questions:

What RPO (Recovery Point Objective) and RTO (Recovery Time Objective)² are defined?
Can these be achieved with the existing technology and personnel in the event of a total failure?
Are there separate offline or offsite backups from regular operations?
Are dependencies and sequences taken into account?
Is a methodical review carried out on a regular basis?
Are changes transparent and traceable?

Poor Backup Software

At the latest with the last two questions, it becomes clear that there is no standard software that can provide answers.

On the one hand, software is being increasingly automated with CI/CD³ pipelines, deploy and rollout workflows on the systems. However, when it comes to backups and recovery, experience shows that administrators have to wade through more or less miserable dialogues manually.

Dependencies and workflows are determined solely by the administrator’s ‘logic’. What do I mean by that? It would be stupid, for example, if all DNS servers were backed up simultaneously offline and therefore became unavailable. It’s even more stupid if two months later another admin asks why the second DNS is missing and thinks he can quickly push it into the backup job.

Very few people use APIs and scripting integrations, if offered by backup software at all. How can a check be carried out methodically and regularly without automation? And don’t even get me started on scaling.

It is worth noting that backup jobs themselves have no audit or change logs. Which administrator did what, when and where is left completely unknown by Veeam & Co. So how does a responsible non-admin know whether a particular VM is still part of a backup or has been swapped for something else in the meantime?

For instance, a company I personally know did not back up the VM of the critical ERP system for almost a year, but instead the development system of the external service provider. The mix-up was caused by an admin during a ‘clean-up’ and nobody noticed. It was only when the development VM was finally removed that everyone wondered why the daily status emails were no longer green.

Backup software is increasingly becoming a problem itself. Proprietary black boxes with unknown modes of operation are set up in central locations and can affect critical areas across intentionally created segments and separations.

At the same time, this single point of failure⁴ collects credentials that can be easily read out using powershell scripts.⁵ Increasing online constraints and non-transparent data outflows in the form of telemetry are additional challenges.

Read all Veeam-Credentials via Powershell

Challenges not understood

Administrators often work with methods from the 80s and ignore the progress of the past decades. Even though the number of servers is increasing, I still see manual logins on servers. Regardless of ideological discussions about ‘Linux or Windows’, ‘console or GUI’, this is utterly wrong!

Scaling does not mean hiring more staff, but rather automating more and, above all, better. Kristian Köhntopp showed 9 years ago in his presentation ‘Go Away Or I Will Replace You With A Very Little Shell Script’ how unproductive and dangerous manual ‘climbing onto’ servers is:⁶

If you have to climb onto a computer to check something, the monitoring is obviously broken. If you have to climb onto a computer to change something, the automation is obviously broken and hopefully not just one box is broken, but all the others too, and hopefully in the same way.

Wherever availability has to be guaranteed because companies and people’s livelihoods depend on it economically, automation is a mandatory requirement.

We use ‘Veeam’, I often hear. Fine, but unfortunately they don’t understand the problem. I fear that the business models with the multiple ways of charging are not even noticed: expensive licence subscriptions for proprietary software on the one hand, ransoms on the other. But the support at Veeam is so good. Which support? No call centre agent or non-admin at the other end of a phone line, chat or email will help you if everything is down and the backups are gone. This is an extreme case of Stockholm syndrome.⁷

GitOps Solution

Originating from DevOps,⁸ the term GitOps refers to the operation of an infrastructure with the help of Git⁹ version control. The ‘single source of truth’¹⁰ for infrastructure operations, server setup, adaptation of software packages and automated processes with your scripts is transparent and traceable.

Last year, a medium-sized company was faced with the question of whether it was still willing to spend large sums of money on a small HyperV cluster consisting of two nodes. The price tag was EUR 17,000.

Screenshot Veeam cost calculation

No proprietary software is required for complete, daily backups of VMs, supplemented by midweek and weekly swaps to other storages and external USB media.

At a fraction of the cost, all backups are now controlled by a Forgejo server that is only accessible internally.¹¹ Each node in the cluster automatically pulls the repo with its backup scripts and executes them.

A positive psychological side effect for every managing director: it’s good to have the whole company in your hands and to be able to restart everything on a random computer at any time.

Screenshot 2 Nodes and Git repo

Since then, more repositories have been added. From the ‘digital twins’ presented here for testing Windows updates¹² to the creation of semi-daily database dumps of all specialised applications. From Bash, PowerShell, batch files and Ansible scripts to small tools and AutoIt programmes, everything can be found here - your milage may vary.

Conclusion

Most data backup concepts do not withstand reality checks. Administrators are getting lost in the wilderness. The lack of automation drives up IT costs, creates errors and technical debt.

Standardised GitOps workflows are the solution. The technology is free, the concept neither complicated nor difficult to implement.

It should be mentioned that GitOps for data backups only works if some preliminary homework has been done. This includes not migrated Exchange black boxes, missing mail gateways, missing mail archive programmes, non-isolated network segments and, unfortunately, often a lack of separation of hypervisors and the AD that is to be safeguarded. This is where Microsoft’s misconception of a HyperV cluster having to be a member of a domain becomes very apparent. The combination of this and convenience leads to the ultimate cluster fuck when the last DC is ravaged by ransomware and the whole cluster no longer boots up.

However, the most important prerequisite for GitOps is the right mind-set, the culture. Administrators must be able to look at problems systematically and solve them programmatically, free of any ideologies.

With this in mind,
Your Tomas Jakobs

Support this blog - Donate a Coffee

September 8, 2024 | 13:30Reading-Time: ca. 6 Min