Backup data system

3-2-1 Rule

The age old 3-2-1 rule of digital storage states that you should have:

  • 3 copies of your data: having at least three copies of your data, one on-site (in the canonical location), and two backups.
  • 2 backup copies should be on different types of storage media: this just says you should diversify the type of device storing your data, reducing the likelihood of simultaneous or similar reasons for failure.
  • 1 backup copy offsite: at least one of the backups should be offsite, ensuring reliability in the case location is comprised (e.g. weather disaster).

Simply put, the rule is to encourage redundancy across storage type and location. So having a proper local RAID array, some external drive

S.M.A.R.T. Test

SMART disk checks serve to show some indication of drive’s status. This Wikipedia article has a good intro, and this SuperUser question has a good way to interpret results.

Backup vs redundancy

Note that data backup is different than data redundancy. Having some sort of RAID setup is insurance against a single drive failure. A backup is instead the process of securing a good copy of the data from a previous point in time that one can restore. This StackOverflow article states it well:

Backup and hardware redundancy are two completely separate disciplines with different purposes. Comparing RAID with Backup is like comparing dual tires on a truck with a spare tire.

I will say this does seem a bit off to me, though; with a RAID 1 setup your data are copied to both disks, so in many ways it feels very much like a spare tire to me.

Backup vs archive

You can also draw up a difference between the terms “archive” and “backup”. Archiving files typically refers to the act of storing older files no longer in use for long periods of time. If needed at some point in the future, these files will be accessible from the archive while almost certainly no longer being present on any current/main system. Backups, on the other hand, are generally referred to as full copies of relevant, current data made frequently (daily/weekly) to protect against the instance of near-term data loss. That is, with backups you will ensure you have one or more recent copies of your primary data that can be used to quickly restore corrupted/lost files on your main systems.

All in all, backups provide you with one or more copies of your primary data to help protect against data loss. Archiving data is the process of moving older data into long-term storage for infrequent use.

An even more detailed description of these two terms, along with useful examples, can be found in this article, along with Reddit discussion here.

Personal approach

Current storage breakdown

  • Drives:
    • Data partition (of 1TB HDD): full @ 250GB
    • Old windows partition (of 1TB HDD): full @ 750GB
    • linuxgames partition (of 500GB SSD): full @ 85GB
    • media partition (of 500GB SSD): full @ 413GB
    • OS partitions (of 250GB SSD): mostly full with ~10GB of OS space to spare
  • Aggregated use:
    • Games: ~220GB (some installs taking up space on main Windows partition like RL and GTA)
    • Media: ~650GB (across partitions)
    • Data: ~350GB (across partitions)

New storage breakdown

With newly installed HDDs, we have a total of 5.75TB of storage on the main PC. This, along with 4TB of external backups, and as large of cloud storage backups as desired. The breakdown of PC storage is as follows:

  • 250GB SSD: main OS drive, holding core Windows and Ubuntu files. This is the disk we boot to, and can hold some basic immediately useful files for everyday OS use.
  • 500GB SSD: primary game drive, holds BF5, Rocket league, Fall guys, etc. For now, the cumulative total of games installed is about 210GB. With something like the free COD, this could be pushed up to 310GB.
  • 1TB HDD: currently split into two partitions, one 250GB data partition and the other 750GB arbitrary storage for stray media files and the old Windows FS. A lot of used storage can be cleared here by removing the old file system (keeping relevant documents) and large apps. This could be repurposed as an on-PC archival drive, temporarily pushed to the 4TB external drive and backed up to the cloud.
  • 4TB HDD: new drive, intended to provide primary data and media partitions. With about 500GB of media floating around and about 260GB in data

So the total new uses are as follows:

  • 250GB SSD: operating systems, local files
  • 500GB SSD: games
  • 1TB (old) HDD: archive of key files, backups
  • 4TB (new) HDD:
    • 1.35TB partition: data (datasets, large files like ISOs, location for big databases)
    • 2.35TB partition: media (movies, TV shows, personal pictures/videos, etc)

iPhone data

  • Pictures/videos
  • Messages
  • Notes
  • Contacts

New HDD setup (Linux)

  • Open gparted, select the drive after the scan completes.
  • Create a partition table for the device (GPT) under Device > Create partition table. You should now have your drive as unallocated space.
  • Create any new partitions as desired. Here you mostly just select the size of the partition, choose a name, select ext4 for most use cases within Linux, and apply.

Data backup process

With everything in mind here, the actual data storage practice aims to redundantly implement proper backups and archives. With access to the aforementioned physical, local drive capacity, as well as BackBlaze B2 cloud storage. Following along with the 3-2-1 rule, we are trying the following backup strategy:

  • Primary Documents data copy is available across three devices by default (desktop, laptop, cloud server), synced continuously via Nextcloud. Should I lose any one copy of my data on any one device, it’s very likely I will be able to recover using a copy available on another.
  • A copy of my Documents data is zipped and placed on my external HDD, remaining unplugged and in a safe location away from my primary desktop operating space. This serves as a copy of my primary files through time, in the case I desperately need an old file version or lose a primary copy.
  • The same backup files copied to my external HDD are copied to my BackBlaze B2 bucket. This just provides the extra redundancy of having a completely off-site storage option.

This strategy is likely overkill: my data is consistently available in five different locations, three of which are primary/working copies, and the other two regular backups. This should more than cover my risk tolerance. However, this is also only addressing my main Document data, and not some of my larger media files or datasets (that I could live without e.g. are not nearly as critical as the other documents). That said, I’d still like to have a plan for these files, although it will have to be slightly different since I cannot reasonably copy these files across my primary devices. For these (much larger) files, we are employing the same backup strategy as above:

  • Store a singular copy on the external HDD. This can be done monthly (or slower), overwriting the existing copy on the external disk. These data don’t need backup copies through time since they are not changing nor are part of my regular working system.
  • Optionally these data can be copied in a similar fashion up to BackBlaze, having an off-site copy as well. Since these files are much larger, it needs to be decided if the cost is worth having this extra backup, paying monthly for possibly multiple terabytes of data on the cloud.

In terms of an archival process, within the scope of the personal system the backup copies will likely do a fine job of serving as an archive. That is, if I need long-term access to a set of files that once existed in my primary file system but I’ve since deleted for irrelevance, I can recover those files by looking far enough back into my backup copies and effectively retrieve an “archived” copy. I know there is a distinct difference between these two things, but I think it is much more important for an industrial setting with vastly greater amounts of data.

Alternatively, I’m also going to use the following procedure for directly addressing an archive setup:

  • Having a single, global Documents folder that holds all files where they would belong (i.e. in their respective subfolders) across time. As files come in and out of the primary working copy of my Documents data, they are copied to the archive Documents but not ever deleted. This practice hopes to avoid the confusion and struggle of trying to search an “archive” via old backups where you may not know the date you’re looking for and in what time period that file may have existed. The archive has a single copy (perhaps the most recent one meaningfully made within the working directory) of that file in the expected location for easy recovery across time periods. This copy will be made on the external HDD.

This is the current backup and archive process for the time being. After some use, I will try to iterate and improve on areas of weakness and add things as I see fit. I’m sure in the long-term my immediate storage capacity will change drastically, and re-organization will need to happen at that time.