Cloud
Storage that lasts: ten-year archive strategy
Most storage strategies optimize for the next two or three years. Data you need to keep for a decade is a different problem. Format obsolescence, vendor changes, and access drift all conspire against long-term retrievability.
What changes over a decade
- The application that wrote the data may not exist anymore.
- The file format may not be readable by current tools.
- The encryption keys may have rotated several times. Where are the old ones?
- The storage vendor may have been acquired, deprecated their pricing tier, or exited the market.
- The access mechanism may have changed. Authentication, APIs, all of it.
Strategy components
- Open formats. PDF/A for documents, parquet or CSV for structured data, standard image and video codecs.
- Multiple geographic copies. One vendor failing is not an extinction event.
- A retrieval test, annually. If you cannot retrieve a file at year three, you cannot retrieve it at year ten.
- Documentation alongside the data. What it is, who wrote it, what it depends on, what tools read it.