Storage that lasts: ten-year archive strategy

Most storage strategies optimize for the next two or three years. Data you need to keep for a decade is a different problem. Format obsolescence, vendor changes, and access drift all conspire against long-term retrievability.

What changes over a decade

  • The application that wrote the data may not exist anymore.
  • The file format may not be readable by current tools.
  • The encryption keys may have rotated several times. Where are the old ones?
  • The storage vendor may have been acquired, deprecated their pricing tier, or exited the market.
  • The access mechanism may have changed. Authentication, APIs, all of it.

Strategy components

  1. Open formats. PDF/A for documents, parquet or CSV for structured data, standard image and video codecs.
  2. Multiple geographic copies. One vendor failing is not an extinction event.
  3. A retrieval test, annually. If you cannot retrieve a file at year three, you cannot retrieve it at year ten.
  4. Documentation alongside the data. What it is, who wrote it, what it depends on, what tools read it.

Related posts.