Best Practices

Key Hygiene

Rotation cadence, backend separation, post-quantum transition timing.

Key management is the quiet part of a Lattix deployment. It rarely causes an incident, but it rarely recovers gracefully from one. Hygiene here is about preventing the scenarios where recovery is painful — not about daily operational vigilance.

Rotation cadence

The platform supports any cadence; what matters is choosing one that matches your threat model and sticking with it.

Annual rotation is the floor. Even for low-sensitivity data, an annual KEK rotation is worth the operational cost — it bounds the impact of a theoretical KEK compromise to at most one year of wrapping.

Quarterly rotation is a good default for business-critical data. It aligns with most organizations' planning cycles and is frequent enough to make a compromise of historical keys materially less useful to an adversary.

Monthly rotation is appropriate for your most sensitive tier — data where a known compromise would be a notifiable event. Monthly limits the blast radius to a narrow window.

Event-triggered rotation complements scheduled rotation. A rotation triggered by a specific event (an employee with key-access privileges leaves, a suspected compromise is reported, a regulatory review starts) is independent of the schedule and additive to it.

Overlap windows

The overlap window — the time between a KEK becoming deprecated and being retired — balances operational continuity against exposure.

Short overlaps (days to a week) are right for high-sensitivity data where you want revocation to propagate quickly.

Long overlaps (months) are right for low-sensitivity data with slow-moving consumers where you don't want to risk an unavailable decryption.

Typical middle ground: 14–30 days. Long enough that most active data flows through the deprecated key before retirement; short enough that a compromise in the deprecated key has a bounded active window.

Monitor the object count still wrapped under a deprecated KEK as retirement approaches. If the count is non-zero at retirement, some objects will become permanently inaccessible — decide whether that's acceptable or whether to extend the overlap.

Backend separation

Most tenants start with a single KMS backend covering all classifications. This is fine for early rollout. As the deployment matures, consider separating:

High-sensitivity data to a dedicated backend. A separate KMS key (or a dedicated HSM) for your Restricted tier limits the blast radius of a backend compromise.

Regional data to region-scoped backends. For jurisdictional residency requirements, a region-scoped backend is often not just a best practice but a regulatory requirement.

Development and staging to a separate backend. Non-production workloads should not share the production KEK. Use a separate development KMS key, and use different tenants for staging and production entirely.

The cost of separation is operational complexity — more backends to monitor, more access to grant to the KAS, more rotation schedules to track. The value is blast-radius reduction. Organizations that have experienced a backend incident universally recommend more separation than they originally deployed.

Post-quantum transition

Post-quantum migration is the single largest key hygiene initiative of the next decade. The specifics are covered in Configuration → Encryption Profiles; the strategy is summarized here.

Inventory your long-lived data now. Any data you wrap today under classical algorithms is a candidate for eventual post-quantum migration. The cost is proportional to the volume, not the sensitivity — a petabyte of Internal-tier data is a much larger migration than a terabyte of Restricted-tier data.

Migrate by classification, high-sensitivity first. The data most at risk from harvest-now-decrypt-later is the most sensitive, longest-lived data. Move it first; move lower tiers on a follow-on schedule.

Use hybrid operation for the transition. Hybrid wrapping keeps consumers on the classical stack working while new objects get post-quantum wrapping. It doesn't have to be an atomic cutover.

Plan for re-wrapping existing data. The volume of historical data to re-wrap is often larger than the volume of new data being produced. Build the re-wrap operation into your maintenance cycle rather than trying to do it all in a migration window.

Don't wait for the certainty date. A cryptographically relevant quantum computer may arrive in 2030, 2035, or later. The date you need to have finished the migration is the earliest plausible arrival minus the retention window of your longest-lived sensitive data. For most organizations that means starting now.

Emergency revocation readiness

You want the first time you execute an emergency revocation to be a rehearsal, not a real incident.

Rehearse quarterly. Pick a non-production classification, execute a full emergency revocation, confirm the ledger events, confirm the re-wrap flow, confirm the notification pool received the alerts. Document the time each stage took.

Know the blast radius in advance. For each KEK, maintain a current count of objects wrapped under it. During an incident you'll want to know "if I revoke this, how many objects become inaccessible and how many can be re-wrapped from an alternate copy."

Have the re-wrap playbook ready. If a revocation strands a significant object count, the re-wrap operation needs to happen fast. A Mesh Node batch re-wrap against a successor KEK is the standard recovery; have the procedure documented, the principal authorized, and the backend capacity reserved.

Know your contact tree. An emergency key rotation is an event your security operations, leadership, and (potentially) legal teams all need to know about. The contact tree shouldn't be reconstructed during the incident.

What doesn't need frequent attention

Some things are worth doing right once and then leaving alone:

  • The algorithm choice. If your encryption profile specifies hybrid or post-quantum, don't change it frequently. Every change is a migration.
  • The KAS deployment topology. Changes here affect latency and availability for the entire tenant. Treat it as platform work, not routine administration.
  • The rotation cadence itself. Changing from quarterly to monthly should be a deliberate policy shift, not a casual adjustment.

Stability in the cadence and topology makes the rest of the operational work easier.