How I Migrated 159 Legacy VMs to GCP with Near-Zero Downtime

Jul 22, 2022

I was tasked with one of the most daunting projects of my career: migrating 159 legacy virtual machines off of aging hardware and onto Google Cloud. These weren’t just any VMs; they were running critical business applications on unsupported operating systems like Windows Server 2008 R2. We were facing major security vulnerabilities and compliance risks. I led a project that not only migrated all 159 VMs with an average of just 28 seconds of downtime per machine but also paved the way for modernization that cut our costs by 46%. Here’s how I did it.

The Challenge I Faced

The state of our legacy infrastructure was a ticking time bomb. We had 159 VMs running end-of-life operating systems, making them ineligible for security patches. Our data center costs were high, we couldn’t scale, and we had no real disaster recovery plan. My challenge was to move everything to the cloud without disrupting the business, which relied on these legacy applications.

My Migration Strategy: Rackware CDC

A manual “lift and shift” was out of the question; the downtime would have been weeks. After evaluating several options, I chose Rackware’s RMM (RackWare Management Module). Its key feature for me was live Change Data Capture (CDC). This allowed me to create a replica of a running VM in GCP and keep it continuously in sync, with a lag of only a few minutes. This meant I could perform the final cutover with near-zero downtime.

graph LR
    OnPrem["On-Prem Legacy VMs"] -->|Rackware CDC| GCP[Compute Engine Replicas]
    GCP -->|Cutover| Modernized["Modernized & Live"]

    style OnPrem fill:#ff6b6b,color:#fff
    style GCP fill:#fff4e6
    style Modernized fill:#51cf66,color:#fff

How I Executed the Migration

I broke the project down into four distinct phases.

Phase 1: Discovery and Planning

My first two weeks were spent using Rackware’s discovery tools to create a complete inventory of all 159 VMs and, more importantly, to map their dependencies. This dependency map was the single most critical document for planning the migration. Based on this map, I planned a 4-wave migration.

graph TD
    Wave1["Wave 1: Identity (AD, DNS)"]
    Wave2["Wave 2: Databases (SQL Server)"]
    Wave3["Wave 3: Applications (Web & Business Apps)"]
    Wave4["Wave 4: Supporting Services (File Servers, etc.)"]

    Wave1 --> Wave2 --> Wave3 --> Wave4

Phase 2: The Pilot Migration

I started with a pilot migration of 5 non-critical VMs. This allowed me to test my entire process, from replication to cutover, and identify issues (like firewall rules and DNS settings) in a low-risk environment.

Phase 3: The Migration Waves

With a successful pilot, we began the 7-week migration process, wave by wave. For each VM, Rackware would perform the initial replication, and then keep the GCP replica in sync with the on-prem VM using its CDC technology.

Phase 4: The Automated Cutover

For each VM, the final cutover was managed by an automated script I wrote. The script would:

  1. Verify the CDC replication lag was near zero.
  2. Gracefully shut down the source VM on-premises.
  3. Perform one final, delta sync.
  4. Power on the new VM in GCP.
  5. Run a series of health checks on the new VM.
  6. Update the DNS records to point to the new GCP IP address.

This automated process is what allowed us to achieve an average cutover downtime of just 28 seconds per VM.

The Results

The project was a huge success. We met our tight schedule and achieved our primary goal of getting off the legacy hardware with minimal disruption.

My Key Lessons Learned

El Muhammad's Portfolio

© 2025 Aria

Instagram YouTube TikTok 𝕏 GitHub