I still remember the exact moment I ran docker images and saw it: 1.2GB for our main Node.js application. I stared at the terminal, thinking maybe I’d misread the units. Nope. One point two gigabytes. For a relatively simple API service that basically just shuffled data between a database and some frontend clients.
My first thought was denial. Maybe this was normal? A quick Slack message to a friend at another company confirmed what I already knew deep down: no, this was absolutely not normal. Their similar service clocked in at around 150MB. I felt that particular flavor of embarrassment that comes from realizing you’ve been doing something wrong for months and nobody noticed.
The wake-up call came during a production deployment that Friday afternoon (because of course it was Friday). Our CI/CD pipeline was crawling. Developers were complaining about slow builds. Worse, our cloud hosting bill had crept up by nearly 40% over the past quarter, and a good chunk of that was container registry storage costs. When our DevOps lead asked me to “take a look at optimizing our Docker setup,” I knew I had some serious homework ahead of me.
What I Got Wrong (And Why)
Here’s the Dockerfile I’d been using, which seemed perfectly reasonable to past me:
# The old, bloated Dockerfile
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/server.js"]
The problem wasn’t any one terrible decision. It was a bunch of small oversights that compounded. I was using the full node:18 image instead of the Alpine variant, which meant I was shipping an entire Debian installation just to run Node. I was installing all dependencies, including dev tools like TypeScript, testing frameworks, and linters, even though none of that stuff was needed at runtime. And because I’d just done COPY . . at the top, I was also bundling our entire .git directory, documentation, test files, and who knows what else.
The final image was basically my entire development environment, frozen in carbonite and shipped to production. It was wasteful, slow, and honestly a bit of a security nightmare. When I ran a vulnerability scan, it found 150 issues. Most were in dev dependencies that had no business being there in the first place.
The Learning Process (Trial and Error)
I won’t pretend I got this right on the first try. My initial attempt at a multi-stage build was overly complicated. I created like five stages because I’d convinced myself that more stages meant more optimization. It didn’t. It just meant more confusion. I also initially forgot to use npm ci instead of npm install, which meant I wasn’t getting reproducible builds. That led to a fun incident where the image built fine on my laptop but failed in CI. Classic.
The breakthrough came when I stopped trying to be clever and focused on a simple principle: separate what you need to build from what you need to run. Building requires dev dependencies, TypeScript, bundlers, all that stuff. Running requires only the compiled output and production dependencies. Once I internalized that, the solution became obvious.
Here’s what I eventually landed on after a few iterations:
# My new, optimized multi-stage Dockerfile
# Stage 1: Install production dependencies
FROM node:18-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
# Stage 2: Build the application
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 3: Create the final, lean image
FROM node:18-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]
The first stage, deps, installs only production dependencies. The second stage, builder, does the full build with all dev dependencies included. The third stage, runner, is the actual image we ship. It starts fresh from a clean Alpine base and cherry-picks only what’s needed from the previous stages: the production node_modules from deps and the compiled dist folder from builder. Everything else gets thrown away.
The Details That Made a Difference
Switching to Alpine was the easiest win. I literally just changed FROM node:18 to FROM node:18-alpine and saved 860MB instantly. Alpine is a minimal Linux distribution designed for containers, and it’s perfect for this use case. The only gotcha I ran into was that some native dependencies didn’t compile cleanly on Alpine at first, but adding python3 and make to the builder stage solved that.
The .dockerignore file was another quick win I’d completely overlooked. I created one and added the obvious stuff: node_modules, .git, *.md, test files, and our .env files. This meant those files never even made it into the build context, which sped up the initial docker build command significantly.
I also learned to be strategic about layer caching. By copying package*.json first and running npm ci before copying the rest of the source code, I ensured that the dependency installation step would be cached unless the dependencies actually changed. Since we modified code way more often than we modified dependencies, this meant faster builds in practice.
The Moment of Truth
I still remember pushing the optimized image to our staging environment and watching the deployment logs. The image pull that used to take over two minutes finished in 14 seconds. Fourteen. Seconds. I actually ran it again because I thought something was broken.
The final stats were almost unbelievable. Our production image went from 1.2GB to 120MB. That’s a 90% reduction. Deployments were faster. Our hosting costs dropped noticeably. The security scan went from 150 vulnerabilities to just 12, and those were all in the base Alpine image, not our code.
When I showed the team, our DevOps lead just laughed and said, “Well, that’s one way to fix the hosting bill.” The best part was watching deployment times during our next sprint. What used to be this tedious wait while images pulled and containers spun up became almost instantaneous. Developers could iterate faster. Rollbacks were quicker. Everything just felt snappier.
Looking back, I’m a little embarrassed it took me so long to figure this out, but I’m glad I did. Multi-stage builds aren’t some advanced Docker wizardry. They’re just a smarter way to think about what actually needs to be in your production images. If you’re still shipping monolithic single-stage builds, trust me, it’s worth an afternoon to fix it. Your CI/CD pipeline, your wallet, and your security team will thank you.