Deploying Node.js to production is more than git push. This checklist covers the critical steps teams skip that cause outages, data loss, and security incidents in the first 90 days post-launch.
1. Environment Configuration
Never hardcode secrets. Use environment variables for every credential, API key, connection string, and feature flag.
# .env (never commit this)
DATABASE_URL=postgresql://user:password@host:5432/dbname
SESSION_SECRET=a-long-random-string-at-least-32-chars
STRIPE_SECRET_KEY=sk_live_...
Use a validation library at startup to fail fast on missing config:
import { z } from "zod";
const env = z.object({
NODE_ENV: z.enum(["development", "test", "production"]),
PORT: z.coerce.number().default(3000),
DATABASE_URL: z.string().url(),
SESSION_SECRET: z.string().min(32),
}).parse(process.env);
export default env;
If the app starts without required env vars, you get a clear error at boot time rather than a cryptic runtime failure.
2. Process Management
Use a process manager — never run node server.js directly in production. Raw node processes die on uncaught exceptions and don’t restart automatically.
PM2 (VPS/bare metal)
pnpm add -g pm2
# ecosystem.config.cjs
module.exports = {
apps: [{
name: "api",
script: "dist/server.js",
instances: "max", // cluster mode: one worker per CPU
exec_mode: "cluster",
max_memory_restart: "512M",
env_production: {
NODE_ENV: "production",
PORT: 3000
}
}]
};
pm2 start ecosystem.config.cjs --env production
pm2 save
pm2 startup # generate OS startup command
Docker + container orchestration
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build
FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json .
EXPOSE 3000
USER node
CMD ["node", "dist/server.js"]
3. Health Checks
Every Node.js service needs a /health endpoint that load balancers and orchestrators can poll:
app.get("/health", async (req, res) => {
try {
await db.raw("SELECT 1"); // verify DB connectivity
res.json({ status: "ok", uptime: process.uptime() });
} catch (err) {
res.status(503).json({ status: "error", message: "database unreachable" });
}
});
Configure your load balancer or Kubernetes liveness probe to call this endpoint every 10-30 seconds and remove unhealthy instances automatically.
4. Structured Logging
Avoid console.log in production. Use a structured logger that outputs JSON for easy querying in log aggregation tools.
pnpm add pino pino-pretty
import pino from "pino";
export const logger = pino({
level: process.env.LOG_LEVEL ?? "info",
...(process.env.NODE_ENV !== "production" && {
transport: { target: "pino-pretty" }
})
});
// Usage
logger.info({ userId: req.user.id, action: "login" }, "User authenticated");
logger.error({ err, requestId }, "Unhandled error in payment webhook");
Ship logs to a centralized service (Datadog, Logtail, Papertrail, CloudWatch). Never rely on disk logs on ephemeral containers.
5. Error Handling
Handle errors at every boundary:
// Async route handler wrapper — prevents uncaught rejections
const asyncHandler = (fn: RequestHandler): RequestHandler =>
(req, res, next) => Promise.resolve(fn(req, res, next)).catch(next);
// Global error handler
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
logger.error({ err, url: req.url, method: req.method }, "Request failed");
if (res.headersSent) return next(err);
const status = err instanceof HttpError ? err.status : 500;
res.status(status).json({
error: process.env.NODE_ENV === "production" ? "Internal server error" : err.message
});
});
// Process-level safety net
process.on("uncaughtException", (err) => {
logger.fatal({ err }, "Uncaught exception — shutting down");
process.exit(1);
});
process.on("unhandledRejection", (reason) => {
logger.fatal({ reason }, "Unhandled rejection — shutting down");
process.exit(1);
});
6. Graceful Shutdown
Handle SIGTERM so in-flight requests complete before the process exits:
const server = app.listen(env.PORT);
const shutdown = async (signal: string) => {
logger.info({ signal }, "Shutdown signal received");
server.close(async () => {
await db.destroy(); // close DB pool
logger.info("Server closed cleanly");
process.exit(0);
});
// Force exit after 30s if cleanup hangs
setTimeout(() => process.exit(1), 30_000);
};
process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT", () => shutdown("SIGINT"));
Kubernetes sends SIGTERM before killing a pod. Without this, connections are dropped mid-request.
7. Security Hardening
pnpm add helmet compression express-rate-limit
import helmet from "helmet";
import compression from "compression";
import rateLimit from "express-rate-limit";
app.use(helmet()); // sets secure HTTP headers
app.use(compression()); // gzip responses
app.use("/api", rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100,
standardHeaders: true,
legacyHeaders: false
}));
Run pnpm audit in CI and block on high/critical vulnerabilities:
# .github/workflows/security.yml
- name: Audit dependencies
run: pnpm audit --audit-level=high
8. Database Connection Pooling
Never create a new DB connection per request. Configure your connection pool for your instance size:
// Postgres with pg (node-postgres)
import { Pool } from "pg";
export const pool = new Pool({
connectionString: env.DATABASE_URL,
max: 20, // max connections in pool
idleTimeoutMillis: 30_000,
connectionTimeoutMillis: 2_000,
ssl: env.NODE_ENV === "production" ? { rejectUnauthorized: false } : false
});
// Prisma — set pool_timeout in DATABASE_URL
// postgresql://user:pass@host/db?connection_limit=10&pool_timeout=10
9. Zero-Downtime Deployment
With PM2 cluster mode
pm2 reload ecosystem.config.cjs --env production
PM2 restarts workers one at a time while keeping others alive.
With Docker / Kubernetes
Use rolling updates — Kubernetes default:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # never reduce capacity below desired
Blue-green on Cloudflare / Railway / Fly.io
These platforms handle zero-downtime deploys automatically on git push. No extra config needed.
10. Pre-deploy Checklist
Before every production release:
-
pnpm auditpasses (no high/critical) - TypeScript compiles without errors (
tsc --noEmit) - All tests pass (
pnpm test) - Database migrations have been reviewed and are reversible
- New env vars are documented and set in production
-
NODE_ENV=productionis set - Health endpoint returns 200 after deploy
- Error tracking (Sentry) shows no new issues after deploy
- Key metrics (request latency, error rate) look normal in dashboards
Running through this list before every deploy catches the most common causes of post-deploy incidents.