Should I use PM2 or Docker for Node.js in production?

Use Docker if you need portable, reproducible environments and are deploying to Kubernetes or container platforms. Use PM2 if you're on a raw VPS and want lightweight process management with clustering. Many teams run PM2 inside Docker containers for both process management and containerization benefits.

How many Node.js cluster workers should I spin up?

Start with os.cpus().length - 1 workers to leave one core for the OS and monitoring agents. For I/O-bound workloads (most HTTP APIs), even 2-4 workers significantly improve throughput. Profile under realistic load before over-provisioning.

What is the best way to handle uncaught exceptions in Node.js production?

Log the error with full stack trace, then exit the process and let your process manager restart it. Trying to continue after an uncaught exception leaves the process in an unknown state. Use process.on('uncaughtException') only to log, then call process.exit(1).

Node.js Production Deployment Checklist (2025)

Deploying Node.js to production is more than git push. This checklist covers the critical steps teams skip that cause outages, data loss, and security incidents in the first 90 days post-launch.

1. Environment Configuration

Never hardcode secrets. Use environment variables for every credential, API key, connection string, and feature flag.

# .env (never commit this)
DATABASE_URL=postgresql://user:password@host:5432/dbname
SESSION_SECRET=a-long-random-string-at-least-32-chars
STRIPE_SECRET_KEY=sk_live_...

Use a validation library at startup to fail fast on missing config:

import { z } from "zod";

const env = z.object({
  NODE_ENV: z.enum(["development", "test", "production"]),
  PORT: z.coerce.number().default(3000),
  DATABASE_URL: z.string().url(),
  SESSION_SECRET: z.string().min(32),
}).parse(process.env);

export default env;

If the app starts without required env vars, you get a clear error at boot time rather than a cryptic runtime failure.

2. Process Management

Use a process manager — never run node server.js directly in production. Raw node processes die on uncaught exceptions and don’t restart automatically.

PM2 (VPS/bare metal)

pnpm add -g pm2

# ecosystem.config.cjs
module.exports = {
  apps: [{
    name: "api",
    script: "dist/server.js",
    instances: "max",        // cluster mode: one worker per CPU
    exec_mode: "cluster",
    max_memory_restart: "512M",
    env_production: {
      NODE_ENV: "production",
      PORT: 3000
    }
  }]
};

pm2 start ecosystem.config.cjs --env production
pm2 save
pm2 startup  # generate OS startup command

Docker + container orchestration

FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json pnpm-lock.yaml ./
RUN corepack enable && pnpm install --frozen-lockfile
COPY . .
RUN pnpm build

FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json .
EXPOSE 3000
USER node
CMD ["node", "dist/server.js"]

3. Health Checks

Every Node.js service needs a /health endpoint that load balancers and orchestrators can poll:

app.get("/health", async (req, res) => {
  try {
    await db.raw("SELECT 1"); // verify DB connectivity
    res.json({ status: "ok", uptime: process.uptime() });
  } catch (err) {
    res.status(503).json({ status: "error", message: "database unreachable" });
  }
});

Configure your load balancer or Kubernetes liveness probe to call this endpoint every 10-30 seconds and remove unhealthy instances automatically.

4. Structured Logging

Avoid console.log in production. Use a structured logger that outputs JSON for easy querying in log aggregation tools.

pnpm add pino pino-pretty

import pino from "pino";

export const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  ...(process.env.NODE_ENV !== "production" && {
    transport: { target: "pino-pretty" }
  })
});

// Usage
logger.info({ userId: req.user.id, action: "login" }, "User authenticated");
logger.error({ err, requestId }, "Unhandled error in payment webhook");

Ship logs to a centralized service (Datadog, Logtail, Papertrail, CloudWatch). Never rely on disk logs on ephemeral containers.

5. Error Handling

Handle errors at every boundary:

// Async route handler wrapper — prevents uncaught rejections
const asyncHandler = (fn: RequestHandler): RequestHandler =>
  (req, res, next) => Promise.resolve(fn(req, res, next)).catch(next);

// Global error handler
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
  logger.error({ err, url: req.url, method: req.method }, "Request failed");

  if (res.headersSent) return next(err);

  const status = err instanceof HttpError ? err.status : 500;
  res.status(status).json({
    error: process.env.NODE_ENV === "production" ? "Internal server error" : err.message
  });
});

// Process-level safety net
process.on("uncaughtException", (err) => {
  logger.fatal({ err }, "Uncaught exception — shutting down");
  process.exit(1);
});

process.on("unhandledRejection", (reason) => {
  logger.fatal({ reason }, "Unhandled rejection — shutting down");
  process.exit(1);
});

6. Graceful Shutdown

Handle SIGTERM so in-flight requests complete before the process exits:

const server = app.listen(env.PORT);

const shutdown = async (signal: string) => {
  logger.info({ signal }, "Shutdown signal received");

  server.close(async () => {
    await db.destroy();  // close DB pool
    logger.info("Server closed cleanly");
    process.exit(0);
  });

  // Force exit after 30s if cleanup hangs
  setTimeout(() => process.exit(1), 30_000);
};

process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT", () => shutdown("SIGINT"));

Kubernetes sends SIGTERM before killing a pod. Without this, connections are dropped mid-request.

7. Security Hardening

pnpm add helmet compression express-rate-limit

import helmet from "helmet";
import compression from "compression";
import rateLimit from "express-rate-limit";

app.use(helmet());           // sets secure HTTP headers
app.use(compression());      // gzip responses

app.use("/api", rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,
  standardHeaders: true,
  legacyHeaders: false
}));

Run pnpm audit in CI and block on high/critical vulnerabilities:

# .github/workflows/security.yml
- name: Audit dependencies
  run: pnpm audit --audit-level=high

8. Database Connection Pooling

Never create a new DB connection per request. Configure your connection pool for your instance size:

// Postgres with pg (node-postgres)
import { Pool } from "pg";

export const pool = new Pool({
  connectionString: env.DATABASE_URL,
  max: 20,              // max connections in pool
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,
  ssl: env.NODE_ENV === "production" ? { rejectUnauthorized: false } : false
});

// Prisma — set pool_timeout in DATABASE_URL
// postgresql://user:pass@host/db?connection_limit=10&pool_timeout=10

9. Zero-Downtime Deployment

With PM2 cluster mode

pm2 reload ecosystem.config.cjs --env production

PM2 restarts workers one at a time while keeping others alive.

With Docker / Kubernetes

Use rolling updates — Kubernetes default:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0  # never reduce capacity below desired

Blue-green on Cloudflare / Railway / Fly.io

These platforms handle zero-downtime deploys automatically on git push. No extra config needed.

10. Pre-deploy Checklist

Before every production release:

pnpm audit passes (no high/critical)
TypeScript compiles without errors (tsc --noEmit)
All tests pass (pnpm test)
Database migrations have been reviewed and are reversible
New env vars are documented and set in production
NODE_ENV=production is set
Health endpoint returns 200 after deploy
Error tracking (Sentry) shows no new issues after deploy
Key metrics (request latency, error rate) look normal in dashboards

Running through this list before every deploy catches the most common causes of post-deploy incidents.