Docker Interview Prep

A comprehensive collection of Docker interview questions and answers, ranging from fundamentals to production-grade best practices.

Topics covered:

  • Core concepts (containers vs VMs, architecture)
  • Dockerfile and image building
  • Container lifecycle and management
  • Networking (bridge, host, overlay)
  • Storage (volumes, bind mounts)
  • Docker Compose
  • Security best practices
  • Orchestration and production patterns

Q: What is the difference between Containers and Virtual Machines?

Answer:

This is the most common Docker interview opener. Both are technologies for isolating applications, but they work at fundamentally different levels.

Virtual Machines (VMs)

A VM runs a complete guest Operating System on top of a hypervisor (e.g., VMware, VirtualBox, KVM). Each VM includes its own kernel, system libraries, and binaries.

Containers

A container shares the host machine's OS kernel and isolates only the application's user-space processes using Linux kernel features like namespaces (process isolation) and cgroups (resource limits).

Key Differences

FeatureContainerVirtual Machine
Isolation levelProcess-level (shares host kernel)Hardware-level (full guest OS)
Startup timeMillisecondsMinutes
SizeMegabytes (just the app + deps)Gigabytes (full OS image)
PerformanceNear-native (no hypervisor overhead)Slower (hardware emulation layer)
DensityRun hundreds on a single hostRun tens on a single host
OS supportLinux containers on Linux host only*Any OS on any host
SecurityWeaker isolation (shared kernel)Stronger isolation (separate kernels)

[!NOTE] *Docker Desktop on macOS/Windows actually runs a lightweight Linux VM under the hood (using HyperKit or WSL2) to provide the Linux kernel that containers need.

When to Use Which?

  • Containers: Microservices, CI/CD pipelines, dev environments, anything where speed and density matter.
  • VMs: When you need full OS-level isolation (e.g., running Windows apps alongside Linux), or when security boundaries are critical (multi-tenant hosting).

Q: Explain the Docker Architecture.

Answer:

Docker uses a client-server architecture with three main components:

1. Docker Client (docker CLI)

The command-line interface that you interact with. When you run a command like docker run, the client sends it as an API request to the Docker daemon. The client can communicate with the daemon locally (via a Unix socket) or remotely (via TCP).

2. Docker Daemon (dockerd)

The background service (server) that does all the heavy lifting. It manages:

  • Building images
  • Running containers
  • Pulling/pushing images from registries
  • Managing networks and volumes

The daemon exposes a REST API that the client talks to.

3. Docker Registry (e.g., Docker Hub)

A storage and distribution system for Docker images. When you docker pull nginx, the daemon fetches the image from Docker Hub (the default public registry). Companies also run private registries (e.g., AWS ECR, GCR, Harbor).

How They Work Together

┌──────────────┐       REST API       ┌──────────────────┐
│ Docker Client │ ──────────────────▶  │  Docker Daemon   │
│  (docker CLI) │                      │   (dockerd)      │
└──────────────┘                      │                  │
                                       │  ┌────────────┐ │
                                       │  │ Containers  │ │
                                       │  ├────────────┤ │
                                       │  │   Images    │ │
                                       │  ├────────────┤ │
                                       │  │  Volumes    │ │
                                       │  ├────────────┤ │
                                       │  │  Networks   │ │
                                       │  └────────────┘ │
                                       └────────┬─────────┘
                                                │
                                       ┌────────▼─────────┐
                                       │  Docker Registry  │
                                       │  (Docker Hub,     │
                                       │   ECR, GCR, etc.) │
                                       └──────────────────┘

Under the Hood: containerd & runc

The Docker daemon doesn't actually run containers directly. It delegates to:

  1. containerd: A high-level container runtime that manages the full container lifecycle (image transfer, storage, execution).
  2. runc: A low-level OCI-compliant runtime that actually creates and runs containers using Linux kernel features (namespaces, cgroups).

[!TIP] When discussing Docker architecture in interviews, mentioning containerd and runc shows a deeper understanding. Kubernetes, for example, talks directly to containerd (not Docker) since Docker was deprecated as a Kubernetes runtime in v1.24.

Q: What is the difference between a Docker Image and a Container?

Answer:

This is a deceptively simple but critical distinction:

Docker Image

An image is a read-only template that contains the application code, runtime, libraries, environment variables, and configuration files needed to run an application. Think of it as a class in OOP or a blueprint for a house.

Images are built in layers. Each instruction in a Dockerfile (e.g., RUN, COPY, ADD) creates a new layer. Layers are stacked on top of each other and are cached, which makes rebuilds extremely fast.

Docker Container

A container is a running instance of an image. Think of it as an object instantiated from a class, or a house built from a blueprint. You can create multiple containers from the same image.

When a container starts, Docker adds a thin writable layer on top of the read-only image layers. All file changes (new files, modifications, deletions) happen in this writable layer.

Analogy

Image  = Class definition (immutable blueprint)
Container = Object/Instance (running process with mutable state)

One Image → Many Containers (just like one Class → many Objects)

Key Differences

FeatureImageContainer
StateImmutable (read-only)Mutable (has a writable layer)
Stored asLayers on diskRunning process + writable layer
Created bydocker build or docker pulldocker run or docker create
PersistencePersists until explicitly deletedEphemeral by default (data lost on removal)
SharingPushed to registriesCannot be pushed (must be docker commit'd into an image first)

Common Follow-Up: What is docker commit?

You can take a running container's writable layer and freeze it into a new image:

docker commit <container_id> my-custom-image:v1

[!CAUTION] Using docker commit in production is considered bad practice. Always use a Dockerfile for reproducible, version-controlled image builds.

Q: What is the difference between CMD and ENTRYPOINT in a Dockerfile?

Answer:

Both CMD and ENTRYPOINT define what command runs when a container starts, but they behave very differently when users pass arguments at runtime.

CMD — The Default Command (Easily Overridden)

CMD sets the default command and/or arguments for the container. However, it is completely replaced if the user provides a command when running the container.

FROM ubuntu
CMD ["echo", "Hello from CMD"]
docker run myimage
# Output: Hello from CMD

docker run myimage echo "I replaced CMD"
# Output: I replaced CMD  (CMD was completely overridden)

ENTRYPOINT — The Fixed Executable (Not Easily Overridden)

ENTRYPOINT sets the main executable for the container. User-provided arguments are appended to the entrypoint, not used to replace it.

FROM ubuntu
ENTRYPOINT ["echo", "Hello from"]
docker run myimage
# Output: Hello from

docker run myimage "Docker World"
# Output: Hello from Docker World  (argument was appended)

The Power Combo: ENTRYPOINT + CMD

The most common production pattern is using them together. ENTRYPOINT defines the fixed executable, and CMD provides default arguments that can be overridden.

FROM python:3.11-slim
ENTRYPOINT ["python"]
CMD ["app.py"]
docker run myimage
# Runs: python app.py (default)

docker run myimage test.py
# Runs: python test.py (CMD overridden, ENTRYPOINT kept)

Summary

FeatureCMDENTRYPOINT
PurposeDefault command/argsFixed executable
Override behaviorCompletely replaced by docker run argsArgs are appended to it
Best forDefault argumentsThe main process

[!TIP] To override ENTRYPOINT at runtime, you must explicitly use the --entrypoint flag: docker run --entrypoint /bin/bash myimage

Q: What is the difference between COPY and ADD in a Dockerfile?

Answer:

Both instructions copy files from the build context into the image, but ADD has extra (often unwanted) functionality.

COPY — Simple File Copy

Does exactly one thing: copies files or directories from the build context into the image filesystem. It's transparent and predictable.

COPY requirements.txt /app/
COPY src/ /app/src/

ADD — Copy with Extras

ADD does everything COPY does, plus two additional features:

  1. Auto-extracts compressed archives (.tar, .tar.gz, .tgz, .bz2, .xz) into the destination directory.
  2. Fetches files from remote URLs (like wget).
# Auto-extracts the tarball into /app/
ADD app.tar.gz /app/

# Downloads a file from the internet
ADD https://example.com/config.json /etc/app/config.json

Why You Should Almost Always Use COPY

[!WARNING] The Docker official best practices guide explicitly recommends using COPY over ADD in almost all cases.

Reasons:

  1. Predictability: COPY has no hidden side effects. With ADD, a developer might not realize their .tar.gz file will be auto-extracted. If you want the archive as-is (e.g., to extract it manually later), ADD will silently break your intent.
  2. Security: ADD from a URL does not verify SSL certificates and doesn't support authentication. Use RUN curl or RUN wget instead for better control.
  3. Cache invalidation: Remote URL fetches with ADD can cause unpredictable cache behavior since Docker cannot know if the remote file has changed.

Rule of Thumb

  • Use COPY for all local file copies (99% of cases).
  • Use ADD only when you explicitly need tar auto-extraction.
  • Use RUN curl or RUN wget for downloading remote files.

Q: What are Multi-Stage Builds and why are they important?

Answer:

Multi-stage builds allow you to use multiple FROM statements in a single Dockerfile. Each FROM starts a new "stage" of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

The Problem Without Multi-Stage Builds

In a typical build, you need compilers, build tools, and dev dependencies to compile your application. If you use a single stage, all of those tools end up in your final production image, making it bloated and insecure.

# ❌ Single-stage: Final image includes Go compiler, source code, build tools
FROM golang:1.21
WORKDIR /app
COPY . .
RUN go build -o myapp
CMD ["./myapp"]
# Final image size: ~800MB (includes entire Go toolchain!)

The Solution: Multi-Stage Build

# Stage 1: Build (named "builder")
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp

# Stage 2: Production (tiny final image)
FROM alpine:3.18
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["./myapp"]
# Final image size: ~15MB (just the binary + Alpine!)

How It Works

  1. Stage 1 ("builder"): Uses the full golang image (800MB+) to compile the Go binary.
  2. Stage 2: Starts fresh from a tiny alpine image (5MB) and copies only the compiled binary from the builder stage using COPY --from=builder.
  3. The final image contains nothing from stage 1 except the single file you explicitly copied.

Real-World Node.js Example

# Stage 1: Install dependencies and build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]

Benefits

  • Dramatically smaller images (often 10-50x reduction).
  • Better security (no compilers, source code, or dev tools in production).
  • Faster deployments (smaller images push/pull faster).
  • Single Dockerfile (no need for separate Dockerfile.dev and Dockerfile.prod).

[!TIP] You can also copy from external images without defining them as a stage: COPY --from=nginx:latest /etc/nginx/nginx.conf /etc/nginx/

Q: How does Docker Layer Caching work? How do you optimize a Dockerfile?

Answer:

Understanding layer caching is essential for building images fast and keeping them small.

How Layers Work

Every instruction in a Dockerfile (FROM, RUN, COPY, ADD, ENV, etc.) creates a new layer. Layers are stacked, read-only, and cached. When you rebuild an image, Docker checks each instruction:

  • If the instruction and its inputs haven't changed, Docker reuses the cached layer (instant).
  • If anything has changed, Docker invalidates that layer and all layers after it (the cache "busts").

The Cache Busting Problem

# ❌ Bad order: Cache busts on EVERY code change
FROM node:20-alpine
WORKDIR /app
COPY . .                    # Any code change invalidates THIS layer
RUN npm install             # ...which forces this to re-run (slow!)
CMD ["node", "index.js"]

Every time you change a single line of code, COPY . . changes, which invalidates the cache for npm install. You end up reinstalling all dependencies from scratch on every build.

The Fix: Order by Change Frequency

# ✅ Good order: Dependencies cached separately from code
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./   # Changes rarely
RUN npm ci                               # Cached unless package.json changes
COPY . .                                 # Code changes only bust THIS layer
CMD ["node", "index.js"]

Now, if you only change application code, Docker reuses the cached npm ci layer and only re-runs COPY . . — saving minutes on every build.

Optimization Best Practices

1. Use .dockerignore Just like .gitignore, a .dockerignore file prevents unnecessary files from being sent to the Docker build context.

node_modules
.git
*.md
.env
dist

2. Combine RUN commands Each RUN instruction creates a new layer. Combine related commands to reduce layer count and image size.

# ❌ Bad: 3 layers (including cached apt lists)
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# ✅ Good: 1 layer, cleanup in same step
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

3. Use specific base image tags

# ❌ Bad: `latest` changes unpredictably, breaks cache
FROM node:latest

# ✅ Good: Pinned version, reproducible
FROM node:20.11-alpine3.18

4. Use multi-stage builds (see the previous chapter).

5. Prefer Alpine or Distroless base images

  • node:20 → ~900MB
  • node:20-alpine → ~130MB
  • node:20-slim → ~200MB

Q: What are the different Container States and the Container Lifecycle?

Answer:

A Docker container goes through several states during its lifecycle. Understanding these is important for debugging and orchestration.

Container States

docker create     docker start     (process exits)     docker rm
    │                  │                  │                │
    ▼                  ▼                  ▼                ▼
 CREATED ──────▶ RUNNING ──────▶ EXITED ──────▶ REMOVED
                    │    ▲
          docker    │    │  docker
          pause     │    │  unpause
                    ▼    │
                  PAUSED
  1. Created: Container exists but hasn't started yet (docker create).
  2. Running: The container's main process (PID 1) is actively executing (docker start or docker run).
  3. Paused: The container's processes are frozen (using cgroup freezer). Memory state is preserved (docker pause).
  4. Exited: The main process has finished or crashed. The writable layer is still on disk.
  5. Removed: The container and its writable layer are deleted (docker rm).

Key Commands

# Create + Start in one command
docker run -d --name myapp nginx

# View running containers
docker ps

# View ALL containers (including stopped ones)
docker ps -a

# Stop gracefully (sends SIGTERM, then SIGKILL after grace period)
docker stop myapp

# Stop immediately (sends SIGKILL)
docker kill myapp

# Remove a stopped container
docker rm myapp

# Force remove a running container
docker rm -f myapp

# Remove ALL stopped containers
docker container prune

The PID 1 Problem

The container's main process is always PID 1. When PID 1 exits, the entire container stops, regardless of whether other processes are still running inside it.

[!IMPORTANT] This is why docker run should always run the main application process as the foreground command (not as a background daemon). If your entrypoint script runs the app with & (background) and then exits, the container will immediately stop.

Q: What is the difference between docker exec and docker attach?

Answer:

Both commands let you interact with a running container, but they connect to very different things.

docker attach

Attaches your terminal's stdin/stdout/stderr to the container's main process (PID 1). You are essentially watching and interacting with the same process that docker run started.

docker run -d --name myapp python app.py
docker attach myapp
# You are now connected to the stdout of `python app.py`

Danger: If you press Ctrl+C while attached, it sends SIGINT to PID 1, which stops the container entirely.

[!WARNING] Use Ctrl+P then Ctrl+Q to detach from a container without killing it. This is the "detach sequence."

docker exec

Starts a brand new, separate process inside the running container. The new process runs alongside PID 1 without affecting it.

docker run -d --name myapp nginx
docker exec -it myapp /bin/bash
# Opens a new bash shell inside the container
# Exiting this shell does NOT stop the container

Key Differences

Featuredocker attachdocker exec
Connects toPID 1 (main process)A new, separate process
Use caseViewing main process outputDebugging, running ad-hoc commands
Ctrl+CStops the containerOnly kills the exec'd process
Multiple terminalsAll see the same PID 1 outputEach gets an independent process

When to Use Which?

  • docker exec (99% of the time): Debugging, inspecting files, running one-off commands, opening a shell.
  • docker attach: Rare. Useful when you need to interact with the stdin of an interactive main process (e.g., a REPL).

Q: What are the different Docker Restart Policies?

Answer:

Restart policies control whether a container is automatically restarted when it exits or when the Docker daemon restarts.

Available Policies

docker run --restart <policy> myimage
PolicyBehavior
noNever restart the container (default).
on-failure[:max-retries]Restart only if the container exits with a non-zero exit code. Optionally limit the number of retries.
alwaysAlways restart the container, regardless of exit code. Also restarts when the Docker daemon starts.
unless-stoppedSame as always, but does not restart if the container was manually stopped before the daemon restart.

Examples

# Restart up to 5 times on failure
docker run --restart on-failure:5 myapp

# Always keep the container running (survives daemon restarts)
docker run --restart always nginx

# Same as always, but respects manual stops
docker run --restart unless-stopped nginx

always vs unless-stopped

The subtle but critical difference:

  1. You run a container with --restart always.
  2. You manually docker stop it.
  3. The Docker daemon restarts (e.g., server reboot).
  4. Result: The container starts again (because the policy is always).

With unless-stopped:

  1. You run a container with --restart unless-stopped.
  2. You manually docker stop it.
  3. The Docker daemon restarts.
  4. Result: The container stays stopped (it respects your manual stop).

[!TIP] For production services, prefer unless-stopped. It auto-recovers from crashes while still respecting your intent when you explicitly stop a container for maintenance.

In Docker Compose

services:
  web:
    image: nginx
    restart: unless-stopped

Q: What are the different Docker Network Types?

Answer:

Docker provides several built-in network drivers. Understanding them is crucial for designing multi-container applications.

1. Bridge Network (Default)

The default network type for standalone containers. Docker creates a virtual bridge (docker0) on the host and assigns each container a private IP address within that bridge's subnet.

# Containers on the default bridge can communicate via IP, 
# but NOT by container name (no automatic DNS).
docker run -d --name app1 nginx
docker run -d --name app2 nginx
# app2 cannot reach app1 via http://app1 (only by IP)

# Custom bridge networks DO support DNS resolution:
docker network create mynet
docker run -d --name app1 --network mynet nginx
docker run -d --name app2 --network mynet nginx
# Now app2 CAN reach app1 via http://app1 ✅

[!IMPORTANT] Always use custom bridge networks instead of the default bridge. Custom bridges provide automatic DNS resolution, better isolation, and the ability to connect/disconnect containers dynamically.

2. Host Network

Removes network isolation entirely. The container shares the host's network stack directly. No port mapping is needed — the container's ports are the host's ports.

docker run --network host nginx
# nginx is now accessible on the host's port 80 directly

Pros: Best network performance (no NAT overhead). Cons: Port conflicts if multiple containers use the same port. Not available on Docker Desktop (macOS/Windows).

3. Overlay Network

Enables communication between containers running on different Docker hosts (across machines). Used in Docker Swarm and Kubernetes environments.

docker network create -d overlay my-overlay

Uses VXLAN tunneling under the hood to encapsulate container traffic across physical network boundaries.

4. None Network

Completely disables networking for the container. The container only has a loopback interface.

docker run --network none myapp
# No external network access at all

Use case: Security-sensitive batch processing where no network communication should be possible.

Summary

DriverScopeDNSUse Case
bridgeSingle hostCustom onlyDefault for standalone containers
hostSingle hostN/AMax performance, no isolation needed
overlayMulti-hostYesSwarm/K8s clusters
noneN/AN/ASecurity, isolated batch jobs

Q: What is the difference between -p (publish) and EXPOSE in Docker?

Answer:

This is a commonly misunderstood distinction.

EXPOSE (Dockerfile instruction)

EXPOSE is purely documentation. It tells other developers and tools (like Docker Compose) which ports the application inside the container listens on. It does NOT actually publish or open any ports.

FROM node:20-alpine
WORKDIR /app
COPY . .
EXPOSE 3000
CMD ["node", "index.js"]

Even with EXPOSE 3000, you cannot access the container on port 3000 from the host unless you explicitly publish it with -p.

-p / --publish (Runtime flag)

This is what actually creates a port mapping between the host machine and the container. It sets up iptables rules to forward traffic.

# Map host port 8080 to container port 3000
docker run -p 8080:3000 myapp

# Map to all interfaces on a random host port
docker run -p 3000 myapp   # Docker picks a random host port

# Bind to a specific host interface
docker run -p 127.0.0.1:8080:3000 myapp  # Only accessible from localhost

-P (Publish All)

The uppercase -P flag automatically publishes all ports that were declared with EXPOSE, mapping each to a random high port on the host.

docker run -P myapp
# If EXPOSE 3000 was in the Dockerfile, 
# Docker maps a random host port → container 3000

docker port myapp
# 3000/tcp -> 0.0.0.0:32768

Summary

FeatureEXPOSE-p / --publish
WhereDockerfiledocker run command
PurposeDocumentation onlyActually opens/maps the port
EffectNone on networkingCreates host↔container port forwarding
Required?NoYes, for external access

Q: How does Container DNS and Service Discovery work in Docker?

Answer:

Docker has a built-in DNS server that enables containers to discover and communicate with each other by name instead of IP address.

How It Works

When you create a custom bridge network, Docker runs an embedded DNS server at 127.0.0.11. Every container on that network automatically registers its container name as a DNS hostname.

docker network create backend
docker run -d --name api --network backend myapi
docker run -d --name db --network backend postgres

# Inside the "api" container:
ping db           # Resolves to the postgres container's IP ✅
curl http://db:5432  # Works by name ✅

Default Bridge vs Custom Bridge

FeatureDefault BridgeCustom Bridge
DNS resolution by name❌ No✅ Yes
Container isolationShared with all default containersIsolated per network
Legacy --link needed?Yes (deprecated)No

[!WARNING] The --link flag is deprecated. Always use custom bridge networks for container-to-container communication.

Network Aliases

You can give a container multiple DNS names using --network-alias:

docker run -d --name postgres-primary \
    --network backend \
    --network-alias db \
    --network-alias database \
    postgres

# Other containers can reach it via "postgres-primary", "db", OR "database"

Docker Compose — Automatic Service Discovery

In Docker Compose, each service name automatically becomes a DNS hostname on the shared network.

services:
  api:
    build: ./api
    depends_on:
      - db
  db:
    image: postgres:16

Inside the api container, db resolves to the Postgres container. No manual network configuration needed.

Round-Robin DNS (Load Balancing)

If multiple containers share the same network alias, Docker's DNS returns all their IPs in a round-robin fashion:

docker run -d --network backend --network-alias worker myworker
docker run -d --network backend --network-alias worker myworker
docker run -d --network backend --network-alias worker myworker

# Resolving "worker" returns all 3 IPs, rotating order each time

Q: What is the difference between Volumes, Bind Mounts, and tmpfs?

Answer:

Docker provides three mechanisms for persisting data or sharing files between the host and containers.

1. Volumes (Managed by Docker)

Volumes are the preferred mechanism for persisting data. Docker fully manages them — they're stored in a dedicated directory on the host (/var/lib/docker/volumes/) and are completely abstracted from the host filesystem.

# Create a named volume
docker volume create mydata

# Use it
docker run -v mydata:/app/data myapp
# or (more explicit long syntax):
docker run --mount type=volume,source=mydata,target=/app/data myapp

2. Bind Mounts (Host Path → Container Path)

A bind mount maps a specific file or directory on the host directly into the container. The host and container see the exact same files in real-time.

# Mount current directory into the container
docker run -v $(pwd):/app myapp
# or:
docker run --mount type=bind,source=$(pwd),target=/app myapp

3. tmpfs Mounts (In-Memory Only)

Data is stored in the host's RAM only. It is never written to disk and is lost when the container stops. Useful for sensitive data that should not persist.

docker run --tmpfs /app/secrets myapp
# or:
docker run --mount type=tmpfs,target=/app/secrets myapp

Comparison

FeatureVolumeBind Mounttmpfs
Stored onDocker-managed area on diskAny host pathHost RAM
Managed byDocker CLI (docker volume)You (host filesystem)Kernel
PortableYes (works across environments)No (depends on host path)No
PerformanceExcellentExcellentFastest (RAM)
Persists after stop✅ Yes✅ Yes (on host)❌ No
Pre-populated✅ Yes (from image)❌ No (overwrites)❌ No
Use caseDatabase storage, app dataDev hot-reload, config filesSecrets, temp caches

When to Use What?

  • Volumes: Production data (databases, uploads). Portable and manageable.
  • Bind Mounts: Local development (mount source code for hot-reload).
  • tmpfs: Storing sensitive info (tokens, keys) that should never hit disk.

[!IMPORTANT] Bind mounts can be dangerous in production because they give containers direct access to the host filesystem. A container running as root with a bind mount to / could access or modify any file on the host.

Q: How do you handle Data Persistence in Docker?

Answer:

Containers are ephemeral by default — all data written inside a container is lost when the container is removed. This is a key interview topic because production systems obviously need persistent data.

The Problem

docker run -d --name mydb postgres
# Write data to the database...
docker rm -f mydb
# 💀 All data is gone forever

Named volumes persist independently of container lifecycle. Even if the container is removed, the volume survives.

docker volume create postgres_data
docker run -d --name mydb \
    -v postgres_data:/var/lib/postgresql/data \
    postgres

# Remove the container
docker rm -f mydb

# Data is still safe in the volume!
docker run -d --name mydb-new \
    -v postgres_data:/var/lib/postgresql/data \
    postgres
# New container picks up right where the old one left off

Strategy 2: Docker Compose with Volumes

services:
  db:
    image: postgres:16
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD: secret

volumes:
  postgres_data:  # Declared as a named volume

Strategy 3: Backup & Restore Volumes

# Backup a volume to a tar file
docker run --rm \
    -v postgres_data:/data \
    -v $(pwd):/backup \
    alpine tar czf /backup/db-backup.tar.gz -C /data .

# Restore from backup
docker run --rm \
    -v postgres_data:/data \
    -v $(pwd):/backup \
    alpine tar xzf /backup/db-backup.tar.gz -C /data

Strategy 4: External Storage Drivers

For production clusters, volume drivers allow Docker to store data on external systems:

  • AWS EBS volumes
  • NFS shares
  • GlusterFS / Ceph distributed filesystems
docker volume create --driver rexray/ebs myvolume

[!TIP] In interviews, mentioning backup strategies and external volume drivers shows production-level thinking beyond basic docker run -v.

Q: What is Docker Compose and when would you use it?

Answer:

Docker Compose is a tool for defining and running multi-container applications using a single YAML configuration file (docker-compose.yml or compose.yaml). Instead of running multiple docker run commands with complex flags, you declare everything in one file and spin up the entire stack with a single command.

Without Compose (Painful)

docker network create myapp
docker volume create db_data
docker run -d --name db --network myapp -v db_data:/var/lib/postgresql/data \
    -e POSTGRES_PASSWORD=secret postgres:16
docker run -d --name redis --network myapp redis:7
docker run -d --name api --network myapp -p 3000:3000 \
    -e DATABASE_URL=postgres://db:5432 \
    -e REDIS_URL=redis://redis:6379 myapi

With Compose (Clean)

# compose.yaml
services:
  api:
    build: ./api
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgres://db:5432/mydb
      REDIS_URL: redis://redis:6379
    depends_on:
      - db
      - redis

  db:
    image: postgres:16
    volumes:
      - db_data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD: secret

  redis:
    image: redis:7-alpine

volumes:
  db_data:
# Start everything
docker compose up -d

# View logs
docker compose logs -f api

# Stop and remove everything
docker compose down

# Stop, remove, AND delete volumes (nuclear option)
docker compose down -v

Key Features

  • Automatic networking: All services in a compose.yaml automatically join a shared network and can reach each other by service name.
  • Volume management: Volumes are declared and managed alongside services.
  • Build integration: You can specify build: context directly instead of pre-building images.
  • Profiles: Group services into profiles for conditional startup.
  • Override files: Use compose.override.yaml for environment-specific config.

When to Use Compose

  • Local development: Spin up your full stack (API + DB + cache + queue) in one command.
  • CI/CD: Run integration tests against real services.
  • Single-host production: Small deployments that don't need Kubernetes.

[!NOTE] Docker Compose is not an orchestration tool. It runs containers on a single host. For multi-host orchestration, use Docker Swarm or Kubernetes.

Q: What is the difference between depends_on and health checks in Docker Compose?

Answer:

This is a subtle but extremely important question. depends_on controls startup order but does NOT wait for a service to be ready.

depends_on (Startup Order Only)

By default, depends_on only guarantees that the dependent container has started (i.e., docker run has been called). It does NOT wait for the application inside to be fully initialized and accepting connections.

services:
  api:
    build: ./api
    depends_on:
      - db   # db container STARTS first, but may not be ready yet!
  db:
    image: postgres:16

The Problem: Postgres takes several seconds to initialize. Your API container starts immediately after the Postgres container starts, but the database isn't accepting connections yet. The API crashes with "connection refused."

Health Checks (Readiness Verification)

A health check defines a command that Docker runs periodically to determine if a container is actually healthy (i.e., the application inside is ready).

services:
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

The Solution: depends_on + condition

Combine both to make a service wait until its dependency is truly healthy:

services:
  api:
    build: ./api
    depends_on:
      db:
        condition: service_healthy  # Wait until db passes health check
  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5
      start_period: 10s

Now the api service will not start until the db service passes its health check.

Health Check Conditions

ConditionMeaning
service_startedDefault. Container has started (same as basic depends_on).
service_healthyContainer's health check is passing.
service_completed_successfullyContainer ran and exited with code 0 (for init/migration containers).

[!TIP] The service_completed_successfully condition is perfect for running database migrations before starting the API:

migrate:
  image: myapp
  command: npm run migrate
api:
  depends_on:
    migrate:
      condition: service_completed_successfully

Q: How do you manage Environment Variables and Secrets in Docker?

Answer:

Environment variables are the primary way to configure containerized applications. However, sensitive data (passwords, API keys) requires special handling.

1. Inline Environment Variables

Pass variables directly in docker run:

docker run -e DATABASE_URL=postgres://localhost:5432/mydb myapp

In Compose:

services:
  api:
    environment:
      - DATABASE_URL=postgres://db:5432/mydb
      - NODE_ENV=production

2. .env Files

Store variables in a file and load them:

docker run --env-file .env myapp

In Compose, .env in the project root is automatically loaded for variable substitution:

# .env
POSTGRES_PASSWORD=supersecret
DB_PORT=5432

# compose.yaml
services:
  db:
    image: postgres:16
    ports:
      - "${DB_PORT}:5432"
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}

3. env_file directive

Load variables from a specific file into the container's environment:

services:
  api:
    env_file:
      - ./config/.env.production

[!CAUTION] Never commit .env files with real secrets to version control. Add them to .gitignore and .dockerignore.

4. Docker Secrets (Swarm Mode)

For true secret management, Docker Swarm provides encrypted secrets that are mounted as files inside containers (never exposed as environment variables or stored in image layers).

echo "supersecretpassword" | docker secret create db_password -

# Use in a Swarm service
docker service create \
    --secret db_password \
    --name myapp myimage
# Secret is available at /run/secrets/db_password inside the container

In Compose (with Swarm):

services:
  api:
    secrets:
      - db_password

secrets:
  db_password:
    file: ./secrets/db_password.txt

Security Hierarchy (Least to Most Secure)

  1. ❌ Hardcoded in Dockerfile (ENV PASSWORD=secret) — visible in image layers.
  2. ⚠️ -e flag or environment: in Compose — visible in docker inspect.
  3. --env-file — secrets in a gitignored file, but still visible in docker inspect.
  4. ✅✅ Docker Secrets — encrypted at rest, mounted as tmpfs, not in docker inspect.
  5. ✅✅✅ External vault (HashiCorp Vault, AWS Secrets Manager) — most secure for production.

Q: Why should containers run as a non-root user?

Answer:

By default, the process inside a Docker container runs as root (UID 0). This is a significant security risk because if an attacker breaks out of the container (a container escape), they gain root access to the host machine.

The Risk

# ❌ Default: runs as root
FROM node:20-alpine
WORKDIR /app
COPY . .
CMD ["node", "index.js"]
# Inside the container: whoami → root

If someone exploits a vulnerability in your Node.js app, they have root-level access inside the container. Combined with a kernel exploit, this could mean root on the host.

The Fix: Create and Use a Non-Root User

FROM node:20-alpine
WORKDIR /app

# Create a non-root user and group
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Install deps as root (needs permissions)
COPY package*.json ./
RUN npm ci --only=production

# Copy app code
COPY . .

# Change ownership of app files to the non-root user
RUN chown -R appuser:appgroup /app

# Switch to non-root user for all subsequent commands
USER appuser

EXPOSE 3000
CMD ["node", "index.js"]
# Inside the container: whoami → appuser

Other Approaches

1. Use --user at runtime:

docker run --user 1000:1000 myapp

2. Use official base images that already set a non-root user: Many official images (like node) include a pre-created user:

FROM node:20-alpine
USER node  # Built-in non-root user

3. Read-only filesystem:

docker run --read-only --tmpfs /tmp myapp

This prevents any writes to the container filesystem, further limiting attack surface.

Additional Hardening

# Drop all Linux capabilities, add back only what's needed
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp

# Prevent privilege escalation
docker run --security-opt no-new-privileges myapp

[!TIP] In Kubernetes, you enforce this via securityContext in the Pod spec:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true

Q: How do you scan Docker images for vulnerabilities and minimize the attack surface?

Answer:

Container image security is a critical production concern. Vulnerabilities in base images, dependencies, or OS packages can be exploited.

1. Image Scanning Tools

Scan images for known CVEs (Common Vulnerabilities and Exposures):

# Docker Scout (built into Docker Desktop)
docker scout cves myimage:latest

# Trivy (open-source, most popular)
trivy image myimage:latest

# Snyk
snyk container test myimage:latest

# Grype (by Anchore)
grype myimage:latest

2. Minimizing the Attack Surface

Use minimal base images:

# ❌ Full OS (~900MB, thousands of packages)
FROM node:20

# ✅ Alpine (~130MB, minimal packages)
FROM node:20-alpine

# ✅✅ Distroless (~20MB, no shell, no package manager)
FROM gcr.io/distroless/nodejs20-debian12

Why Distroless? Distroless images contain only the application runtime. There's no shell (/bin/sh), no package manager, no utilities. If an attacker gets inside the container, they can't run curl, wget, or even ls.

Multi-stage builds (see the Images chapter) are essential for keeping build tools out of production images.

3. Never Use latest Tag

# ❌ Bad: "latest" could change at any time
FROM node:latest

# ✅ Good: Pinned digest for reproducibility
FROM node:20.11-alpine3.18@sha256:abc123...

4. Scan in CI/CD Pipeline

# GitHub Actions example
- name: Scan image
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: myimage:${{ github.sha }}
    severity: CRITICAL,HIGH
    exit-code: 1  # Fail the build if critical vulnerabilities found

5. Don't Store Secrets in Images

# ❌ TERRIBLE: Secret is baked into a layer permanently
COPY .env /app/.env
ENV API_KEY=sk-abc123

# ✅ Pass secrets at runtime
docker run -e API_KEY=$API_KEY myimage

[!CAUTION] Even if you delete a secret in a later Dockerfile layer, it still exists in the earlier layer and can be extracted with docker history or by inspecting the image layers directly.

Checklist

  • Use minimal base images (Alpine, Distroless, or Scratch)
  • Run as non-root user
  • Scan images in CI with Trivy or Scout
  • Pin base image versions (avoid latest)
  • No secrets in image layers
  • Use multi-stage builds
  • Drop unnecessary Linux capabilities
  • Use read-only filesystem where possible

Q: How do Health Checks work in Docker?

Answer:

A health check is a command that Docker runs periodically inside a container to determine if the application is healthy and functioning correctly.

Defining Health Checks

In a Dockerfile:

FROM node:20-alpine
WORKDIR /app
COPY . .

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:3000/health || exit 1

CMD ["node", "index.js"]

In Docker Compose:

services:
  api:
    build: .
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      start_period: 10s
      retries: 3

Health Check Parameters

ParameterDefaultDescription
interval30sTime between health checks
timeout30sMax time to wait for the check to complete
start_period0sGrace period for the container to initialize
retries3Number of consecutive failures before marking unhealthy

Health Check States

StateMeaning
startingContainer just launched, within start_period
healthyHealth check command exited with code 0
unhealthyHealth check failed retries times consecutively
# Check container health
docker inspect --format='{{.State.Health.Status}}' mycontainer
# Output: healthy

docker ps
# CONTAINER ID   STATUS
# abc123         Up 5 min (healthy)

Common Health Check Commands

# HTTP endpoint check (requires curl in the image)
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1

# TCP port check (no curl needed)
HEALTHCHECK CMD nc -z localhost 3000 || exit 1

# PostgreSQL readiness
HEALTHCHECK CMD pg_isready -U postgres || exit 1

# Redis ping
HEALTHCHECK CMD redis-cli ping || exit 1

[!TIP] For Alpine-based images that don't have curl, use wget:

HEALTHCHECK CMD wget --spider -q http://localhost:3000/health || exit 1

Or for images without any HTTP tools, use a simple Node.js script or a compiled binary health checker.

Q: What are Docker Logging Best Practices?

Answer:

Proper logging is essential for debugging, monitoring, and auditing containerized applications.

The Golden Rule: Log to stdout/stderr

Docker captures everything written to the container's stdout and stderr streams. Applications should NOT write logs to files inside the container.

# ❌ Bad: Logs trapped inside the container filesystem
CMD ["node", "index.js", ">>", "/var/log/app.log"]

# ✅ Good: Logs go to stdout (Docker captures them)
CMD ["node", "index.js"]

Why stdout/stderr?

  1. docker logs only shows stdout/stderr output.
  2. Log drivers can only capture stdout/stderr.
  3. Files inside the container are lost when the container is removed.
  4. Centralized logging systems (ELK, Datadog, CloudWatch) integrate with Docker's log drivers, not container files.

Docker Log Drivers

Docker supports pluggable logging drivers that determine where container logs are sent:

# View current log driver
docker info --format '{{.LoggingDriver}}'

# Run a container with a specific driver
docker run --log-driver=json-file --log-opt max-size=10m --log-opt max-file=3 myapp
DriverDestination
json-fileLocal JSON files (default)
syslogSyslog daemon
fluentdFluentd collector
awslogsAWS CloudWatch
gcplogsGoogle Cloud Logging
splunkSplunk HTTP Event Collector
noneDiscard all logs

Log Rotation (Critical!)

The default json-file driver has no size limit. Logs will grow until they fill the disk.

// /etc/docker/daemon.json
{
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "10m",
        "max-file": "3"
    }
}

In Compose:

services:
  api:
    image: myapp
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

[!CAUTION] Forgetting log rotation is one of the most common causes of production outages in Docker environments. A single chatty container can fill up the host's disk in hours.

Useful Commands

# View logs
docker logs mycontainer

# Follow logs (like tail -f)
docker logs -f mycontainer

# Show last 100 lines
docker logs --tail 100 mycontainer

# Show logs since a timestamp
docker logs --since 2024-01-01T00:00:00 mycontainer

Q: When would you use Docker (Compose/Swarm) vs Kubernetes?

Answer:

This is a high-level architecture question that interviewers use to gauge your understanding of container orchestration.

Docker Compose

  • Scope: Single host only.
  • Use case: Local development, CI/CD test environments, small single-server deployments.
  • Complexity: Minimal. A single YAML file.
  • Scaling: docker compose up --scale web=3 (basic, no load balancer).
  • Networking: Automatic service discovery on the same host.

Docker Swarm

  • Scope: Multi-host cluster (built into Docker Engine).
  • Use case: Simple production setups, small teams that want orchestration without Kubernetes complexity.
  • Features: Service discovery, load balancing, rolling updates, secrets management.
  • Scaling: docker service scale web=10 (across multiple nodes).
  • Learning curve: Low (if you know Docker, you know 80% of Swarm).

Kubernetes (K8s)

  • Scope: Multi-host cluster (industry standard for container orchestration).
  • Use case: Large-scale production, microservices, multi-team environments.
  • Features: Everything Swarm has, plus: auto-scaling (HPA/VPA), self-healing, RBAC, custom resource definitions (CRDs), Ingress controllers, service mesh support, advanced scheduling.
  • Scaling: Handles thousands of nodes and hundreds of thousands of pods.
  • Learning curve: Steep. Requires understanding of Pods, Deployments, Services, ConfigMaps, etc.

Comparison Table

FeatureComposeSwarmKubernetes
Multi-host
Auto-scaling✅ (HPA)
Self-healing✅ (basic)✅ (advanced)
Rolling updates
Load balancing✅ (built-in)✅ (Service + Ingress)
SecretsFile-based✅ (encrypted)✅ (encrypted)
Community/EcosystemN/ADecliningDominant
Setup complexityMinutesHoursDays

When to Use What?

  • Compose: You're developing locally or running a small app on a single server.
  • Swarm: You need multi-host orchestration but want something simpler than Kubernetes. (Note: Swarm adoption is declining; most teams go straight to K8s.)
  • Kubernetes: You need production-grade orchestration, auto-scaling, advanced networking, or you're operating at scale.

[!NOTE] In interviews, it's perfectly acceptable to say: "We used Docker Compose for local dev and Kubernetes for production." This shows practical understanding of using the right tool for the right environment.