Docker Interview Prep
A comprehensive collection of Docker interview questions and answers, ranging from fundamentals to production-grade best practices.
Topics covered:
- Core concepts (containers vs VMs, architecture)
- Dockerfile and image building
- Container lifecycle and management
- Networking (bridge, host, overlay)
- Storage (volumes, bind mounts)
- Docker Compose
- Security best practices
- Orchestration and production patterns
Q: What is the difference between Containers and Virtual Machines?
Answer:
This is the most common Docker interview opener. Both are technologies for isolating applications, but they work at fundamentally different levels.
Virtual Machines (VMs)
A VM runs a complete guest Operating System on top of a hypervisor (e.g., VMware, VirtualBox, KVM). Each VM includes its own kernel, system libraries, and binaries.
Containers
A container shares the host machine's OS kernel and isolates only the application's user-space processes using Linux kernel features like namespaces (process isolation) and cgroups (resource limits).
Key Differences
| Feature | Container | Virtual Machine |
|---|---|---|
| Isolation level | Process-level (shares host kernel) | Hardware-level (full guest OS) |
| Startup time | Milliseconds | Minutes |
| Size | Megabytes (just the app + deps) | Gigabytes (full OS image) |
| Performance | Near-native (no hypervisor overhead) | Slower (hardware emulation layer) |
| Density | Run hundreds on a single host | Run tens on a single host |
| OS support | Linux containers on Linux host only* | Any OS on any host |
| Security | Weaker isolation (shared kernel) | Stronger isolation (separate kernels) |
[!NOTE] *Docker Desktop on macOS/Windows actually runs a lightweight Linux VM under the hood (using HyperKit or WSL2) to provide the Linux kernel that containers need.
When to Use Which?
- Containers: Microservices, CI/CD pipelines, dev environments, anything where speed and density matter.
- VMs: When you need full OS-level isolation (e.g., running Windows apps alongside Linux), or when security boundaries are critical (multi-tenant hosting).
Q: Explain the Docker Architecture.
Answer:
Docker uses a client-server architecture with three main components:
1. Docker Client (docker CLI)
The command-line interface that you interact with. When you run a command like docker run, the client sends it as an API request to the Docker daemon. The client can communicate with the daemon locally (via a Unix socket) or remotely (via TCP).
2. Docker Daemon (dockerd)
The background service (server) that does all the heavy lifting. It manages:
- Building images
- Running containers
- Pulling/pushing images from registries
- Managing networks and volumes
The daemon exposes a REST API that the client talks to.
3. Docker Registry (e.g., Docker Hub)
A storage and distribution system for Docker images. When you docker pull nginx, the daemon fetches the image from Docker Hub (the default public registry). Companies also run private registries (e.g., AWS ECR, GCR, Harbor).
How They Work Together
┌──────────────┐ REST API ┌──────────────────┐
│ Docker Client │ ──────────────────▶ │ Docker Daemon │
│ (docker CLI) │ │ (dockerd) │
└──────────────┘ │ │
│ ┌────────────┐ │
│ │ Containers │ │
│ ├────────────┤ │
│ │ Images │ │
│ ├────────────┤ │
│ │ Volumes │ │
│ ├────────────┤ │
│ │ Networks │ │
│ └────────────┘ │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Docker Registry │
│ (Docker Hub, │
│ ECR, GCR, etc.) │
└──────────────────┘
Under the Hood: containerd & runc
The Docker daemon doesn't actually run containers directly. It delegates to:
- containerd: A high-level container runtime that manages the full container lifecycle (image transfer, storage, execution).
- runc: A low-level OCI-compliant runtime that actually creates and runs containers using Linux kernel features (namespaces, cgroups).
[!TIP] When discussing Docker architecture in interviews, mentioning containerd and runc shows a deeper understanding. Kubernetes, for example, talks directly to containerd (not Docker) since Docker was deprecated as a Kubernetes runtime in v1.24.
Q: What is the difference between a Docker Image and a Container?
Answer:
This is a deceptively simple but critical distinction:
Docker Image
An image is a read-only template that contains the application code, runtime, libraries, environment variables, and configuration files needed to run an application. Think of it as a class in OOP or a blueprint for a house.
Images are built in layers. Each instruction in a Dockerfile (e.g., RUN, COPY, ADD) creates a new layer. Layers are stacked on top of each other and are cached, which makes rebuilds extremely fast.
Docker Container
A container is a running instance of an image. Think of it as an object instantiated from a class, or a house built from a blueprint. You can create multiple containers from the same image.
When a container starts, Docker adds a thin writable layer on top of the read-only image layers. All file changes (new files, modifications, deletions) happen in this writable layer.
Analogy
Image = Class definition (immutable blueprint)
Container = Object/Instance (running process with mutable state)
One Image → Many Containers (just like one Class → many Objects)
Key Differences
| Feature | Image | Container |
|---|---|---|
| State | Immutable (read-only) | Mutable (has a writable layer) |
| Stored as | Layers on disk | Running process + writable layer |
| Created by | docker build or docker pull | docker run or docker create |
| Persistence | Persists until explicitly deleted | Ephemeral by default (data lost on removal) |
| Sharing | Pushed to registries | Cannot be pushed (must be docker commit'd into an image first) |
Common Follow-Up: What is docker commit?
You can take a running container's writable layer and freeze it into a new image:
docker commit <container_id> my-custom-image:v1
[!CAUTION] Using
docker commitin production is considered bad practice. Always use aDockerfilefor reproducible, version-controlled image builds.
Q: What is the difference between CMD and ENTRYPOINT in a Dockerfile?
Answer:
Both CMD and ENTRYPOINT define what command runs when a container starts, but they behave very differently when users pass arguments at runtime.
CMD — The Default Command (Easily Overridden)
CMD sets the default command and/or arguments for the container. However, it is completely replaced if the user provides a command when running the container.
FROM ubuntu
CMD ["echo", "Hello from CMD"]
docker run myimage
# Output: Hello from CMD
docker run myimage echo "I replaced CMD"
# Output: I replaced CMD (CMD was completely overridden)
ENTRYPOINT — The Fixed Executable (Not Easily Overridden)
ENTRYPOINT sets the main executable for the container. User-provided arguments are appended to the entrypoint, not used to replace it.
FROM ubuntu
ENTRYPOINT ["echo", "Hello from"]
docker run myimage
# Output: Hello from
docker run myimage "Docker World"
# Output: Hello from Docker World (argument was appended)
The Power Combo: ENTRYPOINT + CMD
The most common production pattern is using them together. ENTRYPOINT defines the fixed executable, and CMD provides default arguments that can be overridden.
FROM python:3.11-slim
ENTRYPOINT ["python"]
CMD ["app.py"]
docker run myimage
# Runs: python app.py (default)
docker run myimage test.py
# Runs: python test.py (CMD overridden, ENTRYPOINT kept)
Summary
| Feature | CMD | ENTRYPOINT |
|---|---|---|
| Purpose | Default command/args | Fixed executable |
| Override behavior | Completely replaced by docker run args | Args are appended to it |
| Best for | Default arguments | The main process |
[!TIP] To override
ENTRYPOINTat runtime, you must explicitly use the--entrypointflag:docker run --entrypoint /bin/bash myimage
Q: What is the difference between COPY and ADD in a Dockerfile?
Answer:
Both instructions copy files from the build context into the image, but ADD has extra (often unwanted) functionality.
COPY — Simple File Copy
Does exactly one thing: copies files or directories from the build context into the image filesystem. It's transparent and predictable.
COPY requirements.txt /app/
COPY src/ /app/src/
ADD — Copy with Extras
ADD does everything COPY does, plus two additional features:
- Auto-extracts compressed archives (
.tar,.tar.gz,.tgz,.bz2,.xz) into the destination directory. - Fetches files from remote URLs (like
wget).
# Auto-extracts the tarball into /app/
ADD app.tar.gz /app/
# Downloads a file from the internet
ADD https://example.com/config.json /etc/app/config.json
Why You Should Almost Always Use COPY
[!WARNING] The Docker official best practices guide explicitly recommends using
COPYoverADDin almost all cases.
Reasons:
- Predictability:
COPYhas no hidden side effects. WithADD, a developer might not realize their.tar.gzfile will be auto-extracted. If you want the archive as-is (e.g., to extract it manually later),ADDwill silently break your intent. - Security:
ADDfrom a URL does not verify SSL certificates and doesn't support authentication. UseRUN curlorRUN wgetinstead for better control. - Cache invalidation: Remote URL fetches with
ADDcan cause unpredictable cache behavior since Docker cannot know if the remote file has changed.
Rule of Thumb
- Use
COPYfor all local file copies (99% of cases). - Use
ADDonly when you explicitly need tar auto-extraction. - Use
RUN curlorRUN wgetfor downloading remote files.
Q: What are Multi-Stage Builds and why are they important?
Answer:
Multi-stage builds allow you to use multiple FROM statements in a single Dockerfile. Each FROM starts a new "stage" of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.
The Problem Without Multi-Stage Builds
In a typical build, you need compilers, build tools, and dev dependencies to compile your application. If you use a single stage, all of those tools end up in your final production image, making it bloated and insecure.
# ❌ Single-stage: Final image includes Go compiler, source code, build tools
FROM golang:1.21
WORKDIR /app
COPY . .
RUN go build -o myapp
CMD ["./myapp"]
# Final image size: ~800MB (includes entire Go toolchain!)
The Solution: Multi-Stage Build
# Stage 1: Build (named "builder")
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp
# Stage 2: Production (tiny final image)
FROM alpine:3.18
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["./myapp"]
# Final image size: ~15MB (just the binary + Alpine!)
How It Works
- Stage 1 ("builder"): Uses the full
golangimage (800MB+) to compile the Go binary. - Stage 2: Starts fresh from a tiny
alpineimage (5MB) and copies only the compiled binary from the builder stage usingCOPY --from=builder. - The final image contains nothing from stage 1 except the single file you explicitly copied.
Real-World Node.js Example
# Stage 1: Install dependencies and build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
EXPOSE 3000
CMD ["node", "dist/index.js"]
Benefits
- Dramatically smaller images (often 10-50x reduction).
- Better security (no compilers, source code, or dev tools in production).
- Faster deployments (smaller images push/pull faster).
- Single Dockerfile (no need for separate
Dockerfile.devandDockerfile.prod).
[!TIP] You can also copy from external images without defining them as a stage:
COPY --from=nginx:latest /etc/nginx/nginx.conf /etc/nginx/
Q: How does Docker Layer Caching work? How do you optimize a Dockerfile?
Answer:
Understanding layer caching is essential for building images fast and keeping them small.
How Layers Work
Every instruction in a Dockerfile (FROM, RUN, COPY, ADD, ENV, etc.) creates a new layer. Layers are stacked, read-only, and cached. When you rebuild an image, Docker checks each instruction:
- If the instruction and its inputs haven't changed, Docker reuses the cached layer (instant).
- If anything has changed, Docker invalidates that layer and all layers after it (the cache "busts").
The Cache Busting Problem
# ❌ Bad order: Cache busts on EVERY code change
FROM node:20-alpine
WORKDIR /app
COPY . . # Any code change invalidates THIS layer
RUN npm install # ...which forces this to re-run (slow!)
CMD ["node", "index.js"]
Every time you change a single line of code, COPY . . changes, which invalidates the cache for npm install. You end up reinstalling all dependencies from scratch on every build.
The Fix: Order by Change Frequency
# ✅ Good order: Dependencies cached separately from code
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./ # Changes rarely
RUN npm ci # Cached unless package.json changes
COPY . . # Code changes only bust THIS layer
CMD ["node", "index.js"]
Now, if you only change application code, Docker reuses the cached npm ci layer and only re-runs COPY . . — saving minutes on every build.
Optimization Best Practices
1. Use .dockerignore
Just like .gitignore, a .dockerignore file prevents unnecessary files from being sent to the Docker build context.
node_modules
.git
*.md
.env
dist
2. Combine RUN commands
Each RUN instruction creates a new layer. Combine related commands to reduce layer count and image size.
# ❌ Bad: 3 layers (including cached apt lists)
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
# ✅ Good: 1 layer, cleanup in same step
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
3. Use specific base image tags
# ❌ Bad: `latest` changes unpredictably, breaks cache
FROM node:latest
# ✅ Good: Pinned version, reproducible
FROM node:20.11-alpine3.18
4. Use multi-stage builds (see the previous chapter).
5. Prefer Alpine or Distroless base images
node:20→ ~900MBnode:20-alpine→ ~130MBnode:20-slim→ ~200MB
Q: What are the different Container States and the Container Lifecycle?
Answer:
A Docker container goes through several states during its lifecycle. Understanding these is important for debugging and orchestration.
Container States
docker create docker start (process exits) docker rm
│ │ │ │
▼ ▼ ▼ ▼
CREATED ──────▶ RUNNING ──────▶ EXITED ──────▶ REMOVED
│ ▲
docker │ │ docker
pause │ │ unpause
▼ │
PAUSED
- Created: Container exists but hasn't started yet (
docker create). - Running: The container's main process (PID 1) is actively executing (
docker startordocker run). - Paused: The container's processes are frozen (using
cgroup freezer). Memory state is preserved (docker pause). - Exited: The main process has finished or crashed. The writable layer is still on disk.
- Removed: The container and its writable layer are deleted (
docker rm).
Key Commands
# Create + Start in one command
docker run -d --name myapp nginx
# View running containers
docker ps
# View ALL containers (including stopped ones)
docker ps -a
# Stop gracefully (sends SIGTERM, then SIGKILL after grace period)
docker stop myapp
# Stop immediately (sends SIGKILL)
docker kill myapp
# Remove a stopped container
docker rm myapp
# Force remove a running container
docker rm -f myapp
# Remove ALL stopped containers
docker container prune
The PID 1 Problem
The container's main process is always PID 1. When PID 1 exits, the entire container stops, regardless of whether other processes are still running inside it.
[!IMPORTANT] This is why
docker runshould always run the main application process as the foreground command (not as a background daemon). If your entrypoint script runs the app with&(background) and then exits, the container will immediately stop.
Q: What is the difference between docker exec and docker attach?
Answer:
Both commands let you interact with a running container, but they connect to very different things.
docker attach
Attaches your terminal's stdin/stdout/stderr to the container's main process (PID 1). You are essentially watching and interacting with the same process that docker run started.
docker run -d --name myapp python app.py
docker attach myapp
# You are now connected to the stdout of `python app.py`
Danger: If you press Ctrl+C while attached, it sends SIGINT to PID 1, which stops the container entirely.
[!WARNING] Use
Ctrl+PthenCtrl+Qto detach from a container without killing it. This is the "detach sequence."
docker exec
Starts a brand new, separate process inside the running container. The new process runs alongside PID 1 without affecting it.
docker run -d --name myapp nginx
docker exec -it myapp /bin/bash
# Opens a new bash shell inside the container
# Exiting this shell does NOT stop the container
Key Differences
| Feature | docker attach | docker exec |
|---|---|---|
| Connects to | PID 1 (main process) | A new, separate process |
| Use case | Viewing main process output | Debugging, running ad-hoc commands |
| Ctrl+C | Stops the container | Only kills the exec'd process |
| Multiple terminals | All see the same PID 1 output | Each gets an independent process |
When to Use Which?
docker exec(99% of the time): Debugging, inspecting files, running one-off commands, opening a shell.docker attach: Rare. Useful when you need to interact with the stdin of an interactive main process (e.g., a REPL).
Q: What are the different Docker Restart Policies?
Answer:
Restart policies control whether a container is automatically restarted when it exits or when the Docker daemon restarts.
Available Policies
docker run --restart <policy> myimage
| Policy | Behavior |
|---|---|
no | Never restart the container (default). |
on-failure[:max-retries] | Restart only if the container exits with a non-zero exit code. Optionally limit the number of retries. |
always | Always restart the container, regardless of exit code. Also restarts when the Docker daemon starts. |
unless-stopped | Same as always, but does not restart if the container was manually stopped before the daemon restart. |
Examples
# Restart up to 5 times on failure
docker run --restart on-failure:5 myapp
# Always keep the container running (survives daemon restarts)
docker run --restart always nginx
# Same as always, but respects manual stops
docker run --restart unless-stopped nginx
always vs unless-stopped
The subtle but critical difference:
- You run a container with
--restart always. - You manually
docker stopit. - The Docker daemon restarts (e.g., server reboot).
- Result: The container starts again (because the policy is
always).
With unless-stopped:
- You run a container with
--restart unless-stopped. - You manually
docker stopit. - The Docker daemon restarts.
- Result: The container stays stopped (it respects your manual stop).
[!TIP] For production services, prefer
unless-stopped. It auto-recovers from crashes while still respecting your intent when you explicitly stop a container for maintenance.
In Docker Compose
services:
web:
image: nginx
restart: unless-stopped
Q: What are the different Docker Network Types?
Answer:
Docker provides several built-in network drivers. Understanding them is crucial for designing multi-container applications.
1. Bridge Network (Default)
The default network type for standalone containers. Docker creates a virtual bridge (docker0) on the host and assigns each container a private IP address within that bridge's subnet.
# Containers on the default bridge can communicate via IP,
# but NOT by container name (no automatic DNS).
docker run -d --name app1 nginx
docker run -d --name app2 nginx
# app2 cannot reach app1 via http://app1 (only by IP)
# Custom bridge networks DO support DNS resolution:
docker network create mynet
docker run -d --name app1 --network mynet nginx
docker run -d --name app2 --network mynet nginx
# Now app2 CAN reach app1 via http://app1 ✅
[!IMPORTANT] Always use custom bridge networks instead of the default bridge. Custom bridges provide automatic DNS resolution, better isolation, and the ability to connect/disconnect containers dynamically.
2. Host Network
Removes network isolation entirely. The container shares the host's network stack directly. No port mapping is needed — the container's ports are the host's ports.
docker run --network host nginx
# nginx is now accessible on the host's port 80 directly
Pros: Best network performance (no NAT overhead). Cons: Port conflicts if multiple containers use the same port. Not available on Docker Desktop (macOS/Windows).
3. Overlay Network
Enables communication between containers running on different Docker hosts (across machines). Used in Docker Swarm and Kubernetes environments.
docker network create -d overlay my-overlay
Uses VXLAN tunneling under the hood to encapsulate container traffic across physical network boundaries.
4. None Network
Completely disables networking for the container. The container only has a loopback interface.
docker run --network none myapp
# No external network access at all
Use case: Security-sensitive batch processing where no network communication should be possible.
Summary
| Driver | Scope | DNS | Use Case |
|---|---|---|---|
bridge | Single host | Custom only | Default for standalone containers |
host | Single host | N/A | Max performance, no isolation needed |
overlay | Multi-host | Yes | Swarm/K8s clusters |
none | N/A | N/A | Security, isolated batch jobs |
Q: What is the difference between -p (publish) and EXPOSE in Docker?
Answer:
This is a commonly misunderstood distinction.
EXPOSE (Dockerfile instruction)
EXPOSE is purely documentation. It tells other developers and tools (like Docker Compose) which ports the application inside the container listens on. It does NOT actually publish or open any ports.
FROM node:20-alpine
WORKDIR /app
COPY . .
EXPOSE 3000
CMD ["node", "index.js"]
Even with EXPOSE 3000, you cannot access the container on port 3000 from the host unless you explicitly publish it with -p.
-p / --publish (Runtime flag)
This is what actually creates a port mapping between the host machine and the container. It sets up iptables rules to forward traffic.
# Map host port 8080 to container port 3000
docker run -p 8080:3000 myapp
# Map to all interfaces on a random host port
docker run -p 3000 myapp # Docker picks a random host port
# Bind to a specific host interface
docker run -p 127.0.0.1:8080:3000 myapp # Only accessible from localhost
-P (Publish All)
The uppercase -P flag automatically publishes all ports that were declared with EXPOSE, mapping each to a random high port on the host.
docker run -P myapp
# If EXPOSE 3000 was in the Dockerfile,
# Docker maps a random host port → container 3000
docker port myapp
# 3000/tcp -> 0.0.0.0:32768
Summary
| Feature | EXPOSE | -p / --publish |
|---|---|---|
| Where | Dockerfile | docker run command |
| Purpose | Documentation only | Actually opens/maps the port |
| Effect | None on networking | Creates host↔container port forwarding |
| Required? | No | Yes, for external access |
Q: How does Container DNS and Service Discovery work in Docker?
Answer:
Docker has a built-in DNS server that enables containers to discover and communicate with each other by name instead of IP address.
How It Works
When you create a custom bridge network, Docker runs an embedded DNS server at 127.0.0.11. Every container on that network automatically registers its container name as a DNS hostname.
docker network create backend
docker run -d --name api --network backend myapi
docker run -d --name db --network backend postgres
# Inside the "api" container:
ping db # Resolves to the postgres container's IP ✅
curl http://db:5432 # Works by name ✅
Default Bridge vs Custom Bridge
| Feature | Default Bridge | Custom Bridge |
|---|---|---|
| DNS resolution by name | ❌ No | ✅ Yes |
| Container isolation | Shared with all default containers | Isolated per network |
Legacy --link needed? | Yes (deprecated) | No |
[!WARNING] The
--linkflag is deprecated. Always use custom bridge networks for container-to-container communication.
Network Aliases
You can give a container multiple DNS names using --network-alias:
docker run -d --name postgres-primary \
--network backend \
--network-alias db \
--network-alias database \
postgres
# Other containers can reach it via "postgres-primary", "db", OR "database"
Docker Compose — Automatic Service Discovery
In Docker Compose, each service name automatically becomes a DNS hostname on the shared network.
services:
api:
build: ./api
depends_on:
- db
db:
image: postgres:16
Inside the api container, db resolves to the Postgres container. No manual network configuration needed.
Round-Robin DNS (Load Balancing)
If multiple containers share the same network alias, Docker's DNS returns all their IPs in a round-robin fashion:
docker run -d --network backend --network-alias worker myworker
docker run -d --network backend --network-alias worker myworker
docker run -d --network backend --network-alias worker myworker
# Resolving "worker" returns all 3 IPs, rotating order each time
Q: What is the difference between Volumes, Bind Mounts, and tmpfs?
Answer:
Docker provides three mechanisms for persisting data or sharing files between the host and containers.
1. Volumes (Managed by Docker)
Volumes are the preferred mechanism for persisting data. Docker fully manages them — they're stored in a dedicated directory on the host (/var/lib/docker/volumes/) and are completely abstracted from the host filesystem.
# Create a named volume
docker volume create mydata
# Use it
docker run -v mydata:/app/data myapp
# or (more explicit long syntax):
docker run --mount type=volume,source=mydata,target=/app/data myapp
2. Bind Mounts (Host Path → Container Path)
A bind mount maps a specific file or directory on the host directly into the container. The host and container see the exact same files in real-time.
# Mount current directory into the container
docker run -v $(pwd):/app myapp
# or:
docker run --mount type=bind,source=$(pwd),target=/app myapp
3. tmpfs Mounts (In-Memory Only)
Data is stored in the host's RAM only. It is never written to disk and is lost when the container stops. Useful for sensitive data that should not persist.
docker run --tmpfs /app/secrets myapp
# or:
docker run --mount type=tmpfs,target=/app/secrets myapp
Comparison
| Feature | Volume | Bind Mount | tmpfs |
|---|---|---|---|
| Stored on | Docker-managed area on disk | Any host path | Host RAM |
| Managed by | Docker CLI (docker volume) | You (host filesystem) | Kernel |
| Portable | Yes (works across environments) | No (depends on host path) | No |
| Performance | Excellent | Excellent | Fastest (RAM) |
| Persists after stop | ✅ Yes | ✅ Yes (on host) | ❌ No |
| Pre-populated | ✅ Yes (from image) | ❌ No (overwrites) | ❌ No |
| Use case | Database storage, app data | Dev hot-reload, config files | Secrets, temp caches |
When to Use What?
- Volumes: Production data (databases, uploads). Portable and manageable.
- Bind Mounts: Local development (mount source code for hot-reload).
- tmpfs: Storing sensitive info (tokens, keys) that should never hit disk.
[!IMPORTANT] Bind mounts can be dangerous in production because they give containers direct access to the host filesystem. A container running as root with a bind mount to
/could access or modify any file on the host.
Q: How do you handle Data Persistence in Docker?
Answer:
Containers are ephemeral by default — all data written inside a container is lost when the container is removed. This is a key interview topic because production systems obviously need persistent data.
The Problem
docker run -d --name mydb postgres
# Write data to the database...
docker rm -f mydb
# 💀 All data is gone forever
Strategy 1: Named Volumes (Recommended)
Named volumes persist independently of container lifecycle. Even if the container is removed, the volume survives.
docker volume create postgres_data
docker run -d --name mydb \
-v postgres_data:/var/lib/postgresql/data \
postgres
# Remove the container
docker rm -f mydb
# Data is still safe in the volume!
docker run -d --name mydb-new \
-v postgres_data:/var/lib/postgresql/data \
postgres
# New container picks up right where the old one left off
Strategy 2: Docker Compose with Volumes
services:
db:
image: postgres:16
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: secret
volumes:
postgres_data: # Declared as a named volume
Strategy 3: Backup & Restore Volumes
# Backup a volume to a tar file
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine tar czf /backup/db-backup.tar.gz -C /data .
# Restore from backup
docker run --rm \
-v postgres_data:/data \
-v $(pwd):/backup \
alpine tar xzf /backup/db-backup.tar.gz -C /data
Strategy 4: External Storage Drivers
For production clusters, volume drivers allow Docker to store data on external systems:
- AWS EBS volumes
- NFS shares
- GlusterFS / Ceph distributed filesystems
docker volume create --driver rexray/ebs myvolume
[!TIP] In interviews, mentioning backup strategies and external volume drivers shows production-level thinking beyond basic
docker run -v.
Q: What is Docker Compose and when would you use it?
Answer:
Docker Compose is a tool for defining and running multi-container applications using a single YAML configuration file (docker-compose.yml or compose.yaml). Instead of running multiple docker run commands with complex flags, you declare everything in one file and spin up the entire stack with a single command.
Without Compose (Painful)
docker network create myapp
docker volume create db_data
docker run -d --name db --network myapp -v db_data:/var/lib/postgresql/data \
-e POSTGRES_PASSWORD=secret postgres:16
docker run -d --name redis --network myapp redis:7
docker run -d --name api --network myapp -p 3000:3000 \
-e DATABASE_URL=postgres://db:5432 \
-e REDIS_URL=redis://redis:6379 myapi
With Compose (Clean)
# compose.yaml
services:
api:
build: ./api
ports:
- "3000:3000"
environment:
DATABASE_URL: postgres://db:5432/mydb
REDIS_URL: redis://redis:6379
depends_on:
- db
- redis
db:
image: postgres:16
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: secret
redis:
image: redis:7-alpine
volumes:
db_data:
# Start everything
docker compose up -d
# View logs
docker compose logs -f api
# Stop and remove everything
docker compose down
# Stop, remove, AND delete volumes (nuclear option)
docker compose down -v
Key Features
- Automatic networking: All services in a
compose.yamlautomatically join a shared network and can reach each other by service name. - Volume management: Volumes are declared and managed alongside services.
- Build integration: You can specify
build:context directly instead of pre-building images. - Profiles: Group services into profiles for conditional startup.
- Override files: Use
compose.override.yamlfor environment-specific config.
When to Use Compose
- Local development: Spin up your full stack (API + DB + cache + queue) in one command.
- CI/CD: Run integration tests against real services.
- Single-host production: Small deployments that don't need Kubernetes.
[!NOTE] Docker Compose is not an orchestration tool. It runs containers on a single host. For multi-host orchestration, use Docker Swarm or Kubernetes.
Q: What is the difference between depends_on and health checks in Docker Compose?
Answer:
This is a subtle but extremely important question. depends_on controls startup order but does NOT wait for a service to be ready.
depends_on (Startup Order Only)
By default, depends_on only guarantees that the dependent container has started (i.e., docker run has been called). It does NOT wait for the application inside to be fully initialized and accepting connections.
services:
api:
build: ./api
depends_on:
- db # db container STARTS first, but may not be ready yet!
db:
image: postgres:16
The Problem: Postgres takes several seconds to initialize. Your API container starts immediately after the Postgres container starts, but the database isn't accepting connections yet. The API crashes with "connection refused."
Health Checks (Readiness Verification)
A health check defines a command that Docker runs periodically to determine if a container is actually healthy (i.e., the application inside is ready).
services:
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
start_period: 10s
The Solution: depends_on + condition
Combine both to make a service wait until its dependency is truly healthy:
services:
api:
build: ./api
depends_on:
db:
condition: service_healthy # Wait until db passes health check
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
start_period: 10s
Now the api service will not start until the db service passes its health check.
Health Check Conditions
| Condition | Meaning |
|---|---|
service_started | Default. Container has started (same as basic depends_on). |
service_healthy | Container's health check is passing. |
service_completed_successfully | Container ran and exited with code 0 (for init/migration containers). |
[!TIP] The
service_completed_successfullycondition is perfect for running database migrations before starting the API:migrate: image: myapp command: npm run migrate api: depends_on: migrate: condition: service_completed_successfully
Q: How do you manage Environment Variables and Secrets in Docker?
Answer:
Environment variables are the primary way to configure containerized applications. However, sensitive data (passwords, API keys) requires special handling.
1. Inline Environment Variables
Pass variables directly in docker run:
docker run -e DATABASE_URL=postgres://localhost:5432/mydb myapp
In Compose:
services:
api:
environment:
- DATABASE_URL=postgres://db:5432/mydb
- NODE_ENV=production
2. .env Files
Store variables in a file and load them:
docker run --env-file .env myapp
In Compose, .env in the project root is automatically loaded for variable substitution:
# .env
POSTGRES_PASSWORD=supersecret
DB_PORT=5432
# compose.yaml
services:
db:
image: postgres:16
ports:
- "${DB_PORT}:5432"
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
3. env_file directive
Load variables from a specific file into the container's environment:
services:
api:
env_file:
- ./config/.env.production
[!CAUTION] Never commit
.envfiles with real secrets to version control. Add them to.gitignoreand.dockerignore.
4. Docker Secrets (Swarm Mode)
For true secret management, Docker Swarm provides encrypted secrets that are mounted as files inside containers (never exposed as environment variables or stored in image layers).
echo "supersecretpassword" | docker secret create db_password -
# Use in a Swarm service
docker service create \
--secret db_password \
--name myapp myimage
# Secret is available at /run/secrets/db_password inside the container
In Compose (with Swarm):
services:
api:
secrets:
- db_password
secrets:
db_password:
file: ./secrets/db_password.txt
Security Hierarchy (Least to Most Secure)
- ❌ Hardcoded in Dockerfile (
ENV PASSWORD=secret) — visible in image layers. - ⚠️
-eflag orenvironment:in Compose — visible indocker inspect. - ✅
--env-file— secrets in a gitignored file, but still visible indocker inspect. - ✅✅ Docker Secrets — encrypted at rest, mounted as tmpfs, not in
docker inspect. - ✅✅✅ External vault (HashiCorp Vault, AWS Secrets Manager) — most secure for production.
Q: Why should containers run as a non-root user?
Answer:
By default, the process inside a Docker container runs as root (UID 0). This is a significant security risk because if an attacker breaks out of the container (a container escape), they gain root access to the host machine.
The Risk
# ❌ Default: runs as root
FROM node:20-alpine
WORKDIR /app
COPY . .
CMD ["node", "index.js"]
# Inside the container: whoami → root
If someone exploits a vulnerability in your Node.js app, they have root-level access inside the container. Combined with a kernel exploit, this could mean root on the host.
The Fix: Create and Use a Non-Root User
FROM node:20-alpine
WORKDIR /app
# Create a non-root user and group
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Install deps as root (needs permissions)
COPY package*.json ./
RUN npm ci --only=production
# Copy app code
COPY . .
# Change ownership of app files to the non-root user
RUN chown -R appuser:appgroup /app
# Switch to non-root user for all subsequent commands
USER appuser
EXPOSE 3000
CMD ["node", "index.js"]
# Inside the container: whoami → appuser
Other Approaches
1. Use --user at runtime:
docker run --user 1000:1000 myapp
2. Use official base images that already set a non-root user:
Many official images (like node) include a pre-created user:
FROM node:20-alpine
USER node # Built-in non-root user
3. Read-only filesystem:
docker run --read-only --tmpfs /tmp myapp
This prevents any writes to the container filesystem, further limiting attack surface.
Additional Hardening
# Drop all Linux capabilities, add back only what's needed
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp
# Prevent privilege escalation
docker run --security-opt no-new-privileges myapp
[!TIP] In Kubernetes, you enforce this via
securityContextin the Pod spec:securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true
Q: How do you scan Docker images for vulnerabilities and minimize the attack surface?
Answer:
Container image security is a critical production concern. Vulnerabilities in base images, dependencies, or OS packages can be exploited.
1. Image Scanning Tools
Scan images for known CVEs (Common Vulnerabilities and Exposures):
# Docker Scout (built into Docker Desktop)
docker scout cves myimage:latest
# Trivy (open-source, most popular)
trivy image myimage:latest
# Snyk
snyk container test myimage:latest
# Grype (by Anchore)
grype myimage:latest
2. Minimizing the Attack Surface
Use minimal base images:
# ❌ Full OS (~900MB, thousands of packages)
FROM node:20
# ✅ Alpine (~130MB, minimal packages)
FROM node:20-alpine
# ✅✅ Distroless (~20MB, no shell, no package manager)
FROM gcr.io/distroless/nodejs20-debian12
Why Distroless?
Distroless images contain only the application runtime. There's no shell (/bin/sh), no package manager, no utilities. If an attacker gets inside the container, they can't run curl, wget, or even ls.
Multi-stage builds (see the Images chapter) are essential for keeping build tools out of production images.
3. Never Use latest Tag
# ❌ Bad: "latest" could change at any time
FROM node:latest
# ✅ Good: Pinned digest for reproducibility
FROM node:20.11-alpine3.18@sha256:abc123...
4. Scan in CI/CD Pipeline
# GitHub Actions example
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: myimage:${{ github.sha }}
severity: CRITICAL,HIGH
exit-code: 1 # Fail the build if critical vulnerabilities found
5. Don't Store Secrets in Images
# ❌ TERRIBLE: Secret is baked into a layer permanently
COPY .env /app/.env
ENV API_KEY=sk-abc123
# ✅ Pass secrets at runtime
docker run -e API_KEY=$API_KEY myimage
[!CAUTION] Even if you delete a secret in a later Dockerfile layer, it still exists in the earlier layer and can be extracted with
docker historyor by inspecting the image layers directly.
Checklist
- Use minimal base images (Alpine, Distroless, or Scratch)
- Run as non-root user
- Scan images in CI with Trivy or Scout
-
Pin base image versions (avoid
latest) - No secrets in image layers
- Use multi-stage builds
- Drop unnecessary Linux capabilities
- Use read-only filesystem where possible
Q: How do Health Checks work in Docker?
Answer:
A health check is a command that Docker runs periodically inside a container to determine if the application is healthy and functioning correctly.
Defining Health Checks
In a Dockerfile:
FROM node:20-alpine
WORKDIR /app
COPY . .
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
CMD ["node", "index.js"]
In Docker Compose:
services:
api:
build: .
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 5s
start_period: 10s
retries: 3
Health Check Parameters
| Parameter | Default | Description |
|---|---|---|
interval | 30s | Time between health checks |
timeout | 30s | Max time to wait for the check to complete |
start_period | 0s | Grace period for the container to initialize |
retries | 3 | Number of consecutive failures before marking unhealthy |
Health Check States
| State | Meaning |
|---|---|
starting | Container just launched, within start_period |
healthy | Health check command exited with code 0 |
unhealthy | Health check failed retries times consecutively |
# Check container health
docker inspect --format='{{.State.Health.Status}}' mycontainer
# Output: healthy
docker ps
# CONTAINER ID STATUS
# abc123 Up 5 min (healthy)
Common Health Check Commands
# HTTP endpoint check (requires curl in the image)
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1
# TCP port check (no curl needed)
HEALTHCHECK CMD nc -z localhost 3000 || exit 1
# PostgreSQL readiness
HEALTHCHECK CMD pg_isready -U postgres || exit 1
# Redis ping
HEALTHCHECK CMD redis-cli ping || exit 1
[!TIP] For Alpine-based images that don't have
curl, usewget:HEALTHCHECK CMD wget --spider -q http://localhost:3000/health || exit 1Or for images without any HTTP tools, use a simple Node.js script or a compiled binary health checker.
Q: What are Docker Logging Best Practices?
Answer:
Proper logging is essential for debugging, monitoring, and auditing containerized applications.
The Golden Rule: Log to stdout/stderr
Docker captures everything written to the container's stdout and stderr streams. Applications should NOT write logs to files inside the container.
# ❌ Bad: Logs trapped inside the container filesystem
CMD ["node", "index.js", ">>", "/var/log/app.log"]
# ✅ Good: Logs go to stdout (Docker captures them)
CMD ["node", "index.js"]
Why stdout/stderr?
docker logsonly shows stdout/stderr output.- Log drivers can only capture stdout/stderr.
- Files inside the container are lost when the container is removed.
- Centralized logging systems (ELK, Datadog, CloudWatch) integrate with Docker's log drivers, not container files.
Docker Log Drivers
Docker supports pluggable logging drivers that determine where container logs are sent:
# View current log driver
docker info --format '{{.LoggingDriver}}'
# Run a container with a specific driver
docker run --log-driver=json-file --log-opt max-size=10m --log-opt max-file=3 myapp
| Driver | Destination |
|---|---|
json-file | Local JSON files (default) |
syslog | Syslog daemon |
fluentd | Fluentd collector |
awslogs | AWS CloudWatch |
gcplogs | Google Cloud Logging |
splunk | Splunk HTTP Event Collector |
none | Discard all logs |
Log Rotation (Critical!)
The default json-file driver has no size limit. Logs will grow until they fill the disk.
// /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
In Compose:
services:
api:
image: myapp
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
[!CAUTION] Forgetting log rotation is one of the most common causes of production outages in Docker environments. A single chatty container can fill up the host's disk in hours.
Useful Commands
# View logs
docker logs mycontainer
# Follow logs (like tail -f)
docker logs -f mycontainer
# Show last 100 lines
docker logs --tail 100 mycontainer
# Show logs since a timestamp
docker logs --since 2024-01-01T00:00:00 mycontainer
Q: When would you use Docker (Compose/Swarm) vs Kubernetes?
Answer:
This is a high-level architecture question that interviewers use to gauge your understanding of container orchestration.
Docker Compose
- Scope: Single host only.
- Use case: Local development, CI/CD test environments, small single-server deployments.
- Complexity: Minimal. A single YAML file.
- Scaling:
docker compose up --scale web=3(basic, no load balancer). - Networking: Automatic service discovery on the same host.
Docker Swarm
- Scope: Multi-host cluster (built into Docker Engine).
- Use case: Simple production setups, small teams that want orchestration without Kubernetes complexity.
- Features: Service discovery, load balancing, rolling updates, secrets management.
- Scaling:
docker service scale web=10(across multiple nodes). - Learning curve: Low (if you know Docker, you know 80% of Swarm).
Kubernetes (K8s)
- Scope: Multi-host cluster (industry standard for container orchestration).
- Use case: Large-scale production, microservices, multi-team environments.
- Features: Everything Swarm has, plus: auto-scaling (HPA/VPA), self-healing, RBAC, custom resource definitions (CRDs), Ingress controllers, service mesh support, advanced scheduling.
- Scaling: Handles thousands of nodes and hundreds of thousands of pods.
- Learning curve: Steep. Requires understanding of Pods, Deployments, Services, ConfigMaps, etc.
Comparison Table
| Feature | Compose | Swarm | Kubernetes |
|---|---|---|---|
| Multi-host | ❌ | ✅ | ✅ |
| Auto-scaling | ❌ | ❌ | ✅ (HPA) |
| Self-healing | ❌ | ✅ (basic) | ✅ (advanced) |
| Rolling updates | ❌ | ✅ | ✅ |
| Load balancing | ❌ | ✅ (built-in) | ✅ (Service + Ingress) |
| Secrets | File-based | ✅ (encrypted) | ✅ (encrypted) |
| Community/Ecosystem | N/A | Declining | Dominant |
| Setup complexity | Minutes | Hours | Days |
When to Use What?
- Compose: You're developing locally or running a small app on a single server.
- Swarm: You need multi-host orchestration but want something simpler than Kubernetes. (Note: Swarm adoption is declining; most teams go straight to K8s.)
- Kubernetes: You need production-grade orchestration, auto-scaling, advanced networking, or you're operating at scale.
[!NOTE] In interviews, it's perfectly acceptable to say: "We used Docker Compose for local dev and Kubernetes for production." This shows practical understanding of using the right tool for the right environment.