Back
4/4

Debugging & Troubleshooting

+15 XP on completion

#Debugging & Troubleshooting

After this lesson you'll know:

  • how to systematically analyze container problems
  • why a container won't start โ€” the 3-step diagnosis
  • docker diff, cp and exec for deep analysis
  • understand exit codes and resource issues

#Container won't start? The 3-Step Diagnosis

โš ๏ธ Stay calm, go systematic

Most container problems can be solved with these three steps. Don't guess โ€” check.

# Step 1: Logs (they tell you the error message)
docker logs container-name
docker logs --tail 50 --timestamps container-name

# Step 2: Status (is it even running?)
docker ps -a | grep container-name
# Exit code โ‰  0 means: crashed

# Step 3: Inspect (the whole truth)
docker inspect container-name | jq '.[0].State'
# โ†’ Status, ExitCode, Error, StartedAt, FinishedAt
๐Ÿ’ก docker inspect is your secret weapon

Pass a JSON path to avoid endless scrolling: docker inspect --format='{{.State.ExitCode}}' container.

#Understanding Exit Codes

A container ends with an exit code. The code tells you what happened:

CodeMeaningMost Likely Cause
0SuccessEverything OK, command completed
1General errorApp crashed (exception, panic)
127Command not foundentrypoint/cmd doesn't exist, wrong path
137SIGKILL (128 + 9)OOM-Killed (memory too low) or docker kill
139SIGSEGV (128 + 11)Segmentation Fault โ€” memory corruption
143SIGTERM (128 + 15)Docker requested graceful shutdown
# Check exit code of a stopped container
docker inspect --format='{{.State.ExitCode}}' container-name

#docker diff โ€” what changed?

๐Ÿ’ก Make changes visible

docker diff shows what files a container modified, added, or deleted compared to the image. Perfect for checking if a setup script actually worked.

# Shows changes in the container
# A = Added, C = Changed, D = Deleted
docker diff container-name

# Example: Did nginx really load the config?
docker diff my-nginx | grep nginx.conf
# A /etc/nginx/conf.d/default.conf โ†’ yes, it was created

# Great for: "Did my volumes actually work?"
docker diff db-container | head -20

#docker cp โ€” copy files out of a container

โš ๏ธ Container stopped?

No problem โ€” you can still copy files as long as the container exists (even when stopped).

# From container to host (works with stopped containers too!)
docker cp container-name:/app/logs/app.log ./app.log

# From host to container
docker cp ./config.yml container-name:/app/config.yml

# Entire directories
docker cp container-name:/var/log/ ./

#Override Entrypoint โ€” the debug trump card

โš ๏ธ Container crashes immediately?

Override the entrypoint and start a shell โ€” then investigate at your leisure.

# Start container and open a shell (overrides CMD/ENTRYPOINT)
docker run -it --rm --entrypoint sh alpine

# With your broken app image
docker run -it --rm --entrypoint sh my-app:latest
# Now you can walk around the container and check:
# - Is a file missing? ls -la /app
# - Are permissions correct? stat /app
# - Does the runtime work? node --version

#Debugging Resource Issues

โš ๏ธ Memory limit too low โ†’ OOM Kill

When a container suddenly disappears, the memory limit is often the culprit. Containers are dead, not visible in docker ps โ€” docker ps -a shows them.

# Was the container OOM-killed?
docker inspect --format='{{.State.OOMKilled}}' container-name
# โ†’ true = memory was too tight

# CPU limit too low? Container is running but slow?
docker stats container-name --no-stream
# โ†’ CPU% consistently at 100%? Increase the limit.

# Disk full? Docker can't pull new images
# docker system df shows the status
docker system df
# If RECLAIMABLE is high โ‡’ prune

#Port Conflicts & Networks

# Error: "port is already allocated"
# Find out what's using it
lsof -i :8080
docker ps | grep 8080  # Maybe another container?

# Alternative: use a different port
docker run -p 8081:80 nginx

# DNS issues in the container?
docker run alpine ping -c 1 google.com  # Internet access?
docker run --dns 8.8.8.8 alpine ping google.com  # Set DNS manually

# Test container-to-container communication
docker network create testnet
docker run -d --name server --network testnet alpine sleep 1000
docker run -it --rm --network testnet alpine sh
# In the shell: ping server

#Debugging Healthchecks

๐Ÿ’ก Healthcheck failing and you don't know why?

Test the healthcheck command manually inside the container โ€” before blaming Docker.

# 1. Enter the container
docker exec -it container-name sh

# 2. Run the exact healthcheck command
# The command must exit with 0 to be healthy
curl -f http://localhost:3000/health
echo $?  # 0 = healthy, 1+ = unhealthy

# 3. Check health status
docker inspect --format='{{json .State.Health}}' container-name
# โ†’ Status: healthy / unhealthy / starting
# โ†’ Log: shows recent checks with exit codes

#โœ‹ Try it out

  • Start a container that crashes immediately (docker run alpine invalid-command). Check the exit code with docker inspect โ€” what do you see? (Hint: 127)
  • Start a container with a memory limit: docker run --memory=64m alpine sh -c "dd if=/dev/zero of=/dev/null bs=1M". Check with docker inspect --format='{{.State.OOMKilled}}' if it got killed
  • Start an Nginx container, copy the config out with docker cp and look at it: docker cp container-name:/etc/nginx/nginx.conf ./
  • Start an image with overridden entrypoint: docker run -it --rm --entrypoint sh alpine. What tools are available inside? (which curl, which ping)

#๐Ÿ“Œ Summary

  • docker logs + docker ps -a + docker inspect = the 3-step diagnosis for startup issues
  • Exit codes tell you WHAT happened: 1=App error, 127=Command missing, 137=OOM Kill
  • docker diff shows file system changes โ€” great for 'did the setup work?' questions
  • docker cp works even with stopped containers
  • --entrypoint sh overrides the startup command โ€” perfect for exploring an image
  • OOM-Killed is the most common reason for containers that suddenly disappear
Docker-Pro Challenge

Test your knowledge with a quick quiz!

5 questions ยท +120 XP

โ† โ†’ to navigate