Debugging & Troubleshooting
#Debugging & Troubleshooting
After this lesson you'll know:
- how to systematically analyze container problems
- why a container won't start โ the 3-step diagnosis
- docker diff, cp and exec for deep analysis
- understand exit codes and resource issues
#Container won't start? The 3-Step Diagnosis
Most container problems can be solved with these three steps. Don't guess โ check.
# Step 1: Logs (they tell you the error message)
docker logs container-name
docker logs --tail 50 --timestamps container-name
# Step 2: Status (is it even running?)
docker ps -a | grep container-name
# Exit code โ 0 means: crashed
# Step 3: Inspect (the whole truth)
docker inspect container-name | jq '.[0].State'
# โ Status, ExitCode, Error, StartedAt, FinishedAt
Pass a JSON path to avoid endless scrolling: docker inspect --format='{{.State.ExitCode}}' container.
#Understanding Exit Codes
A container ends with an exit code. The code tells you what happened:
| Code | Meaning | Most Likely Cause |
|---|---|---|
0 | Success | Everything OK, command completed |
1 | General error | App crashed (exception, panic) |
127 | Command not found | entrypoint/cmd doesn't exist, wrong path |
137 | SIGKILL (128 + 9) | OOM-Killed (memory too low) or docker kill |
139 | SIGSEGV (128 + 11) | Segmentation Fault โ memory corruption |
143 | SIGTERM (128 + 15) | Docker requested graceful shutdown |
# Check exit code of a stopped container
docker inspect --format='{{.State.ExitCode}}' container-name
#docker diff โ what changed?
docker diff shows what files a container modified, added, or deleted compared to the image. Perfect for checking if a setup script actually worked.
# Shows changes in the container
# A = Added, C = Changed, D = Deleted
docker diff container-name
# Example: Did nginx really load the config?
docker diff my-nginx | grep nginx.conf
# A /etc/nginx/conf.d/default.conf โ yes, it was created
# Great for: "Did my volumes actually work?"
docker diff db-container | head -20
#docker cp โ copy files out of a container
No problem โ you can still copy files as long as the container exists (even when stopped).
# From container to host (works with stopped containers too!)
docker cp container-name:/app/logs/app.log ./app.log
# From host to container
docker cp ./config.yml container-name:/app/config.yml
# Entire directories
docker cp container-name:/var/log/ ./
#Override Entrypoint โ the debug trump card
Override the entrypoint and start a shell โ then investigate at your leisure.
# Start container and open a shell (overrides CMD/ENTRYPOINT)
docker run -it --rm --entrypoint sh alpine
# With your broken app image
docker run -it --rm --entrypoint sh my-app:latest
# Now you can walk around the container and check:
# - Is a file missing? ls -la /app
# - Are permissions correct? stat /app
# - Does the runtime work? node --version
#Debugging Resource Issues
When a container suddenly disappears, the memory limit is often the culprit. Containers are dead, not visible in docker ps โ docker ps -a shows them.
# Was the container OOM-killed?
docker inspect --format='{{.State.OOMKilled}}' container-name
# โ true = memory was too tight
# CPU limit too low? Container is running but slow?
docker stats container-name --no-stream
# โ CPU% consistently at 100%? Increase the limit.
# Disk full? Docker can't pull new images
# docker system df shows the status
docker system df
# If RECLAIMABLE is high โ prune
#Port Conflicts & Networks
# Error: "port is already allocated"
# Find out what's using it
lsof -i :8080
docker ps | grep 8080 # Maybe another container?
# Alternative: use a different port
docker run -p 8081:80 nginx
# DNS issues in the container?
docker run alpine ping -c 1 google.com # Internet access?
docker run --dns 8.8.8.8 alpine ping google.com # Set DNS manually
# Test container-to-container communication
docker network create testnet
docker run -d --name server --network testnet alpine sleep 1000
docker run -it --rm --network testnet alpine sh
# In the shell: ping server
#Debugging Healthchecks
Test the healthcheck command manually inside the container โ before blaming Docker.
# 1. Enter the container
docker exec -it container-name sh
# 2. Run the exact healthcheck command
# The command must exit with 0 to be healthy
curl -f http://localhost:3000/health
echo $? # 0 = healthy, 1+ = unhealthy
# 3. Check health status
docker inspect --format='{{json .State.Health}}' container-name
# โ Status: healthy / unhealthy / starting
# โ Log: shows recent checks with exit codes
#โ Try it out
- Start a container that crashes immediately (
docker run alpine invalid-command). Check the exit code withdocker inspectโ what do you see? (Hint: 127) - Start a container with a memory limit:
docker run --memory=64m alpine sh -c "dd if=/dev/zero of=/dev/null bs=1M". Check withdocker inspect --format='{{.State.OOMKilled}}'if it got killed - Start an Nginx container, copy the config out with
docker cpand look at it:docker cp container-name:/etc/nginx/nginx.conf ./ - Start an image with overridden entrypoint:
docker run -it --rm --entrypoint sh alpine. What tools are available inside? (which curl,which ping)
#๐ Summary
- docker logs + docker ps -a + docker inspect = the 3-step diagnosis for startup issues
- Exit codes tell you WHAT happened: 1=App error, 127=Command missing, 137=OOM Kill
- docker diff shows file system changes โ great for 'did the setup work?' questions
- docker cp works even with stopped containers
- --entrypoint sh overrides the startup command โ perfect for exploring an image
- OOM-Killed is the most common reason for containers that suddenly disappear
Test your knowledge with a quick quiz!
5 questions ยท +120 XP