5 minute read

Part of my 290-day AI Engineering journey - learning in public, one hour a day, writing everything in plain English so beginners can follow along. The blog is written with the help of AI

What Is This?

Most ML training happens on remote Linux servers with no graphical interface. No mouse, no icons, no desktop. Just a black terminal window and a blinking cursor. When your training job crashes at 2am, the terminal is your only way in.

Today I learned the commands every ML engineer uses daily - to navigate filesystems, search through logs, and automate repetitive tasks with bash scripts.


The Analogy

Your Mac’s graphical desktop is the “front of house” at a restaurant - nice, visual, designed for customers. The terminal is the kitchen. It’s less pretty, but it’s where the real work happens. Chefs work in the kitchen because it’s faster, more precise, and most importantly: you can automate it.

You can’t write a script to click icons. You can write a script to process a thousand log files.


The Concept Explained Simply

What is a shell?

The shell is the program that reads what you type and runs it. On Mac, it’s zsh. On most Linux servers, it’s bash. They’re 95% identical for everyday use.

What is a pipe?

A pipe (|) chains two commands together. The output of the first becomes the input of the second.

cat log.txt | grep "ERROR"   # read log.txt, then filter for lines containing "ERROR"

What is redirection?

> saves output to a file. >> appends to a file.

grep "ERROR" log.txt > errors.txt    # save error lines to a new file
grep "WARNING" log.txt >> errors.txt  # add warning lines to the same file

What is a bash script?

A text file containing a list of commands. You run it once, and it executes all the commands in sequence. This is how ML engineers automate jobs that would otherwise take them 30 minutes of typing every time.


How Real Companies Use This

When a training job finishes at NeuralCorp, an automated bash script archives the log files, checks for errors, and sends a Slack alert. The engineer just sees “training complete” in their phone notification - they never had to manually check anything.

Understanding the terminal is what lets you build that kind of automation.


Step-by-Step: Try It Yourself

1. Navigation

pwd                     # where am I right now?
ls -la                  # list everything including hidden files
cd ~/neuralcorp-setup   # go to a specific folder (Tab autocompletes!)
cd ..                   # go up one level
cd -                    # go back to where you just were

2. Create a practice log file

mkdir ~/neuralcorp-terminal-practice
mkdir -p ~/neuralcorp-terminal-practice/data
mkdir -p ~/neuralcorp-terminal-practice/scripts
cd ~/neuralcorp-terminal-practice

cat > data/sample_log.txt << 'EOF'
2026-03-22 10:00:01 | INFO  | Training started | model=baseline_v1
2026-03-22 10:00:02 | INFO  | Epoch 1/10 | loss=1.0000 | acc=0.45
2026-03-22 10:00:03 | INFO  | Epoch 2/10 | loss=0.8200 | acc=0.58
2026-03-22 10:00:04 | ERROR | NaN loss detected at epoch 3 - stopping early
2026-03-22 10:01:00 | INFO  | Epoch 5/10 | loss=0.5300 | acc=0.72
2026-03-22 10:03:00 | WARNING | GPU memory at 92% - consider reducing batch size
2026-03-22 10:07:01 | INFO  | Training complete | best_acc=0.85
EOF

3. Search with grep

grep "ERROR" data/sample_log.txt        # find error lines
grep -n "WARNING" data/sample_log.txt   # show line numbers too
grep -c "INFO" data/sample_log.txt      # count INFO lines (returns: 5)
grep -v "INFO" data/sample_log.txt      # show everything EXCEPT INFO lines

# Pipe two commands together
grep -E "ERROR|WARNING" data/sample_log.txt | wc -l   # count errors + warnings

What you’ll see for the last command: 2

4. Read files

cat data/sample_log.txt       # print entire file
head -3 data/sample_log.txt   # first 3 lines only (quick preview)
tail -3 data/sample_log.txt   # last 3 lines (most recent log entries)
tail -f data/sample_log.txt   # live follow - watch a file grow in real-time
                              # (Ctrl+C to stop - essential for monitoring live training)
wc -l data/sample_log.txt     # count lines

5. Write a bash script

Create scripts/cleanup.sh:

#!/usr/bin/env bash
# cleanup.sh - Archives old log files after a training run
#
# Usage: bash scripts/cleanup.sh <log_directory>
# Example: bash scripts/cleanup.sh data/

set -e   # stop immediately if any command fails (safety - always include this)

# Check that the user passed a directory argument
if [ "$#" -ne 1 ]; then
    echo "Usage: bash scripts/cleanup.sh <log_directory>" >&2
    exit 1   # exit code 1 = failure
fi

LOG_DIR="$1"   # named variable - much clearer than using $1 everywhere

# Verify the directory exists
if [ ! -d "$LOG_DIR" ]; then
    echo "Error: '$LOG_DIR' is not a directory." >&2
    exit 1
fi

# Create a timestamped archive folder
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
ARCHIVE_DIR="${LOG_DIR}/archived_${TIMESTAMP}"
mkdir -p "$ARCHIVE_DIR"
echo "Created: $ARCHIVE_DIR"

# Move all .txt files into the archive
LOG_COUNT=0
for log_file in $(find "$LOG_DIR" -maxdepth 1 -name "*.txt" -type f); do
    mv "$log_file" "$ARCHIVE_DIR/"
    LOG_COUNT=$((LOG_COUNT + 1))
    echo "  Archived: $(basename $log_file)"
done

echo ""
echo "Done. Archived $LOG_COUNT log file(s)."

Run it:

chmod +x scripts/cleanup.sh          # make it executable
bash scripts/cleanup.sh data/        # run it

What you’ll see:

Created: data/archived_20260322_100530
  Archived: sample_log.txt

Done. Archived 1 log file(s).

Try it with no arguments:

bash scripts/cleanup.sh
# Usage: bash scripts/cleanup.sh <log_directory>

Common Mistakes

❌ permission denied when running script
✅ chmod +x scripts/cleanup.sh  (you forgot to make it executable)

❌ grep returns nothing (but you expect results)
✅ Check case - grep "error" won't match "ERROR". Use grep -i for case-insensitive search.

❌ Script keeps running after a failed command
✅ Add set -e at the top - without it, bash ignores errors and keeps going

✅ Today’s One-Sentence Lesson

The terminal isn’t an intimidating black box - it’s a precise, automatable tool that lets you work 10x faster and debug problems you could never find through a graphical interface.


🔗 Up Next

Day 4: Google Colab & Kaggle - how to access free GPUs and TPUs in the cloud, and why even senior ML engineers at big companies use them for experiments.


Tags: #Linux #Terminal #BashScript #AIEngineering #MLOps #Beginners #CommandLine