Learn the Unix shell โ one command at a time.
The command line is how you'll drive HPC systems like Klone, run bioinformatics tools, and string small programs together into powerful pipelines. This short, hands-on tutorial takes you from "what is a terminal?" to building your first pipe โ no prior experience needed.
Set up & how to use this site
Don't just read โ type every command into your terminal as you go. The shell rewards practice, and muscle memory matters more than memorizing. Check off boxes as you finish each section; your progress saves in this browser.
data-shell.zip (also at http://gannet.fish.washington.edu/seashell/snaps/data-shell.zip), unzip it, and open a terminal. That can be the Terminal tab inside RStudio, or a standalone app (macOS Terminal, Windows WSL/Git Bash, or any Linux shell). So we all have the same experience, cd into the data-shell directory before running the examples.
Navigate
Sections 1โ3: find out where you are, see what's around you, and move through the file system.
Create
Sections 4โ5: make directories and files, then move, copy, rename, and delete them.
Combine
Sections 6โ8: wildcards, redirection, and pipes โ the shell's real superpower.
The file system & where you are
The part of the operating system that manages your data is the file system. It organizes data into files (which hold information) and directories, or "folders" (which hold files or other directories).
When you open a shell you'll see a prompt โ often a dollar sign โ telling you the shell is waiting for input. Throughout this tutorial, lines starting with $ are commands you type; the rest is the output the shell prints back.
$ whoami
jovyan
When you type whoami and press Enter, the shell (1) finds a program called whoami, (2) runs it, (3) displays its output โ here, your user ID โ and (4) shows a fresh prompt, ready for the next command.
pwd โ print working directory
Find out where you are with pwd. Your current working directory is the default directory the shell runs commands in unless you say otherwise.
$ pwd
/home/jovyan
At the very top of the file system is the root directory, which holds everything else. We write it as a single slash / โ that's the leading slash in /home/jovyan.
/. At the front of a path, / means the root directory. Inside a path (like home/jovyan) it's just a separator between directory names.
Listing what's there with ls
Make sure you're in the data-shell directory, then list its contents with ls ("listing"):
$ ls
creatures molecules pizza.cfg
data north-pacific-gyre solar.pdf
Desktop notes.txt writing
ls prints the names of files and directories in alphabetical order, arranged in columns. Add the flag -F to mark directories with a trailing / so they're easy to tell apart:
$ ls -F
creatures/ molecules/ pizza.cfg
data/ north-pacific-gyre/ solar.pdf
Desktop/ notes.txt writing/
Now you can see data-shell holds several sub-directories (trailing /) and a few plain files like notes.txt and solar.pdf.
ls and -F. Without it, the shell looks for a command called ls-F โ which doesn't exist. Most commands take options (flags) that begin with -.
Relative & absolute paths
You can point ls at something other than the current directory by giving it an argument โ a path with no leading dash.
Relative path
List the contents of the data sub-directory:
$ ls -F data
amino-acids.txt animal-counts/ animals.txt elements/
morse.txt pdb/ planets.txt salmon.txt sunspot.txt
data here is a relative path: it tells ls how to find something starting from where you are, not from the root of the file system. It has no leading slash.
Absolute path
Run it with a leading slash and you get a different answer โ because /data is an absolute path measured from the root:
$ ls -F /data
ls: cannot access '/data': No such file or directory
You'll get a "No such file" warning here, because that directory doesn't exist at the root. The leading / tells the computer to start from the very top of the file system, so an absolute path always points to exactly one place no matter where you currently are.
To list this same data directory by its absolute path, you'd write:
$ ls -F /home/jovyan/data-shell/data/
That works regardless of your current pwd.
Two special shortcuts
| Symbol | Means |
|---|---|
. | The current directory. |
.. | The directory above the current one (its parent). |
~ | Your home directory (e.g. ~/data = /home/jovyan/data). |
Creating & deleting: mkdir, nano, rm
Now that you can explore, let's make things. Back in data-shell, create a new directory called thesis:
$ mkdir thesis
mkdir means "make directory" โ it produces no output. Since thesis is a relative path, the new directory is created inside the current working directory. Confirm with ls -F, then check it's empty:
$ ls -F thesis
Editing a file with nano
Move into thesis and open the simple text editor Nano to create draft.txt:
$ cd thesis
$ nano draft.txt
Type a few lines, then press Ctrl+O to write (save) to disk, and Ctrl+X to quit back to the shell. (Unix docs often write ^O as shorthand for "Control-O.") Now ls shows your new file:
$ ls
draft.txt
nano handles plain text only โ great for learning because anyone can drive it. For real work, many people use vim or emacs in the terminal, or a graphical editor like VS Code. Whatever you use, know where it saves files (usually your current working directory if launched from the shell).
Deleting with rm
Remove the file with rm ("remove"):
$ rm draft.txt
Run ls again and the output is empty โ the file is gone.
rm a file it's unhooked from the file system and its space can be reused immediately. There's no reliable undo โ double-check before you delete, especially with wildcards.
Moving, renaming & copying: mv and cp
Re-create a file, then learn the two commands you'll use constantly to reorganize your work.
$ mkdir thesis
$ nano thesis/draft.txt # type something, ^O to save, ^X to quit
$ ls thesis
draft.txt
mv โ move or rename
mv ("move") takes a source and a destination. Give it a new name as the destination and it effectively renames the file:
$ mv thesis/draft.txt thesis/quotes.txt
$ ls thesis
quotes.txt
Give it a directory as the destination and it keeps the filename but puts the file somewhere new. The special name . (current directory) moves it right here:
$ mv thesis/quotes.txt .
$ ls thesis # now empty
mv works on directories too โ there's no separate mvdir command.
cp โ copy
cp works just like mv, but leaves the original in place:
$ cp quotes.txt thesis/quotations.txt
$ ls quotes.txt thesis/quotations.txt
quotes.txt thesis/quotations.txt
Both paths exist, proving you made a copy rather than moving the original.
rm -r ("recursive"): rm -r thesis. Powerful and unforgiving โ there's no undo, so be certain of the target first.
Wildcards & counting with wc
The shell's real power is combining simple programs. Start in the molecules directory, which holds six .pdb files (Protein Data Bank format).
$ ls molecules
cubane.pdb ethane.pdb methane.pdb
octane.pdb pentane.pdb propane.pdb
wc โ word count
cd into molecules and run wc ("word count"), which reports lines, words, and characters per file. The * in *.pdb is a wildcard that the shell expands into the full list of matching files:
$ cd molecules
$ wc *.pdb
20 156 1158 cubane.pdb
12 84 622 ethane.pdb
9 57 422 methane.pdb
30 246 1828 octane.pdb
21 165 1226 pentane.pdb
15 111 825 propane.pdb
107 819 6081 total
Use -l for lines only, -w for words, -c for characters:
$ wc -l *.pdb
20 cubane.pdb
12 ethane.pdb
9 methane.pdb
30 octane.pdb
21 pentane.pdb
15 propane.pdb
107 total
Wildcards
| Pattern | Matches |
|---|---|
* | Zero or more characters โ *.pdb matches every .pdb file. |
p*.pdb | Only pentane.pdb and propane.pdb (must start with p). |
? | Exactly one character โ p?.pdb matches p5.pdb but not propane.pdb. |
*[AB].txt | Files ending in A or B before .txt โ handy for selecting valid samples. |
The shell expands wildcards into a list of filenames before the command runs, so wc and ls never see the * itself โ only what it matched.
Redirecting output: > and cat
Which file is shortest? Easy with six files โ but what about 6000? The first step is to save the counts to a file instead of the screen.
$ wc -l *.pdb > lengths
The > redirects the command's output into a file, creating it (or overwriting it) as needed. There's no screen output because everything went into lengths instead. Confirm it exists:
$ ls lengths
lengths
cat โ show a file
cat ("concatenate") prints file contents to the screen:
$ cat lengths
20 cubane.pdb
12 ethane.pdb
9 methane.pdb
30 octane.pdb
21 pentane.pdb
15 propane.pdb
107 total
sort & head
sort -n sorts numerically (without changing the file), and head -1 shows just the first line:
$ sort -n lengths > sorted-lengths
$ head -1 sorted-lengths
9 methane.pdb
head -1 means "first line"; -20 would give the first 20. Since the list is sorted shortest-first, this is the file with the fewest lines. (< redirects the other way โ feeding a file into a command's input, e.g. wc -l < mydata.dat.)
Pipes: joining commands with |
Those intermediate files (lengths, sorted-lengths) make things hard to follow. A pipe lets one command's output flow straight into the next โ no temp files needed.
$ sort -n lengths | head -1
9 methane.pdb
The vertical bar | sends the output of the command on its left as the input to the command on its right. Chain as many as you like โ here, wc โ sort โ head with no intermediate files at all:
$ wc -l *.pdb | sort -n | head -1
9 methane.pdb
A real example: sanity-checking data files
Imagine 1520 sample files that should each have 300 lines. Find the shortest five โ a quick way to spot a truncated file:
$ wc -l *.txt | sort -n | head -5
240 NENE02018B.txt
300 NENE01729A.txt
300 NENE01729B.txt
300 NENE01736A.txt
300 NENE01751A.txt
One file is 60 lines short โ worth investigating. Swap head for tail -5 to check the longest files instead, and use a wildcard like ls *Z.txt to find flagged samples. Same handful of small tools, endlessly recombined.
Try it yourself
Work through these in the data-shell data. Predict the output first, then run it and see if you were right โ that's how the commands stick.
A. Trace a sequence of commands
Starting in /home/jovyan/data with a file proteins.dat, what does the final ls print?
$ mkdir recombine
$ mv proteins.dat recombine
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls
B. Build a pipeline
In molecules, write one pipeline that prints the name of the longest .pdb file. (Hint: wc, sort -n, and tail.)
C. Wildcards in practice
List only the files in molecules whose names start with a vowel. Then count how many .pdb files there are in total using a wildcard and wc -l.
- Downloaded data-shell and
cd'd into it - Navigated with
pwd,ls, andcd - Created, renamed, copied, and deleted files
- Used a wildcard with
wc - Redirected output to a file with
> - Built a pipeline with
| - Completed exercises AโC
Command cheat sheet & key points
Navigating
pwd | Print working directory |
ls -F | List, marking directories |
cd dir | Change directory |
cd .. | Go up one level |
cd ~ | Go home |
Files & directories
mkdir d | Make a directory |
nano f | Edit a text file |
mv a b | Move / rename |
cp a b | Copy |
rm f / rm -r d | Delete file / directory |
Inspecting & combining
cat f | Print file contents |
wc -l f | Count lines |
sort -n f | Sort numerically |
head -n / tail -n | First / last n lines |
Redirection & pipes
cmd > f | Send output to a file |
cmd < f | Read input from a file |
a | b | Pipe output of a into b |
* ? [AB] | Wildcards |
Key points to remember
- The file system stores files inside directories, which nest into a directory tree.
/alone is the root. - A relative path starts from where you are; an absolute path starts from the root.
.= current directory,..= parent,~= home.- Most commands take flags beginning with
-(likels -Forwc -l). - The shell has no trash bin โ deletion is permanent.
>redirects output to a file;|pipes output between commands. Combine small tools to do big things.- Use Tab completion always โ it's faster and prevents typos.
You've got the fundamentals. These same commands are what you'll use to drive Klone, run bioinformatics tools, and keep your projects organized. ๐