Roberts Lab ยท Self-directed tutorial

Learn the Unix shell โ€” one command at a time.

The command line is how you'll drive HPC systems like Klone, run bioinformatics tools, and string small programs together into powerful pipelines. This short, hands-on tutorial takes you from "what is a terminal?" to building your first pipe โ€” no prior experience needed.

โฑ๏ธ ~1โ€“2 hours, self-paced ๐Ÿš bash / Unix shell ๐Ÿ’ป Type along in a terminal โœ… No prior experience needed
Start here

Set up & how to use this site

Don't just read โ€” type every command into your terminal as you go. The shell rewards practice, and muscle memory matters more than memorizing. Check off boxes as you finish each section; your progress saves in this browser.

Before you start: download data-shell.zip (also at http://gannet.fish.washington.edu/seashell/snaps/data-shell.zip), unzip it, and open a terminal. That can be the Terminal tab inside RStudio, or a standalone app (macOS Terminal, Windows WSL/Git Bash, or any Linux shell). So we all have the same experience, cd into the data-shell directory before running the examples.
๐Ÿงญ

Navigate

Sections 1โ€“3: find out where you are, see what's around you, and move through the file system.

โœ๏ธ

Create

Sections 4โ€“5: make directories and files, then move, copy, rename, and delete them.

๐Ÿ”—

Combine

Sections 6โ€“8: wildcards, redirection, and pipes โ€” the shell's real superpower.

Section 1

The file system & where you are

The part of the operating system that manages your data is the file system. It organizes data into files (which hold information) and directories, or "folders" (which hold files or other directories).

When you open a shell you'll see a prompt โ€” often a dollar sign โ€” telling you the shell is waiting for input. Throughout this tutorial, lines starting with $ are commands you type; the rest is the output the shell prints back.

$ whoami
jovyan

When you type whoami and press Enter, the shell (1) finds a program called whoami, (2) runs it, (3) displays its output โ€” here, your user ID โ€” and (4) shows a fresh prompt, ready for the next command.

pwd โ€” print working directory

Find out where you are with pwd. Your current working directory is the default directory the shell runs commands in unless you say otherwise.

$ pwd
/home/jovyan

At the very top of the file system is the root directory, which holds everything else. We write it as a single slash / โ€” that's the leading slash in /home/jovyan.

Two meanings for /. At the front of a path, / means the root directory. Inside a path (like home/jovyan) it's just a separator between directory names.
Section 2

Listing what's there with ls

Make sure you're in the data-shell directory, then list its contents with ls ("listing"):

$ ls
creatures  molecules           pizza.cfg
data       north-pacific-gyre  solar.pdf
Desktop    notes.txt           writing

ls prints the names of files and directories in alphabetical order, arranged in columns. Add the flag -F to mark directories with a trailing / so they're easy to tell apart:

$ ls -F
creatures/  molecules/           pizza.cfg
data/       north-pacific-gyre/  solar.pdf
Desktop/    notes.txt            writing/

Now you can see data-shell holds several sub-directories (trailing /) and a few plain files like notes.txt and solar.pdf.

Mind the space. There's a space between ls and -F. Without it, the shell looks for a command called ls-F โ€” which doesn't exist. Most commands take options (flags) that begin with -.
Section 3

Relative & absolute paths

You can point ls at something other than the current directory by giving it an argument โ€” a path with no leading dash.

Relative path

List the contents of the data sub-directory:

$ ls -F data
amino-acids.txt  animal-counts/  animals.txt  elements/
morse.txt        pdb/            planets.txt  salmon.txt   sunspot.txt

data here is a relative path: it tells ls how to find something starting from where you are, not from the root of the file system. It has no leading slash.

Absolute path

Run it with a leading slash and you get a different answer โ€” because /data is an absolute path measured from the root:

$ ls -F /data
ls: cannot access '/data': No such file or directory

You'll get a "No such file" warning here, because that directory doesn't exist at the root. The leading / tells the computer to start from the very top of the file system, so an absolute path always points to exactly one place no matter where you currently are.

To list this same data directory by its absolute path, you'd write:

$ ls -F /home/jovyan/data-shell/data/

That works regardless of your current pwd.

Tab completion saves typing. Start typing a name and press Tab โ€” the shell completes it for you when there's only one match, or shows the options when there are several. Use it constantly; it also prevents typos.

Two special shortcuts

SymbolMeans
.The current directory.
..The directory above the current one (its parent).
~Your home directory (e.g. ~/data = /home/jovyan/data).
Section 4

Creating & deleting: mkdir, nano, rm

Now that you can explore, let's make things. Back in data-shell, create a new directory called thesis:

$ mkdir thesis

mkdir means "make directory" โ€” it produces no output. Since thesis is a relative path, the new directory is created inside the current working directory. Confirm with ls -F, then check it's empty:

$ ls -F thesis

Editing a file with nano

Move into thesis and open the simple text editor Nano to create draft.txt:

$ cd thesis
$ nano draft.txt

Type a few lines, then press Ctrl+O to write (save) to disk, and Ctrl+X to quit back to the shell. (Unix docs often write ^O as shorthand for "Control-O.") Now ls shows your new file:

$ ls
draft.txt
Which editor? nano handles plain text only โ€” great for learning because anyone can drive it. For real work, many people use vim or emacs in the terminal, or a graphical editor like VS Code. Whatever you use, know where it saves files (usually your current working directory if launched from the shell).

Deleting with rm

Remove the file with rm ("remove"):

$ rm draft.txt

Run ls again and the output is empty โ€” the file is gone.

Deleting is forever. The shell has no trash bin. When you rm a file it's unhooked from the file system and its space can be reused immediately. There's no reliable undo โ€” double-check before you delete, especially with wildcards.
Section 5

Moving, renaming & copying: mv and cp

Re-create a file, then learn the two commands you'll use constantly to reorganize your work.

$ mkdir thesis
$ nano thesis/draft.txt   # type something, ^O to save, ^X to quit
$ ls thesis
draft.txt

mv โ€” move or rename

mv ("move") takes a source and a destination. Give it a new name as the destination and it effectively renames the file:

$ mv thesis/draft.txt thesis/quotes.txt
$ ls thesis
quotes.txt

Give it a directory as the destination and it keeps the filename but puts the file somewhere new. The special name . (current directory) moves it right here:

$ mv thesis/quotes.txt .
$ ls thesis     # now empty

mv works on directories too โ€” there's no separate mvdir command.

cp โ€” copy

cp works just like mv, but leaves the original in place:

$ cp quotes.txt thesis/quotations.txt
$ ls quotes.txt thesis/quotations.txt
quotes.txt   thesis/quotations.txt

Both paths exist, proving you made a copy rather than moving the original.

To delete a whole directory and everything in it, use rm -r ("recursive"): rm -r thesis. Powerful and unforgiving โ€” there's no undo, so be certain of the target first.
Section 6

Wildcards & counting with wc

The shell's real power is combining simple programs. Start in the molecules directory, which holds six .pdb files (Protein Data Bank format).

$ ls molecules
cubane.pdb    ethane.pdb    methane.pdb
octane.pdb    pentane.pdb   propane.pdb

wc โ€” word count

cd into molecules and run wc ("word count"), which reports lines, words, and characters per file. The * in *.pdb is a wildcard that the shell expands into the full list of matching files:

$ cd molecules
$ wc *.pdb
  20  156 1158 cubane.pdb
  12   84  622 ethane.pdb
   9   57  422 methane.pdb
  30  246 1828 octane.pdb
  21  165 1226 pentane.pdb
  15  111  825 propane.pdb
 107  819 6081 total

Use -l for lines only, -w for words, -c for characters:

$ wc -l *.pdb
  20  cubane.pdb
  12  ethane.pdb
   9  methane.pdb
  30  octane.pdb
  21  pentane.pdb
  15  propane.pdb
 107  total

Wildcards

PatternMatches
*Zero or more characters โ€” *.pdb matches every .pdb file.
p*.pdbOnly pentane.pdb and propane.pdb (must start with p).
?Exactly one character โ€” p?.pdb matches p5.pdb but not propane.pdb.
*[AB].txtFiles ending in A or B before .txt โ€” handy for selecting valid samples.

The shell expands wildcards into a list of filenames before the command runs, so wc and ls never see the * itself โ€” only what it matched.

Section 7

Redirecting output: > and cat

Which file is shortest? Easy with six files โ€” but what about 6000? The first step is to save the counts to a file instead of the screen.

$ wc -l *.pdb > lengths

The > redirects the command's output into a file, creating it (or overwriting it) as needed. There's no screen output because everything went into lengths instead. Confirm it exists:

$ ls lengths
lengths

cat โ€” show a file

cat ("concatenate") prints file contents to the screen:

$ cat lengths
  20  cubane.pdb
  12  ethane.pdb
   9  methane.pdb
  30  octane.pdb
  21  pentane.pdb
  15  propane.pdb
 107  total

sort & head

sort -n sorts numerically (without changing the file), and head -1 shows just the first line:

$ sort -n lengths > sorted-lengths
$ head -1 sorted-lengths
  9  methane.pdb

head -1 means "first line"; -20 would give the first 20. Since the list is sorted shortest-first, this is the file with the fewest lines. (< redirects the other way โ€” feeding a file into a command's input, e.g. wc -l < mydata.dat.)

Section 8

Pipes: joining commands with |

Those intermediate files (lengths, sorted-lengths) make things hard to follow. A pipe lets one command's output flow straight into the next โ€” no temp files needed.

$ sort -n lengths | head -1
  9  methane.pdb

The vertical bar | sends the output of the command on its left as the input to the command on its right. Chain as many as you like โ€” here, wc โ†’ sort โ†’ head with no intermediate files at all:

$ wc -l *.pdb | sort -n | head -1
  9  methane.pdb
"Small pieces, loosely joined." This is the heart of Unix's design: instead of giant do-everything programs, you get many small tools that each do one job well and read/write plain text. A program that reads from standard input and writes to standard output can be piped together with any other โ€” multiplying their power.

A real example: sanity-checking data files

Imagine 1520 sample files that should each have 300 lines. Find the shortest five โ€” a quick way to spot a truncated file:

$ wc -l *.txt | sort -n | head -5
 240 NENE02018B.txt
 300 NENE01729A.txt
 300 NENE01729B.txt
 300 NENE01736A.txt
 300 NENE01751A.txt

One file is 60 lines short โ€” worth investigating. Swap head for tail -5 to check the longest files instead, and use a wildcard like ls *Z.txt to find flagged samples. Same handful of small tools, endlessly recombined.

Section 9 ยท Practice

Try it yourself

Work through these in the data-shell data. Predict the output first, then run it and see if you were right โ€” that's how the commands stick.

Easy

A. Trace a sequence of commands

Starting in /home/jovyan/data with a file proteins.dat, what does the final ls print?

$ mkdir recombine
$ mv proteins.dat recombine
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls

You'll practice: reasoning about mv, cp, and where files end up.

Medium

B. Build a pipeline

In molecules, write one pipeline that prints the name of the longest .pdb file. (Hint: wc, sort -n, and tail.)

You'll practice: chaining commands with |.

Medium

C. Wildcards in practice

List only the files in molecules whose names start with a vowel. Then count how many .pdb files there are in total using a wildcard and wc -l.

You'll practice: *, ?, and [...] patterns.

  • Downloaded data-shell and cd'd into it
  • Navigated with pwd, ls, and cd
  • Created, renamed, copied, and deleted files
  • Used a wildcard with wc
  • Redirected output to a file with >
  • Built a pipeline with |
  • Completed exercises Aโ€“C
Reference

Command cheat sheet & key points

Navigating

pwdPrint working directory
ls -FList, marking directories
cd dirChange directory
cd ..Go up one level
cd ~Go home

Files & directories

mkdir dMake a directory
nano fEdit a text file
mv a bMove / rename
cp a bCopy
rm f / rm -r dDelete file / directory

Inspecting & combining

cat fPrint file contents
wc -l fCount lines
sort -n fSort numerically
head -n / tail -nFirst / last n lines

Redirection & pipes

cmd > fSend output to a file
cmd < fRead input from a file
a | bPipe output of a into b
* ? [AB]Wildcards

Key points to remember

  • The file system stores files inside directories, which nest into a directory tree. / alone is the root.
  • A relative path starts from where you are; an absolute path starts from the root.
  • . = current directory, .. = parent, ~ = home.
  • Most commands take flags beginning with - (like ls -F or wc -l).
  • The shell has no trash bin โ€” deletion is permanent.
  • > redirects output to a file; | pipes output between commands. Combine small tools to do big things.
  • Use Tab completion always โ€” it's faster and prevents typos.
Want to go deeper? This tutorial is adapted from the Software Carpentry / Data Carpentry shell lessons. For more, see Software Carpentry: The Unix Shell and Data Carpentry: Shell for Genomics.

You've got the fundamentals. These same commands are what you'll use to drive Klone, run bioinformatics tools, and keep your projects organized. ๐Ÿš