User Tools

Site Tools


mkatari-bioinformatics-august-2013-introlinuxnotes

This is an old revision of the document!


Back to Manny's Bioinformatics Workshop Home

What is Linux ?

It is a free and open source operating system released in 1991 under the GNU GPL license.GPL allows anyone to use, modify and redistribute with the requirement that they pass it on with the same license.

It is the leading operating system of choice for servers such as supercomputers. More than 90% of the top 500 fastest computers are based on Linux.

MAC computers are related to Linux because they are also based on UNIX

Depending on the purpose of the Linux machine, it may or may not have a Desktop environment that we are familiar with on our personal computers. Linux uses X Window System to provide the Desktop environment.

A popular distribution of Linux operating system is called Ubuntu.

REF: http://en.wikipedia.org/wiki/Linux

Why do bioinformaticians use Linux?

  • Many bioinformatics core tools are written in Linux.
    • BLAST, CLUSTALW, PHRAP, etc
    • Many web applications are also supported on web servers hosted on linux machines
  • Linux supports development of software for many different programming languages.
    • Developers are lazy so creating a software that does not require a window is much faster and easier
  • Multiple users can log in at the same time.
    • A user logging in over the network can do just about anything a user sitting in front of the computer can do. Which also means linux handles multitasking very well.

Remote vs. Local

Logging in with X Windows

The standard user interface for personal computers is a GUI (Graphical User Interface). However for linux it is a command-line interpreter called shell. It is simply a prompt the awaits your command. There are several different shells, but the one used often is called “bash”, which is a mixture of a bunch of other shells.

In cases where a program requires a GUI, you should log in using the –X option. This opens a tunnel to your computer allowing all windows to open in your computer. For this to work you need X11 installed on your computer (MobaXterm already has one) MAC – Xquartz (http://xquartz.macosforge.org/landing/) Windows – Xming (http://sourceforge.net/projects/xming/)

Last login: Wed Jun  3 15:49:01 on ttys000
Manpreets-MacBook-Pro:~ manpreetkatari$ ssh -X mkatari@hpc.ilri.cgiar.org
Unauthorized access is prohibited.
mkatari@hpc.ilri.cgiar.org's password:
Last login: Wed Jun  3 16:33:26 2015 from 197.136.62.11
[mkatari@hpc ~]$ emacs

You should have a window popup on your computer that looks something like this.

Simply close the window to exit.

Home Sweet Home

When you first log in, you will be in a directory called “home directory

/home/<your username>

Generally in this directory you have complete control over creating, modifying, and executing files in this or any sub directory you create. In order to return to your home directory simply type the command: cd ~ at the prompt. Unless appropriate changes have been made you can can not enter anyone’s directory or even see what is in it.

Command Line Editing

The command is only executed once you press enter. Till then you can edit the line by using the following key strokes:

Action Result
Backspace (delete on MACs) delete previous character
Left Arrow, Right Arrow move left and right on lines
Up Arrow, down Arrow previous and following command
Ctrl-A go to front of line
Ctrl-E go to end of line
Ctrl-D delete next character
Ctrl-K delete everything to the right of the character
Ctrl-Y paste
Ctrl-C stop a running job

Once you press enter the program will be executed. When your prompt returns, you know that the program has finished. If there is an output to the program it usually prints it on the screen (often referred to as the standard output)

In the example below, date is a command that is being executed with no arguments. Many commands/programs have options that are provided immediately following the command. In the ls -l example, ls is the command and everything else are options that are provided.

[mkatari@hpc ~]$ date
Wed Jun  3 21:10:57 EAT 2015
[mkatari@hpc ~]$ ls -l
total 19443152
-rw-rw-r--. 1 mkatari mkatari      16263 Jun  3 16:29 03-06-2015.pdf
-rw-rw-r--. 1 mkatari mkatari     990646 Jun 12  2014 _1.fastq
-rw-rw-r--. 1 mkatari mkatari     381856 Jun 12  2014 _2.fastq

Directing standard output

Instead letting the output print to the screen we can save it to a file by using the > sign and then giving the file name. This will replace a file if it already exists without a warning. To append use an existing file use ». It is important to mention here that once you overwrite a file, it is deleted. It is gone. There is no recycling bin to restore from trash.

The following command gets details about all users' home directories and saves them into a file called allusers.txt

[mkatari@hpc ~]$ ls -l /home/ > allusers.txt
[mkatari@hpc ~]$ ls -l allusers.txt
-rw-rw-r--. 1 mkatari mkatari 18897 Jun  3 21:26 allusers.txt

Command-line completion

In some cases the commands or the file names that you need as arguments can be very long which increases the chance of spelling mistakes.

To prevent such mistakes simply type the enough letters to unambiguously identify the command or file and then pressing tab will complete it for you.

In the case you don’t know how many letters you need, simply press tab twice to see all your options.

In the example below, after typing the command and its options, the tab key was pressed twice to get this. The command will not be executed until the enter key is pressed.

[mkatari@hpc ~]$ ls /usr/bin/bz
bzcat         bzdiff        bzip2         bzless
bzcmp         bzgrep        bzip2recover  bzmore

Wildcards

In cases where you want to refer to multiple files you can use * to represent any characters of any length. You can also use ? To represent any character of one length.

In the example below, the first line gives all files/programs that start with bz. The second only gives which begin with bz and three letters afterwards, represented by ?

[mkatari@hpc ~]$ ls /usr/bin/bz*
/usr/bin/bzcat  /usr/bin/bzdiff  /usr/bin/bzip2         /usr/bin/bzless
/usr/bin/bzcmp  /usr/bin/bzgrep  /usr/bin/bzip2recover  /usr/bin/bzmore
[mkatari@hpc ~]$ ls /usr/bin/bz???
/usr/bin/bzcat  /usr/bin/bzcmp  /usr/bin/bzip2

SLIDE16

Some useful information about linux

Environment variables and PATH

All variables that are set in your environment can be found by using

env

The variable that is most important to us is PATH. The PATH is where the computer is looking for the commands. To see the contents of the variable type:

echo $PATH

In the sbatch files we have been adding the full path to the commands. Another option is to add the full path of the command you want to use to the variable PATH. This is what module load essentially does.

For example the following two commands are equivalent.

export PATH=/export/apps/samtools/0.1.19/bin:$PATH

module load samtools
mkatari-bioinformatics-august-2013-introlinuxnotes.1433356808.txt.gz · Last modified: 2015/06/03 18:40 by mkatari