Skip to content

Repeat Masker

module avail repeatmasker/

Repeat Masker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.

The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked.

Currently over 56% of human genomic sequence is identified and masked by the program.

Usage

Submodules

RepeatMasker uses the following submodules:

  • Tandem Repeats Finder version 4.0.9
  • RMBlast search engine version 2.6.0
  • Repbase Update Database of Repetitive DNA version 20170201 (February 2017)

License

To use RepeatMasker and its submodules you first need to accept its licence with the following license terms:

  1. RepeatMasker is licensed under the Open Source License v2.1.

  2. Tandem Repeats Finder is licensed under the following terms:

    • The author of this software grants to any individual or organization the right to use and to make an unlimited number of copies of this software. You may not de-compile, disassemble, reverse engineer, or modify the software. This software cannot be sold, incorporated into commercial software or redistributed. The author of this software accepts no responsibility for damages resulting from the use of this software and makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose. This software is provided as is, and the user assumes all risks when using it.
    • Please cite: G. Benson, "Tandem repeats finder: a program to analyze DNA sequences" Nucleic Acids Research (1999) Vol. 27, No. 2, pp. 573-580.
  3. RMBlast is licensed under the Open Source License v2.1.

  4. Repbase Update is a Database of Repetitive DNA published by Genetic Information Research Institute. You may use the content of the Database free of charge under the following conditions:

    • You agree NOT to make the Repbase Update (or any part thereof, including Repbase Reports, Repeat Maps and other derived materials, modified or not) available to anyone outside your research group. "Make available" includes leaving the data where it may be accessible to outside individuals without your direct knowledge (e.g. on a computer to which people outside your group have login privileges), as well as directly providing it to someone. Refer any requests for the Database to GIRI.
    • You agree NOT to use the Repbase Update for commercially restricted sequencing and/or proprietary sequence analysis. Commercially restricted sequencing is defined as sequencing for which a company retains patenting or licensing rights regarding the sequence, or the right to restrict or delay dissemination of the sequence; with the sole exception that sequencing is not considered to be commercially restricted if it is federally funded and the investigators adopt the data release policies endorsed at the Wellcome Trust-sponsored Bermuda meeting, i.e. immediate release of data as it is generated).
    • If you are doing commercially restricted sequencing or other proprietary activities involving any portion of Repbase Update you must commercial register at GIRI.
    • You agree to properly cite the Database and its specific, original contributions if directly related to your work (details).
    • You certify that you are authorized to accept this agreement on behalf of your institution.
    • All members of your group with access to the Database agree to the same conditions.

You won't be able to use this program without a license agreement.

Basic usage

Warning

RepeatMasker requires a scratch directory for calculation.

Load SW module repeatmasker, then run RepeatMasker (e.g. with sample input file my_repeatmasker_sample.fasta):

$ module add repeatmasker/4.0.7
$ RepeatMasker my_repeatmasker_sample.fasta

Note

For RepeatMasker versions 3.3.0 and 4.0.0, name of executable binaries is repeatmasker. For versions 4.0.6 and 4.0.7, the binary name is RepeatMasker.

Interactive job

Interactive job can be run as follows:

skirit$ qsub -I -l select=1:ncpus=2:mem=4gb:scratch_local=10gb -l walltime=1:00:00

You are then redirected to a concrete machine where you can run RepeatMasker with my_repeatmasker_sample.fasta input file as follows (and then exit from the machine):

$ module add repeatmasker/4.0.7
$ cp my_repeatmasker_sample.fasta $SCRATCHDIR
$ cd $SCRATCHDIR
$ RepeatMasker my_repeatmasker_sample.fasta
...
$ exit

Please note that interactive regime does not bring any significant speed-up comparing to running RepeatMasker locally on your machine unless parallelism is used. Interactive regime may be used to test the execution of your job (strongly recommended) and on success you are invited to switch to running it in a batch (see below).

Batch job

This is the preferred way of running jobs. Create the shell script my_repeatmasker_script.sh with the following content:

#!/bin/bash

# add RepeatMasker module
module add repeatmasker/4.0.7

# copy your input data (e.g. my_repeatmasker_sample.fasta) to the scratch directory
cp my_repeatmasker_sample.fasta $SCRATCHDIR

# go to the scratch directory
cd $SCRATCHDIR

# run RepeatMasker
RepeatMasker my_repeatmasker_sample.fasta

Submit this shell script by something like:

qsub -l select=1:ncpus=2:mem=4gb:scratch_local=10gb my_repeatmasker_script.sh