Transcription factor binding site extraction from ChIP-Seq data by de novo identification of consensus motifs

 Introduction

MICSA is package for the identification of transcription factor binding sites in ChIP-Seq data, developed by Computational Systems Biology of Cancer group at the Bioinformatics Laboratory of Institut Curie (Paris).

MICSA was developed to perform a sensitive and specific discovery of transcription factor binding sites from ChIP-Seq data by taking into account information about genomic sequences of putative sites.

Cite: Boeva V, Surdez D, Guillon N, Tirode F, Fejes AP, Delattre O, Barillot E. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 2010 Jun 1;38(11):e126. Epub 2010 Apr 7.

Download the article: HTML, PDF.

The first version of MICSA contains:

  • a graphical user interface for the whole MICSA pipeline (screen shot). The pipeline can be used for ChIP-Seq data other than transcription factor data since the last step, which is motif based, is optional.
  • version 3.3 of FindPeaks, used in the first step to exhaustively identify candidate regions of protein binding (FindPeaks website)
  • java class Summary.class, used to create summary information about peaks discovered by FindPeaks in ChIP and control data
  • java class DeleteRegions.class, used to filter out a priori false peaks in satellite or centromeric regions. It uses bed format file with coordinates. Example of such files: hg18 satellites, hg18 pericentromeric repeats, hg18 whole pericentromeric regions
  • java class FilterPeaks.class, used to filter out peaks present both in ChIP and control data
  • the MICSA program, used to select peaks bearing identified motifs, calculate p-values and false discovery rates and optimize output to have less false positives than predefined value
  • Unfortunately we don't have a right to put a precompiled version of MEME on our web site. Please download and install MEME from their server.

More information is available:


 Downloads

You can choose to download the latest versions of FindPeaks, MEME, and MICSA independently or use the FindPeaks program included into the MICSA package. In any case, MEME should be separately downloaded and installed. You should add the directory meme.bin to the PATH.

Installation steps:

  1. Download and install the latest version of MEME from the MEME website. MEME is free only for non-commercial use. Please check the MEME license before using.
  2. Add the directory with meme.bin (meme_your_version\src?) to your PATH. Change mode of meme.bin file if necessary.
  3. Download the MICSA program: Download
  4. If you want to use the latest version of FindPeaks, download and install FindPeaks from the FindPeaks website

 Links to documentation and source code


Example results for MICSA based upon ChIP-Seq data for the neuron-restrictive silencer factor

  • Peaks identified by FindPeaks in the control data [ download ]
  • Peaks identified by FindPeaks in the ChIP data for NRSF [ download ]
  • Peaks identified by MICSA in the ChIP data for NRSF [ download ]

Example results for MICSA based upon ChIP-Seq data for EWS-FLI1

  • Peaks identified by FindPeaks in the control data [ download ]
  • Peaks identified by FindPeaks in the ChIP data for EWS-FLI1 [ download ]
  • Peaks identified by MICSA in the ChIP data for EWS-FLI1 [ download ]
  • List of genes containing EWS-FLI1 binding sites identified by MICSA [ download ]

 MICSA working group


 Contacts

The following members of the MICSA working group are pleased to answer any question or address any concerns you may have with the MICSA software:


 Acknowledgements

This work was supported by grants from the Institut National de la Sante et de la Recherche Medicale, the Institut Curie, the Ligue Nationale contre le Cancer (Equipe labellisee and CIT program) and the Agence Nationale de la Recherche (SITCON project). We thank Peter Kharchenko and Andrei Zinovyev for valuable discussion and Martial Hue for the idea of the optimization procedure.















Last modified: April 8 2011