CFCE is actively developing new analytical methods and pipelines to perform systematic analysis of epigenetic and transcriptional datasets. We currently use our in-house "CHIPS", "VIPER" and "CoBRA" pipelines for the initial mapping, peak calling, unsupervised and supervised analysis of epigenetic and RNA-seq datasets.
CHIPS pipeline
We developed a Snakemake pipeline called CHIPS (CHromatin enrIchment ProcesSor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, polymerase chain reaction bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible. github.com/liulab-dfci/CHIPS
VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. bitbucket.org/cfce/viper/src/master/
CoBRA(Containerized Bioinformatics workflow for Reproducible ChIP/ATAC-seq Analysis) provides a comprehensive state-of-the-art ChIP-seq and ATAC-seq analysis pipeline that can be used by scientists with limited computational experience. This enables researchers to gain rapid insight into protein–DNA interactions and chromatin accessibility through sample clustering, differential peak calling, motif enrichment, comparison of sites to a reference database, and pathway analysis. CoBRA is publicly available online at https://bitbucket.org/cfce/cobra; cfce-cobra.readthedocs.io/en/latest/
scATAnno is a workflow designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow can generate scATAC-seq reference atlases from publicly available datasets, and enable accurate cell type annotation by integrating query data with reference atlases, without the aid of scRNA-seq profiling. https://scatanno-main.readthedocs.io/