Separating peaks of chip-seq with specific length

Question

My data contain several chip-seq results. I have the peaks called by MACS2.I wanted to only look at those peaks that their size is e.g 500bp to 1000bp. How can I separate those peaks efficiently?
I know one way is to take the peaks.xls file sort them based on the length column and take out the peaks less than 1kb. But I was looking for a more efficient way to do that. Any suggestion?
Many Thanks

StupidWolf · Answer

You can use rtracklayer from R bioconductor to import it, filter and do much more stuff, i use the code from this blog for importing the narrow bed file. Below I use an example file since you did not provide yours: library(rtracklayer) fl = "zas1_default_peaks.narrowPeak" extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric", qValue = "numeric", peak = "integer") gr <- import(fl, format = "BED", extraCols = extraCols_narrowPeak) head(gr) GRanges object with 6 ranges and 6 metadata columns: seqnames ranges strand | name score signalValue | [1] AB325691 8982-9219 * | zas1_default_peak_1 18 1.31697 [2] AB325691 16685-18050 * | zas1_default_peak_2 59 1.51283 [3] I 101971-102188 * | zas1_default_peak_3 17 1.23717 [4] I 102390-103538 * | zas1_default_peak_4 19 1.24967 [5] I 152286-152496 * | zas1_default_peak_5 23 1.37631 [6] I 175025-175577 * | zas1_default_peak_6 69 1.62063 pValue qValue peak [1] 3.37358 1.89101 146 [2] 8.12841 5.92232 419 [3] 3.26012 1.79545 64 [4] 3.47875 1.97673 708 [5] 3.89921 2.32948 136 [6] 9.34429 6.9608 374 The narrowBed and peaks.xls contains the same information. If you really really need to read in the xls, you can do: library(GenomicRanges) peaks = read.table("zas1_default_peaks.xls",comment="#",header=TRUE) gr = makeGRangesFromDataFrame(peaks,keep.extra.columns=TRUE) Once you have the genomic ranges object, filtering is the same: gr[width(gr)>500 & width(gr)<1000] GRanges object with 208 ranges and 7 metadata columns: seqnames ranges strand | length abs_summit pileup | [1] I 175025-175577 * | 553 175399 186.99 [2] I 186807-187489 * | 683 187009 208.66 [3] I 339070-339700 * | 631 339258 222.91 [4] I 431557-432088 * | 532 431734 195.66

geek_y · Answer

You can do it by awk.
awk -v OFS="t" '{ if ( ($3-$2) >= 500 && ($3-$2) <= 1000) { print } }' in.bed > out.bed

Assuming there is no header.

Separating peaks of chip-seq with specific length

2 Answers

Add your own answers!

Ask a Question