Bioinformatics Asked by user5191 on July 12, 2021
This question has also been asked on Biostars
I used macs2 to call peaks for atac-seq data. now my goal is to split the peaks into 50 bp windows with 25 bp steps and then calculate the Tn5 integration frequency in each window.
How should I proceed with that?
import numpy as np
np.random.seed(0)
import pyranges as pr
gr = pr.random()
gr.Score = np.random.randint(100, size=len(gr))
gr = gr.slack(25) # make data wider for this example
print(gr)
t1 = gr.tile(50)
def increase_by_25(df):
df = df.copy()
df.Start += 25
df.End += 25
return df
t2 = t1.apply(increase_by_25)
tiled = pr.concat([t1, t2]).sort()
print(tiled)
# +--------------+-----------+-----------+--------------+-----------+
# | Chromosome | Start | End | Strand | Score |
# | (category) | (int32) | (int32) | (category) | (int64) |
# |--------------+-----------+-----------+--------------+-----------|
# | chr1 | 5205300 | 5205350 | + | 33 |
# | chr1 | 5205325 | 5205375 | + | 33 |
# | chr1 | 5205350 | 5205400 | + | 33 |
# | chr1 | 5205375 | 5205425 | + | 33 |
# | ... | ... | ... | ... | ... |
# | chrY | 41326450 | 41326500 | - | 3 |
# | chrY | 41326475 | 41326525 | - | 3 |
# | chrY | 41326500 | 41326550 | - | 3 |
# | chrY | 41326525 | 41326575 | - | 3 |
# +--------------+-----------+-----------+--------------+-----------+
# Stranded PyRanges object has 7,964 rows and 5 columns from 24 chromosomes.
# For printing, the PyRanges was sorted on Chromosome and Strand.
This is as far as I can get without example data and a clearer explanation of what you need.
Answered by The Unfun Cat on July 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP