Bioinformatics Asked on November 26, 2020
Problem:
I am trying to convert some codes written in R to Python and part of that conversion process is find classes equivalent to the GRanges and IRanges from the GenomicRanges R package in Python. https://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.html#granges-genomic-ranges
I couldn’t find any equivalent library written in Python that does the same operations as the aforementioned R classes. There is however pyranges library in Python but it is not as flexible as GRanges and doesn’t have IRanges implemented in it. https://github.com/biocore-ntnu/pyranges
For example in R I could add the following in the GRanges class:
gr = GRanges(
seq = Rle(DF$chr),
ranges = IRanges(DF$Start,DF$End),
ref = DF$ref,
alt = DF$alt,
sampleID=DF$sampleID,
seqlengths = chromSize)
However, pyranges don’t allow you to add arguments other than (chr, start, end, strand). However, in my case, I need to have the option to add sample_id, seqlengths and other arguments.
Thank you in advance.
Best
I am not sure if your question has anything to do with IRanges. If I get it correct, the limitation you have with pyranges now is adding extra columns to the meta data, like adding values to @values
slot in granges
.
You can do it using insert
or setattr
:
import pyranges
import pandas as pd
import numpy as np
gr = pyranges.random(10)
setattr(gr,'alt',np.random.choice(['A','T','G','C'],10))
gr.insert(pd.DataFrame({'ref':np.random.choice(['A','T','G','C'],10)}))
+--------------+-----------+-----------+--------------+------------+------------+
| Chromosome | Start | End | Strand | alt | ref |
| (category) | (int32) | (int32) | (category) | (object) | (object) |
|--------------+-----------+-----------+--------------+------------+------------|
| chr1 | 27431195 | 27431295 | - | C | G |
| chr2 | 197045893 | 197045993 | - | A | G |
| chr4 | 86316012 | 86316112 | - | C | A |
| chr8 | 4560598 | 4560698 | - | G | G |
| ... | ... | ... | ... | ... | ... |
| chr11 | 130395526 | 130395626 | - | A | A |
| chr12 | 28557156 | 28557256 | + | T | G |
| chr13 | 55519337 | 55519437 | - | T | C |
| chr14 | 1807028 | 1807128 | + | G | A |
+--------------+-----------+-----------+--------------+------------+------------+
Stranded PyRanges object has 10 rows and 6 columns from 10 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
Or directly adding them like this (thanks to @ChrisRands for pointing it out):
gr.alt = np.random.choice(['A','T','G','C'],10)
Answered by StupidWolf on November 26, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP