TransWikia.com

Python equivalent to R GRanges and IRanges

Bioinformatics Asked on November 26, 2020

Problem:

I am trying to convert some codes written in R to Python and part of that conversion process is find classes equivalent to the GRanges and IRanges from the GenomicRanges R package in Python. https://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.html#granges-genomic-ranges

I couldn’t find any equivalent library written in Python that does the same operations as the aforementioned R classes. There is however pyranges library in Python but it is not as flexible as GRanges and doesn’t have IRanges implemented in it. https://github.com/biocore-ntnu/pyranges

For example in R I could add the following in the GRanges class:

gr = GRanges(
seq = Rle(DF$chr),
ranges = IRanges(DF$Start,DF$End),
ref = DF$ref,
alt = DF$alt,
sampleID=DF$sampleID,
seqlengths = chromSize)

However, pyranges don’t allow you to add arguments other than (chr, start, end, strand). However, in my case, I need to have the option to add sample_id, seqlengths and other arguments.

Thank you in advance.
Best

One Answer

I am not sure if your question has anything to do with IRanges. If I get it correct, the limitation you have with pyranges now is adding extra columns to the meta data, like adding values to @values slot in granges.

You can do it using insert or setattr :

import pyranges
import pandas as pd
import numpy as np
gr = pyranges.random(10)

setattr(gr,'alt',np.random.choice(['A','T','G','C'],10))

gr.insert(pd.DataFrame({'ref':np.random.choice(['A','T','G','C'],10)}))

+--------------+-----------+-----------+--------------+------------+------------+
| Chromosome   | Start     | End       | Strand       | alt        | ref        |
| (category)   | (int32)   | (int32)   | (category)   | (object)   | (object)   |
|--------------+-----------+-----------+--------------+------------+------------|
| chr1         | 27431195  | 27431295  | -            | C          | G          |
| chr2         | 197045893 | 197045993 | -            | A          | G          |
| chr4         | 86316012  | 86316112  | -            | C          | A          |
| chr8         | 4560598   | 4560698   | -            | G          | G          |
| ...          | ...       | ...       | ...          | ...        | ...        |
| chr11        | 130395526 | 130395626 | -            | A          | A          |
| chr12        | 28557156  | 28557256  | +            | T          | G          |
| chr13        | 55519337  | 55519437  | -            | T          | C          |
| chr14        | 1807028   | 1807128   | +            | G          | A          |
+--------------+-----------+-----------+--------------+------------+------------+
Stranded PyRanges object has 10 rows and 6 columns from 10 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

Or directly adding them like this (thanks to @ChrisRands for pointing it out):

gr.alt = np.random.choice(['A','T','G','C'],10)

Answered by StupidWolf on November 26, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP