Get transcripts data from Bioconductor
In this example, I will extract the 3’UTR length from all mouse RefSeq transcripts
I will use the GenomicRanges library from Bioconductor to extract the 3’UTRs information. Also, I will use the dplyr library to handle the data.
library(GenomicFeatures)
library(dplyr)
Now, we need to load the data.
#refSeq <- makeTxDbFromUCSC(genom="mm10",tablename="refGene")
Since the function does not work at the moment (apparently something was changed in UCSC table), I will load the data from a file. You download the data for the example by clicking here.
refseq <- loadDb("mm10_refseq.sqlite")
Now we get the 3’UTRs
threeUTRs <- threeUTRsByTranscript(refseq, use.names=TRUE)
length_threeUTRs <- width(ranges(threeUTRs))
We put it all together in a dataframe
the_lengths <- as.data.frame(length_threeUTRs)
the_lengths <- the_lengths %>% group_by(group, group_name) %>% summarise(sum(value))
the_lengths <- unique(the_lengths[,c("group_name", "sum(value)")])
colnames(the_lengths) <- c("RefSeq Transcript", "3' UTR Length")
The data is in the the_lengths data frame
## # A tibble: 10 x 2
## RefSeq Transcript 3' UTR Length
## <chr> <int>
## 1 NM_008866 1719
## 2 NM_001159750 1545
## 3 NM_011541 1545
## 4 NM_001159751 1545
## 5 NM_001310442 384
## 6 NM_133826 384
## 7 NM_001204371 3349
## 8 NM_001318735 3262
## 9 NM_011011 3349
## 10 NM_009826 1829
We can save the data for later
write.csv(the_lengths, "the_lengths.csv")
And we can get it back
again <- read.csv("the_lengths.csv")
No comments:
Post a Comment