How to read circular DNA representations

Use this forum for general bioinformatics questions, or questions regarding genomics, proteomics, etc.

Moderators: mdfenko, Abhijeet Bakre, strom

How to read circular DNA representations

Postby mnstn » Sep 23 2016 6:26 pm

Hello All,

Could some one help me understand the circular DNA of L. mesenteroides? Here are some specific question but I would appreciate an in depth discussion as well:
1. What is the diff between the red and blue colors? Forward and reverse strand?
2. If the genome is supposed to have 2003 protein coding regions should I not expect 2003 red and blue regions combined?

https://microbewiki.kenyon.edu/index.ph ... genome.png
https://microbewiki.kenyon.edu/index.ph ... enteroides

Thank you,
Moon
mnstn
technician-in-training
technician-in-training
 
Posts: 5
Joined: Nov 23 2014 6:40 pm

Re: How to read circular DNA representations

Postby r.rosati » Sep 24 2016 4:13 pm

You are correct, and you have identified an error in the wiki.
By looking at the original page from where the map was taken, which is stated on the image page, and searching for "Leuconostoc mesenteroides", it appears that the map on the Wiki is actually not from its complete genome, but from the sequence of plasmid pLEUM1. The image map of this organism's complete genome is the one directly above it.
Do you have an account there, and would you edit the page?
r.rosati
ModSquad
ModSquad
 
Posts: 2152
Joined: Nov 04 2002 3:23 pm
Location: Brazil

Re: How to read circular DNA representations

Postby mnstn » Sep 26 2016 2:50 am

Hi r.rosati,
Thank you for the clarification. I tried correcting that wiki but the login seems restricted to students of that course/class. I am wondering if
- a certain class/family of proteins co-locate in the forward strand (red) vs the reverse strand (blue).
- are proteins in the forward/reverse strand longer than the other?
- why does the reverse strand seem to encode for far fewer proteins than the forward strand? May be I got this wrong.
mnstn
technician-in-training
technician-in-training
 
Posts: 5
Joined: Nov 23 2014 6:40 pm

Re: How to read circular DNA representations

Postby r.rosati » Sep 26 2016 8:38 pm

Hello,

the first two questions can be answered by retrieving information following the genome's accession number NC_008531.
I'd be glad to point you to a software able to do that, but I never had to search for it so I don't know of any.

However I have half an hour to spare before going home, so maybe I can help a little:
You can first download all the record from NCBI; then separate all the lines with "CDS" on them (BASH can be downloaded for Windows too):
(assuming you saved the sequence as "Leuconostoc mesenteroides.gb")
Code: Select all
grep "Leuconostoc mesenteroides.gb" -e "CDS" > CDSs.txt

This generates a file with 2 lines from the initial genome info, and 1933 CDS entries.
Now you can separate only the lines with "complement" on them:
Code: Select all
grep "CDSs.txt" -e "complement" > antisense.txt

This generates 884 entries; so there are 884 genes on the reverse strand, and 1933-884=1049 genes on the sense strand.

(Can you tell me why the entry says there should be 1948 CDSs, but I only find 1933? That'd be awesome.)

Now about the length... A quick and dirty script in Python 3:

Code: Select all
import re
from collections import defaultdict

with open ("Leuconostoc mesenteroides.gb", "r") as genome:
    data = genome.readlines()

CDSs = [line for line in data if line.startswith("     CDS")]
lengths = {'sense': [], 'antisense': []}

for line in CDSs:
    try:
        gene_range = re.search("([\d]+)\.\.\>?([\d]+)", line).groups(0)
        if 'complement' in line:
            lengths['antisense'].append(abs(int(gene_range[1])-int(gene_range[0]))+1)
        else:
            lengths['sense'].append(abs(int(gene_range[1])-int(gene_range[0]))+1)       
    except AttributeError:
        print(line)

rettext = "CDSs in {0} strand: {1}, total nts: {2}; mean: {3:.2f}"

for location in lengths:
    print(rettext.format(location,
                         len(lengths[location]),
                         sum(lengths[location]),
                         sum(lengths[location])/len(lengths[location])))


This outputs:

CDSs in sense strand: 1049, total nts: 983300; mean: 937.37
CDSs in antisense strand: 884, total nts: 824962; mean: 933.21

So by this quick script that could be wrong it'd seem that there are less genes on the reverse strand, but they are of similar size (one could use numpy to check statistically).

Regarding why - I can't say for sure and I'm not an expert, but by the map I would say that the pattern follows the replication pattern of the genome. I can see that most sense genes are on the right side of the map, and most antisense genes are on the left. I would think that mabye an origin of replication is on the boundary, and the size of the two halves originated evolutionally from the differential replication speed of the forward and reverse strand. But I'd wait for someone more experient to answer.
r.rosati
ModSquad
ModSquad
 
Posts: 2152
Joined: Nov 04 2002 3:23 pm
Location: Brazil


Return to General Bioinformatics

Who is online

Users browsing this forum: No registered users and 2 guests