samedi 28 novembre 2009

Fourth Step "Basic Statistics"

In Python programming world we have a very powerfull math and stat external library as Gnuplot,Python(x,y),etc...,but we will you the hisogram function in the Visual Python library (Vpython) wich is very simple and the same library will be used for other projects ;)
You can find Vpython at http://vpython.org/contents/download_windows.html
NB: later we will use a much more powerfull math library such Gnuplot,Python(x,y) but now we will stick to the Vpython.
Since we have already enumearted our aminoacid (See the Translation Step ),we had just to enumerate the nucleotids in the DNA fragment so:
  • A_nb=0
  • C_nb=0
  • G_nb=0
  • T_nb=0
  • for i in range(len(ADN_5)):
    • if ADN_5[i]=='A':
      • A_nb=A_nb + 1
    • if ADN_5[i]=='C':
      • C_nb=C_nb + 1
    • if ADN_5[i]=='G':
      • G_nb=G_nb + 1
    • if ADN_5[i]=='T':
      • T_nb=T_nb + 1
  • print "A_nb =" , A_nb
  • print "C_nb =" , C_nb
  • print "G_nb =" , G_nb
  • print "T_nb =" , T_nb
Python Shell screenshot:



We have nucleotides and aminoacid frequency in this DNA so let do a simple colored graph from this data
First we need the Vpython library that must be called at the beginning of our code source by this syntax:
  • from visual.graph import *
Then we create a graphic window with black foreground and withe background:
  • graph1 = gdisplay(foreground=color.black, background=color.white)

each nuclotid number (A_nb,C_nb,G_nb,T_nb) will be represented in a vertical bar (gvbars ) with different color (color=color.XXXX) and will be displayed at a specific positon of the graph.
the 4 nuclotode are represented in graph using:
  • gvbars(delta=0.05, color=color.blue).plot(pos=(0.2,A_nb))
  • gvbars(delta=0.05, color=color.red).plot(pos=(.4,C_nb))
  • gvbars(delta=0.05, color=color.green).plot(pos=(.6,G_nb))
  • gvbars(delta=0.05, color=color.yellow).plot(pos=(.8,T_nb))
 The same thing is done to the aminoacids:
  • graph2 = gdisplay(foreground=color.black, background=color.white)
  • gvbars(delta=0.05, color =(.0,.0,.3)).plot(pos=(.1,Phe_nb))
  • gvbars(delta=0.05, color =(.0,.0,.6)).plot(pos=(.2,Leu_nb))
  • gvbars(delta=0.05, color =(.0,.0,.9)).plot(pos=(.3,Iso_nb))
  • gvbars(delta=0.05, color =(.0,1,.0)).plot(pos=(.4,Met_nb))
  • gvbars(delta=0.05, color =(.0,1,.3)).plot(pos=(.5,Val_nb))
  • gvbars(delta=0.05, color =(.0,1,.6)).plot(pos=(.6,Ser_nb))
  • gvbars(delta=0.05, color =(.0,1,.9)).plot(pos=(.7,Pro_nb))
  • gvbars(delta=0.05, color =(1,.0,.0)).plot(pos=(.8,Thr_nb))
  • gvbars(delta=0.05, color =(1,.0,.3)).plot(pos=(.9,Ala_nb))
  • gvbars(delta=0.05, color =(1,.0,.6)).plot(pos=(1,Tyr_nb))
  • gvbars(delta=0.05, color =(1,.0,.9)).plot(pos=(1.1,His_nb))
  • gvbars(delta=0.05, color =(1,.3,.0)).plot(pos=(1.2,Gln_nb))
  • gvbars(delta=0.05, color =(1,.6,.0)).plot(pos=(1.3,Asn_nb))
  • gvbars(delta=0.05, color =(1,.9,.0)).plot(pos=(1.4,Lys_nb))
  • gvbars(delta=0.05, color =(1,.3,.3)).plot(pos=(1.5,Asp_nb))
  • gvbars(delta=0.05, color =(1,.3,.6)).plot(pos=(1.6,Glu_nb))
  • gvbars(delta=0.05, color =(1,.3,.9)).plot(pos=(1.7,Cys_nb))
  • gvbars(delta=0.05, color =(1,.6,.0)).plot(pos=(1.8,Trp_nb))
  • gvbars(delta=0.05, color =(1,.6,.3)).plot(pos=(1.9,Arg_nb))
  • gvbars(delta=0.05, color =(1,.6,.9)).plot(pos=(2,Gly_nb))
  • gvbars(delta=0.05, color =(1,.9,.0)).plot(pos=(2.1,STOP_nb))
 as a result we have those example graphics





I hope you enjoyed this little project the whole code source is availble right here



Third Step "Translation"

In this phase each triplets of nucleotid form a codon and indicate a START,END,or simply an aminoacid




We will first create a list named "Codons" where we will form the triplets from our ARNm so we have:
  • for i in range(0,len(ADN_5),3):
    • Codons.append (ARNm[i]+ARNm[i+1]+ARNm[i+2])



we will walk the ADN_5 by a 3 steps each time,and the Codons list recive the three nucleotide ARNm[i]+ARNm[i+1]+ARNm[i+2] forming a triplet
Well this is now the most fastidious part of the programming typing ;)
This is the result in the Pytho shell:



We have now our Codons list and ready to translate each of them to a START,END,or an aminoacid working should follow this rule table according to the combination of the the triplet:

Ala   GCU GCC GCA GCG
Arg  CGU CGC CGA CGG AGA AGG
Asn  AAU AAC
Asp  GAU GAC
Cys  UGU UGC
Gln   CAA CAG
Glu   GAA GAG
Gly  GGU GGC GGA GGG
His  CAU CAC
Ile    AUU AUC AUA
Leu  CUU CUC CUA CUG UUA UUG
Lys  AAA AAG
Met/START AUG
Phe  UUU UUC
Pro  CCU CCC CCA CCG
Ser  UCU UCC UCA UCG AGU AGC
Thr  ACU ACC ACA ACG
Trp  UGG
Typ  UAU UAC
Val  GUU GUC GUA GUG
Met  AUG
Stopcodons UAA UAG UGA

First we will walk througth codons list to translate codons (Met/START AUG) , (STOP UGA UAG UAA) or other codon in this exp we look for the (Phe UUU UUC) and the (Met AUG)
  • for i in range(len(Codons)):
    • if Codons[i]=="AUG":
      • Met_nb=Met_nb + 1
      • for i in range(i,len(Codons)):
      • if Codons[i]=="UUU" or Codons[i]=="UUC":
        • display( "----------->Phe")
        • Protide.append("Phe")
        • Phe_nb=Phe_nb + 1
      • if Codons[i]=="AUG":
        • display( "----------->Met")
        • Protide.append("Met")
      • if Codons[i]=="UGA" or Codons[i]=="UAG" or Codons[i]=="UAA":
        • display( "Traduction Stopped")
        • Protide.append("STOP")
        • STOP_nb=STOP_nb + 1
This is a part of the result in the Pytho Shell:



We created Counter's variables like Met_nb for the Met numbers,Phe_nb for the Phe numbers wich they increment each time we found a codon that much,like that we know the number of each aminoacid generated by this DNA sequence.

In the next Code fragment we will look for functionnel protides wich are those who begin with (Met/START AUG) and end with (STOP UGA UAG UAA) each time we found the START followed by the STOP codon we add to All_Protide list the whole fragment betwin STARTand STOP:

  • for i in range(len(Protide)):
    • if Protide[i]=="Met":
      • j=i+1
      • for j in range(len(Protide)):
        • if Protide[j]=="STOP":
        • All_Protide.append(Protide[i:j])
        • break
    • else:
      • pass
 
  Python Shell result:



  Until this point we got our All_Protide list filled by the functionnel protides,the next step is to do some basic statistc graph to have a quantitative represention of this DNA fragment in term of nucleotid and aminoacid frequency.

vendredi 27 novembre 2009

Second Step "Transcription"

Since we have the DNA portion we will generate the complementary RNAm fragment that will lead us to protein building.
This is achived by just a "T" by  a "U" nucleotid
We will make a list named ARNm as this Code source suggest:
  • for i in range(len(ADN_5)):
    • if ADN_3[i]=="T":
      • ARNm.append("U")
    • else:
      • ARNm.append(ADN_3[i])
       
      Next Step Translation

First Step "Replication"

In this step we will begin with our 4 nucleotides "ACGT"
nucleotide= list('ACGT')

This will create a list with 4 nucleotids that we will manipulate
To use the random generator we must call the random module in the beginning of the source code:
  • import random
Then we will form our 5' ADN branch randomly generated
for i in range(330): # randomly choosed 330 nucleotides
    • x=random.choice(nucleotide)  
    • ADN_5.append(x)
Now we will form the 3'ADN branch according to the 5' like this complementarity table:
  • A -> T
  • T -> A
  • C -> G
  • G -> C
The code source should look like this:
  • for i in range(len(ADN_5)):
    • if ADN_5[i]=="A":
      • ADN_3.append("T")
    • elif ADN_5[i]=="T":
      • ADN_3.append("A")
    • elif ADN_5[i]=="C":
      • ADN_3.append("G")
    • elif ADN_5[i]=="G":
      • ADN_3.append("C")

 this loop will test each nucleotid in the ADN_5 branch tand assign to ADN_3 branch the right nucleotid according to the complementarity table.
The result look like this in the pytho shell:



With this part of code we have formed our entire ADN portion that we will work in the next phase: The Transcription

DNA Replication/Transcription/Translation Simulator let'Do it ;)

In this page i propose to program a DNA simulator for the 3 phases of protein formation:
1-Replication
2-Transcription of the actuel DNA to RNA
3-Translation of the RNA to aminoacid forming protid and protein
A good overview of the real biological process is in courses everywhere on the net with well documented sites,i picked up this one:
http://www.vcbio.science.ru.nl/en/virtuallessons/cellcycle/trans/

Python language

What is Python
Python is a great object-oriented, interpreted, and interactive programming language. It is often compared (favorably of course  ) to Lisp, Tcl, Perl, Ruby, C#, Visual Basic, Visual Fox Pro, Scheme or Java... and it's much more fun.Python combines remarkable power with very clear syntax. It has modules, classes, exceptions, very high level dynamic data types, and dynamic typing. There are interfaces to many system calls and libraries, as well as to various windowing systems. New built-in modules are easily written in C or C++ (or other languages, depending on the chosen implementation). Python is also usable as an extension language for applications written in other languages that need easy-to-use scripting or automation interfaces.