<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>'Protein Folding' Blog RSS Feed</title>
    <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/</link>
    <description>Contains the latest posts from the blog 'Protein Folding'</description>
    <lastBuildDate>Sat, 25 May 2013 07:11:46 -0700</lastBuildDate>
    <generator>Argotic Syndication Framework 2007.3.0.1, http://www.codeplex.com/Argotic</generator>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <item>
      <title>detab utility in C</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/1180-detab-utility-in-C/</link>
      <description>Here's a detab utility.&lt;br /&gt;
&lt;br /&gt;
This should bread and butter code. In fact it is tricky to ensure that a&lt;br /&gt;
malicious user cannot cause a buffer overrun. Hence the maxmemory() routine.&lt;br /&gt;
getline() is also difficult.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="sourcecode"&gt;
 /*
  Utility to replaces tabs in text files with spaces.
  Can either do a blind replace or replace with user-specifed tab stops

  By Malcolm McLean, 2009
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;include &amp;lt;stdio.h&amp;gt;
&lt;/li&gt;
&lt;li&gt;include &amp;lt;stdlib.h&amp;gt;
&lt;/li&gt;
&lt;li&gt;include &amp;lt;string.h&amp;gt;
&lt;/li&gt;
&lt;li&gt;include &amp;lt;limits.h&amp;gt;
&lt;/li&gt;
&lt;li&gt;include &amp;lt;assert.h&amp;gt;
&lt;/li&gt;
&lt;/ol&gt;
/*
  count the occurences of ch in str
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;size_t chcount(const char *str, int ch)
{
  size_t answer = 0;
  while(*str)
    if(*str++ == ch)
      answer++;

  return answer;
}

/*
  is a string an integer?
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int integral(char *str)
{
  char *end;
  long x;

  if(!str)
    return 0;
  if(!*str)
    return 0;
  x = strtol(str, &amp;amp;end, 10);
  if(x &amp;gt; INT_MAX || x &amp;lt; INT_MIN)
    return 0;
  if(*end)
    return 0;
  return 1;
}


/*
  check that an array of integers is ascending
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int ascending(int *x, int N)
{
  int i;

  assert(N &amp;gt; 0);
  for(i=1;i&amp;lt;N;i++)
    if(x[i] &amp;lt;= x[i-1])
      return 0;
  return 1;
}

/*
  get line from file
  Params: fp - input file
  Returns malloced line, 0 on EOF  
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;char *getline(FILE *fp)
{
  char *buff;
  char *temp;
  size_t buffsize = 128;
  size_t newsize;
  size_t N = 0;
  int ch;

  buff = malloc(buffsize);
  if(!buff)
    goto out_of_memory; 
  while( (ch = fgetc(fp)) != EOF)
  {
    if(N &amp;gt;= buffsize - 2)
    {
      newsize = buffsize + buffsize/10;
      if(newsize &amp;lt; buffsize)
      {
        goto out_of_memory;
      }
      temp = realloc(buff, newsize);
      if(!temp)
      {
        goto out_of_memory;
      }
      buff = temp;
      buffsize = newsize;
    }
    buff[N++] = ch;
    if(ch == '\n')
      break;
  }
  buff[N] = 0;
  if(N == 0 &amp;amp;&amp;amp; ch == EOF)
  {
    free(buff);
    return 0;
  }
  return buff;
 out_of_memory:
  free(buff);
  fprintf(stderr, "Out of memory\n");
  exit(EXIT_FAILURE);
}

/*
  calculate maximum amount of memory needed to hold tabbed line
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;size_t maxmemory(char *str, int Ntabs, int *tablist)
{
  int last;
  int penultimate;
  size_t tabcount;
  size_t answer;


  /* assume string is after last tab */
  last = tablist[Ntabs -1];
  penultimate = (Ntabs &amp;gt; 1) ? tablist[Ntabs-2] : 0;
  answer = strlen(str);
  if(answer + last + 1&amp;lt; answer)
  {
    fprintf(stderr, "Out of memory\n");
    exit(EXIT_FAILURE);
  }  
  answer += last + 1;

  /* now assume that all the tabs are stacked up after the string */
  tabcount = chcount(str, '\t');
  if(tabcount &amp;gt; 0)
  {
    if( ((size_t) (last - penultimate) * (size_t) tabcount)/tabcount != last - penultimate )
    {
      fprintf(stderr, "Out of memory\n");
      exit(EXIT_FAILURE);
    }  
    if(answer + (last - penultimate) * tabcount &amp;lt; answer)
    {
      fprintf(stderr, "Out of memory\n");
      exit(EXIT_FAILURE);
    }  
    answer = answer + (last - penultimate) * tabcount;
  }

  return answer;
}

/*
  get the position of a tab at pos
  Notes: the last tab is considered to be a repeat
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int gettabposition(int pos, int Ntabs, int *tablist)
{
  int i;
  int last, penultimate;
  int answer;

  assert(Ntabs &amp;gt; 0);

  for(i=0;i&amp;lt;Ntabs;i++)
    if(tablist[i] &amp;gt; pos)
      return tablist[i];

  last = tablist[Ntabs -1];
  penultimate = (Ntabs &amp;gt; 1) ? tablist[Ntabs-2] : 0;
  answer = last;
  while(answer &amp;lt; pos)
    answer += last - penultimate;
  return answer; 
}

/*
  remove tabs from a line, replacing with spaces
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int detabline(char *out, const char *in, int Ntabs, int *tablist)
{
  int i, ii, j;
  int tabpos;

  assert(Ntabs &amp;gt; 0);

  j = 0;
  for(i=0;in[i];i++)
  {
    if(in[i] != '\t')
      out[j++] = in[i];
    else
    {
      tabpos = gettabposition(j, Ntabs, tablist);
      for(ii=j;ii&amp;lt;tabpos;ii++)
        out[j++] = ' ';   
    }
  }
  out[j] = 0;
  return 0;
} 

/*
  detab a file
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;void detab(FILE *out, FILE *in, int Ntabs, int *tablist)
{
  char *buff;
  char *temp;
  char *line;
  size_t buffsize = 1024;

  buff = malloc(buffsize);
  if(!buff)
  {
    fprintf(stderr, "Put of memory\n");
    exit(EXIT_FAILURE);
  }
  while( (line = getline(in)) )
  {
    if(buffsize &amp;lt; maxmemory(line, Ntabs, tablist))
    {
      temp = realloc(buff, maxmemory(line, Ntabs, tablist));
      if(!temp)
      {
        fprintf(stderr, "Out of memory\n");
        exit(EXIT_FAILURE);
      }
      buff = temp;
      buffsize = maxmemory(line, Ntabs, tablist);
    }

    detabline(buff, line, Ntabs, tablist);
    fprintf(out, "%s", buff);
    free(line);
  }
  free(buff);  
}

/*
  detab by simply replacing tabs with spaces
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;void detabsimple(FILE *out, FILE *in, int Nspaces)
{
  int ch;
  int i;

  while( (ch = fgetc(in)) != EOF)
  {
    if(ch == '\t')
      for(i=0;i&amp;lt;Nspaces;i++)
        fputc(' ', out);
    else
      fputc(ch, out);

  }
} 

/*
  get list of tabs from argument list
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int *gettablist(char **args, int *Ntabs)
{
  int N;
  int i;
  char *end;
  int *answer;
  
  for(i=0;args[i];i++)
    if(!integral(args[i]))
      break;
  N = i;
  if(N == 0)
  {
&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;Ntabs = 0;
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;    return 0;
  }
  answer = malloc(N * sizeof(int));
  if(!answer)
  {
&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;Ntabs = 0;
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;    return 0;
  }
  for(i=0;i&amp;lt;N;i++)
    answer[i] = (int) strtol(args[i], &amp;amp;end, 10);
&lt;ul&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;Ntabs = N;
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;  return answer;
} 

/*
  get list of tabs from argument list, with checking
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int *getcheckedtablist(char **args, int *Ntabs)
{
  int *answer;
  int i;

  answer = gettablist(args, Ntabs);
  if(!answer)
  {
    fprintf(stderr, "Bad tab list\n");
    exit(EXIT_FAILURE);
  }
  if(answer[0] &amp;lt;= 0)
  {
    fprintf(stderr, "tabs must be positive integers\n");
    exit(EXIT_FAILURE);
  }
  if(!ascending(answer, *Ntabs))
  {
    fprintf(stderr, "tabs must be in ascending order\n");
    exit(EXIT_FAILURE);
  }
  for(i=0;i&amp;lt;*Ntabs;i++)
    answer[i] -= 1;
  
  return answer;
}

/*
  print out usage message
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;void usage(void)
{
  printf("detab - replace tabs with spaces\n");
  printf("Usage [options] &amp;lt;filename&amp;gt;\n");
  printf("Options: -tab &amp;lt;tablist, ...&amp;gt;\n");
  printf("         -s &amp;lt;Nspaces&amp;gt;\n");
  printf(" -tab, list of tab positions (1-based). Last tab repeats.\n");
  printf(" -s -simply replace tabs with given number of spaces\n");
  printf("  (Note -s 0 will remove all the tabs)\n");
  exit(EXIT_FAILURE);
}

/*
  main fucntion, detab a text file by replacing with spaces
&lt;ul&gt;&lt;ul&gt;&lt;li&gt;/
&lt;/li&gt;
&lt;/ul&gt;&lt;/ul&gt;int main(int argc, char **argv)
{
  FILE *fpin;
  int Nspaces;
  int Ntabs;
  int *tablist;
  char *end;

  if(argc == 1)
    usage();
  if(!strcmp(argv[1], "-s"))
  {
    if(!integral(argv[2]))
      usage();
    Nspaces = (int) strtol(argv[2], &amp;amp;end, 10);
    if(Nspaces &amp;lt; 0)
    {
      fprintf(stderr, "negative number of spaces %d\n", Nspaces);
      exit(EXIT_FAILURE);
    } 
    if(argc == 3)
      fpin = stdin;
    else if(argc == 4)
    {
      fpin = fopen(argv[4], "r");
      if(!fpin)
      {
        fprintf(stderr, "Error opening %s\n", argv[4]);
        exit(EXIT_FAILURE);
      }
    }
    else
      usage();
    detabsimple(stdout, fpin, Nspaces);
    if(fpin != stdin)
      fclose(fpin);
  }
  else if(!strcmp(argv[1], "-tab"))
  {
    tablist = getcheckedtablist(argv + 2, &amp;amp;Ntabs);
    if(argc == Ntabs + 2)
       fpin = stdin;
    else if(argc == Ntabs + 3)
    {
      fpin = fopen(argv[Ntabs + 2], "r");
      if(!fpin)
      {
        fprintf(stderr, "Error opening %s\n", argv[Ntabs+2]);
        exit(EXIT_FAILURE);
      }
    }
    else
      usage();
    detab(stdout, fpin, Ntabs, tablist);
    free(tablist);
  }
  else
    usage();

  fflush(stdout);
  return 0; 
}

&lt;/pre&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/1180-detab-utility-in-C/</guid>
      <pubDate>Fri, 07 Aug 2009 06:06:52 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>BattleBugs</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/488-BattleBugs/</link>
      <description>The big mainframe we use to do protein structure calcualtions has gone down, probably with a virus.&lt;br /&gt;
&lt;br /&gt;
In the meantime I'm thinking about a new use for my Basic Interpreter (this is available from my website, I also publish it as a book). The idea is BattleBugs.&lt;br /&gt;
&lt;br /&gt;
You have a game arena consisting of a terrain on which grows grass. You then have two teams of bugs. The bugs are controlled by little Basic programs. The bugs move to eat grass, when they get energy. After a bug has reached a certain energy level, it can reproduce.When it runs out of energy, it dies.&lt;br /&gt;
&lt;br /&gt;
[includeimage:1]&lt;br /&gt;
&lt;br /&gt;
Bugs are allowed to execute one line of Basic per move. When they move into other bugs they fight.&lt;br /&gt;
&lt;br /&gt;
The skeleton of the game is working, however I need a designer for the fighting rules. The rules have got to have a sort of open-ended quality to them so that programs can become more and more ingenious. Also at present the game does not have enough visual appeal. Fighting rules should change the morphology of the bug.&lt;br /&gt;
&lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/488-BattleBugs/</guid>
      <pubDate>Sat, 10 Jan 2009 05:48:34 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>Fortran vs C, is your symbol really necessary</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/427-Fortran-vs-C-is-your-symbol-really-necessary/</link>
      <description>All the legacy protein energy potential code is in Fortran 77. However I use C for bread an butter work.&lt;br /&gt;
&lt;br /&gt;
For instance I recently had to load in over 400 data files, a matrix of interactionss for various amino acids (we represent the amino acid residue by virtual atom "balls", see previous posts). 400 paths are too many to hardcode or even put in a configuration file. However my naming system is regular - amino acid 1 letter code, the virtual atom or ball id (2 characters) and the same for its partner, to produce a 6 letter filename.&lt;br /&gt;
&lt;br /&gt;
In C it is dead easy to generate the files. Simply say&lt;br /&gt;
&lt;pre class="sourcecode"&gt;
sprintf(filename, "%s/%c%s%c%s", directory, aa[i], ball[i], aa[j], ball[j]);
fp = fopen(filename, "r");
&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
In Fortran 77, believe it or not, there's no way to get the length of a string. Strings are just padded with spaces. However you can loop through the string, get the first space, and use that to conactenate&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="sourcecode"&gt;
      do 100 i = 1, 70
        if(plen .eq. 0 .and. path(i:i) .eq. ' ') plen .eq. i-1
100   continue
      write(fname, 1000) path(1:plen),aai,balli,aaj,ballj
1000  format(a,'/',a,a,a,a)
      open(11, file = fname, status = 'old')
&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
it's realy cluntsy and it took me the best part of an hour to work out how to coax Fortran into doing what I wanted.&lt;br /&gt;
&lt;br /&gt;
However there's another more important feature of Fortran 77 - no structures. If you want a list of points, you have to write&lt;br /&gt;
&lt;br /&gt;
real points(3,100)&lt;br /&gt;
&lt;br /&gt;
if you want amino acid resiudes to attach to them, it has to be&lt;br /&gt;
&lt;br /&gt;
character *1 aacids(100) &lt;br /&gt;
&lt;br /&gt;
This is different to C. The danger is that the C programmer will write&lt;br /&gt;
&lt;br /&gt;
typedef struct &lt;br /&gt;
{&lt;br /&gt;
  float x;&lt;br /&gt;
  float y;&lt;br /&gt;
  float z;&lt;br /&gt;
} Point;&lt;br /&gt;
&lt;br /&gt;
typedef struct&lt;br /&gt;
{&lt;br /&gt;
  Point pt;&lt;br /&gt;
  char aacid;&lt;br /&gt;
} Residue;&lt;br /&gt;
&lt;br /&gt;
This is OK as long as all structures are used consistently throughout the program. However it has the potential to go horribly wrong. What if your root mean squared deviation function takes an array of Points, for example? It cannot be passed an array of Residues. Either you need glue code to write out the Points into a temporary array, or you need to hack into the rmsd function to take a Residue *.&lt;br /&gt;
&lt;br /&gt;
This reaches the height of stupidity when you get aliases for various C types. Let's say someone passes the number of balls per residue as an unsigned int *, another as a size_t *. Underneath they are probably the same. But you need glue code just to convert an array of unsigned ints to an array of size_t's. Worse, someone will cast the problem away, and everything wil go horribly wrong when we move to 64 bits and 64 bit size_t,s 32 bit ints.&lt;br /&gt;
&lt;br /&gt;
By not having any user-defined type symbols, Fortran 77 gets round that problem. So the practical case is that Fortran 77 routines (subprograms) tend to be easier to move between programs than C routines. &lt;br /&gt;
&lt;br /&gt;
The motto is, for the C programmer, "is your symbol really necessary?".</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/427-Fortran-vs-C-is-your-symbol-really-necessary/</guid>
      <pubDate>Fri, 28 Nov 2008 09:55:22 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>Boltzmann-weighted peptide models</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/376-Boltzmann-weighted-peptide-models/</link>
      <description>Finally got some decent models out of the computer.&lt;br /&gt;
&lt;br /&gt;
My problem is processing time. Because of the way I build up my protein models, each amino acid residue adds a factor of 18 to the number of conformations. That is, I have 18^N conformations. Somehow these have to be searched to find the minimum energy structure. &lt;br /&gt;
&lt;br /&gt;
However a molecule doesn't exist in only one conformation, except at zero Kelvin or when crystallised. It moves about in the solvent. We hope to capture this. What we do is run it in a simualted temperature bath, until it has found the minimum conformation. Then we run it for as long again, taking a thousand samples from the run. We then super-impose these to gain a view of the conformation.&lt;br /&gt;
&lt;br /&gt;
This leads to the question, how do we know it has found the minimum? What we do is run three independent simulations. if all three find the same minimum value, we assume that the genuine minimum has been found.&lt;br /&gt;
&lt;br /&gt;
Here are the results for a bovine rhdopsin segment (protein data bank code 1eds).&lt;br /&gt;
&lt;br /&gt;
[includeimage:1][includeimage:2][includeimage:3]&lt;br /&gt;
&lt;br /&gt;
The first image shows the native structure. The second image shows the results using the forcefield as it used to be. The final image shows my "improved" forcefield which includes rotamers. Rotamers give the sidechains of the amino acids a bit of freedom to wiggle about. The old forcefield treated them as a rigid stick of balls.&lt;br /&gt;
&lt;br /&gt;
What has happened is that the rotamers have increased the entropic space of the problem enormously. We go from 6^N moving just the protein backbone to 18^N moving sidechains as well. So for the same temperature, the simulation is fuzzier.&lt;br /&gt;
Eyeballing it, there seems to be no real difference in accuracy. We've more or less got the alpha helix at one terminus, but the less structured part at the other terminus is not modelled very accurately. &lt;br /&gt;
The image you see took about three days with 8 processors. The problem is that I had to run three replicates to make sure I'd got the minimum, then the old forcefield as a control. So in fact it tied up all my processor allotment for 3 days. So I've got a new account on a bigger machine. It has a job limit of 48 hours. However I've bumped up the number of processors to compensate.</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/376-Boltzmann-weighted-peptide-models/</guid>
      <pubDate>Mon, 13 Oct 2008 08:29:39 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>GUIs versus little languages</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/357-GUIs-versus-little-languages/</link>
      <description>We use several software tools in our research. These include wordprocessors, graphics packages, slide show presentation packages, tools for manipulating data, and of course programming languages.&lt;br /&gt;
&lt;br /&gt;
There are basically two ways of designing a tool. You can point and click, or you can use some sort of scripting. Programs on the Beowulf cluster are entirely scripting - it has no graphical screen round which to build a GUI. On the desktop, we've got a choice.&lt;br /&gt;
&lt;br /&gt;
Excel is a predominantly GUI-based statistics package. It is designed as a spreadsheet. You enter numbers into cells, then select manipulations on them from a menu. Finally you create charts from the data, again largely mouse driven. However it is not entirely a GUI - it is possible to use macros that represent a substantial chunk of code. The other statisitics package we use is R. This is a "little language" package. You set up the data as csv files. You then read the data into an R data frame, which you manipulate using the R scripting langauge. It is in fact a full programming language, though it has features that make statistics particularly easy, like dedicated commands for drawing graphs.&lt;br /&gt;
&lt;br /&gt;
Which is better? Excel is easier to pick up and play with. However the great advantage of R is that you have a record of all the manipulations you have applied to your data. So if something is wrong you can go back and correct it. or you can apply the same manipulations to a different data set. The disadvantage is the difficulty of learning the language. Simple things like the code to set all missing numbers to a set value are slightly tricky.&lt;br /&gt;
&lt;br /&gt;
(The code is&lt;br /&gt;
&lt;br /&gt;
array[is.na(array)] &amp;lt;- x;&lt;br /&gt;
&lt;br /&gt;
)&lt;br /&gt;
&lt;br /&gt;
Who would have guessed, especially since the thing will fall over if you type is.nan() by accident?&lt;br /&gt;
&lt;br /&gt;
In practise we need a balance between GUIs and little languages, so that manipulations which are more intuitive with the GUI and done via GUIs, and those which can't be easily reduced to mouse presses are done in the scripting. OpenOffice does this quite well. The bulk of the program is a GUI wordprocessor, but the equation editing uses a little language. It is much easier than trying to figure out how to get Greek subscripts from a mouse and keyboard.&lt;br /&gt;
&lt;br /&gt;
MiniBasic is designed as a little language to slot into applications. The advantage of using Basic is that, firstly, it is a complete programming language, secondly it is relatively easy to implement, and thirdly there is a huge base of non-programmers who know it, left over from the microcomputer craze of the 1980s. BASICdraw is a program I wrote to showcase MiniBasic's capabilities.&lt;br /&gt;
&lt;br /&gt;
[includeimage:1]&lt;br /&gt;
&lt;br /&gt;
It's not yet an panacea. BASICdraw will allow you to do things with graphics which are easily described but can be painful from a GUI, such as overlaying an image with pixel-wide grid points. It also allows the creation of some nice programs. For instance there's one that maps a texture to a sphere and then shades it. On the other hand it is not easy to pick it up and draw with it. It's niche is probably with professional graphic artists.&lt;br /&gt;
&lt;br /&gt;
Any suggestions for improvements to BASICdraw are welcome. The program is freeware, from my website.</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/357-GUIs-versus-little-languages/</guid>
      <pubDate>Fri, 19 Sep 2008 09:56:30 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>Minimisation routines</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/347-Minimisation-routines/</link>
      <description>As I mentioned in the last post, we've added rotamers to the protein model.&lt;br /&gt;
&lt;br /&gt;
This threw up a problem. Proteins have torsional degrees of freedom along the backbone, however generally the angles are confined to one or two areas of Ramachandran space - given by plotting the phi versus the psi backbone angles against each other. The other torsion angle, the omega angle, is usually planar - it moves but only very slightly.&lt;br /&gt;
&lt;br /&gt;
What we do is reduce the backbone angles to 6 points on the Ramachandran plot.&lt;br /&gt;
&lt;br /&gt;
[includeimage:1]&lt;br /&gt;
&lt;br /&gt;
So now we can describe the protein backbone as a string of integers, like this&lt;br /&gt;
&lt;br /&gt;
613555555555421&lt;br /&gt;
&lt;br /&gt;
The string of 5s is typical and indicates an alpha helix - point 5 is in the helical area of the plot.&lt;br /&gt;
&lt;br /&gt;
However when I add the rotamers we also need a number between 1-3 for each sidechain. The description becomes&lt;br /&gt;
&lt;br /&gt;
Backbone  613555555555421 &lt;br /&gt;
&lt;br /&gt;
Sidechain 122111233112312&lt;br /&gt;
&lt;br /&gt;
What this means is that we've gone from 6^15 possibilities to 6^15 * 3^15 possibilities for a 15-residue peptide. With our current forcefield a 15-mer can be enumerated in about 6 months on a single processor, or in about 2 weeks if we use parallel processors. However when we add the rotamers, we'd have to throw the entire academic grid at it to enumerate in reasonable time.&lt;br /&gt;
&lt;br /&gt;
In practise we only enumerate very short peptides of 12 or so residues. We use simulated annealing to find the minimum for longer peptides. Simulated annealing works by trying a random conformation. Then we accept it if it is better than our current conformation. If it is worse than our current conformation, we accept it with a proability based on the temperature. Intially the temperature is high and we are accepting almost all conformations. Then the temperature is gradually reduced, until we are only accepting conformations that are better then our current one. The idea is that at the beginning of the run we explore all of conformational space, so we then concentrate the low temperature search in the subset where the minimum is.&lt;br /&gt;
&lt;br /&gt;
Simulated annealing works well for relatively small search spaces like 6^15. When I added the rotamers, the approach was beginning to break down. A more sophisticated algorithm was needed. I finally decided upon feedback-optimised replica exchange. Instead of reducing the temperature gradually, I perform several runs at different temperatures. Then epriodicially I exchange solutions between temepratures. The feedback-otpimisation dynamically changes the temperatures to be the optimal set.&lt;br /&gt;
&lt;br /&gt;
With this technique I can find the minimum of a 15-residue peptide in about 30 minutes, using 8 processors in parallel. So I ran the forcefield with the new rotamers on 23 peptides between 12 and 20 residues long. The results showed that adding rotamers made predictions neither better nor worse. This wasn't unexpected, because short peptides don't have close sidechain interactions, all the strong interactions are hydrogen-bonding in the backbone. The next task is to see whether we have improved performace on bigger prioteins, where the cores are packed closely together. &lt;br /&gt;</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/347-Minimisation-routines/</guid>
      <pubDate>Wed, 10 Sep 2008 09:52:03 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>Soft Coding</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/332-Soft-Coding/</link>
      <description>Here's the chap who invented the term &lt;a href="http://thedailywtf.com/Articles/Soft_Coding.aspx"&gt; "soft coding".&lt;/a&gt;.&lt;br /&gt;
&lt;br /&gt;
I've added my rotamer set to the proteins. Now I've got to run the forcefield over them to see whether adding sidechain degrees of freedom has improved our ability to discriminate folds. It's not the end of the world if it hasn't - some tweaking of the forcefield to accomodate the extra accuracy might well be required.&lt;br /&gt;
&lt;br /&gt;
However each run needs an input file. That input file references a backbone file with the backbone degrees of freedom in it, a sequence file with the sequence data, an output file for the protein atom co-ordinates, a configuration file with the start configuration, a balls file with the original sidechains, a new balls file I've just created with the rotamers. All in all it's a huge edifice. Then the input files need to be submitted to the cluster job scheduling code. So this necessitates a shell script file for each job, and an output file to hold diagnostic prints.&lt;br /&gt;
&lt;br /&gt;
It all adds up to soft coding. Any mistake in any of these files will invalidate the run. So I have to go through, running for trivial cases, just to make sure all the ancillary files are set up right. That's assuming that the main program is bug free. It's too much work for anyone except the programmer to expect to run the program correctly, which is no good long term. We need some sort of script so that the user can type the amino acid sequence of the protein, press GO, and get out a prediction.&lt;br /&gt;
&lt;br /&gt;
[includeimage:1]&lt;br /&gt;
&lt;br /&gt;
Here's a picture of the rotamers with their correct reduced representations. The backbone is the green balls. Arginine has one blue ball and one red ball. The red ball is charge, the blue ball is oily. I've got three rotamers for the three major orientations of the sidechin, so you see six sidechain balls in all.&lt;br /&gt;
&lt;br /&gt;
[includeimage:2]&lt;br /&gt;
&lt;br /&gt;
Tyrosine just has one oily ball. There's a bit of an electrical charge on the oxygen at the end of the aromatic ring, but we're ignoring this. It's a very simple model and we don't try to include everything. Version 1 didn't even have any rotamers.&lt;br /&gt;
&lt;br /&gt;
It's all been submitted to the cluster. By tomorrow I ought to know whether we have improved predictions or not.</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/332-Soft-Coding/</guid>
      <pubDate>Wed, 27 Aug 2008 09:04:41 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
    <item>
      <title>Protein rotamers</title>
      <link>http://www.programmersheaven.com/user/Malcolm_McLean/blog/326-Protein-rotamers/</link>
      <description>Sometimes you just wish you could program visually.&lt;br /&gt;
&lt;br /&gt;
Proteins consist of a long chain of amino acids, or "residues", connected together in what we call the "backbone". All the backbone atoms for each residue are identical. The residues differ in the sidechain. Glycine, the simplest, has a single hydrogen for its sidechain. Tryptophan, the most complex, has 10 heavy atoms arranged in two rings.&lt;br /&gt;
&lt;br /&gt;
To a first approximation, all the bond lengths and bond angles are invariant. The protein has dihedral or torsional degrees of freedom. What I am concentrating on is the sidechain. Each residue has between 0 and 4 'chi' angles, torsion angles that describe the orientation of four connected atoms.&lt;br /&gt;
&lt;br /&gt;
For fundamental reasons, atoms like to go into the spaces between other atoms. So the chi one angle has three major orientations, + 60 degrees, -60 degrees, and trans, or 180 degrees. These slot the sidechin atoms into the spaces between the other atoms in the backbone. So the idea is to reduce the description of the protein sidechain to one number between 1 and 3.&lt;br /&gt;
&lt;br /&gt;
The problem is getting the other chi angles right, to place the other atoms of the sidechains in sensible positions. This has taken the best part of a week. I've got code that creates protein geometry in C, but the angle set then has to then go onto the mainframe computer in Fortran, for potential energy calculations. So I'm setting up input files and printing out co-ordinates. They've then got to be loaded into protein viewers like rasmol or chimera. Chimera won't print out the torsion angles from the mainframe, because as well as reducing the number of angles I've also reduced the number of atoms. Some "atoms" are imaginary virtual ones. It can't handle this for dihedral calcualtions. Rasmol can.&lt;br /&gt;
&lt;br /&gt;
Eventually I've gone from an input file with all the chi angles to a mainframe reduced representation, and got all the sidechains to match up. This requires four programs. One to generate the angle set, one to generate a reduced representation from the angle set, one on the mainframe to generate a model for the reduced representation and output it, and one to take in the mainframe model and check it against the original angle set.&lt;br /&gt;
&lt;br /&gt;
[includeimage:1]&lt;br /&gt;
&lt;br /&gt;
The picture shows the final result. Here's a tryptophan and a tyrosine against the rotamer set. I still haven't worked out how to get chimera to display the virtual atoms (the white model) with proper size.&lt;br /&gt;
&lt;br /&gt;
So much work. This is typical of 3d apps. We need some sort of tool so that you can program visually, or at least see co-ordinates as you go.</description>
      <guid isPermaLink="true">http://www.programmersheaven.com/user/Malcolm_McLean/blog/326-Protein-rotamers/</guid>
      <pubDate>Thu, 21 Aug 2008 05:34:39 -0700</pubDate>
      <dc:creator>Malcolm_McLean</dc:creator>
    </item>
  </channel>
</rss>