In the world of biology there are lots of unanswered questions that can't be solved using only our knowledge from biology. We might need to know techniques from programming, algorithms and other parts of mathematics as well. Bioinformatics uses these fields to solve problems in biology.
A problem like this is the prediction of the secondary structure of RNA molecules. We can't tell the secondary structure solely from the nucleotid sequence. However, this structure plays a main role in defining the features of the molecule, so determining the connections between the nucleotids is vital.
We have some knowledge of rules that determine possible connections that the nucleotids can form. But we still have to consider many possible secondary structures so at this point a different approach is necessary.
Let's consider these rules as the rules of a grammar. The RNS molecule is the sentence and the nucleotids are the words. Now we can use our knowledge of grammars in computer science, especially of context-free grammars.
The CYK algorithm makes it possible to find the possible sequence of rules that generates the RNA. From these rules we can find out where the connections between the nucleotids were.
However there are many possibilities and we only need the most likely one. So the solution is to give a probability to every rule and use this feature in the CYK algorithm to determine the most probable derivation tree. To compute the most accurate probabilities I used the inside-outside algorithm and EM algorithm.
My goal is to implement a program, that will predict secondary structure of RNAs using the beforementioned technologies.