Biological medical products or biologics are in the spotlight of pharmaceutical research nowadays. With these mainly protein or peptide based products more specific therapeutic targets can be set and they provide higher efficiency with less side effects. The greatest problem with these therapeutics is their complexity that makes their research and manufacturing quite challenging. However the expiring patents of the original biologics opens possibilities for new manufacturers to produce biosimilar products that can reduce the costs of the development of the medicine. There is no specific definition for biosimilarity, only general directives are available for pharmaceutical companies. Therefore during the development process they have to set their own standards to compare to the originator product by using wide range of analytical measurements. This situation raises the need of a method that can describe the state of the product and its biological distance from the originator. This thesis addresses this problem using statistical and machine learning approaches.
For the analyses data was collected on a pegilated recombinant protein based drug, developed by Richter Gedeon.
On the data principal component analysis was used to explore the main factors describing the data and the potency of the pharmaceutical preparations. With Bayes analysis the dependencies of the measured parameters were examined with extra emphasis on potency and it's connections. Regression models were created to predict potency, these models were compared by different measures. Finally a kernel based data fusion method was used to compute the similarity of the reference and RG products. This method can be used to create a rank for the products showing their difference from a reference.