A Christian College of the Liberal Arts & Sciences

Undergraduate Research

Houghton’s undergraduate computer science students can extend their understanding of computer science by applying their skills to research, either through honors projects or summer research opportunities. Many of these research projects are in the area of bioinformatics, allowing students to apply their understanding of the liberal arts in an integrative way to address important medical issues using the tools of computer science. Such opportunities can be significant in choosing career direction and getting internships or jobs.

Honors Projects:

Students who excel in the standard computer science curriculum are frequently encouraged to take part in an honors project – and extended, multi-semester study that culminates in the production and defense of a final thesis.

Project 1: Codon Usage Bias in H1N1 Influenza by Keli Fancher ’11

In DNA sequences, amino acids are triggered by a series of 3 nucleic-bases, called a codon. However there are 64 different codons that appear in the genetic code, and there are only 20 amino acids; as such many amino acids are encoded by more than one codon. The study of influenza has shown it to be strongly biased towards the use of some codons over others. Past study on this "synonymous codon usage bias" has revealed that by understanding this bias we may better understand the mutational patterns of Influenza.

The purpose of this study is to apply four different measurements of codon bias to the newly emerged strain of the influenza A H1N1 "Swine Flu" virus. Through this study, we aim to discover which of the four different metrics for codon bias provides the best understanding of the mutational possibilities of influenza. Of particular interest in this study is the currently circulating "Swine Flu" virus, an understanding of which may help prevent future pandemic-level threats.

Project 2: Identifying a Novel Gene Signature for Lung Cancer Survival, by Erin Bard ‘11

The purpose of this study is to find a particular sequence of genes that most often correctly identifies the subtype of lung cancer a patient has. This will build upon research done at other institutions in identifying the genes that are responsible for different subtypes of lung cancer.

To do this, we will be implementing several different machine learning algorithms that will analyze all the patient data and build a new gene sequence from that data. Then we will evaluate the new gene sequence to see what type of lung cancer it predicts in a given patient. to determine if my algorithm and gene sequence is effective or not in predicting the type of lung cancer. To determine if the new algorithm and gene sequence is effective or not in predicting the type of lung cancer, we will compare that result to the results achieved by researchers using other algorithms.

Project 3: Categorizing HIV-1 Subtypes using an Ant-Based Clustering Algorithm, by David King ‘09

Human Immunodeficiency Virus (HIV), the cause of the lethal acquired immunodeficiency syndrome (AIDS) is a worldwide pandemic. The virus is so difficult to treat, in part, due to its rapid mutation rate – there are 11 genetic subtypes of HIV, as well as a number of recombinant subtypes. The ability to algorithmically process genetic information of unknown HIV viruses and make accurate predictions of the subtype is of critical importance to our ability to treat the infected.

Ant based algorithms, a subset of machine learning techniques that make use of small sub-routines with limited intelligence to solve computationally difficult problems, have been used in the past to cluster and classify complex data, similar to those represented by the genetic information of HIV. In this study, we proposed a novel application of the ant based algorithm to the problem of HIV subtype identification, and evaluated the effectiveness of the program, finding it to be preliminarily competitive with other identification algorithms currently available.

This paper has been accepted for publication by The Journal of Biomedical Science and Engineering.

Summer Research

Each summer, the a group of computer science students and faculty participate in the Houghton College’s Summer Research Institute. Students in the sciences work closely with faculty mentors on research projects in a variety of topics for a period of five weeks, before giving a presentation of their findings.

Find out more about the Summer Research Institute.

Summer Research Institute 2009: Correlation of Selected Molecular Markers in Chemosensitivity Prediction

Finding effective cancer treatment is a challenge, because the sensitivity of the cancer stems from both intrinsic cellular properties and acquired resistances from prior treatment. Particular pieces of genetic and proteomic information can be highly informative to these properties and resistances. A previous study reported on a number of markers that could be used to predict how sensitive a cancer might be to a certain treatment. These markers were treated as independent variables; we know, however, that the individual pieces of data are biologically related to one another, meaning that viewing them as independent will not reveal a complete appreciation of their significance. Our goal was to find correlated markers which are collectively significant to sensitivity prediction to complement the individual markers already reported.

To validate our approach, we identified the protein markers that were strongly correlated by our analysis with the individual protein markers found in previous studies. Our feature analysis discovered highly correlated protein marker pairs, based on which we found individual protein markers with medical significance. While some of the markers uncovered were consistent with those previously reported, others were original to this work. Using these marker pairs we were able to further correlate the cellular functions associated with them. As an exploratory analysis, we discovered feature selection correlation patterns between and within different drug mechanisms of action for each of our datasets. In conclusion, the highly correlated protein marker pairs as well as their functions found by our feature analysis are validated by previous studies, and are shown to be medically significant, demonstrating D’ as an effective measurement of correlation in the context of feature selection for the first time.

This paper has been accepted for publication by The Journal of Biomedical Science and Engineering. Read it online here.