PhD, EngD and MSc by research theses (Cranfield Health)
Permanent URI for this collection
Browse
Browsing PhD, EngD and MSc by research theses (Cranfield Health) by Supervisor "Bessant, Conrad"
Now showing 1 - 13 of 13
Results Per Page
Sort Options
Item Open Access Bioinformatics solutions for confident identification and targeted quantification of proteins using tandem mass spectrometry(Cranfield University, 2009-10) Cham, Jennifer A.; Bessant, Conrad; Regan, StephenProteins are the structural supports, signal messengers and molecular workhorses that underpin living processes in every cell. Understanding when and where proteins are expressed, and their structure and functions, is the realm of proteomics. Mass spectrometry (MS) is a powerful method for identifying and quantifying proteins, however, very large datasets are produced, so researchers rely on computational approaches to transform raw data into protein information. This project develops new bioinformatics solutions to support the next generation of proteomic MS research. Part I introduces the state of the art in proteomic bioinformatics in industry and academia. The business history and funding mechanisms are examined to fill a notable gap in management research literature, and to explain events at the sponsor, GlaxoSmithKline. It reveals that public funding of proteomic science has yet to come to fruition and exclusively high-tech niche bioinformatics businesses can succeed in the current climate. Next, a comprehensive review of repositories for proteomic MS is performed, to locate and compile a summary of sources of datasets for research activities in this project, and as a novel summary for the community. Part II addresses the issue of false positive protein identifications produced by automated analysis with a proteomics pipeline. The work shows that by selecting a suitable decoy database design, a statistically significant improvement in identification accuracy can be made. Part III describes development of computational resources for selecting multiple reaction monitoring (MRM) assays for quantifying proteins using MS. A tool for transition design, MRMaid (pronounced „mermaid‟), and database of pre-published transitions, MRMaid-DB, are developed, saving practitioners time and leveraging existing resources for superior transition selection. By improving the quality of identifications, and providing support for quantitative approaches, this project brings the field a small step closer to achieving the goal of systems biology.Item Open Access Building bioinformatics solutions for biomarker identification(Cranfield University, 2008-08) Oakley, Darren; Bessant, ConradThis thesis describes the design, implementation and application of bioinformatics systems to aid work in the field of biomarker discovery and diagnostic test development. The aim of the work was to develop a flexible data storage and analysis platform that would be capable of housing and working with data from a variety of modern biomarker analysis techniques. In order to achieve this aim, several tools were developed: a flexible database schema, taking ideas from the field of systems biology, was developed with the goal of being flexible enough to house information about experiments looking at targets such as genes, proteins and metabolites; and API was created to allow easy programmatic interaction with the database; and multivariate data analysis routines were prepared so that data imported into the database could be investigated. Together this toolset was named XPA [for ‘Cross Platform Analysis’]. The XPA system was tested by using it to house and analyse data from two different medical studies, one using quantitative PCR [qPCR] to observe gene expression changes in prostate cancer, and the second using surface enhanced laser desorption/ionisation mass spectrometry [SELDI MS] to generate protein profiles in sufferers of pre-eclampsia. In both studies XPA was used to develop multivariate classification models using partial least squares discriminant analysis [PLS-DA] and support vector machines [SVMs], with the aim of evaluating the data acquired for potential diagnostic use. The results showed the benefit of a tool such as XPA to the field of biomarker discovery.Item Open Access Design of a field-portable low power personal data logger - A hardware perspective(Cranfield University, 2008-01) Pitts, David G.; Bessant, ConradThere are a vast number of field–portable data loggers currently on the market. They differ greatly in terms of capability and complexity, in many cases being application or function specific. A survey was undertaken to identify market trends and future developments, system hardware specifications and the technologies employed. After comparing system specifications, it was apparent that there was a strong correlation between system performance and power consumption - high performance systems tend to be power hungry, and are typically larger and heavier than their lower performance counterparts. The aim of this project was to design the core of an advanced, flexible, low-power portable data acquisition system, a ‘personal’ data logger (PDL), suitable for medical or athletic performance monitoring. The pocket-sized target system should be capable of high performance - sampling daily or up to 20,000 samples per second – with low power operation, and should be able to measure both analogue and digital signals. The data must be stored in a high-capacity non-volatile memory card, with USB and RS-232 ports provided for data upload and system configuration. With the design specification defined, low power design techniques and the various battery and power supply options were investigated. A survey of system components was carried out and suitable low-power parts identified and selected for the design. After checking the project schematics, the circuit board was designed, manufactured and carefully assembled, ready for function and performance testing. The test results indicated that the project met the design specification, demonstrating its potential for use in a small portable personal data logger. Further work would be required to refine the power supply and power management systems, add an interface board housing a real-time clock, analogue signal conditioning, and input and output connectors, and to develop embedded system software.Item Open Access Development and evaluation of statistical approaches in proteomic biomarker discovery(Cranfield University, 2011-11) Patel, Amit; Bessant, ConradA biomarker is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacological responses to a therapeutic intervention. The aim of this project was to deal with the identification of potential biomarker candidates from experimental data comparing samples displaying divergent physiological traits. Chapter 1 introduces the topic and the aims of the project. The primary aim was to identify the ideal statistical analysis methods and data pre- and post-treatment options to use for potential biomarker identification from proteomic datasets. The product of this work was a statistical analysis pipeline for identifying potential biomarker candidates from proteomic experimental data. Proteomic data often suffers from missing values, so methods to deal with these were also evaluated in this project. Chapter 2 outlines the data sets that were used as well as presenting an overview of the “Biomarker Hunter” pipeline software solution created in this project. Chapter 3 evaluates the appropriate univariate statistical methods to use for biomarker identification and the results of biomarker identification using these techniques. Chapter 4 evaluates options for data pre- and post-processing. Chapter 5 suggests the use of missing value imputation as well as offering a novel clustering algorithm to deal with missing values. The software pipeline also offers multivariate statistical methods, which are evaluated in Chapter 6. Chapter 7 provides some business context for both biomarker discovery and the statistical analysis software available for the purpose of proteomic biomarker discovery. As well as providing a software pipeline for the identification of biomarkers, the project aimed to identify a suggested strategy for statistical analysis of proteomic experimental data. Strong conclusions regarding the ideal statistical approach could only be made if the list of actual, validated biomarkers were available. Unfortunately this information was not available, but in the absence of this a strategy was suggested based on the available information from both the available literature and the author’s interpretation of the results from this study. In terms of data pre-processing, this strategy involved not averaging technical replicates, and using total abundance normalisation to reduce technical variation. A novel clustering algorithm was suggested to reduce the presence of missing values prior to existing methods of missing value imputation. Following statistical analysis multiple testing correction methods should be implemented to reduce the number of false positives.Item Open Access Development and optimisation of chemometric techniques for the evaluation of meat freshness(Cranfield University, 2013) Chatzimichali, Eleni Anthippi; Bessant, ConradMuscle foods such as meat, fish and poultry are an integral part of human diet. Over time, such food succumbs to spoilage, resulting from various intrinsic and extrinsic factors, the most significant of which is microbial activity. Spoilage changes the organoleptic properties of meat, rendering it unacceptable to the consumer, and may ultimate result in the food becoming toxic. Spoilage is therefore of major commercial and public health interest. This thesis describes the development and application of a novel suite of software tools designed to support novel instrumental approaches for the accurate, rapid and inexpensive evaluation of meat freshness. A pipeline was built for the analysis of highly heterogeneous data obtained by a diverse range of high-throughput techniques across four three-class case studies. As a first step, PCA was applied for dimensionality reduction, feature extraction and exploratory analysis. PLS-DA and SVMs were employed as classifiers, and classification ensembles implemented as a means of improving classification accuracy. Rigorous validation and evaluation methods based on bootstrapping and permutation testing were applied to ensure that the performance metrics are representative of real-world application, and to ascertain the statistical significance of the results. This was made possible by the development of an advanced optimisation approach, which reduced the computational demands of SVM tuning by up to ~ 90× times. The functionality of the pipeline was further enhanced by exploiting GPA and CPCA as data fusion techniques, to evaluate whether better classification accuracy is achieved when integrated as opposed to standalone datasets are used. SVM ensembles proved to be the most powerful and accurate classification method since they produced consistently higher prediction rates ( ) than PLS-DA. Among the analytical techniques, HPLC was established as the most diagnostic method for the assessment of meat freshness, with a of 80%. Among the two data fusion techniques, CPCA outperformed GPA. However, CPCA only exceeded standalone HPLC in a minority of cases, presenting an overall of 82%.Item Open Access Development of a database and its use in the Investigation of Interferences in SRM assay design(Cranfield University, 2013-04) Dokpesi, Oshiobugie; Bessant, ConradSelected Reaction Monitoring (SRM), is a form of mass spectrometry that guarantees high throughput and also a high level of selectivity and specificity. Performing SRM experiments requires the development of assays to aid in peptide identification. This is a time consuming and expensive process thus biological researchers have come up with bioinformatics solutions for the design of SRM assay. The accuracy of these bioinformatics methods is quite high and the next step is to optimise the process by tackling the interference issue. As various analytes may have the same signals within an SRM experiment and thus interfere with each other’s signals, different solutions are being derived to tackle the issue. This thesis describes the development of a SRM transition database to store peptide and transition data, software to populate the database and also software to retrieve the data from the database. Finally the database is tested with the MRMaid transitions for the human proteome which were mined from the PRIDE database and the results analysed to investigate the transition interference issue. The database currently contains data for 20220 proteins and approximately 870,000 tryptic peptides from the human proteome.Item Open Access Development of an automated identification system for nano-crystal encoded microspheres in flow cytometry(Cranfield University, 2008-08) Clarke, Colin; Bessant, ConradQuantum dot encoded microspheres (QDEMs) offer much potential for bead based identification of a variety of biomolecules via flow cytometry (FCM). To date, QDEM subpopulation classification from FCM has required significant instrument modification or multiparameter gating. It is unclear whether or not current data analysis approaches can handle the increased multiplexed capacity offered by these novel encoding schemes. In this thesis the drawbacks of currently available data analysis techniques are demonstrated and novel classification methods proposed to overcome these limitations. A commercially available 20 code QDEM library with fluorescent emissions at 4 distinct wavelengths and 4 different intensity levels was analysed using flow cytometry. Multiparameter gating (MPG) a readily available classification method for subpopulations in FCM was evaluated. A support vector machine (SVM) and two types of artificial neural networks (ANNs), a multilayer perceptron (MLP) and probabilistic radial basis function (PRBF) were also considered. For the supervised models rigorous parameter selection using cross validation (CV) was used to construct the optimum models. Independent test set validation was also carried out. As a further test, external validation of the classifiers was performed using multiplexed QDEMs solutions. The performance of MPG was poor (average misclassification (MC) rate = 9.7%) was a time consuming process requiring fine adjustment of the gates, classifications made on the dataset were poor with multiple classifications on single events and as the multiplex capacity increases the performance is likely to decrease. The SVM had the best performance in independent test validation with 96.33% accuracy on the independent testing (MLP = 96.12%, PRBF = 94.38%). Furthermore the performance of the SVM was superior to both MPG and both ANNs for the external validation set with an average MC rate for MLP = 6.1% and PRBF = 7.5% whereas the SVM MC rate was 2.9%. Assuming that the external test solutions were homogenous the variance between classified results should be minimal hence, the variance of correct classifications (CCs) was used as an additional indicator of classifier performance. The SVM demonstrates the lowest variance for each of the external validation solutions (average σ 2 = 31479) some 50% lower than that of MPG. As a conclusion to the development of the classifier, a user friendly software system has been developed to allow construction and evaluation of multiclass SVMs for use by FCM practitioners in the laboratory. SVMs are a promising classifier for QDEMs that can be rapidly trained and classifications made in real time using standard FCM instrumentation. It is hoped that this work will advance SAT for bioanalytical applications.Item Open Access Evaluation of Wireless Sensor Networks Technologies(Cranfield University, 2008-09) Salan Padillo, Ignacio; Bessant, ConradWireless sensor networks represent a new technology that has emerged from developments in ultra low power microcontrollers and sophisticated low cost wireless data devices. Their small size and power consumption allow a number of independent ‘nodes’ (known as Motes) to be distributed in the field, all capable of ad-hoc networking and multihop message transmission. New routing algorithms allow remote data to be passed reliably through the network to a final control point. This occurs within the constraints of low power RF transmissions in a congested 2.4GHz radio spectrum. Wireless sensor network nodes are suitable for applications requiring long term autonomous operation, away from mains power supplies, such as environmental or health monitoring. To achieve this, sophisticated power management techniques must be used, with the units remaining ‘asleep’ in ultra low power mode for long periods of time. The main aim of this research described in this thesis is first to review the area and then to evaluate one of the current hardware platforms and the popular software used with it called TinyOS. Therefore this research uses a hardware platform designed from University of Berkeley, called the TmoteSky. Practical work has been carried out in different scenarios. Using Java tools running on a PC, and customized applications running on the Motes, data has been captured, together with information showing topology configuration and adaptive routing of the network and radio link quality information. Results show that the technology is promising for distributed data acquisition applications, although in time critical monitoring systems new power management schemes and networking protocols to improve latency in the system will be required.Item Open Access Extraction of genetic network from microarray data using Bayesian framework(2007-04) Kumuthini, Judit; Bessant, Conrad; Setford, S.The aim of the work described in this thesis was to develop novel methods for the extraction of gene regulatory networks (GRN) from gene expression data, and use these methods to capture previously unknown relationships between genes in specific biological applications. This has been accomplished through the application of Bayesian Networks (BN) through minimum description length (MDL) and taboo search for parameter and structure learning respectively to three large scale microarray datasets from Saccharomyeces cerevisae, Escherichia coli and human stem cells. The application of BNs for modelling the well characterised yeast cell cycle demonstrated the efficacy of the techniques employed. Using the cDNA microarray data from the yeast cell cycle project by Spellman et a l (1998), this study succeeded in extracting many biologically plausible genetic relationships, which were supported by evidence from publicly available genome and literature databases. Two novel knowledge extraction techniques were applied; Target Node (TN) analysis and learning through simulation. Further, it was demonstrated how the addition of prior knowledge to the extracted network can improve the network structure extracted purely from experimental data. The second part of this thesis demonstrated how the BN approach could be adapted to a data set of very high dimensionality, specifically data from a 54,634 probe array used to monitor human adipose tissue. Genetic networks extracted included insulin receptor (IR) and Fatty acid binding proteins (FABP) families that play key roles in fatty acid uptake, transport, and metabolism In the final part of this thesis, the genome-wide GRNs of a prokaryotic expression system were extracted from novel oligo cDNA microarray data from E-coli K12 to identify metabolic stress responsive genes during recombinant protein production. Also, detailed analysis of known metabolic stress related genes and the genes that are directly or indirectly associated in the GRN were used to establish possible markers for host system exhaustion. In conclusion, the BN methods developed proved to be a powerful and effective means of extracting GRNs in a variety of applications.Item Open Access Multivariate analysis methods for veterinary diagnostics using SIFT-MS(Cranfield University, 2010) Spooner, Andrew; Bessant, ConradSelected ion flow tube mass spectroscopy (SIFT-MS) is an analytical method for the investigation of volatile organic compounds (VOCs). It produces mass to charge (m/z) ratio ion counts with a range of 10-200 m/z. Current data analysis involves sifting through the spectra files one at a time looking for peaks of interest. This is time consuming and requires expert knowledge. This thesis proposes, implements and demonstrates a novel approach to the analysis of SIFT-MS data using multivariate techniques similar to those employed to analyse electronic nose and gas chromatography mass spectroscopy (GCMS) data. The methodology was developed using a set of samples created in the laboratory that belonged to two groups which contained different VOCs found in biological samples. The methodology requires the removal of the m/z peaks associated with the precursors, then principal component analysis (PCA) and partial least squares discriminant analysis (PLSDA) methods were evaluated for biomarker discovery and sample classification. Both methods produced excellent results, identifying the volatiles in the mixtures and being able to classify samples with 100% accuracy. This methodology was then tested using a variety of samples. Ammonia was found as a possible marker for bovine TB (Mycobacterium bovis) infection using serum samples taken from wild badgers. Discrimination results of an accuracy of 67%±6% were acquired. The number of sample needed to build the best performing model from this dataset was empirically shown to be 120. It was shown to be effective for the discrimination of serum samples from cattle taken before and after introduction of bovine TB (Mycobacterium bovis) bacteria in a clinical trial (accuracy of 85% achieved). A similar dataset pertaining to infection by Mannheimia haemolytica failed to produce models that performed as well as the others - this is suspect to be due to a poor experimental design. Finally, discrimination accuracies of 88% for urine samples collected from cattle from herds infected with Mycobacterium paratuberculosis and 90% for urine samples collected in the same bovine TB trial as above were achieved. The novel multivariate approach to SIFT-MS data analysis has been shown to be effective with a number of datasets but it is sensitive to the experimental design. Recommendation for the consideration required for analysis using this method have been made.Item Open Access Optimisation of machine learning methods for cancer detection using vibrational spectroscopy(Cranfield University, 2011-01) Sattlecker, Martine; Bessant, Conrad; Stone, NicholasEarly cancer detection drastically improves the chances of cure and therefore methods are required, which allow early detection and screening in a fast, reliable and inexpensive manner. A prospective method, featuring all these characteristics, is vibrational spectroscopy. In order to take the next step towards the development of this technology into a clinical diagnostic tool, classification and imaging methods for an automated diagnosis based on spectral data are required. For this study, Raman spectra, derived from axillary lymph node tissue from breast cancer patients, were used to develop a diagnostic model. For this purpose different classification methods were investigated. A support vector machine (SVM) proved to be the best choice of classification method since it classified 100% of the unseen test set correctly. The resulting diagnostic models were thoroughly tested for their robustness to the spectral corruptions that would be expected to occur during routine clinical analysis. It showed that sufficient robustness is provided for a future diagnostic routine application. SVMs demonstrated to be a powerful classifier for Raman data and due to that they were also investigated for infrared spectroscopic data. Since it was found that a single SVM was not capable of reliably predicting breast cancer pathology based on tissue calcifications measured by infrared micro-spectroscopy a SVM ensemble system was implemented. The resulting multi-class SVM ensemble predicted the pathology of the unseen test set with an accuracy of 88.9%, in comparison a single SVM assessed with the same unseen test set achieved 66.7% accuracy. In addition, the ensemble system was extended for analysing complete infrared maps obtained from breast tissue specimens. The resulting imaging method successfully detected and staged calcification in infrared maps. Furthermore, this imaging approach revealed new insights into the calcification process in malignant development, which was not previously well understood.Item Open Access A study of FT-IR spectroscopy for the identification and classifcation of haematological malignancies(Cranfield University, 2009-06) Babrah, Jaspreet; Stone, Nicholas; Bessant, ConradThe aim of the work presented in this thesis was to explore the use of FT-IR spectroscopy, as a complementary clinical tool for haematological laboratory analysis. FT-IR spectra were measured from air-dried and frozen cell lines derived from lymphoma, lymphoid, myeloid leukaemia and normal and chronic lymphocytic leukaemia blood samples. Multivariate statistical analysis was used to extract important spectral information with the greatest discriminative power. Principal component fed linear discriminant spectral models have been tested with leave one out cross validation procedures. A preliminary unfiltered classification model using 50 frozen and air-dried samples correctly classified 54% of 18556 spectra. The performance improved with the three cell line group datasets, with 71% of 19903 spectra correctly classified. Furthermore, the use of the frozen spectra improved the performance of the three cell line group classification model considerably. Findings showed that 73.3% of 9920 spectra were correctly classified in the frozen datasets, whereas in the air-dried only 41.5% of 9983 spectra are correctly classified. Optimisation of the spectral models by selection of principal components, application of Savitsky-Golay filters and selecting spectra using standard deviation and absorption filter tool was investigated. Using the first 25 significant PCs, a 0 th derivative Savitsky-Golay filter and the absorbance filter tool on the frozen five cell line spectral dataset were shown to be the optimal parameters for constructing a classification model. When tested with leave one batch out cross validation 90% of the spectra were correctly classified for the five cell line model. Blood component classification models tested with leave one batch out cross validation performed well. The whole blood model correctly classified 70% of 1736 spectra, measured on 22 samples. The plasma model correctly classified 80.6% of 331 spectra and the buffy coat model correctly classified 99.5% of 1438 spectra. This demonstrated that the buffy coat (containing white blood cells) holds the key biochemical information for discrimination between the pathology of the blood samples. Partial least squares analysis has been demonstrated as a method to support whole blood count tests for real time prediction of cellular constituents. These findings demonstrate the potential of FT- IR spectroscopy as a clinical tool although more work is needed if it is to be applied in clinical practice.Item Open Access Vibrational spectroscopy for the rapid and early diagnosis of leukaemias and lymphomas(Cranfield University, 2013-11) Jackson, Olivia; Stone, Nicholas; Bessant, Conrad; Rye, Adam; Lush, Richard; McCarthy, KeithThis thesis aimed to investigate vibrational spectroscopies for the identification of biochemical markers of leukaemias and lymphomas. In a preliminary study using the blood proteins albumin, fibrinogen and globulin, Drop Coating Deposition Raman Spectroscopy was explored and extended for use with Fourier Transform infrared spectroscopy for leukaemia blood sample analysis. Due to low sample volumes and minimal preparation required it was identified as a potential alternative to blood centrifugation to obtain the buffy coat for analysis. These studies identified that it was capable of detecting low levels of protein from small, highly concentrated droplets. Thus this method, alongside cytospin centrifugation, was used for the spectroscopic analysis of different blood fractions. Due to the low number of lymphoma samples obtained, only a feasibility study is outlined in this thesis. Samples were collected from leukaemia patients and healthy volunteers. Infrared and Raman spectra were measured of whole blood and buffy coat samples cytospun onto slides and whole blood and plasma pipetted by drop coating deposition. Multivariate statistical analysis was employed to extract key spectral differences between the pathologies and develop classification models for diagnosing chronic lymphoblastic leukaemia from previously treated and untreated patient groups. Principal component analysis followed by linear discriminant analysis was employed to identify the largest variances in the data and leave one sample out cross validation evaluated the performance of the spectral models measured on different blood components in diagnosing leukaemia. The buffy coat infrared model correctly classified 59% of the spectra, and blood droplet Raman 62%. The treated and untreated groups were then combined, which improved classification to 83% for buffy coat infrared and 71% for blood droplet Raman. These findings highlight the potential of drop coating deposition spectroscopy of whole blood for leukaemia diagnosis, although further work is required to achieve a clinically validated method.