Monday, June 23, 2014

PhD tales: (Part1) Most challenging part

Doing PhD is indeed one lifetime experience for every PhD candidate. But what is interesting is that every PhD tale has its both unique and comparable narratives. As it is notorious for, PhD comes with lot of challenges and hurdles. Before even trying to start answering the research question that you yourself propose, you need to answer and walk through a trail of many questions. First answer them and satisfy your soul. And get convinced. The very first and foremost question is to ask if you really want to do a PhD. Are you a PhD material? Can you devote your next 5 years studying? Can you make your PhD studies both your friends and family? Most importantly, does it align with your life’s goal??? And once you examine the depth of your intent and interest in doing a PhD, then comes rolling the never ending list of deciding the school, your advisor, thesis topic, and further on.
For me, the director of computer science in an ok not a great university, owning a bioinformatics lab called me and interviewed and showed interest in me. This meant I already had my advisor when I started my PhD. When I joined his lab, I was expected to take over the project of a former graduating PhD student in our lab. I heard it is very common in biology community to first take over projects from previous PhD students in your lab and then only think what you yourself want to do. For that, I had to fully understand his whole software written in C and develop its web application for its users to be able to access and run it on web on our servers. So my first year spent in understanding that software and building its webserver. It was indeed, a very useful learning experience. I developed the online tool and wrote a conference paper as its first author.
Having understood the software so well, I started taking up services where our collaborators wanted to use that tool. And on side I was also digging its various projects that it had been used for, in past. I found one of our collaborators projects and its result sitting in our computers for last two years. The results looked so promising and representing one unique feature of that tool, unnoticed till then. I polished that feature of that tool and further analyzed those results. Eventually after prolonged efforts of bringing together all the past collaborators on that project to provide me with more details, I finally wrote a paper and got it published. It is always advised to take care of the low hanging fruit first. It was hard to get all the authors back to their 2 years old project and review a long research paper with absolutely different angle than the project they used that tool for. Reminding busy professors and struggling new post-doc for reviewing the paper I wrote was a tough job by itself for naïve, immature PhD student.
And, only after my fairly successful 2 years came my very first real challenge to decide on my thesis topic. It was not just the specific thesis topic but also to first decide the broader field of bioinformatics applications, where I wanted to focus. There are various directions where you can streamline your phD studies. One is to focus on your algorithmic and computational skills and try to develop a tool or software that everyone in your community and research can use and benefit from. The other is to go towards the consumer side where you dwell upon the available tools and use it in your favor and come up with new findings.
To me it is more useful for the community to help improve upon an existing tool based on your needs rather than reinventing the whole wheel. Having PhD record, as one of the requirements of a PhD is a feature to show your potential and caliber. However, software designed for the sake of publication in mind or develop a software with goal of adding just a single feature to outperform or compete among zillions of other tools is injustice towards driving bioinformatics forward. The number of bioinformatics web applications and tools for example sequence aligners, genome assemblers or mappers, and many other bioinformatics software’s are many folds higher than the ones that are actually used.
Also, if I wanted to focus on building algorithm, next thing to do was either look for some other collaborator of our lab for a project that needs a better algorithmic design or find one myself. But for the second direction, I had to only decide on a particular disease that I’m really interested in and practically feasible to conduct research. And for that, Cancer was the very prompt and feasible answer with lots of available online dataset and unanswered questions.
However, knowing the expectations from not only my advisor but also the department and the peers for a graduate student to only conduct one top-notch research and answer questions never heard before causing cancer, I initially desired more to only investigate fundamental questions in biology. But only after I realize its importance and applicability in understanding the underlying biological processes leading to cancer. I do believe, from such understanding leading to cancer oriented changes, new treatments can arise. And I decided to start working working specifically on cancer.

Sunday, September 30, 2012

Full NGS

Even if you know all about sequencing technologies or to get more clear about it, this video would be very highly recommended. A long one but very nicely delivered by Elaine Mardis of the Genome Institute at Washington University. It is like 101 in how sequencing works.

You can always skip the last 20 minutes when she talk more about specific projects after technology.



Saturday, February 18, 2012

Fun to see the future of Bioinfomatics but need more analysis to take over technology development.

It was not very long back when I went to India to visit my family for December break and had to go to see doctor for my back and shoulder pain. I went to a physiotherapist and seeing the intensity of pain he immediately advised me to get some tests done. Got MRI for Back, Brain and x-rays and what not but that was good since I will not have to take random pain killers and will get accurate diagnosis. Next, my doctor pulled up all my radiology images and report on screen and actually explained me the reports knowing that I was also kind of aware of those biological terms and obviously biological field. And as he flipped through images and talking, what I was thinking was the future of personal genomics and bioinformatics.

One day when $1000 genome will come and companies like 23andme flourish, doctors will not only pull up physiological images but genomes or any other genomic data to show me the root of disease. Or may be the altered biological networks, protein modifications and show me the action of drugs he is prescribing, targeting which pathway or cycle or any particular protein. Compare my genomic or protein sequence in question with normal and their interaction or participation in particular pathways. Not that but when a baby will born, besides getting immunizations they should get genomes sequenced for medical records. And only then, a normal man will understand the importance of Bioinformatics and what your genome means. People will know to which drug they are sensitive to and which not. They will know, if they are suffering from anything then which part of their biological network or pathway is messed up and how it is connected to others. That is what I understand is "sequencing in clinical applications". And also might explain to my parents what I'm studying, the hardest thing on earth to explain them. An electrical engineer and a house wife will know what a "gene" means and how does it look like or composed of.

Advance in sequencing technology immensely impressive in last few yeas. And now with the latest techniques in most talked and tweeted AGBT conference is mind-blowing.  I don't see this future very far. Nanopore technology looks really promising and will definitely revolutionize this field if comes to market late this year (and get successful). I think, Bioinformatics has indeed taken pace with respect to technology and now needs in analysis. I expect more and more companies emerging in not only technology development and produce data deluge, but also in their analysis for most wanted clinical applications. There is still a need to know what each of 2100 genes do in our genome, how epigenetics difffer in each of us and how is it a characteristic of only me!

P.S. I would have really not liked if my doctor had given me pills without showing me the proof (of whatever he had), why he was giving me them.

Sunday, December 11, 2011

What is the main goal of doing bioinformatics?

Tons of machine learning models, algorithms and complicated methods are being developed in academic level of bioinformatics. Every phd student working in bioinformatics is expected to develop a novel method to do some task. Some prefer using SVM, some prefer Bayesian networks and some neural network. Some try to develop new kernels that could have various applications in biology, and so on. They try to prove higher and higher accuracy of their models over others. I want to know how many of these tons and million models and methods developed are being used by other people in bioinformatics community? Are they simple enough to be easily understood and communicated by biologists? And top of it no research team want to use what already have been developed. They prefer to hire their own local programmers to build their in-house tools and methods doing the same thing. And here is one big reason why many bioinformatics companies are not able to do good business in selling their software and tools.

What is important is first to understand “What is bioinformatics?”and “What is the main goal of doing bioinformatics?”. I believe that the study and approach in bioinformatics is “Problem-driven” instead of “Solution-driven”. One should try to prove the quality and novelty of findings instead of the efficiency or accuracy of method applied. To be good bioinformatician, I would first start with understanding the biological question, problem and its hypothesis and then look for methods which can be used. This approach definitely needs interdisciplinary training to bioinformatics scientist. On the contrary, the interdisciplinary trained scientist are good for solving problems, but they might not be very good at implementing solutions which require more than interdisciplinary approach or some particular expertise skill.

Indeed multidisciplinary team has become a trend over interdisciplinary integration. Most of the pharmaceutical and bioinformatics companies are recruiting computer scientist and biologist rather than recruiting interdisciplinary bioinformaticians. I suspect this to be the approach where communication gap or linguistic difference arises within two groups trying to solve the same problem. Although, we have already seen experts in both fields to turn into interdisciplinary to take this field further so far, I still believe there is some gap where interdisciplinary experts and training like mine can prove a big asset.

Saturday, December 10, 2011

How should a bioinformatician be trained?

This is one question which is troubling me since last 3 years. I am asking people in different seminars, reading blogs, articles and what not but still not clear. The question is, how should be a bioinformatician trained. How should be an ideal bioinformatics program be designed? Whether he/she be trained in the basic biology first or can a computer science expert having command on developing any software tool or performing computation-intensive work should specialize in life-science research. In present scenario, majority of leading bioinformatics experts are trained computer scientist and learn biology in due course of their research. And since bioinformatics is an inter/multidisciplinary field, some would say a bioinformatics training should have both components in its program together or one after other in any order. But the question is can one single person be trained in multiple disciplines and how well and quick can they learn.

I have raised this concern because it is personally related to me. I got exposed to bioinformatics, when an institute in India (my home country) came up with bioinformatics degree and they proposed that they will provide courses both in computer science and basic to advanced biology. This was very attractive to me. A new concept of learning, a new kind of specialization and something different from everyone else in India where everyone wanted to become either a doctor or an engineer. I took the challenge or I can also confess that I did not have guts or passion to wait another year or two to try getting into pure medicine plus I didn’t want to get in using sources. Well, I was in first batch we didn’t have any list of books assigned, no curriculum but just current topics of bioinformatics in syllabus. However, we had proper basic biology courses from Cell biology, Molecular biology, Genetics upto Virology and Immunology on biology side and basic programming courses in computation side with weaker computational aspect. Program was not very bad, however till the time I graduated after 4 years I was not at all clear what Bioinformatics means. That time approximately 8-10 years from now, some said it is database management to manage biological data while some said it is studying or modeling genes, proteins and cells using computer programs.

Finally, I decided to come to United States to explore more and got admitted to masters in computational biology. There, I started doing “mathematical modeling of malarial transmission dynamics of drug sensitive and resistant strains of malarial pathogen population under different treatment regimes”. Then I had to learn some of the mathematics and statistics background knowledge along with population genetics study which I didn’t know in my bachelors. This was interesting but then sometime in my second year, I saw a fasta file on my colleagues desktop and I got an urge to work on real genomic sequences and not modeling or simulation studies. For this I needed more skills so I joined PhD program in University of Missouri, Columbia. Here, I clearly stated that I wanted to work primarily on sequence data but nothing else. Now after starting PhD, I realized was lacking computer science expertise like machine learning methods, algorithm development, etc. and started learning those concepts to enhance my research here. So, again and again I am encountering the same question whether my training was accurate for bioinformatics work. May be I should have started with pure computer science training and gained expertise and then understand biology to apply that expertise to solve biological questions or problems or vice versa. Or, may be what I have is unique and try to be distinct!