Bioinformatics Voyage

Sunday, March 15, 2015

PCR Duplicates

PCR is a step in sequencing where you want a sequence fragment to sequence just once and just once. Duplicates need to get removed. Everyone knows that. I knew since the time I started working on NGS data. I tried to read about it many times. But only now I understood it fully from where do they come and how they are generated. This is a blog post where the author has described it a great way:

http://www.cureffi.org/2012/12/11/how-pcr-duplicates-arise-in-next-generation-sequencing/

Wednesday, July 30, 2014

PhD tales: (Part3) Genomics vs. Proteomics with NGS flying high

Here in this post, I am continuing from my last post where I mentioned how I ended up deciding on cancer study during my PhD. Cancer study was in itself a very broad field and was for sure not enough to be my thesis topic. This meant that I was yet not over with answering questions in my PhD voyage.

I came from a background with higher interest in biological sciences. Hence, I knew that the option for being on consumer side and using other people’s tool to study basic biology was an easier route for me. And, the only reason I had more interest in biological sciences was I think because of my longer journey with it. I started studying programming only in my undergrad, and now I realize that I have equal interests in both. So to say, you should not stick to something too early without trying out different options.

Then, to do bioinformatics’ analysis in Cancer, I asked myself if I want to focus on proteomics or genomics? Well, in order to decide on this, I had to not only consider my personal interests but also which one deserves to be studied first in my understanding and was easier in terms of resources in the current research community.

Among genomics and proteomics, few scientists argue that proteomics is a better predictor of disease than genomics. I knew that proteins are functional units of a cell. I was also convinced that a difference in protein content would definitely exert a functional difference at the level of a cell between a diseased and normal but a difference in DNA may not contribute greatly to cell function and metabolism. On top of that, protein binding and docking studies are interesting and directly applicable to drug discovery and treatment. Apart from proteomics, there is another level of lipidomics studying lipid composition of cells that allured me a lot. Scientists predicted it to be coming in near future. They show their role in compartmentalization, cycling of proteins, signaling, and possibly in pathogenesis.

Studying proteins or any higher level might be a better predictor of disease than genomics in future but we should not forget that that still genomics is the basic building block. Before understanding proteins and their network in human body, I believe we should understand our genetic makeup more deeply. There will always be levels of complexity or components to consider in cells and disease states. And, study of all -omics will be needed in order to fully understand but going upwards in the hierarchy of complexity would always be my way.

My reasoning fell just right at timing when next-generation sequencing (NGS) technology was blooming in full. It was less than a decade since the first next-gen sequencers hit the market and the technology had already transformed nearly every field of biological sciences. Last several years had seen revolutionary advances in DNA sequencing technologies with the advent of NGS techniques. NGS methods now allowed millions of bases to be sequenced in one round, at a fraction of the cost relative to traditional Sanger sequencing. As costs and capabilities of these technologies continued to improve, we were only beginning to see the possibilities of NGS platforms, which were developing in parallel with online availability of a wide range of biological data sets and scientific publications and allowing us to address a variety of questions not possible before.

I was trying to read more and scientific journal to see the latest research topics and the scientific literature was brimming with NGS-related studies. The pace of scientific discovery was largely driven by innovative applications and an astonishing rapid evolution of the technology.

I could easily see the new era in next-gen sequencing, one in which NGS technologies was not only being used for discovery, but already integrated into clinical care. Along with that, in my future job prospective, I could see a greater need for specialists who could make sense of the mountains of information in such a way that is meaningful for scientists and clinicians, and ultimately beneficial to customers and patients.

Hence, deciding on genomics and NGS was pretty simple for me at that time.

Also, its hype was one big reason for redirecting my focus from tool development to NGS analysis.

Monday, July 28, 2014

PhD tales: (Part 2) Research topic

Deciding on thesis topic for a bioinformatics PhD is one big challenge. There are few PhD students who work on their advisor's funded projects, for whom this is not a challenge. They just work on someone else's idea, what I think. But for others, it is a time to look for their own ideas and interests. Not just ideas but also resources like access to data which is most important in health-related field. Also, it is not just the specific thesis topic but also to decide the broader field of bioinformatics applications, where they want to focus.

There are various directions where you can streamline your phD studies. One is to focus on your algorithmic and computational skills and try to develop a tool or software that everyone in your community and research can use and benefit from. The other is to go towards the consumer side where you dwell upon the available tools and use it in your favor and come up with new biological findings.

To me it is more useful for the community to help improve upon an existing tool based on your needs rather than reinventing the whole wheel. Having a publication record, as one of the requirements of a PhD is a feature for you to show your potential and caliber. However, software designed for the sake of publication in mind or develop a software with goal of adding just a single feature to outperform or compete among zillions of other tools is injustice towards driving bioinformatics forward. The number of bioinformatics web applications and tools for example sequence aligners, genome assemblers or mappers, and many other bioinformatics software’s are many folds higher than the ones that are actually used.

Also, if I wanted to focus on building algorithm, next thing to do was either look for some other collaborator of our lab for a project that needs a better algorithmic design or find one myself. But for the second direction, I had to only decide on a particular disease that I’m really interested in and practically feasible to conduct research. And for that, Cancer was the very prompt and feasible answer with lots of available online dataset and unanswered questions.

However, there are expectations from not only my advisor but also the department and the peers for a graduate student to only conduct one top-notch research and answer questions never heard before causing cancer. Instead, I desired more to only investigate fundamental questions in biology. I do believe, that from understanding the underlying biological processes leading to cancer oriented changes, new treatments can arise. And I decided to start working working specifically on cancer.

Monday, June 23, 2014

PhD tales: (Part1) Most challenging part

Doing PhD is indeed one lifetime experience for every PhD candidate. But what is interesting is that every PhD tale has its both unique and comparable narratives. As it is notorious for, PhD comes with lot of challenges and hurdles. Before even trying to start answering the research question that you yourself propose, you need to answer and walk through a trail of many questions. First answer them and satisfy your soul. And get convinced. The very first and foremost question is to ask if you really want to do a PhD. Are you a PhD material? Can you devote your next 5 years studying? Can you make your PhD studies both your friends and family? Most importantly, does it align with your life’s goal??? And once you examine the depth of your intent and interest in doing a PhD, then comes rolling the never ending list of deciding the school, your advisor, thesis topic, and further on.

For me, the director of computer science from an ok not a great university, owning a bioinformatics lab called me and interviewed me. He then later offered me a PhD position which I joined. This meant, I already had my advisor when I started my PhD. When I joined his lab, I was expected to take over the project of a former graduating PhD student in our lab. I heard it is very common in biology community to first take over projects from previous PhD students in your lab and then only think what you yourself want to do. For that, I had to fully understand his whole software written in C and develop its web application. This would help its users to be able to access and run it on web on our servers. So, my first year was spent in understanding that software and building its web-server. It was indeed, a very useful learning experience. I developed the online tool and wrote a conference paper as first author.

Having understood the software so well, I started taking up services where our collaborators wanted to use that tool. And on side I was also digging its various projects that it had been used for, in past. I found one of our collaborators projects and its result sitting in our computers for last two years. The results looked so promising and representing one unique feature of that tool, unnoticed till then. I polished that feature of that tool and further analyzed those results. Eventually after prolonged efforts of bringing together all the past collaborators on that project to provide me with more details, I finally wrote a paper and got it published. It is always advised to take care of the low hanging fruit first. It was hard to get all the authors back to their 2 years old project and review a long research paper. That paper had absolutely different angle than for what they used that tool before. Reminding busy professors and struggling new post-docs for reviewing the paper I wrote was a tough job by itself for a naïve, immature PhD student.

And, then only after my fairly successful 2 years came my very first real challenge to decide on my thesis topic. The challenge that many of us think is to decide in first year or even before joining a PhD program.

Sunday, September 30, 2012

Full NGS

Even if you know all about sequencing technologies or to get more clear about it, this video would be very highly recommended. A long one but very nicely delivered by Elaine Mardis of the Genome Institute at Washington University. It is like 101 in how sequencing works.

You can always skip the last 20 minutes when she talk more about specific projects after technology.

Saturday, February 18, 2012

Fun to see the future of Bioinfomatics but need more analysis to take over technology development.

It was not very long back when I went to India to visit my family for December break and had to go to see doctor for my back and shoulder pain. I went to a physiotherapist and seeing the intensity of pain he immediately advised me to get some tests done. Got MRI for Back, Brain and x-rays and what not but that was good since I will not have to take random pain killers and will get accurate diagnosis. Next, my doctor pulled up all my radiology images and report on screen and actually explained me the reports knowing that I was also kind of aware of those biological terms and obviously biological field. And as he flipped through images and talking, what I was thinking was the future of personal genomics and bioinformatics.

One day when $1000 genome will come and companies like 23andme flourish, doctors will not only pull up physiological images but genomes or any other genomic data to show me the root of disease. Or may be the altered biological networks, protein modifications and show me the action of drugs he is prescribing, targeting which pathway or cycle or any particular protein. Compare my genomic or protein sequence in question with normal and their interaction or participation in particular pathways. Not that but when a baby will born, besides getting immunizations they should get genomes sequenced for medical records. And only then, a normal man will understand the importance of Bioinformatics and what your genome means. People will know to which drug they are sensitive to and which not. They will know, if they are suffering from anything then which part of their biological network or pathway is messed up and how it is connected to others. That is what I understand is "sequencing in clinical applications". And also might explain to my parents what I'm studying, the hardest thing on earth to explain them. An electrical engineer and a house wife will know what a "gene" means and how does it look like or composed of.

Advance in sequencing technology immensely impressive in last few yeas. And now with the latest techniques in most talked and tweeted AGBT conference is mind-blowing. I don't see this future very far. Nanopore technology looks really promising and will definitely revolutionize this field if comes to market late this year (and get successful). I think, Bioinformatics has indeed taken pace with respect to technology and now needs in analysis. I expect more and more companies emerging in not only technology development and produce data deluge, but also in their analysis for most wanted clinical applications. There is still a need to know what each of 2100 genes do in our genome, how epigenetics difffer in each of us and how is it a characteristic of only me!

P.S. I would have really not liked if my doctor had given me pills without showing me the proof (of whatever he had), why he was giving me them.

Sunday, December 11, 2011

What is the main goal of doing bioinformatics?

Tons of machine learning models, algorithms and complicated methods are being developed in academic level of bioinformatics. Every phd student working in bioinformatics is expected to develop a novel method to do some task. Some prefer using SVM, some prefer Bayesian networks and some neural network. Some try to develop new kernels that could have various applications in biology, and so on. They try to prove higher and higher accuracy of their models over others. I want to know how many of these tons and million models and methods developed are being used by other people in bioinformatics community? Are they simple enough to be easily understood and communicated by biologists? And top of it no research team want to use what already have been developed. They prefer to hire their own local programmers to build their in-house tools and methods doing the same thing. And here is one big reason why many bioinformatics companies are not able to do good business in selling their software and tools.

What is important is first to understand “What is bioinformatics?”and “What is the main goal of doing bioinformatics?”. I believe that the study and approach in bioinformatics is “Problem-driven” instead of “Solution-driven”. One should try to prove the quality and novelty of findings instead of the efficiency or accuracy of method applied. To be good bioinformatician, I would first start with understanding the biological question, problem and its hypothesis and then look for methods which can be used. This approach definitely needs interdisciplinary training to bioinformatics scientist. On the contrary, the interdisciplinary trained scientist are good for solving problems, but they might not be very good at implementing solutions which require more than interdisciplinary approach or some particular expertise skill.

Indeed multidisciplinary team has become a trend over interdisciplinary integration. Most of the pharmaceutical and bioinformatics companies are recruiting computer scientist and biologist rather than recruiting interdisciplinary bioinformaticians. I suspect this to be the approach where communication gap or linguistic difference arises within two groups trying to solve the same problem. Although, we have already seen experts in both fields to turn into interdisciplinary to take this field further so far, I still believe there is some gap where interdisciplinary experts and training like mine can prove a big asset.