Sunday, July 30, 2017

A day in the life of a postdoctoral fellow at a cancer center

I'm a postdoctoral fellow in bioinformatics/computational biology at a major cancer center in the United States. I walk into lab at the cancer center around 9am, boot up my MacBook Pro, and check my email. The biologists in our group are also just wandering in, and our early rising technician already has an experiment running. Being embedded in a lab as a computational person has it's advantages including immediate feedback for questions about an experimental design or results. The main downside is that my thought process and thus coding will get interrupted by the occasional question or request for consultation. However, I much rather be seated at my window in the lab where I get to see then sun instead of stuck somewhere in a cube farm.

Scanning through my email, I see that the boss has us scheduled to meet with some medical doctors later in the week. Our lab has helped pioneer a new method for looking at multiple post-translational modifications of proteins simultaneously. These docs are in the cancer center's bone marrow transplant department, and they are interested in what role post translational modifications play in graft versus host disease, an awful condition wherein newly transplanted cells attack the host's body. It turns out that I'll be in charge of coming up with an analysis strategy for making sense of all of these modifications. Processing the data won't be a problem because I already have individual pipelines established for analyzing post translational modifications, so I can spend more of my time figuring out exactly what question we're trying to ask and the method I'll use or implement to see the project through. 

Email done, I SSH into our center's HPC cluster to check the status of a job. I've developed a new, network-based method for integrating multi-omic data (aka genomics, proteomics, metabolomics). To determine which networks are significant, the method relies on generating millions of random permutations of the existing data, but as you can imagine, this is computationally intensive which necessitates utilization of the HPC. My job successfully completed so I make a note to have an undergraduate intern that works with me check the results. Any tweaks that need to be made to the output will be pushed to our private github repository for me to review later.

I then spend the rest of the morning analyzing some data generated in the lab by one of the biologists. Most of my established pipelines written in R process the data rather quickly which gives me additional time to try to find biological meaning in the results. Lunchtime! 

I wander over to the cancer center's cafeteria and grab a veggie burger and some fries. I sit down and thumb through a feed reader app on my phone. I use it to stay up to date on the latest peer-reviewed scientific journal articles as well as some relevant blogs (e.g., science commentary, bioinformatics, data science, programming). The cafeteria is not segregated; research staff, medical doctors, and patients all co-mingle in the dining area. I promise that if you're feeling bummed out about your day because the traffic was bad or Starbucks got your order wrong, then seeing a bald patient in the cafeteria (who is obviously undergoing chemo) will really put things in perspective for you.

Lunch done, I wander back to lab for some more data analysis. Later in the afternoon I go to a joint group meeting involving biologists, chemists, fellow bioinformaticians/computational biologists, and the occasional statistician. Our center is fairly forward thinking, and we routinely have interdisciplinary meetings to go over progress on various projects. I enjoy these meetings because experts from many fields are represented and they each give a unique perspective on the work you're doing. One bonus of these meetings is that a problem with experimental design or feasibility of a project will be identified early before time/money can be wasted.

It's getting late before the meeting finally adjourns. I get back to my desk and check the calendar for the rest of the week. We've got a grant deadline in about a month and I've been tasked with doing some preliminary research for a particular section. I make a note on my to-do list for tomorrow using Evernote to make sure I don't forget this. Later in the week I'll need to write up some results for a first author paper that I'm planning to submit to a peer-reviewed journal. I make another note in my to-do list for this.

I'm fortunate to work for a boss that values work-life balance, so I typically work about 40 hours a week. The occasional grant or other deadline might push these hours up a bit, but I'm not slaving away for 60 plus hours. The nice part about being a computational researcher is that I can pick up my work from home if needed. Most of my files containing non-sensitive information is synced with dropbox while my scripts are all up on my private github repository, meaning I can close my laptop at work, drive home, and then open my home laptop and keep working. I leave work fairly happy. Boredom at my job is rare, and I occasionally drive home with a smile on my face because I really feel like I'm doing my part in the fight against cancer. I knew since early in my college days that I wanted to pursue a PhD doing bioinformatics-related cancer research, and it took a lot of hard work and perseverance but I feel like all of my time at school is finally paying off.

Saturday, July 29, 2017

Science Magazine special issue on AI and machine learning

It's very fitting that, as I worked to get this site up and running this month, Science released a special issue on artificial intelligence and machine learning entitled 'The cyberscientist'. I think it's wonderful that the computational sciences (especially bioinformatics) are now part of the mainstream scientific discourse, and I think this is evidence that more and more positions are going to need to be filled by candidates with training in this space.

From the issue:
Big data has met its match. In field after field, the ability to collect data has exploded, overwhelming human insight and analysis. But the computing advances that helped deliver the data have also conjured powerful new tools for making sense of it all. In a revolution that extends across much of science, researchers are unleashing artificial intelligence (AI), often in the form of artificial neural networks, on these mountains of data. Unlike earlier attempts at AI, such “deep learning” systems don’t need to be programmed with a human expert’s knowledge. Instead, they learn on their own, often from large training data sets, until they can see patterns and spot anomalies in data sets far larger and messier than human beings can cope with.
Wait a second, that's a lot of buzzwords for someone not familiar with the field. What's the difference between artificial intelligence, machine learning, and deep learning? In short, artificial intelligence is just human intelligence exhibited by a machine. Machine learning uses algorithms to parse data, learns from it, and then makes some sort of prediction. Deep learning is a type of machine learning uses neural networks with multiple hidden layers. 

Before you go down the rabbit hole of neural networks and deep learning, you should ask yourself why you should care as a bioinformatician. Back to the issue in Science for a great example:
For geneticists, autism is a vexing challenge. Inheritance patterns suggest it has a strong genetic component. But variants in scores of genes known to play some role in autism can explain only about 20% of all cases. Finding other variants that might contribute requires looking for clues in data on the 25,000 other human genes and their surrounding DNA—an overwhelming task for human investigators. So computational biologists have enlisted the tools of artificial intelligence (AI), which can ask a trillion questions where scientists can ask only 10. First, these researchers combined hundreds of genomics data sets and used machine learning build a map of gene interactions. They compared those of the few well-established autism risk genes with those of thousands of other unknown genes and last year flagged another 2500 genes likely to be involved in this disorder. Now they have developed a deep learning tool to find non-coding DNA that may also play a role in autism and other diseases.
Machine learning is an advanced but indispensable tool for the bioinformatician's toolbox. It just so happens that Andew Ng, a renowned world expert of machine learning, runs an online Coursera course on Machine Learning. The course runs continuously, so if you missed the enrollment date for this session you can sign up again in a few weeks. If you're interested in self-teaching, I highly recommend the introductory text An Introduction to Statistical Learning: with Applications in R.

Popular Posts