Monday, January 8, 2018

Get a bioinformatics education online for free

This post contains affiliate links, meaning when you click a link and make a purchase, we receive a commission that helps support this site.

You can now get an entire bioinformatics education online for free. This is thanks to 1) the explosion in Massive Open Online Courses (MOOCs) and 2) academics making more and more of their work (especially books) open access, meaning they're available to anyone at no cost. MOOCs are a great option for students looking to supplement their education to make them more attractive to bioinformatics hiring managers. A biology major can use MOOCs to learn how to program and analyze data, while other STEM majors lacking biology credits can take MOOCs on molecular biology and next generation sequencing technologies. MOOCs are a great option for professionals looking for a career change or promotion since you can learn extra skills on your own time. MOOCs can even be an option for someone looking to get into a science as a hobby (bioinformatics is one of the few fields in the biomedical sciences where you can analyze the genetic code of cancer from home in your underwear). Courses start every few weeks, so there are ample opportunities to get started quickly if you're ready to get started. 

When it comes to online platforms, it's best to stick with ones partnered with or owned by a major university. Two of the most popular platforms are Coursera (Johns Hopkins) and edX (Harvard). Because these platforms are affiliated with top universities in the United States, it is not uncommon to have your course taught by a leading researcher in a particular topic. These platforms offer a range of bioinformatics-related courses from introductory biology and introductory programming to advanced concepts like deep learning and systems biology. Most of these courses have an audit option, which allows you to take them for free. However, the downside to auditing is that you might not be able to access certain course materials (Coursera), you won't be able to submit certain assignments or get grades for your work (Coursera), and you won't receive a certificate proving that you successfully completing the course (both Coursera and edX).

In addition to having standalone courses, Coursera and edX both feature paid specializations (edX calls them "XSeries"). Specializations are series of related courses designed to help you master a specific topic. On Coursera, many specializations build your knowledge on a topic and culminate in a final Capstone Project that you can put straight onto GitHub. If you read my guide on your first bioinformatics project, then you know the importance of having a project to showcase your talents to prospective employers. Completing a specialization earns you a Specialization Certificate, and these certificates should be used to enhance your resume/CV and to show potential employers that you are competent in a particular topic. Not only are they good CV padding, but paid specializations will give you a little skin in the game and make you more accountable to yourself (many who start free MOOCs never finish them).

There are literally thousands of courses on Coursera alone, so it can be hard to parse through them all to find the ones that are worth your time. Here, I give a handpicked list of courses that will give you the tools you need to get quickly up to speed in bioinformatics. This is by no means an exhaustive list since bioinformatics touches on so many different areas. Instead, I focus on core competencies and then suggest optional courses that you can take depending on your interests and the type of job you'd like to get. For my recommendations, I tend to favor specializations because they give you a more cohesive experience instead of feeling like a patchwork of disconnected information. I start off with introductory courses for building your knowledge of both biology and programming. If you are already comfortable with biology or have an undergraduate biology degree, then I would suggest focusing more on the programming and data analysis courses. It's better to know some biology and spend time perfecting your programming skills than it is to be an expert at biology who flounders at simple coding tasks. On the other hand, if you are a STEM major familiar with programming and data analysis, then it is probably worth your time learning biology to understand the context of the bioinformatics problems you will be working on. If you are coming from outside of STEM, then I really recommend going through all of the introductory courses. Getting a solid foundation is the basics is necessary for handling the advanced concepts to come. Next, I list intermediate courses. These are the bread and butter of bioinformatics and include a lot of the type of work you can expect as a bioinformatician. It's fair to say a lot of bioinformatics positions focus on gene expression pipelines and data analysis, so this is reflected in these intermediate course recommendations. Even if you aren't super familiar with more exotic types of data (I work almost exclusively with proteomics and metabolomics data), the education you receive in these intermediate courses will give you the ability to tackle most types of problems you come across. Then, I list advanced topics to further hone your skills. These courses tend to be on the more difficult side, but they are well worth the time investment to hone your skills. Finally, I end with a number of popular, freely available books. These make good companions to the courses or can even serve as good introductory texts if you prefer self-learning.

Introductory Courses

Biology

Essential Human Biology: Cells and Tissues
Perfect for someone with zero biology experience, this course will give you an introduction to the structure and function of human cells and tissues and lay a foundation for more advanced topics.

Introduction to Biology - The Secret of Life
Introductory level molecular biology course hosted by professor Eric Lander, one of the leaders of the Human Genome Project. The course content reflects the topics taught in the MIT introductory biology courses and many biology courses across the world.

DNA: Biology’s Genetic Code
Most bioinformatics projects (probably the vast majority) revolve around next generation sequencing technologies and genomics, so it's really important to get a solid foundation in this area. This course explores the basics of DNA structure, packaging, replication, and manipulation. 

Computer Science and Programming

Fundamentals of Computing Specialization (Option #1)
I list two computer science tracks here depending on your preferences. I love this specialization because of the emphasis on developing critical mathematical problem solving and algorithmic thinking. These skills are the bread and butter of a bioinformatician. 

The courses include:
  • An Introduction to Interactive Programming in Python (Part 1)
  • An Introduction to Interactive Programming in Python (Part 2)
  • Principles of Computing (Part 1) 
  • Principles of Computing (Part 2)
  • Algorithmic Thinking (Part 1)
  • Algorithmic Thinking (Part 2)
  • The Fundamentals of Computing Capstone Exam

Python for Everybody (Option #2)
This track will take you through 'fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language'. This track places a bit more emphasis on practical application, so it might be a good option if you find yourself struggling with the first track (which is a bit more mathy/technical). 

The courses include:
  • Programming for Everybody (Getting Started with Python)
  • Python Data Structures
  • Using Python to Access Web Data
  • Using Databases with Python
  • Capstone: Retrieving, Processing, and Visualizing Data with Python

Optional

Introduction to the Biology of Cancer
There are a lot of cancer-related bioinformatics jobs in big pharma and at academic centers, so knowing a thing or two about cancer can help you land a job. This optional course introduces the molecular biology of cancer (oncogenes and tumor suppressor genes) as well as the biologic hallmarks of cancer. 

Epigenetic Control of Gene Expression
This optional course introduces you to epigenetics, the study of heritable changes in gene function that do not involve changes in the DNA sequence. This is a more specialized course and might not be applicable to everyone, but you can run into trouble landing a job that involves epigenetics or epigenomics without some background in this area.

Introduction to Computer Science aka CS50x
Learning to program is important, but learning to think like a computer scientist is equally as important. This is a good skill to develop for a bioinformatician since you will be called upon to solve complex problems at the intersection of computer science and biology. CS50x is an immensely popular course taught in person at Harvard that has been adapted for the edX platform. 

Intermediate Courses

Bioinformatics

Bioinformatics Specialization (Option #1)
You've got three really great tracks to pick from for your core bioinformatics competencies. These three tracks cover roughly the same topics, so you should look in to each to see which one piques your interest and is right for you. This first specialization comes from the creators of rosalind.info (a free bioinformatics practice site). The first course in this track, "Finding Hidden Messages in DNA (Bioinformatics I)", is listed as a Top 50 MOOC of All Time, and there are even two print textbooks (highly recommended) that go along with the course: Bioinformatics Algorithms Volume I and Volume II

The courses include:
  • Finding Hidden Messages in DNA (Bioinformatics I)
  • Genome Sequencing (Bioinformatics II)
  • Comparing Genes, Proteins, and Genomes (Bioinformatics III)
  • Molecular Evolution (Bioinformatics IV)
  • Genomic Data Science and Clustering (Bioinformatics V)
  • Finding Mutations in DNA and Proteins (Bioinformatics VI)
  • Bioinformatics Capstone: Big Data in Biology (Bioinformatics VII)

Genomic Data Science Specialization (Option #2)
Taught by renowned data scientist and biostatistician (Jeff Leek, Johns Hopkins ), this specialization will give you the skills you need to understand, analyze, and interpret data from next generation sequencing experiments. Features hands-on exercises with the command line, Python, R, Bioconductor, and Galaxy. 

The courses include:
  • Introduction to Genomic Technologies
  • Genomic Data Science with Galaxy
  • Python for Genomic Data Science
  • Algorithms for DNA Sequencing
  • Command Line Tools for Genomic Data Science
  • Bioconductor for Genomic Data Science
  • Statistics for Genomic Data Science
  • Genomic Data Science Capstone

Data Analysis for Life Sciences and Genomics Data Analysis (Option #3)
This track consists of two complementary XSeries. Both are taught by another renowned data scientist and biostatistician (Rafael Irizarry, Harvard). Data Analysis for Life Sciences '...is perfect for anyone in the life sciences who wants to learn how to analyze data. Problem sets will require coding in the R language to ensure learners fully grasp and master key concepts.' The second part of the track, Genomics Data Analysis, '...is an advanced series that will enable students to analyze and interpret data generated by modern genomics technology... is perfect for those who seek advanced training in high-throughput technology data.'

The courses include:
  • Statistics and R
  • Introduction to Linear Models and Matrix Algebra
  • Statistical Inference and Modeling for High-throughput Experiments
  • High-Dimensional Data Analysis
  • Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays
  • High-performance Computing for Reproducible Genomics
  • Case Studies in Functional Genomics

Other Intermediate Courses

Data Structures and Algorithms Specialization
Knowing the right algorithm to use can mean the difference between a job that takes an hour or a week to run. This specialization has a combination of theory and practice, and you will implement nearly 100 algorithmic coding problems in your language of choice (instead of just taking a multiple choice quiz like many MOOCs). This specialization also features two big, real-world projects: Big Networks and Genome Assembly. The first involves analyzing different networks (e.g roads, social) and finding the shortest path. The second will cover assembly algorithms and assembling genomes from millions of short fragments of DNA.
  
The courses include:
  • Algorithmic Toolbox
  • Data Structures
  • Algorithms on Graphs
  • Algorithms on Strings
  • Advanced Algorithms and Complexity
  • Genome Assembly Programming Challenge

Optional

Big Data Specialization
As a bioinformatician, working with tens of thousands of sequenced genomes or millions of radiology images (e.g. CAT scans) means you'll need to know a thing or two about big data. This specialization will give you hands-on experience with the tools and systems you need. These courses take you through the basics of using Hadoop with MapReduce, Spark, Pig and Hive, and you will be shown how to ask the right questions about data, how to communicate like a data scientist, and how to perform exploration of large, complex datasets. 

The courses include:
  • Introduction to Big Data
  • Big Data Modeling and Management Systems
  • Big Data Integration and Processing
  • Machine Learning With Big Data
  • Graph Analytics for Big Data
  • Big Data - Capstone Project

Mathematical Biostatistics Boot Camp 1 & Mathematical Biostatistics Boot Camp 2
These courses will help if you are struggling with some of the statistical concepts you encounter in your training as a bioinformatician. They introduce 1) the fundamental probability and statistical concepts used in elementary data analysis and 2) fundamental concepts in data analysis and statistical inference. These courses are appropriate for '...undergraduate students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.'


Advanced Courses

Machine Learning and Deep Learning

Machine Learning (Option # 1)
This first option is actually a standalone course (in lieu of a specialization) because I like it so much. It is taught by one of the world's foremost experts in machine learning (Andrew Ng, Baidu Research/Stanford University). The fact that this course is available for anyone to take is mind blowing. I highly recommend it. You will be given an introduction to machine learning, data mining, and statistical pattern recognition. You'll learn both supervised and unsupervised methods as well as best practices in machine learning. The course features numerous case studies and applications, so you'll be getting a taste of everything from '...building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.'

Machine Learning Specialization (Option # 2)
If you want to spend a bit more time getting familiar with machine learning, then this four course specialization in machine learning from the University of Washington is for you. 'Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.'

The courses include:
  • Machine Learning Foundations: A Case Study Approach
  • Machine Learning: Regression
  • Machine Learning: Classification
  • Machine Learning: Clustering & Retrieval

Deep Learning Specialization
An entire Deep Learning specialization by Andrew Ng? Yes please! This specialization will teach you the foundations of deep learning, how to build neural networks, and how to run a successful deep learning project. They cover everything from convolutional networks and recurrent neural networks (RNN) to long short term memory (LSTM), Adam, Dropout, BatchNorm, Xavier/He initialization, and more. These courses teach you both the theory and how deep learning is applied in industry.  This includes several case studies in healthcare, autonomous driving, sign language reading, music generation, and natural language processing. This course is taught in Python and in TensorFlow.

The courses include:
  • Neural Networks and Deep Learning
  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
  • Structuring Machine Learning Projects
  • Convolutional Neural Networks
  • Sequence Models

Other Advanced Courses

Systems Biology and Biotechnology Specialization
Many bioinformaticians work in systems biology, '...[a] field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.' This broad field utilizes a whole slew of different methodologies, so this specialization introduces you to topics like dynamical modeling, network and statistical modeling, "omics" technologies (e.g. genomics, proteomics), and single cell research technologies. Upon completion, you'll know how to combine experimental, computational, and mathematical methods to answer questions in a variety of biomedical fields. 

The courses include:
  • Introduction to Systems Biology
  • Experimental Methods in Systems Biology
  • Network Analysis in Systems Biology
  • Dynamical Modeling Methods for Systems Biology
  • Integrated Analysis in Systems Biology
  • Systems Biology and Biotechnology Capstone

Recommended Free Books

Programming

Advanced R - Free Online Version - Buy Print Version 
R programming book by data scientist Hadley Wickham (credentials include Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University)

Think Python - Free Online Version (1st Edition) - Buy Print Version (2nd Edition) 
'Think Python is an introduction to Python programming for beginners. It starts with basic concepts of programming, and is carefully designed to define all terms when they are first used and to develop each new concept in a logical progression. Larger pieces, like recursion and object-oriented programming are divided into a sequence of smaller steps and introduced over the course of several chapters.'

Statistics and Data Science

An Introduction to Statistical Learning with Applications in R - Free Online Version - Buy Print Version 
'This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.'

Think Stats - Free Online Version - Buy Print Version 
'Think Stats is an introduction to Probability and Statistics for Python programmers. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets. If you have basic skills in Python, you can use them to learn concepts in probability and statistics. Think Stats is based on a Python library for probability distributions (PMFs and CDFs). Many of the exercises use short programs to run experiments and help readers develop understanding.'

Exploratory Data Analysis with R - Free Online Version - Buy Print Version 
'This book teaches you to use R to effectively visualize and explore complex datasets. Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies.'

The Elements of Statistical Learning (2nd edition) - Free Online Version - Buy Print Version 
'During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.'

Machine Learning

Understanding Machine Learning - Free Online Version  - Buy Print Version 
'Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.'

Biology

Molecular Biology of the Cell - Searchable Online Version (4th Edition) - Buy Print Version (6th Edition)
'Molecular Biology of the Cell is the classic in-depth text reference in cell biology. By extracting fundamental concepts and meaning from this enormous and ever-growing field, the authors tell the story of cell biology, and create a coherent framework through which non-expert readers may approach the subject. Written in clear and concise language, and illustrated with original drawings, the book is enjoyable to read, and provides a sense of the excitement of modern biology. Molecular Biology of the Cell not only sets forth the current understanding of cell biology (updated as of Fall 2001), but also explores the intriguing implications and possibilities of that which remains unknown.'

Molecular Cell Biology - Searchable Online Version (4th Edition)  - Buy Print Version (8th Edition)
'Modern biology is rooted in an understanding of the molecules within cells and of the interactions between cells that allow construction of multicellular organisms. The more we learn about the structure, function, and development of different organisms, the more we recognize that all life processes exhibit remarkable similarities. Molecular Cell Biology concentrates on the macromolecules and reactions studied by biochemists, the processes described by cell biologists, and the gene control pathways identified by molecular biologists and geneticists. In this millennium, two gathering forces will reshape molecular cell biology: genomics, the complete DNA sequence of many organisms, and proteomics, a knowledge of all the possible shapes and functions that proteins employ.'

Misc.

How to be a modern scientist - Free Online Version 
'A book about how to be a scientist the modern, open-source way.'

Popular Posts