tag:blogger.com,1999:blog-57203123549090013322024-02-06T21:16:46.010-05:00Bioinformatics Career GuideAdvice, resources, and how-tos for getting started or switching into a career in bioinformatics.Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-5720312354909001332.post-58541682262423149982020-07-15T11:14:00.000-04:002020-07-15T11:14:14.148-04:00What degree do I need for a career in bioinformatics?<div style="-en-clipboard: true;">
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The type of bioinformatics degree you pursue should be influenced by the type of job that you want. Go to </span><a href="http://indeed.com/" style="font-family: "helvetica neue", arial, helvetica, sans-serif;">indeed.com</a><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">, and take a look at some of the positions you may be interested in (dream big!). If they all require at least a master's, then you better get serious about graduate school. In general, </span></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">getting a graduate degree in bioinformatics can make you more competitive for bioinformatics jobs. Having a graduate degree can be a tie-breaker in your favor if you are competing against someone with otherwise identical qualifications for a position, and a graduate degree will qualify you for more positions since many positions require at least master’s if not a PhD.</span></div>
<div style="-en-clipboard: true;">
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Bachelor's</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">A bachelor’s degree is traditionally a four year degree in the United States. It will land you a job that, day to day, resembles a traditional software engineering job. Web interfaces, data visualizations, dashboards, databases, and maybe even some pipetting will be your bread and butter. If your job description mentions research, then you can expect to spend a lot of time coding other people's algorithms. If this is the type of position you want and/or are already qualified for (especially fresh out of college with a computer science degree), then the time that would be otherwise devoted to graduate school might be better spent on internships and getting work experience. Still, a master's will not hurt and might help job prospects (and starting salary) if you are on the fence. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Master’s degree</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">A master's degree is a graduate degree that you can complete after receiving a bachelor's degree. Most master's degrees, including ones in bioinformatics, will take you two years to complete. Master's degrees are coursework-based, but some master’s degrees can include a master’s thesis option where you work on a novel project or research project of your own. Graduate certificate programs are abridged master’s programs that do not award you with a master’s upon completion. Certificate programs are generally easier to get into, are cheaper, and may be an option if you are having trouble getting into a traditional master's program. Master's programs are easier to get into compared to a PhD program because students are typically expected to pay their own way for the master's. However, the investment can be worth it since <a href="https://cew.georgetown.edu/cew-reports/valueofcollegemajors/" target="_blank">having a master’s degree pays better than an undergraduate degree</a> and bioinformatics is no exception. Completion of a master’s degree will help you land more interesting jobs, gains you more independence, and allows you to work on more advanced projects. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">A master's in bioinformatics can help you expand your biological knowledge if you have a computational background, and it will certainly help your computational skills if you have a background in a biomedical field. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you are interested in developing novel algorithms, then you really need to get at least a master's and probably consider a PhD. There are programs that offer master's degrees in bioinformatics completely online. This might be an option if you are having trouble getting into a traditional program, but beware that you will lose out on critical in-person networking that can help you get a job later [</span><i style="font-family: "helvetica neue", arial, helvetica, sans-serif;">note: this was written before the current global pandemic; online courses are quickly becoming the norm for at least the short-term</i><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">].</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Doctoral Degree (Doctor of Philosophy aka PhD)</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">A PhD is a research intensive degree, and a PhD in the computational sciences, like bioinformatics, will take at least four years to complete. It is not unheard of for a PhD to take longer since progress depends on you and your project. The first two years of a PhD generally involve coursework or a combination of coursework and novel research under a faculty advisor/mentor (also referred to as a principal investigator or PI). The remaining years you are expected to be a productive researcher that works on and publishes novel research, helps your PI write grant proposals, and attends scientific conferences to showcase your work. Your pièce de résistance will be your dissertation, a hundred-plus page document that where you compile your research into a coherent story (or several sub-stories with similar themes). In addition to writing your dissertation, you must orally defend it in front of a committee PhD-wielding professors that you and your PI have picked. Your dissertation work often gets turned into one or many peer reviewed publications before or after your graduation (depending on your university and your circumstances). You are normally required to publish one or more first author papers in peer reviewed journals before you are allowed to graduate.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">One huge benefit to choosing a PhD over a master's is a teaching assistantship, which essentially lets you go to graduate school for free. A teaching assistantship provides a full time PhD student with a stipend (a salary) in the $20,000+ range with a tuition waiver (reduced or free tuition) in exchange for helping to teach undergraduate courses. This often ends up being in the realm of 20 hours a week of work grading papers, attending office hours, teaching labs, and even lecturing courses. If you play your cards right and are very, very frugal with your stipend, then you can come out of graduate school debt-free. PhD students usually get priority over master's students for teaching assistantships (in some programs, master's students are not eligible for teaching assistantships), so this is something to consider if you are on the fence between a master's and a PhD. Many PhD programs give their students students assistantships by default, but others may require a separate application. PhD students are also eligible for research assistantships. This is along the same idea of a teaching assistantship except instead of being paid to teach, you are getting paid from your PI's grant (or your own grant/fellowship) to do research.</span></div>
<div><br /></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">A PhD in bioinformatics can open the most doors for you. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Many positions require graduate degrees, and positions that do not will very likely credit your education years towards years of experience.</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you are interested in academia, then a PhD is mandatory for trying to become a faculty/professor at a university. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you are looking to one day lead a team of researchers or be a director of a program, then this definitely requires a PhD. If you want to start your own bioinformatics company, then you had better get a PhD so investors will take you seriously. Bioinformatics has A LOT of PhDs in the field, so it could be challenging to get taken seriously as a founder if you are not part of the PhD club. </span></div><div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>What program should I choose?</b></span><br /><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Your field of study matters but it also does not matter. A graduate bioinformatics degree can cover enough of biological and computational topics to make you a fairly well rounded scientist. However, many of the hard sciences (especially </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">computer science, statistics, and mathematics) </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">can give you the computational foundation needed for a career in bioinformatics. Graduate work in seemingly unrelated areas can have direct applications to bioinformatics. Electrical engineering comes to mind as a field that you might not think of but is leading the way in machine learning and artificial intelligence research. Keep in mind all programs are not created equal. Although a bioinformatics degree might seem like the logical choice for a career in bioinformatics, if you have the opportunity to get a hard science degree from a better program/university, especially if there are groups there doing bioinformatics research, then you should definitely consider this as an alternative. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Note that most job postings do not require "x degree in bioinformatics" but rather say "x degree in bioinformatics, computer science, statistics, or related field".</span></div><div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Closing thoughts</b></span><br /><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">I personally recommend at least a master's degree for people who ask me for career advice, mostly due to how broad the field is. However, you can be a very successful bioinformatician regardless of the level of your degree. There are bachelor's and master's holding bioinformaticians out there that can go toe-to-to with a PhD-holding bioinformatician any day, but this very much depends on the person, their education/background, and their drive. </span><br /></div><div><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div><div><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Cannot afford school or do not have the time to go back? Check out my <a href="http://www.bioinformaticscareerguide.com/2018/01/get-bioinformatics-education-online-for.html" target="_blank">guide for getting a bioinformatics education online for free</a>.</span></div>
Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-41932114638096368052018-01-08T11:45:00.002-05:002020-07-23T10:25:19.384-04:00Get a bioinformatics education online for free<div><span><font face=""><i>This post contains affiliate links, meaning when you click a link and make a purchase, we receive a commission that helps support this site.</i></font></span></div><div><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">You can now get an entire bioinformatics education online for free. This is thanks to 1) the explosion in Massive Open Online Courses (MOOCs) and 2) academics making more and more of their work (especially books) open access, meaning they're available to anyone at no cost. MOOCs are a great option for students looking to supplement their education to make them more attractive to bioinformatics hiring managers. A biology major can use MOOCs to learn how to program and analyze data, while other STEM majors lacking biology credits can take MOOCs on molecular biology and next generation sequencing technologies. MOOCs are a great option for professionals looking for a career change or promotion since you can learn extra skills on your own time. MOOCs can even be an option for someone looking to get into a science as a hobby (bioinformatics is one of the few fields in the biomedical sciences where you can analyze the genetic code of cancer from home in your underwear). Courses start every few weeks, so there are ample opportunities to get started quickly if you're ready to get started. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">When it comes to online platforms, it's best to stick with ones partnered with or owned by a major university. Two of the most popular platforms are <a href="https://click.linksynergy.com/fs-bin/click?id=4YJjzg8urMo&offerid=467035.207&type=3&subid=0%20(Johns%20Hopkins)%20and%20edX%20https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2F" target="_blank">Coursera</a> (Johns Hopkins) and <a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2F" target="_blank">edX</a> (Harvard). Because these platforms are affiliated with top universities in the United States, it is not uncommon to have your course taught by a leading researcher in a particular topic. These platforms offer a range of bioinformatics-related courses from introductory biology and introductory programming to advanced concepts like deep learning and systems biology. Most of these courses have an audit option, which allows you to take them for free. However, the downside to auditing is that you might not be able to access certain course materials (Coursera), you won't be able to submit certain assignments or get grades for your work (Coursera), and you won't receive a certificate proving that you successfully completing the course (both Coursera and edX).</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">In addition to having standalone courses, Coursera and edX both feature paid specializations (edX calls them "<a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fxseries" target="_blank">XSeries</a>"). Specializations are series of related courses designed to help you master a specific topic. On Coursera, many specializations build your knowledge on a topic and culminate in a final Capstone Project that you can put straight onto GitHub. If you read <a href="http://www.bioinformaticscareerguide.com/2017/08/your-first-bioinformatics-project.html" target="_blank">my guide on your first bioinformatics project</a>, then you know the importance of having a project to showcase your talents to prospective employers. Completing a specialization earns you a Specialization Certificate, and these certificates should be used to enhance your resume/CV and to show potential employers that you are competent in a particular topic. Not only are they good CV padding, but paid specializations will give you a little skin in the game and make you more accountable to yourself (many who start free MOOCs never finish them).</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">There are literally thousands of courses on Coursera alone, so it can be hard to parse through them all to find the ones that are worth your time. Here, I give a handpicked list of courses that will give you the tools you need to get quickly up to speed in bioinformatics. This is by no means an exhaustive list since bioinformatics touches on so many different areas. Instead, I focus on core competencies and then suggest optional courses that you can take depending on your interests and the type of job you'd like to get. For my recommendations, I tend to favor specializations because they give you a more cohesive experience instead of feeling like a patchwork of disconnected information. I start off with introductory courses for building your knowledge of both biology and programming. If you are already comfortable with biology or have an undergraduate biology degree, then I would suggest focusing more on the programming and data analysis courses. It's better to know some biology and spend time perfecting your programming skills than it is to be an expert at biology who flounders at simple coding tasks. On the other hand, if you are a STEM major familiar with programming and data analysis, then it is probably worth your time learning biology to understand the context of the bioinformatics problems you will be working on. If you are coming from outside of STEM, then I really recommend going through all of the introductory courses. Getting a solid foundation is the basics is necessary for handling the advanced concepts to come. Next, I list intermediate courses. These are the bread and butter of bioinformatics and include a lot of the type of work you can expect as a bioinformatician. It's fair to say a lot of bioinformatics positions focus on gene expression pipelines and data analysis, so this is reflected in these intermediate course recommendations. Even if you aren't super familiar with more exotic types of data (I work almost exclusively with proteomics and metabolomics data), the education you receive in these intermediate courses will give you the ability to tackle most types of problems you come across. Then, I list advanced topics to further hone your skills. These courses tend to be on the more difficult side, but they are well worth the time investment to hone your skills. Finally, I end with a number of popular, freely available books. These make good companions to the courses or can even serve as good introductory texts if you prefer self-learning.</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif; font-size: large;"><b>Introductory Courses</b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Biology</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fcourse%2Fessential-human-biology-cells-tissues-adelaidex-humbio101x-1" target="_blank"><b>Essential Human Biology: Cells and Tissues</b></a></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Perfect for someone with zero biology experience, this course will give you an introduction to the structure and function of human cells and tissues and lay a foundation for more advanced topics.</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fcourse%2Fintroduction-biology-secret-life-mitx-7-00x-6" target="_blank"><b>Introduction to Biology - The Secret of Life</b></a></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introductory level molecular biology course hosted by professor Eric Lander, one of the leaders of the <a href="https://www.genome.gov/10001772/all-about-the--human-genome-project-hgp/" target="_blank">Human Genome Project</a>. The course content reflects the topics taught in the MIT introductory biology courses and many biology courses across the world.</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fcourse%2Fdna-biologys-genetic-code-ricex-bioc300-2x-1" target="_blank">DNA: Biology’s Genetic Code</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Most bioinformatics projects (probably the vast majority) revolve around next generation sequencing technologies and <a href="https://en.wikipedia.org/wiki/Genomics" target="_blank">genomics</a>, so it's really important to get a solid foundation in this area. This course explores the basics of DNA structure, packaging, replication, and manipulation. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Computer Science and Programming</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><br /></b></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1921197134&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fcomputer-fundamentals" target="_blank">Fundamentals of Computing Specialization</a> (</b></span><b style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Option #1)</b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">I list two computer science tracks here depending on your preferences. I love this specialization because of the emphasis on developing critical mathematical problem solving and algorithmic thinking. These skills are the bread and butter of a bioinformatician. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">An Introduction to Interactive Programming in Python (Part 1)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">An Introduction to Interactive Programming in Python (Part 2)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Principles of Computing (Part 1) </span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Principles of Computing (Part 2)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithmic Thinking (Part 1)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithmic Thinking (Part 2)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The Fundamentals of Computing Capstone Exam</span></li>
</ul>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1560499266&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fpython" target="_blank">Python for Everybody</a> (</b></span><b style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Option #2)</b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">This track will take you through 'fundamental programming concepts including data structures, networked application program interfaces, and databases, using the Python programming language'. This track places a bit more emphasis on practical application, so it might be a good option if you find yourself struggling with the first track (which is a bit more mathy/technical). </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Programming for Everybody (Getting Started with Python)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python Data Structures</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Using Python to Access Web Data</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Using Databases with Python</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Capstone: Retrieving, Processing, and Visualizing Data with Python</span></li>
</ul>
<b><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u><br /></u></span></b>
<b><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u>Optional</u></span></b><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.9931479646&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fcancer" target="_blank">Introduction to the Biology of Cancer</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">There are a lot of cancer-related bioinformatics jobs in big pharma and at academic centers, so knowing a thing or two about cancer can help you land a job. This optional course introduces the molecular biology of cancer (oncogenes and tumor suppressor genes) as well as the biologic hallmarks of cancer. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.10188598344&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fepigenetics" target="_blank">Epigenetic Control of Gene Expression</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">This optional course introduces you to <a href="https://en.wikipedia.org/wiki/Epigenetics" target="_blank">epigenetics</a>, the study of heritable changes in gene function that do not involve changes in the DNA sequence. This is a more specialized course and might not be applicable to everyone, but you can run into trouble landing a job that involves epigenetics or epigenomics without some background in this area.</span><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fcourse%2Fintroduction-computer-science-harvardx-cs50x" target="_blank">Introduction to Computer Science aka CS50x</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Learning to program is important, but learning to think like a computer scientist is equally as important. This is a good skill to develop for a bioinformatician since you will be called upon to solve complex problems at the intersection of computer science and biology. CS50x is an <a href="http://www.thecrimson.com/article/2014/9/18/this-is-cs50/" target="_blank">immensely popular course</a> taught in person at Harvard that has been adapted for the edX platform. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif; font-size: large;"><b>Intermediate Courses</b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Bioinformatics</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u><br /></u></b></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.2812652887&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fbioinformatics" target="_blank">Bioinformatics Specialization</a> (Option #1)</b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">You've got three really great tracks to pick from for your core bioinformatics competencies. These three tracks cover roughly the same topics, so you should look in to each to see which one piques your interest and is right for you. This first specialization comes from the creators of <a href="http://rosalind.info/">rosalind.info</a> (a free bioinformatics practice site). The first course in this track, "Finding Hidden Messages in DNA (Bioinformatics I)", is listed as a Top 50 MOOC of All Time, and there are even two print textbooks (highly recommended) that go along with the course: <a href="https://www.amazon.com/gp/product/0990374610/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0990374610&linkId=bbe1e79e2fb3809527beebc1de1f68b9" target="_blank">Bioinformatics Algorithms Volume I</a> and <a href="https://www.amazon.com/gp/product/0990374629/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0990374629&linkId=4f733f52a8c3cf08c2779c562e47497e" target="_blank">Volume II</a>. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Finding Hidden Messages in DNA (Bioinformatics I)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Genome Sequencing (Bioinformatics II)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Comparing Genes, Proteins, and Genomes (Bioinformatics III)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Molecular Evolution (Bioinformatics IV)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Genomic Data Science and Clustering (Bioinformatics V)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Finding Mutations in DNA and Proteins (Bioinformatics VI)</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Bioinformatics Capstone: Big Data in Biology (Bioinformatics VII)</span></li>
</ul>
<br />
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1560498632&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fgenomic-data-science" target="_blank">Genomic Data Science Specialization</a> (Option #2)</b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Taught by renowned data scientist and biostatistician (<a href="http://jtleek.com/" target="_blank">Jeff Leek</a>, Johns Hopkins ), this specialization will give you the skills you need to understand, analyze, and interpret data from next generation sequencing experiments. Features hands-on exercises with the command line, Python, R, Bioconductor, and Galaxy. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introduction to Genomic Technologies</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Genomic Data Science with Galaxy</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python for Genomic Data Science</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithms for DNA Sequencing</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Command Line Tools for Genomic Data Science</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Bioconductor for Genomic Data Science</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Statistics for Genomic Data Science</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Genomic Data Science Capstone</span></li>
</ul>
<br />
<b><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fxseries%2Fdata-analysis-life-sciences" target="_blank">Data Analysis for Life Sciences</a> and <a href="https://www.awin1.com/cread.php?awinmid=6798&awinaffid=423057&clickref=&p=https%3A%2F%2Fwww.edx.org%2Fxseries%2Fgenomics-data-analysis" target="_blank">Genomics Data Analysis</a> </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">(Option #3)</span></b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">This track consists of two complementary XSeries. Both are taught by another renowned data scientist and biostatistician (<a href="https://rafalab.github.io/" target="_blank">Rafael Irizarry</a>, Harvard). Data Analysis for Life Sciences '...is perfect for anyone in the life sciences who wants to learn how to analyze data. Problem sets will require coding in the R language to ensure learners fully grasp and master key concepts.' The second part of the track, Genomics Data Analysis, '...is an advanced series that will enable students to analyze and interpret data generated by modern genomics technology... is perfect for those who seek advanced training in high-throughput technology data.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Statistics and R</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introduction to Linear Models and Matrix Algebra</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Statistical Inference and Modeling for High-throughput Experiments</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">High-Dimensional Data Analysis</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introduction to Bioconductor: Annotation and Analysis of Genomes and Genomic Assays</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">High-performance Computing for Reproducible Genomics</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Case Studies in Functional Genomics</span></li>
</ul>
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u><br /></u></b>
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u>Other Intermediate Courses</u></b><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1745054360&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fdata-structures-algorithms" target="_blank">Data Structures and Algorithms Specialization</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Knowing the right algorithm to use can mean the difference between a job that takes an hour or a week to run. This specialization has a combination of theory and practice, and you will implement nearly 100 algorithmic coding problems in your language of choice (instead of just taking a multiple choice quiz like many MOOCs). This specialization also features two big, real-world projects: Big Networks and Genome Assembly. The first involves analyzing different networks (e.g roads, social) and finding the shortest path. The second will cover </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">assembly algorithms and</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> assembling genomes from millions of short fragments of DNA.</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithmic Toolbox</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Data Structures</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithms on Graphs</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Algorithms on Strings</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Advanced Algorithms and Complexity</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Genome Assembly Programming Challenge</span></li>
</ul>
<b><u><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></u></b>
<b><u><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Optional</span></u></b><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.3194365719&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fbig-data" target="_blank">Big Data Specialization</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">As a bioinformatician, working with tens of thousands of sequenced genomes or millions of radiology images (e.g. CAT scans) means you'll need to know a thing or two about big data. This specialization will give you hands-on experience with the tools and systems you need. These courses take you through the basics of using Hadoop with MapReduce, Spark, Pig and Hive, and you will be shown how to ask the right questions about data, how to communicate like a data scientist, and how to perform exploration of large, complex datasets. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introduction to Big Data</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Big Data Modeling and Management Systems</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Big Data Integration and Processing</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Machine Learning With Big Data</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Graph Analytics for Big Data</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Big Data - Capstone Project</span></li>
</ul>
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.10188599828&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fbiostatistics" target="_blank">Mathematical Biostatistics Boot Camp 1</a> & <a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.10033995912&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fbiostatistics-2" target="_blank">Mathematical Biostatistics Boot Camp 2</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">These courses will help if you are struggling with some of the statistical concepts you encounter in your training as a bioinformatician. They introduce 1) the fundamental probability and statistical concepts used in elementary data analysis and 2) fundamental concepts in data analysis and statistical inference. These courses are appropriate for '...undergraduate students with junior or senior college-level mathematical training including a working knowledge of calculus. A small amount of linear algebra and programming are useful for the class, but not required.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif; font-size: large;"><b>Advanced Courses</b></span><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Machine Learning and Deep Learning</u></b></span><br />
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1560515719&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Flearn%2Fmachine-learning" target="_blank">Machine Learning</a> (</b></span><b style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Option # 1)</b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">This first option is actually a standalone course (in lieu of a specialization) because I like it so much. It is taught by one of the world's foremost experts in <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> (<a href="http://www.andrewng.org/" target="_blank">Andrew Ng</a>, Baidu Research/Stanford University). The fact that this course is available for anyone to take is mind blowing. I highly recommend it. You will be given an introduction to machine learning, data mining, and statistical pattern recognition. You'll learn both supervised and unsupervised methods as well as best practices in machine learning. The course features numerous case studies and applications, so you'll be getting a taste of everything from '...building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1560499085&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fmachine-learning" target="_blank">Machine Learning Specialization</a> (</b></span><b>Option # 2)</b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you want to spend a bit more time getting familiar with machine learning, then this four course specialization in machine learning from the University of Washington is for you. 'Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Machine Learning Foundations: A Case Study Approach</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Machine Learning: Regression</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Machine Learning: Classification</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Machine Learning: Clustering & Retrieval</span></li>
</ul>
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.11421701896&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fdeep-learning" target="_blank">Deep Learning Specialization</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">An entire <a href="https://en.wikipedia.org/wiki/Deep_learning" target="_blank">Deep Learning</a> specialization by <a href="http://www.andrewng.org/" target="_blank">Andrew Ng</a>? Yes please! This specialization will teach you the foundations of deep learning, how to build neural networks, and how to run a successful deep learning project. They cover everything from convolutional networks and recurrent neural networks (RNN) to long short term memory (LSTM), Adam, Dropout, BatchNorm, Xavier/He initialization, and more. These courses teach you both the theory and how deep learning is applied in industry. This includes several case studies in healthcare, autonomous driving, sign language reading, music generation, and natural language processing. This course is taught in Python and in <a href="https://www.tensorflow.org/" target="_blank">TensorFlow</a>.</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Neural Networks and Deep Learning</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Structuring Machine Learning Projects</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Convolutional Neural Networks</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Sequence Models</span></li>
</ul>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u><b><br /></b></u></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><u><b>Other Advanced Courses</b></u></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://click.linksynergy.com/link?id=4YJjzg8urMo&offerid=467035.1560499477&type=2&murl=https%3A%2F%2Fwww.coursera.org%2Fspecializations%2Fsystems-biology" target="_blank"><b>Systems Biology and Biotechnology Specialization</b></a></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Many bioinformaticians work in <a href="https://en.wikipedia.org/wiki/Systems_biology" target="_blank">systems biology</a>, '...[a] field of study that focuses on complex interactions within biological systems, using a holistic approach to biological research.' This broad field utilizes a whole slew of different methodologies, so this specialization introduces you to topics like dynamical modeling, network and statistical modeling, "omics" technologies (e.g. genomics, proteomics), and single cell research technologies. Upon completion, you'll know how to combine experimental, computational, and mathematical methods to answer questions in a variety of biomedical fields. </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">The courses include:</span><br />
<ul>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Introduction to Systems Biology</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Experimental Methods in Systems Biology</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Network Analysis in Systems Biology</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Dynamical Modeling Methods for Systems Biology</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Integrated Analysis in Systems Biology</span></li>
<li><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Systems Biology and Biotechnology Capstone</span></li>
</ul>
<br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif; font-size: large;"><b>Recommended Free Books</b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Programming</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Advanced R - <a href="http://adv-r.had.co.nz/" target="_blank">Free Online Version</a> - <a href="https://www.amazon.com/gp/product/1466586966/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1466586966&linkId=073e64030c093b6c017185c54cca0d87" target="_blank">Buy Print Version </a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">R programming book by data scientist <a href="http://hadley.nz/" target="_blank">Hadley Wickham </a>(credentials include Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University)</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Think Python - <a href="http://greenteapress.com/wp/think-python/" target="_blank">Free Online Version (1st Edition)</a> - <a href="https://www.amazon.com/gp/product/1491939362/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1491939362&linkId=a0fc031f588a7fe2ec1cb9dfd56b4e80" target="_blank">Buy Print Version (2nd Edition) </a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'Think Python is an introduction to Python programming for beginners. It starts with basic concepts of programming, and is carefully designed to define all terms when they are first used and to develop each new concept in a logical progression. Larger pieces, like recursion and object-oriented programming are divided into a sequence of smaller steps and introduced over the course of several chapters.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Statistics and Data Science</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>An Introduction to Statistical Learning with Applications in R - <a href="http://www-bcf.usc.edu/~gareth/ISL/" target="_blank">Free Online Version</a> - <a href="https://www.amazon.com/gp/product/1461471370/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1461471370&linkId=7ecec0eaef65357ba1542ad555bd5aeb" target="_blank">Buy Print Version </a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Think Stats - <a href="http://www.greenteapress.com/thinkstats/" target="_blank">Free Online Version</a> - <a href="https://www.amazon.com/gp/product/1491907339/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1491907339&linkId=2b9ba34de4808e32b1bbca5cc963ea76" target="_blank">Buy Print Version </a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'Think Stats is an introduction to Probability and Statistics for Python programmers. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets. If you have basic skills in Python, you can use them to learn concepts in probability and statistics. Think Stats is based on a Python library for probability distributions (PMFs and CDFs). Many of the exercises use short programs to run experiments and help readers develop understanding.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Exploratory Data Analysis with R - <a href="https://leanpub.com/exdata" target="_blank">Free Online Version</a> - <a href="https://www.amazon.com/gp/product/1365060063/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1365060063&linkId=4646626cc020d2ee47f9ccb9227f24cc" target="_blank">Buy Print Version</a> </b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'This book teaches you to use R to effectively visualize and explore complex datasets. Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>The Elements of Statistical Learning (2nd edition) - <a href="https://web.stanford.edu/~hastie/ElemStatLearn/" target="_blank">Free Online Version</a> </b></span><b style="font-family: "helvetica neue", arial, helvetica, sans-serif;">- <a href="https://www.amazon.com/gp/product/0387848576/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0387848576&linkId=b55a6e68973e9bcd615e29bb68a0daf0" target="_blank">Buy Print Version </a></b><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'During the past decade has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Machine Learning</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Understanding Machine Learning - <a href="http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/" target="_blank">Free Online Version</a> - <a href="https://www.amazon.com/gp/product/1107057132/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1107057132&linkId=1e3a36b96a84cfe7eb7508682654d3b1" target="_blank">Buy Print Version</a> </b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for advanced undergraduates or beginning graduates, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics and engineering.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Biology</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Molecular Biology of the Cell - <a href="https://www.ncbi.nlm.nih.gov/books/NBK21054/" target="_blank">Searchable Online Version (4th Edition)</a> - <a href="https://www.amazon.com/gp/product/0815344325/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0815344325&linkId=64762a62cf7f023e6b2c11e96f35e570" target="_blank">Buy Print Version (6th Edition)</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'Molecular Biology of the Cell is the classic in-depth text reference in cell biology. By extracting fundamental concepts and meaning from this enormous and ever-growing field, the authors tell the story of cell biology, and create a coherent framework through which non-expert readers may approach the subject. Written in clear and concise language, and illustrated with original drawings, the book is enjoyable to read, and provides a sense of the excitement of modern biology. Molecular Biology of the Cell not only sets forth the current understanding of cell biology (updated as of Fall 2001), but also explores the intriguing implications and possibilities of that which remains unknown.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>Molecular Cell Biology - <a href="https://www.ncbi.nlm.nih.gov/books/NBK21475/" target="_blank">Searchable Online Version (4th Edition)</a> - <a href="https://www.amazon.com/gp/product/1464183392/ref=as_li_qf_sp_asin_il_tl?ie=UTF8&tag=bioinforma074-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1464183392&linkId=f8af89f8943a35d92d32132e0d6f6a84" target="_blank">Buy Print Version (8th Edition)</a></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'Modern biology is rooted in an understanding of the molecules within cells and of the interactions between cells that allow construction of multicellular organisms. The more we learn about the structure, function, and development of different organisms, the more we recognize that all life processes exhibit remarkable similarities. Molecular Cell Biology concentrates on the macromolecules and reactions studied by biochemists, the processes described by cell biologists, and the gene control pathways identified by molecular biologists and geneticists. In this millennium, two gathering forces will reshape molecular cell biology: genomics, the complete DNA sequence of many organisms, and proteomics, a knowledge of all the possible shapes and functions that proteins employ.'</span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b><u>Misc.</u></b></span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><b>How to be a modern scientist - <a href="https://leanpub.com/modernscientist" target="_blank">Free Online Version</a></b> </span><br />
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">'A book about how to be a scientist the modern, open-source way.'</span>Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-65484485638625322922017-09-20T11:58:00.001-04:002020-05-03T00:40:23.988-04:00How to ask a good programming question<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Bioinformatics is full of programming challenges, but did you know that there are people on the Internet right now who are willing to give you programming advice for free? There is just one catch: no one is going to help you if you waste their time. Ever see a painful-to-read question on a help site where the original poster (OP) asks a question, someone responds asking for more information, OP responds with not enough information, someone else responds asking for further clarification, and so on? This is an example a bad question. Time is important to most people, and bad questions waste OP’s time and potential respondents' time. Not many people have the time or patience to go through the effort of deciphering a bad question. On the other hand, a good question will be straight to the point and contain all of the information necessary for someone else to quickly answer the question. Asking a good question makes it much easier for someone to respond since the problem will be clear and example code will be provided. There are five elements to asking a good question, and if you follow these elements, you will drastically increase your chances of receiving a response. However, there is a key point to keep in mind before you ask your question: don't be lazy.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><b>Don't be lazy</b></span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">In my opinion, the worst type of question is "How do I do xyz?" This type of question is almost never appropriate to ask because it's too broad. When you're soliciting help from a stranger, you need to have a little skin in the game, and this type of question will make it abundantly clear that you haven't put any effort into finding a solution on your own. Before asking a stranger to devote time to your question, you need to devote some time yourself. If you're not sure where to start, then work through a tutorial on xyz, go find a book on xyz, see how people have implemented xyz themselves on <a href="https://github.com/" target="_blank">GitHub</a>, search <a href="https://stackexchange.com/" target="_blank">StackExchange</a> for questions on xyz, or anything else you can think of to increase your knowledge of xyz. </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">You should now be at the point where you’ve tried to implement xyz yourself and you now have a specific question. Keep in mind that you will still need to do some homework (even with a specific question) before posting your question. If you ask a simple question that can easily be Googled, then you might never get a response or you might get a lot of <a href="http://lmgtfy.com/?q=programming+help" target="_blank">passive aggressive links to LMGTFY</a>. Google can give you a helpful link to documentation or to StackExchange where your question has already been answered. If you’re having trouble finding your specific error or problem with Google, then please see this post for <a href="https://knightlab.northwestern.edu/2014/03/13/googling-for-code-solutions-can-be-tricky-heres-how-to-get-started/" target="_blank">suggestions on how to format your search</a>. </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><b>The five elements to a good question</b></span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Now that you’ve done your due diligence, you can ask your good question. There are five key elements that make up a good question. Follow these to maximize your chances of receiving a response:</span><br />
<ul>
<li><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">First, a good question needs a good title that clearly and succinctly conveys the problem. On most sites, only the title of your question will be visible until someone clicks on it, so you better make it straight to the point. An example of a bad title would be "BAM file confusion”, which doesn't specify what the actual problem is, whereas "Trouble calculating average coverage for a BAM file" will make it clear what the issue is. Your title doesn't necessarily have to be in question tense as long as it gets the point across. </span></li>
</ul>
<ul>
<li><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Second, have a leading sentence or two about what you're trying to accomplish. Having an idea of the big picture is helpful to others because if gives context to what you’re trying to accomplish. This context might even get you recommendations for shortcuts or alternative approaches that you hadn't considered. After this leading sentence should be the statement of the problem or the asking the question that you’d like answered.</span></li>
</ul>
<ul>
<li><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Third, include a <a href="http://sscce.org/" target="_blank">short, self-contained, correct/compilable example (SSCCE)</a>. This is crucially important. Coming up with a SSCCE reduces the number of things that could be going wrong in your code and allows others to quickly reproduce the issue you're having. Most sites allow you to embed code directly into your post, but you can also post your code to a site like <a href="http://www.pastebin.com/" target="_blank">Pastebin</a>, which has features like syntax highlighting. Make sure to include sample output, any errors that you get, and other relevant information (operating system, versions of programs/programming languages, etc.). </span></li>
</ul>
<ul>
<li><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Fourth, make sure you proofread your post. If you want someone to give you a serious answer, then you want to show that you are serious by using proper punctuation and grammar. Next, ask yourself if your question is clear and straight to the point. Three paragraphs of background before you get to any code means that you aren't being straight to the point. Long, rambling questions like this will cause a lot of <a href="http://www.urbandictionary.com/define.php?term=tl%3Bdr" target="_blank">TL;DR</a>.</span></li>
</ul>
<ul>
<li><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Fifth, help others help you. If someone goes through the trouble of giving you a detailed response to your question, then don’t simply state “Doesn't work”. Make sure you specify why it doesn’t work. Make an updated SSCCE with the new code. Are you getting a new error message? Is the output different now?</span></li>
</ul>
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br />Where to ask your good question</b><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Now that you know what makes a good question, you will need somewhere to post it. Since a majority of bioinformatics is computer work, <a href="https://stackexchange.com/" target="_blank">StackExchange</a> is a great place to find answers to programming and system administration related questions. The StackExchange communities have a voting system where the best answers rise to the top. This feature makes it very easy to find quality responses to previously asked questions. StackExchange now features <a href="https://bioinformatics.stackexchange.com/" target="_blank">a new bioinformatics community</a> for bioinformatics-related questions, but historically a StackExchange-like site, <a href="https://www.biostars.org/" target="_blank">Biostars</a>, has been the place to go for bioinformatics questions. The <a href="https://www.reddit.com/r/bioinformatics/" target="_blank">bioinformatics subreddit</a> has a very active community and it's rare that a good question goes unanswered. Another option is <a href="http://seqanswers.com/" target="_blank">SEQanswers</a>, a popular forum dedicated to bioinformatics and next generation sequencing. Some software developers (<a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">like R</a>) maintain mailing lists where you can email in a question which gets disseminated to everyone subscribed. To me, this method is a bit slower than getting help from the above sites, but occasionally obscure bioinformatics software might not be well known on a mainstream site like StackExchange and your only hope for support is via a mailing list. If you're encountering some strange behavior or you think you've run into a bug in a program, then you should consider emailing the developer directly or <a href="https://guides.github.com/features/issues/" target="_blank">opening an issue</a> wherever the project’s code repository is hosted (these days, it’s likely to be <a href="https://github.com/" target="_blank">GitHub</a>).</span><br />
<div>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span></div>
Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-33183018444646817572017-08-14T12:19:00.003-04:002020-05-03T00:41:07.660-04:00Your first bioinformatics project<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Nothing will improve your bioinformatics skills like creating your own project, and </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">n</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">othing proves your bioinformatics skills to a prospective employer (or university admissions committee) like having your own project to showcase. Yes, having a solid foundation in biology and programming is important for anyone looking to do bioinformatics, but studying biology and programming books until you're blue in the face will not get you practical insights like your first project will. </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><b>Picking your First Project</b></span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">As a beginner it might seem hard to know where you can carve out your own little piece of bioinformatics and contribute to the community, but rest assured that there are more problems than there are bioinformaticians to solve them, so with a little digging you can find something cool to work on while simultaneously solving a problem. Your project doesn't even have to solve a Big Problem™ in bioinformatics. A cooler/better/faster way of doing something existing is still enough for a first project. Even a neat way to visualize existing data is enough. If you're having trouble identifying a project to work on, then this might be a sign that you need to do some more background research. </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Once you're familiar with a topic, it is REALLY easy to identify a problem that hasn't been solved or an area for improvement. </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">There are many different subspecialties in bioinformatics, so to help narrow down your choices, you could always consider an area that a potential employer might be interested in, or if you still can't decide, then pick an area that might have broad appeal. For example, genomics and personalized medicine are on fire right now, so a project related to next generation sequencing pipelines will probably have more appeal than a project dealing with protein crystal structures. Still l</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">ooking for ideas? W</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">hy not ask a biologist on </span><a href="http://reddit.com/r/biology" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">r/biology</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"> about the pain points in their work or data analysis and come up with a bioinformatics solution for them?</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">If you're having trouble choosing a project or if you're that you have the programming skills it takes to complete a project, then I'd like to suggest working on some bioinformatics-specific programming problems. The exercises in regular programming books are great, but they don't get you thinking about programming in the context of biology. With this in mind, try working through some bioinformatics-specific programming problems on <a href="http://rosalind.info/" target="_blank">rosalind</a>. rosalind poses real bioinformatics problems for you to solve, gives you the relevant background, and allows you to instantly check your solution through a web interface. rosalind gives you exposure to a variety of topics including '...computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, and string algorithms'. Working your way through these problems (warning: some can be quite challenging) will give you exposure to the variety of problems a bioinformatician might work on, allow you to hone your programming skills, identify gaps and your knowledge, and act as a springboard for you to create your own project. A bonus is that all of your rosalind solutions can be shown a potential employer. This isn't as great as having your very own project, but it will prove you have programming experience and you can score some bonus points for having a particularly clever solution to a problem.</span><br />
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></b>
<b style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Hosting Your Project on GitHub</b><br />
<div>
<br /></div>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Once you've created your project, it will need to be hosted somewhere for potential employers to access. </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><a href="https://github.com/" target="_blank">GitHub</a> is a popular choice that offers free hosting for publicly accessible open source projects, and you can get <a href="https://education.github.com/pack" target="_blank">free private repositories if you're a student</a>. I</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">t's now very common practice to ask for a GitHub link as part of the application process, but even if you aren't directly asked, you should always include a link to your project's repository on your resume/CV. Don't ever just provide a link to your GitHub profile because it makes more work for the other person and it might be unclear as to what project you're trying to highlight. If you're wondering, it's considered poor form to offer to email someone a zipped file of your work "upon request". However, this doesn't mean that you simply git push your code, provide a link, and then forget about it. Oh no. There is an entire art to exhibiting a professional looking project.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">The first thing you need for your project is a proper </span><a href="https://en.wikipedia.org/wiki/README" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">readme</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">. This is the information displayed below the folders and files in GitHub. GitHub readmes are </span><a href="https://guides.github.com/features/mastering-markdown/" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">written in markdown</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">, a quick and popular way to style text. A good readme tells a potential user what your project is about and how to use it. You should avoid using field-specific lingo or abbreviations without defining them first. Since prospective employers could be reading this, make sure to use proper grammar, spelling, and punctuation. A good readme template </span><a href="https://gist.github.com/PurpleBooth/109311bb0361f32d87a2" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">can be found here</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">, and you can find examples of projects with </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><a href="https://github.com/matiassingers/awesome-readme" target="_blank">good readmes here</a></span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">. The readme should absolutely include screenshots or sample output if you're generating images. The readme can even include an embedded video in gif form if you want to show some cool visual effects or animations. If you built a web tool, then the readme should link directly to a publicly accessible web server running the software. If it's not a web tool, then there should be a link to an installer or at least easy to follow instructions for building your tool. If it's a pain to get to a useable form of your tool, then you risk losing the interest of or frustrating the potential employer. Even if you made the most mind-blowing tool, frustrating a potential employer will mean you have a near zero chance at landing an interview.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">The second thing you need to do is properly organize your project's structure. There are certain conventions that people will expect to see when perusing your code, and a professional looking project structure will show recruiters that you know your stuff. PLOS has a <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424" target="_blank">great guide for structuring your bioinformatics project</a>. The PLOS guide seems to be more geared for projects that process data, so if your project is more software oriented, then you should check out <a href="https://github.com/kriasoft/Folder-Structure-Conventions" target="_blank">this guide</a> for project structure.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdWW9_q9vWVyNntIterymrDicbV-H11Db_6mmWfpxCDXpXqbxgxSuXtAMZyw0LjQRu7e8ab6bIa3_SbqfaHinIX8h4fJGZElEw0ryrxdtxkoQ2AVfSE6mKDKqs5ZPuvpeB43d9qK_5PylC/s1600/journal.pcbi.1000424.g001.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="893" data-original-width="1600" height="355" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdWW9_q9vWVyNntIterymrDicbV-H11Db_6mmWfpxCDXpXqbxgxSuXtAMZyw0LjQRu7e8ab6bIa3_SbqfaHinIX8h4fJGZElEw0ryrxdtxkoQ2AVfSE6mKDKqs5ZPuvpeB43d9qK_5PylC/s640/journal.pcbi.1000424.g001.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Suggested structure for your bioinformatics project from <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424" target="_blank">PLOS Computational Biology</a>.</span></td></tr>
</tbody></table>
<br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">For advanced users, you should consider utilizing <a href="https://en.wikipedia.org/wiki/Continuous_integration" target="_blank">continuous integration</a> (CI). CI includes <a href="https://en.wikipedia.org/wiki/Version_control" target="_blank">version control</a> (you're already doing that with GitHub) combined with various other practices like build automation and self testing. CI is good development practice, and you can get <a href="http://shields.io/" target="_blank">sweet badges</a> for your project's page. Potential employers will see a badge on your project's page and know that you're the real deal. <a href="https://travis-ci.org/" target="_blank">Travis CI</a> is a popular choice with <a href="https://github.com/marketplace/travis-ci" target="_blank">GitHub integration</a>, and Travis CI is free for open source projects. <a href="https://github.com/mbonaci/mbo-storm/wiki/Integrate-Travis-CI-with-your-GitHub-repo" target="_blank">Check out this guide</a> for getting Travis CI integrated into your GitHub project.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><b>Contribute to Open Source</b></span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">In closing, an alternative to creating your own project would be to contribute to an existing <a href="https://en.wikipedia.org/wiki/Open-source_software" target="_blank">open source</a> project. It's <a href="https://scholar.google.com/scholar?q=bioinformatic%20toolkit%20github" target="_blank">really easy</a> to find an open source project to contribute to (<a href="https://galaxyproject.org/" target="_blank">Galaxy</a> is my favorite recommendation), and you can read </span><a href="https://opensource.guide/how-to-contribute/" style="font-family: "Helvetica Neue", Arial, Helvetica, sans-serif;" target="_blank">this wonderful guide</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"> for everything you need to know from finding a project to opening a pull request. Keep in mind that there are good ways and bad ways to go about contributing to an open source project. The good way would mean being involved in multiple aspects like contributing code and being part of a design team or a steering committee. This way you can talk to a prospective employer about many different aspects of the project. The bad way would mean contributing code without really being involved with the project. This way you can only say, "Oh, well I fixed this one bug," which doesn't come off as very impressive during an interview. </span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Contributing to open source the good way has the secondary benefit of building your reputation. Many job candidates are found through networking, so showing people you do good work and push good code can directly help you land a job. If you have a specific company or research group in mind, then contributing to an open source project that is used there (or was written there) can help your chances as an applicant. </span>Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-90784799474728862432017-08-07T11:49:00.002-04:002020-07-23T10:25:35.264-04:00The best programming language for getting started in bioinformatics<div><span><i><font face="helvetica">This post contains affiliate links, meaning when you click a link and make a purchase, we receive a commission that helps support this site.</font></i></span></div><div><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div><div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">"What programming language should I learn?" is one of the very first questions to tackle if you are a beginner wanting to learn bioinformatics. Choice of language is important since it will be a tool that you utilize often, so the obvious answer is that you should learn the "best" one. What features should this best language have? It should be easy to learn, cover all of your needs as a bioinformatician (like data analysis, text processing, and application development), and be amazingly fast at any task you throw at it... Unfortunately, any <a href="https://gist.github.com/lenards/3739917" target="_blank">greybeard</a> can tell you that this language does not exist. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">This language does not exist because it would be impossible for a room full of computer scientists to come to agreement on anything let alone the features of a best language. Instead, we have many computer languages that excel in some aspects while having shortcomings in others. This can probably best be summarized with a hammer analogy. Have you ever wondered why screwdriver attachments for hammers aren't more popular? A hammer is exceptionally good at driving nails, and a screwdriver is exceptionally good at driving screws. Why deal with an <a href="https://en.wikipedia.org/wiki/Overengineering" target="_blank">over-engineered</a> scrammer when a hammer and a screwdriver work perfectly well independently?</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">For a seasoned bioinformatician, the "best" programming language would be whatever language gets the job done efficiently. They might choose JavaScript for making a web application, Java for a graphical user interface (GUI), and C for developing a fast algorithm (like the ones used in genomics for <a href="https://en.wikipedia.org/wiki/Sequence_alignment" target="_blank">sequence alignment</a>). However, this is not helpful for someone who might not know how to program in the first place. For a fledgling bioinformatician, the best language is actually a combination of two languages: <a href="https://cran.r-project.org/" target="_blank">R</a> and <a href="https://www.python.org/" target="_blank">Python</a>. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Why R an Python? </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">These languages have all of the features you need to be successful, and i</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">t is unlikely that you will run into a bioinformatics problem that can't be solved because of the limitations of these languages. R and Python</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> are consistently ranked as the two most popular programming languages for bioinformatics job positions according to indeed.com's job trends (accessed 08-02-17), so knowing these languages will likely help your job prospects. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVdKf31CEXcZnM7vxN24m8avLDDaZhVHm4WxdUB2FkJnx5oXrANZQ78YaTZIcq8xmqJGuJKfBnuMyog54_GPt-FsgNKFbUHj-D6JlEPU-0pxVNAfIHwk5jo6WeGVcQPFq_kQSJpkJxSJ_Z/s1600/indeed+trends.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="474" data-original-width="788" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVdKf31CEXcZnM7vxN24m8avLDDaZhVHm4WxdUB2FkJnx5oXrANZQ78YaTZIcq8xmqJGuJKfBnuMyog54_GPt-FsgNKFbUHj-D6JlEPU-0pxVNAfIHwk5jo6WeGVcQPFq_kQSJpkJxSJ_Z/s640/indeed+trends.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">R and Python are consistently the most popular languages for bioinformatics jobs on indeed.com.</span></td></tr>
</tbody></table>
<div>
<br /></div>
<div>
<br /></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Lets break down three major needs of a bioinformatician (data analysis, text processing, and application development; by no means an exhaustive list) and find out why these two languages are the best for getting started.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Data Analysis (R)</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Although bioinformaticians spend a lot of time building software tools, many will spend at least some time working with biological data. For data analysis, R is an excellent choice. It is both a language and an environment for statistical computing and graphics, and <a href="http://r4stats.com/articles/popularity/" target="_blank">it has wide adaptation</a> in the statistics and data science communities. This popularity means that there are thousands of libraries developed by others to take advantage of so you don't have to spend extra time coding. Even better, the <a href="http://bioconductor.org/" target="_blank">Bioconductor project</a> exists solely to provide R libraries for many types of bioinformatic analyses. R is a top choice by academics in bioinformatics and statistics, so a lot of the cutting edge tools based on the newest research are only available in R.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">R has an absolutely wonderful (and free) integrated development environment (IDE) called <a href="https://www.rstudio.com/products/RStudio/" target="_blank">RStudio</a>, which takes the vanilla environment and transforms it into something much more useable. The RStudio folks also make <a href="http://shiny.rstudio.com/" target="_blank">Shiny</a>, a web application framework for R. Shiny lets bioinformaticians take their R code and quickly make polished, interactive web applications without needing to know HTML, CSS, or JavaScript. RStudio's Chief Scientist and well-known data scientist, <a href="http://hadley.nz/" target="_blank">Hadley Wickham</a>, has developed a suite of packages called <a href="http://tidyverse.org/" target="_blank">the tidyverse</a>. 'The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy.' Basically, everything from transforming data to string manipulation to eye-pleasing visualizations can all be done with the libraries in this suite. The tidyverse is considered a default library installation by many in the R community, and I can't recommend it enough.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">What's the best way to learn R? I am a big proponent of structured classes with homework and project deadlines to help facilitate learning, so I highly recommend the <a href="https://www.coursera.org/learn/r-programming" target="_blank">R Programming course at Coursera</a> taught by three big names in the data science world (Peng, Leek, Caffo). This course is part of the <a href="https://www.coursera.org/specializations/jhu-data-science" target="_blank">Data Science Specialty</a>, which is a great idea if you're going to be spending a lot of time analyzing biological data, and upon completion of the specialization you even receive a certificate that you can list on your resume/CV. If you prefer to be self taught, then I highly recommend <i><a href="https://www.amazon.com/Data-Science-Transform-Visualize-Model/dp/1491910399/" target="_blank">R for Data Science: Import, Tidy, Transform, Visualize, and Model Data</a></i> by Hadley Wickham. </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you already know a programming language, then some of the quirks of R might </span><a href="https://www.johndcook.com/blog/r_language_for_programmers/" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">take some getting used to</a><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">, but it is still an invaluable addition to your toolbox. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Text Processing and Application Development (Python)</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python is an <a href="https://en.wikipedia.org/wiki/Object-oriented_programming" target="_blank">object-oriented</a> <a href="https://en.wikipedia.org/wiki/Scripting_language" target="_blank">scripting language</a>. Python was designed with code readability in mind, so Python code tends to be more readable and can accomplish tasks in fewer lines compared to other object-oriented languages. Python is one of the <a href="http://r4stats.com/articles/popularity/" target="_blank">most popular languages in the world</a>, and IEEE listed Python as the <a href="http://spectrum.ieee.org/computing/software/the-2017-top-programming-languages" target="_blank">top ranked programming language of 2017</a> (up from number 3 last year). Python's popularity benefits bioinformaticians using Python because there are extensive libraries for everything from web frameworks to scientific computing. Python can scrape files and websites right out the box thanks to built-in string and HTML/XML processing functions. Have a neat idea for utilizing natural language data? Your idea is only a library install away thanks to the <a href="http://www.nltk.org/" target="_blank">Natural Language Toolkit</a>. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python is great for both web-based and desktop-based application development. For web-based applications, beginning bioinformaticians can choose between three popular frameworks (<a href="http://flask.pocoo.org/" target="_blank">flask</a>, <a href="https://www.djangoproject.com/" target="_blank">django</a>, and <a href="https://trypyramid.com/" target="_blank">Pyramid</a>). For starting out, I recommend flask since you can have a "<a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program" target="_blank">hello world</a>" page up and running in about six lines of code, and it is <a href="https://stackoverflow.com/questions/tagged/flask" target="_blank">very easy to find help</a> if you get stuck. For desktop applications, there are <a href="https://docs.python.org/3/faq/gui.html#what-platform-independent-gui-toolkits-exist-for-python" target="_blank">a number of library choices</a>. I recommend starting with the tkinter library, included in most Python installs by default, which provides an interface to the Tk GUI toolkit. The Tk GUI toolkit comes with or is available for most operating systems, meaning that tkinter applications you write are platform independent (can work on multiple operating systems). You can have a tkinter "hello world" pop-up window in as little as three or four lines of code, and it is <a href="https://stackoverflow.com/questions/tagged/tkinter" target="_blank">very easy to find help</a> if you need it.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Unlike R where RStudio is really your only IDE choice, Python has several options. I highly recommend using <a href="https://www.sublimetext.com/" target="_blank">Sublime Text</a>, a fast text editor with must-have features like <a href="http://www.regular-expressions.info/" target="_blank">regex</a> searches and column highlighting. Sublime can run on Mac, Windows, and even Linux. It can run Python code directly, and with the help of a few easy to install packages, it can be your <a href="https://realpython.com/blog/python/setting-up-sublime-text-3-for-full-stack-python-development/" target="_blank">one stop solution for all Python development</a>. Sublime Text is not free but has an essentially indefinite trial version, and you can get rid of the occasional nag screen by purchasing a lifetime license for $75 USD. If you're looking for something free and open source, then <a href="https://pythonhosted.org/spyder/" target="_blank">Spyder</a> provides an interface similar to RStudio's. Advanced text editors like vim probably have too steep of a learning curve to be useful to a beginner. However, once you get the hang of programming you might be interested in <a href="https://www.norfolkwinters.com/vim-creep/" target="_blank">upping your game</a>.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">For learning Python, I suggest <a href="https://www.coursera.org/learn/interactive-python-1" target="_blank">An Introduction to Interactive Programming in Python</a> and <a href="https://www.coursera.org/learn/interactive-python-2" target="_blank">part two</a>, which will get you well on your way. These courses are part of the <a href="https://www.coursera.org/specializations/computer-fundamentals" target="_blank">Fundamentals of Computing S</a></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://www.coursera.org/specializations/computer-fundamentals" target="_blank">pecialization</a></span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">. If you like the two introductory Python courses, then you should consider taking the rest of the courses in the specialization since the mathematical computing skills you will learn will be helpful on your journey as a bioinformatician (and you get a certificate). For self-teaching, I can't recommend Zed Shaw's </span><i style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="https://www.amazon.com/Learn-Python-Hard-Way-Introduction/dp/0134692888/" target="_blank">Learn Python 3 the Hard Way</a></i><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> book enough. This book is a great because it requires you to type all of the exercises, and I'm a big fan of "learning by doing". For biologists looking to learn how to code, I also recommend the well-written, beginner-friendly <a href="https://www.amazon.com/Python-Biologists-complete-programming-beginners/dp/1492346136/" target="_blank">Python for Biologists</a>.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python 2 or 3?</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Python 3 is the latest version of the official Python release, and I recommend starting with Python 3 since it's <a href="http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#why-is-python-3-considered-a-better-language-to-teach-beginning-programmers" target="_blank">arguably better for beginners</a>. I might have told you otherwise a few years ago due to backwards compatibility issues and lack of library support for Python 3, but Python 3 has now been out since 2008 so these issues are mostly history. If you'</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">re still concerned or maybe you need to work with some legacy code, then you can read more about </span><a href="https://wiki.python.org/moin/Python2orPython3" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">Python 2 vs Python 3</a><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> to help you figure out what version is most appropriate.</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-weight: bold;"><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Why not just Python?</span></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">If you twisted my arm while insisting that you didn't have time to learn two languages because you were working two jobs, helping shelter puppies find homes on weekends, and learning to play the cello, then I would acquiesce that it was ok to only learn Python. Python can have R-like data analysis functionality with the help of the </span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><a href="http://pandas.pydata.org/" target="_blank">pandas</a> library,</span><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"> but you will likely run into situations where a library you need is available in R but not Python. In my experience, R is easier to use for data analysis because it was built for data analysis (see above hammer analogy). Keep in mind that not learning R could hurt your job prospects since there are a LOT of positions that want this kind of experience (see above indeed.com graph). If a position that lists R programming experience came down to a candidate who only knew Python and an equally qualified candidate who knew Python but who also had some small R project up on <a href="https://github.com/" target="_blank">github</a>, then who do you think would get the job? </span></div>
<div>
<br /></div>
<div>
<b><span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">In closing</span></b></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">You'll know when it's time to learn another language. You might come across something unbelievably cool in Erlang that draws you in, or you might start running into performance issues with your code. R and Python are great for a lot of things, but they can be very slow for computationally heavy tasks when compared to a language like C. The good news is that you aren't going to have a hard time picking up new languages since your brain has already been introduced to programming concepts by R and Python. </span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;">Biology can be quite complicated and problems are going to come in all shapes and sizes. Taking a complex problem and breaking it down into manageable pieces that can be solved by a computer program is a skill that transcends any language. As a bioinformatician, this is a very important skill to develop in addition to being able to code. With this in mind, I'd like to close with a link to this great article, '<a href="http://www.ybrikman.com/writing/2014/05/19/dont-learn-to-code-learn-to-think/" target="_blank">Don't learn to code. Learn to think.</a>'</span></div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<div>
</div>
<div>
<span style="font-family: "helvetica neue", arial, helvetica, sans-serif;"><br /></span></div>
<br />
<div>
</div>
<br />
<div style="font-family: "helvetica neue", arial, sans; font-size: 16px;">
<br /></div>
Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-61992460706138816852017-07-30T12:00:00.000-04:002020-05-03T00:41:30.750-04:00A day in the life of a postdoctoral fellow at a cancer center<div style="font-family: gotham, helvetica, arial, sans-serif; font-size: 14px;">
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">I'm a postdoctoral fellow in bioinformatics/computational biology at a major cancer center in the United States. I walk into lab at the cancer center around 9am, boot up my MacBook Pro, and check my email. The biologists in our group are also just wandering in, and our early rising technician already has an experiment running. Being embedded in a lab as a computational person has it's advantages including immediate feedback for questions about an experimental design or results. The main downside is that my thought process and thus coding <a href="http://heeris.id.au/2013/this-is-why-you-shouldnt-interrupt-a-programmer/" target="_blank">will get interrupted</a> by the occasional question or request for consultation. However, I much rather be seated at my window in the lab where I get to see then sun instead of stuck somewhere in a cube farm.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Scanning through my email, I see that the boss has us scheduled to meet with some medical doctors later in the week. Our lab has helped pioneer a new method for looking at multiple post-translational modifications of proteins simultaneously. These docs are in the cancer center's bone marrow transplant department, and they are interested in what role post translational modifications play in graft versus host disease, an awful condition wherein newly transplanted cells attack the host's body. It turns out that I'll be in charge of coming up with an analysis strategy for making sense of all of these modifications. Processing the data won't be a problem because I already have individual pipelines established for analyzing post translational modifications, so I can spend more of my time figuring out exactly what question we're trying to ask and the method I'll use or implement to see the project through. </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Email done, I SSH into our center's HPC cluster to check the status of a job. I've developed a new, network-based method for integrating multi-omic data (aka genomics, proteomics, metabolomics). To determine which networks are significant, the method relies on generating millions of random permutations of the existing data, but as you can imagine, this is computationally intensive which necessitates utilization of the HPC. My job successfully completed so I make a note to have an undergraduate intern that works with me check the results. Any tweaks that need to be made to the output will be pushed to our private github repository for me to review later.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">I then spend the rest of the morning analyzing some data generated in the lab by one of the biologists. Most of my established pipelines written in R process the data rather quickly which gives me additional time to try to find biological meaning in the results. Lunchtime! </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">I wander over to the cancer center's cafeteria and grab a veggie burger and some fries. I sit down and thumb through a feed reader app on my phone. I use it to stay up to date on the latest peer-reviewed scientific journal articles as well as some relevant blogs (e.g., science commentary, bioinformatics, data science, programming). The cafeteria is not segregated; research staff, medical doctors, and patients all co-mingle in the dining area. I promise that if you're feeling bummed out about your day because the traffic was bad or Starbucks got your order wrong, then seeing a bald patient in the cafeteria (who is obviously undergoing chemo) will really put things in perspective for you.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Lunch done, I wander back to lab for some more data analysis. Later in the afternoon I go to a joint group meeting involving biologists, chemists, fellow bioinformaticians/computational biologists, and the occasional statistician. Our center is fairly forward thinking, and we routinely have interdisciplinary meetings to go over progress on various projects. I enjoy these meetings because experts from many fields are represented and they each give a unique perspective on the work you're doing. One bonus of these meetings is that a problem with experimental design or feasibility of a project will be identified early before time/money can be wasted.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">It's getting late before the meeting finally adjourns. I get back to my desk and check the calendar for the rest of the week. We've got a grant deadline in about a month and I've been tasked with doing some preliminary research for a particular section. I make a note on my to-do list for tomorrow using Evernote to make sure I don't forget this. Later in the week I'll need to write up some results for a first author paper that I'm planning to submit to a peer-reviewed journal. I make another note in my to-do list for this.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">I'm fortunate to work for a boss that values work-life balance, so I typically work about 40 hours a week. The occasional grant or other deadline might push these hours up a bit, but I'm not slaving away for 60 plus hours. The nice part about being a computational researcher is that I can pick up my work from home if needed. Most of my files containing non-sensitive information is synced with dropbox while my scripts are all up on my private github repository, meaning I can close my laptop at work, drive home, and then open my home laptop and keep working. I leave work fairly happy. Boredom at my job is rare, and I occasionally drive home with a smile on my face because I really feel like I'm doing my part in the fight against cancer. I knew since early in my college days that I wanted to pursue a PhD doing bioinformatics-related cancer research, and it took a lot of hard work and perseverance but I feel like all of my time at school is finally paying off.</span></div>
Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.comtag:blogger.com,1999:blog-5720312354909001332.post-66872295433861291082017-07-29T13:00:00.000-04:002020-05-03T00:41:53.428-04:00Science Magazine special issue on AI and machine learning<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">It's very fitting that, as I worked to get this site up and running this month, Science <a href="http://science.sciencemag.org/content/357/6346" target="_blank">released a special issue </a>on artificial intelligence and machine learning entitled 'The cyberscientist'. I think it's wonderful that the computational sciences (especially bioinformatics) are now part of the mainstream scientific discourse, and I think this is evidence that more and more positions are going to need to be filled by candidates with training in this space.</span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><a href="http://science.sciencemag.org/content/357/6346/16.full" target="_blank">From the issue</a>:</span><br />
<blockquote class="tr_bq">
<span style="background-color: white; color: #333333;"><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Big data has met its match. In field after field, the ability to collect data has exploded, overwhelming human insight and analysis. But the computing advances that helped deliver the data have also conjured powerful new tools for making sense of it all. In a revolution that extends across much of science, researchers are unleashing artificial intelligence (AI), often in the form of artificial neural networks, on these mountains of data. Unlike earlier attempts at AI, such “deep learning” systems don’t need to be programmed with a human expert’s knowledge. Instead, they learn on their own, often from large training data sets, until they can see patterns and spot anomalies in data sets far larger and messier than human beings can cope with.</span></span></blockquote>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Wait a second, that's a lot of buzzwords for someone not familiar with the field. <a href="https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/" target="_blank">What's the difference between artificial intelligence, machine learning, and deep learning?</a> In short, artificial intelligence is just human intelligence<span style="background-color: white; color: #222222;"> exhibited by a </span>machine. Machine learning uses algorithms to parse data, learns from it, and then makes some sort of prediction. Deep learning is a type of machine learning uses <a href="https://en.wikipedia.org/wiki/Artificial_neural_network">neural networks</a> with multiple hidden layers. </span><br />
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"><br /></span>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Before you go down the rabbit hole of neural networks and deep learning, you should ask yourself why you should care as a bioinformatician.</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"> </span><a href="http://science.sciencemag.org/content/357/6346/25.full" style="font-family: "helvetica neue", arial, helvetica, sans-serif;" target="_blank">Back to the issue in Science</a><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;"> for a great example</span><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">:</span><br />
<blockquote class="tr_bq">
<span style="background-color: white; color: #333333;"><span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">For geneticists, autism is a vexing challenge. Inheritance patterns suggest it has a strong genetic component. But variants in scores of genes known to play some role in autism can explain only about 20% of all cases. Finding other variants that might contribute requires looking for clues in data on the 25,000 other human genes and their surrounding DNA—an overwhelming task for human investigators. So computational biologists have enlisted the tools of artificial intelligence (AI), which can ask a trillion questions where scientists can ask only 10. First, these researchers combined hundreds of genomics data sets and used machine learning build a map of gene interactions. They compared those of the few well-established autism risk genes with those of thousands of other unknown genes and last year flagged another 2500 genes likely to be involved in this disorder. Now they have developed a deep learning tool to find non-coding DNA that may also play a role in autism and other diseases.</span></span></blockquote>
<span style="font-family: "helvetica neue" , "arial" , "helvetica" , sans-serif;">Machine learning is an advanced but indispensable tool for the bioinformatician's toolbox. It just so happens that <a href="https://en.wikipedia.org/wiki/Andrew_Ng" target="_blank">Andew Ng</a>, a renowned world expert of machine learning, <a href="https://www.coursera.org/learn/machine-learning" target="_blank">runs an online Coursera course on Machine Learning</a>. The course runs continuously, so if you missed the enrollment date for this session you can sign up again in a few weeks. If you're interested in self-teaching, I highly recommend the introductory text </span><span style="background-color: white; color: #111111; font-family: "amazon ember" , "arial" , sans-serif;"><i><a href="https://www.amazon.com/Introduction-Statistical-Learning-Applications-Statistics/dp/1461471370/" target="_blank">An Introduction to Statistical Learning: with Applications in R</a>.</i></span>Paulhttp://www.blogger.com/profile/13198801666214127093noreply@blogger.com