Monday, August 14, 2017

Your first bioinformatics project

Nothing will improve your bioinformatics skills like creating your own project, and nothing proves your bioinformatics skills to a prospective employer (or university admissions committee) like having your own project to showcase. Yes, having a solid foundation in biology and programming is important for anyone looking to do bioinformatics, but studying biology and programming books until you're blue in the face will not get you practical insights like your first project will. 

Picking your First Project

As a beginner it might seem hard to know where you can carve out your own little piece of bioinformatics and contribute to the community, but rest assured that there are more problems than there are bioinformaticians to solve them, so with a little digging you can find something cool to work on while simultaneously solving a problem. Your project doesn't even have to solve a Big Problem™ in bioinformatics. A cooler/better/faster way of doing something existing is still enough for a first project. Even a neat way to visualize existing data is enough. If you're having trouble identifying a project to work on, then this might be a sign that you need to do some more background research. Once you're familiar with a topic, it is REALLY easy to identify a problem that hasn't been solved or an area for improvement. There are many different subspecialties in bioinformatics, so to help narrow down your choices, you could always consider an area that a potential employer might be interested in, or if you still can't decide, then pick an area that might have broad appeal. For example, genomics and personalized medicine are on fire right now, so a project related to next generation sequencing pipelines will probably have more appeal than a project dealing with protein crystal structures. Still looking for ideas? Why not ask a biologist on r/biology about the pain points in their work or data analysis and come up with a bioinformatics solution for them?

If you're having trouble choosing a project or if you're that you have the programming skills it takes to complete a project, then I'd like to suggest working on some bioinformatics-specific programming problems. The exercises in regular programming books are great, but they don't get you thinking about programming in the context of biology. With this in mind, try working through some bioinformatics-specific programming problems on rosalind. rosalind poses real bioinformatics problems for you to solve, gives you the relevant background, and allows you to instantly check your solution through a web interface. rosalind gives you exposure to a variety of topics including '...computational mass spectrometry, alignment, dynamic programming, genome assembly, genome rearrangements, phylogeny, probability, and string algorithms'. Working your way through these problems (warning: some can be quite challenging) will give you exposure to the variety of problems a bioinformatician might work on, allow you to hone your programming skills, identify gaps and your knowledge, and act as a springboard for you to create your own project. A bonus is that all of your rosalind solutions can be shown a potential employer. This isn't as great as having your very own project, but it will prove you have programming experience and you can score some bonus points for having a particularly clever solution to a problem.

Hosting Your Project on GitHub

Once you've created your project, it will need to be hosted somewhere for potential employers to access. GitHub is a popular choice that offers free hosting for publicly accessible open source projects, and you can get free private repositories if you're a student. It's now very common practice to ask for a GitHub link as part of the application process, but even if you aren't directly asked, you should always include a link to your project's repository on your resume/CV. Don't ever just provide a link to your GitHub profile because it makes more work for the other person and it might be unclear as to what project you're trying to highlight. If you're wondering, it's considered poor form to offer to email someone a zipped file of your work "upon request". However, this doesn't mean that you simply git push your code, provide a link, and then forget about it. Oh no. There is an entire art to exhibiting a professional looking project.

The first thing you need for your project is a proper readme. This is the information displayed below the folders and files in GitHub. GitHub readmes are written in markdown, a quick and popular way to style text. A good readme tells a potential user what your project is about and how to use it. You should avoid using field-specific lingo or abbreviations without defining them first. Since prospective employers could be reading this, make sure to use proper grammar, spelling, and punctuation. A good readme template can be found here, and you can find examples of projects with good readmes here. The readme should absolutely include screenshots or sample output if you're generating images. The readme can even include an embedded video in gif form if you want to show some cool visual effects or animations. If you built a web tool, then the readme should link directly to a publicly accessible web server running the software. If it's not a web tool, then there should be a link to an installer or at least easy to follow instructions for building your tool. If it's a pain to get to a useable form of your tool, then you risk losing the interest of or frustrating the potential employer. Even if you made the most mind-blowing tool, frustrating a potential employer will mean you have a near zero chance at landing an interview.

The second thing you need to do is properly organize your project's structure. There are certain conventions that people will expect to see when perusing your code, and a professional looking project structure will show recruiters that you know your stuff. PLOS has a great guide for structuring your bioinformatics project. The PLOS guide seems to be more geared for projects that process data, so if your project is more software oriented, then you should check out this guide for project structure.


Suggested structure for your bioinformatics project from PLOS Computational Biology.

For advanced users, you should consider utilizing continuous integration (CI). CI includes version control (you're already doing that with GitHub) combined with various other practices like build automation and self testing. CI is good development practice, and you can get sweet badges for your project's page. Potential employers will see a badge on your project's page and know that you're the real deal. Travis CI is a popular choice with GitHub integration, and Travis CI is free for open source projects. Check out this guide for getting Travis CI integrated into your GitHub project.

Contribute to Open Source

In closing, an alternative to creating your own project would be to contribute to an existing open source project. It's really easy to find an open source project to contribute to (Galaxy is my favorite recommendation), and you can read this wonderful guide for everything you need to know from finding a project to opening a pull request. Keep in mind that there are good ways and bad ways to go about contributing to an open source project. The good way would mean being involved in multiple aspects like contributing code and being part of a design team or a steering committee. This way you can talk to a prospective employer about many different aspects of the project. The bad way would mean contributing code without really being involved with the project. This way you can only say, "Oh, well I fixed this one bug," which doesn't come off as very impressive during an interview. Contributing to open source the good way has the secondary benefit of building your reputation. Many job candidates are found through networking, so showing people you do good work and push good code can directly help you land a job. If you have a specific company or research group in mind, then contributing to an open source project that is used there (or was written there) can help your chances as an applicant. 

Popular Posts