This week at my workplace, we have hosted a workshop centering around transcriptomics analysis. I was responsible for two main things: 1) setting up Docker instances for a couple of web applications that would be run on a virtual host that the students could access, and 2) providing tech support for these various demonstrations.

During both the setup and the demonstration processes, there were a few takeaways that will benefit me for future workshops

  • The docker commit command is really handy for demos. For our demo of some visualization application called JBrowse, I decided to use their Docker image, which saved me the trouble of setting everything up myself. However, I ran into some issues accessing the demo data within JBrowse from a mounted volume, so I needed to copy it along with some configuration files into the container. Once everything was configured to my liking, I ran docker commit to save the state of the container as an image. This served as a nice way to do a quick restore of the current container state, should anything happen to the existing container or even the virtual host (fortunately nothing happened).
  • If a Docker container is writing output to a mounted volume, that data will persist after the container is shut down and removed. For our transcriptomics pipeline demo, which consists of a docker-compose stack where one of the containers in the stack runs analysis pipelines and writes output to the area designated as a mounted volume. Before the workshop started, I terminated the containers in the stack, to remove test pipelines, and launched the stack anew. However, the output from the test pipelines persisted since they were in the mounted volume, and while most output was simply overwritten by new pipelines sharing the same IDs, there were cases where the existing presence of a file caused the pipeline to fail or behave erratically. If the output from the previous runs was deleted before relaunching the docker-compose stack, this would have been a non-issue.
  • Using docker cp can be a lifesaver for quick bugfixes. We discovered a bug in the script to configure and launch pipelines that cause the wrong input file to be used in a pipeline step. We use the same code internally, but this was never seen prior to this demo. When someone tells you that students are your best bugfinders, they weren’t kidding. I quickly patched the script and made the GitHub commit, but knew that a rebuild of the Docker image would take too long. The next best step was to use docker cp to copy the patched file directly to the correct area in the container, and the fix was immediate.

Personally, I am glad that we are gravitating towards using Docker in our workshops. The ability to save the state of a workshop demo makes redeploying the same demo for future workshops pretty trivial, as long as the same input data is being used. As the one who is central in setting up the demonstrations it certainly makes my life a lot easier knowing that much of the setup has been taken care of.

I am also happy to see that we are increasingly using Docker in a production environment at work, though there is still a ways to go. Our docker-compose stack for our version of a transcriptomics analysis pipeline will soon be published and released to the public, and I have personally worked on a couple of other Docker-centric versions of other in-house bioinformatics pipelines. As of this writing, we do not have a dedicated host internally that supports Docker, so anyone who wants to run Docker must do so from his or her local machine. In the field of bioinformatics, this can be a big problem given the expensive memory and storage requirements some programs or some data require. Fortunately one foot is already in the door, so I expect improvements to come sooner rather than later.