Version Control

Amanda D. Clark

Reproducibility and Version Control

Reproducibility in Science

From a review on Reproducibility and Replicability in Science by the Committee on Replicability and Reproducibility in Science of the National Academies of Sciences, Engineering, and Medicine:

  • Reproducibility “…obtaining consistent results using the same input data, computational steps, methods,code, and conditions of analysis.”
  • Replicability “…obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.”
  • More about the “Reproducibility Crisis in Science”

Version Control

  • How do you manage versions of documents and code?
  • Do you remember the changes in each version?
  • What about working code that hits a snag after edits?

    Version Control Humor Source

Version Control Humor Source

  • Use a Version Control System

Version Control & Reproducible Science

  • Version Control Systems (VCS) tracks content, providing an entire history of changes

  • VCS like Git allow code and text files to be shared in full

  • Develop and debug portions of a project, integrate the changes into working code later

Intro to Git

Version Control with Git

What is git?

  • Modern free and open source version control system
  • Developed by the creator of Linux OS
  • Manages and track change to code and text files
  • Aids in reproducibility and transparency

Basic Workflow

Created with BioRender.com

Git Setup

Check for git

$ git --version
  • This command should print your version of git (i.e., git version 1.7.1)

  • If you reach an error (i.e.,-bash: git: command not found), you need to install git

Let’s personalize some basic settings in git

git config --list
  • This command should print your current configuration settings and aliases

You may or may not have configured settings, but here are some essential ones…

git config --global user.name "Amanda D. Clark"
git config --global user.email "adc0032@auburn.edu"
git config --global core.editor "nano -w"
git config --global color.ui true
  • Use git config -h for more options and settings.

Creating a New Local Repository

Git command git init

Make a course directory

$ mkdir IntrotoCompBio
$ cd IntrotoCompBio
$ mkdir repo1
$ cd repo1

Let’s make a repository for the course…

$ git init
Initialized empty Git repository in /home/adc0032/IntrotoCompBio/repo1.git/
  • git will now be aware of content in the working directory (tracked and untracked)

Git command git add

Let’s give git content to track…

$ echo "Just keep Swimming..." > Dory.txt
$ git add Dory.txt
  • This content is now staged to be tracked

Git command git status

Let’s see what content is being tracked and/or managed…

$ git status

On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   Dory.txt
  • No commits, but we do have new content in the staging area .

Git command git commit

Let’s commit staged content and changes to our local repository…

$ git commit

Add a summary of changes made in the commit message, save, and exit the text editor…

$ git commit
[master (root-commit) 8566e3b] Adding Dory's song
 1 file changed, 1 insertion(+)
 create mode 100644 Dory.txt
  • This was the first snapshot of the working directory stored in the local repository!

Git command git diff

Let’s see if there are any changes that haven’t been staged…

$ git diff

What happened?

$ echo "Just Keep Swimming, Swimming, Swimming" >> Dory.txt

$ git diff 
diff --git a/Dory.txt b/Dory.txt
index 4353455..53fb00b 100644
--- a/Dory.txt
+++ b/Dory.txt
@@ -1 +1,2 @@
 Just keep Swimming...
+Just Keep Swimming, Swimming, Swimming
  • git found changes to the working directory that haven’t been staged yet!

  • Let’s go ahead and stage a commit.

$ git add Dory.txt
#We forgot to add the last part! Let's edit the file we just staged

$ echo 'What do we do? We Swim!' >> Dory.txt
  • Run git diff again. What happens?

  • What about if you run git diff --staged?

  • Let’s stage and commit all changes now.

Git command git log

Let’s review what git has done…

$ git log
commit e40bf73c567d1b8881795a8450667b8b30673970 (HEAD -> master)
Author: Amanda D. Clark <adc0032@auburn.edu>
Date:   Sun Aug 16 17:33:46 2020 -0500

    Added the final lines of Dory's song.

commit 8566e3bde8e7aa9428e7614f7917b2ae9d4f137e
Author: Amanda D. Clark <adc0032@auburn.edu>
Date:   Sat Aug 15 02:20:41 2020 -0500

    Adding Dory's song
  • There are many elements here, but this is a log of changes we have made and asked git to track

  • This is essentially your lab notebook!

Back to the Basics

  • Remember:

git add
git status
git commit

  • Stage and Commit Often!

  • The commit message is your friend.

Branches in Git

Branches

  • Contribute another dimension to efficient workflows

  • Useful for experimental code and new developments

  • Changes to branches are independent of the “master” branch

Git command: git branch

Let’s see what branches we have of our repository…

$ git branch
$ git branch
* master
  • We only have one branch, our master

  • The asterisk indicates current branch

Now let’s make a couple of new branches…

$ git branch develop
$ git branch test
  • Run git branch again. How is it different from before?

  • How do you get from branch to branch?

Git command: git checkout

from Imgflip Meme Generator

Let’s manipulate the develop branch…

$ git checkout develop
M   Dory.txt
Switched to branch 'develop'
  • Run git branch again. What has changed?

  • ls the working directory of the “develop” branch

  • Edits, stages, and commits here are specific to the develop branch. Let’s test this…

Let’s sprinkle in little HTML code on our song…

  • Populate Dory’s song into the html template on ASC or github and save as “Dory.html”

  • What has changed about your directory content (Check with ls and git status)?

  • Go ahead and commit these modifications to the repository (Quiz)

Git command: git merge

Let’s incorporate changes in the development branch into the master branch…

$ git checkout master
Switched to branch 'master'

$ git merge develop
Updating e40bf73..edb4f1f
Fast-forward
 Dory.html | 16 ++++++++++++++++
 Dory.txt  |  1 +
 2 files changed, 17 insertions(+)
 create mode 100644 Dory.html
  • We just merged our content and histories from “develop” into “master”

  • What does our working directory and “master” branch look like now?

  • We have new files, why don’t we have anything to commit?

Branches: A Review

  • Branches are extremely useful for making changes to existing content, particularly when publically available.

  • Move between branches with git checkout

  • Remember git diff? Try comparing two branches (test vs master)

  • Changes on the branch didn’t work out? Use the git branch option -d to delete the branch

Remote Repositories

Remotes

  • A version of your local repository hosted elsewhere (i.e., Internet)

  • Often is read-only accessible or read-write accessible to one person (typically, the creator)

  • A way to make your final products and code available to the public

Basic Workflow

Created with BioRender.com

Git command: git clone

Existing repositories can be copied and set up for remote access…

$ git clone <url or directory>
  • Makes a local repository that is a clone of the repository at the give address

  • Automatically makes a remote connection to the given address under the alias “origin”

Git command: git remote

Manage remote repositories with git remote

$ git remote -v
  • Here you are asking git to list your remote connections

    • -v means verbose, and will print the remote alias and address

$ git remote add <alias> <url or directory>
  • Here you are asking git to set up a remote connection to a given address under a specific alias

Git command: git pull

Getting content from a remote repository…

$ git pull <alias> <branch>
# often the command is simply origin master
$ git pull origin master
  • git pull is actually two commands that incorporates remote content locally

    • git fetch which grabs content from the remote repository

    • git merge which merges remote content with your local repository

    • The two commands (fetch merge) are useful for when you want to inspect content before merging

Git command: git push

Sending content to a remote repository…

$ git push origin master
  • We are asking git to send a snapshot of our local repository content to the “master” branch of the remote repository named “origin”

    • Keep your local repository up-to-date with git pull

    • Let’s see a remote repository in action…

Collaboration with Git and GitHub

  • GitHub is one host of public and private remote code repositories (SourceForge is another example)

    • GitHub also provides a GUI for interacting with a repository

    • GitHub repositories can be accessed in a CLI using git clone <github url for repository>

  • Public Repositories can be Forked on GitHub

    • Forking is similar to git clone on the command line

    • Users can make edits or revisions to public code to submit back to the creator or to make something novel

Best Practices for Shared Repositories

  • git pull frequently to ensure the most up-to-date state of the repository

  • Make sure to add meaningful messages for your collaborators!

  • git commit and git push often to reduce merging conflicts. (Remember to git pull first!)

Resources

Git Resources

Git Documentation

Learning Git

Git for Scientist in Bioinformatics Data Skills by Vince Buffalo (Chapter 5)

Acknowledgements

Inspiration

This presentation was adapted from Dr. Jamie Oaks’ Intro to Git

Funding

This presentation was made possible by funding to Amanda Clark from NSF GRFP (1414475) and Auburn University