From a review on Reproducibility and Replicability in Science by the Committee on Replicability and Reproducibility in Science of the National Academies of Sciences, Engineering, and Medicine:
Use a Version Control System
Version Control Systems (VCS) tracks content, providing an entire history of changes
VCS like Git allow code and text files to be shared in full
Develop and debug portions of a project, integrate the changes into working code later
What is git
?
Created with BioRender.com
Check for git
$ git --version
This command should print your version of git
(i.e., git version 1.7.1
)
If you reach an error (i.e.,-bash: git: command not found
), you need to install git
Let’s personalize some basic settings in git
git config --list
This command should print your current configuration settings and aliases
You may or may not have configured settings, but here are some essential ones…
git config --global user.name "Amanda D. Clark"
git config --global user.email "adc0032@auburn.edu"
git config --global core.editor "nano -w"
git config --global color.ui true
Use git config -h
for more options and settings.
git init
Make a course directory
$ mkdir IntrotoCompBio
$ cd IntrotoCompBio
$ mkdir repo1
$ cd repo1
Let’s make a repository for the course…
$ git init
Initialized empty Git repository in /home/adc0032/IntrotoCompBio/repo1.git/
git
will now be aware of content in the working directory (tracked and untracked)
git add
Let’s give git
content to track…
$ echo "Just keep Swimming..." > Dory.txt
$ git add Dory.txt
This content is now staged to be tracked
git status
Let’s see what content is being tracked and/or managed…
$ git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: Dory.txt
No commits, but we do have new content in the staging area .
git commit
Let’s commit staged content and changes to our local repository…
$ git commit
Add a summary of changes made in the commit message, save, and exit the text editor…
$ git commit
[master (root-commit) 8566e3b] Adding Dory's song
1 file changed, 1 insertion(+)
create mode 100644 Dory.txt
This was the first snapshot of the working directory stored in the local repository!
git diff
Let’s see if there are any changes that haven’t been staged…
$ git diff
What happened?
$ echo "Just Keep Swimming, Swimming, Swimming" >> Dory.txt
$ git diff
diff --git a/Dory.txt b/Dory.txt
index 4353455..53fb00b 100644
--- a/Dory.txt
+++ b/Dory.txt
@@ -1 +1,2 @@
Just keep Swimming...
+Just Keep Swimming, Swimming, Swimming
git
found changes to the working directory that haven’t been staged yet!
Let’s go ahead and stage a commit.
$ git add Dory.txt
#We forgot to add the last part! Let's edit the file we just staged
$ echo 'What do we do? We Swim!' >> Dory.txt
Run git diff
again. What happens?
What about if you run git diff --staged
?
Let’s stage and commit all changes now.
git log
Let’s review what git
has done…
$ git log
commit e40bf73c567d1b8881795a8450667b8b30673970 (HEAD -> master)
Author: Amanda D. Clark <adc0032@auburn.edu>
Date: Sun Aug 16 17:33:46 2020 -0500
Added the final lines of Dory's song.
commit 8566e3bde8e7aa9428e7614f7917b2ae9d4f137e
Author: Amanda D. Clark <adc0032@auburn.edu>
Date: Sat Aug 15 02:20:41 2020 -0500
Adding Dory's song
There are many elements here, but this is a log of changes we have made and asked git
to track
This is essentially your lab notebook!
Remember:
git add
git status
git commit
Stage and Commit Often!
The commit message is your friend.
Contribute another dimension to efficient workflows
Useful for experimental code and new developments
Changes to branches are independent of the “master” branch
git branch
Let’s see what branches we have of our repository…
$ git branch
$ git branch
* master
We only have one branch, our master
The asterisk indicates current branch
Now let’s make a couple of new branches…
$ git branch develop
$ git branch test
Run git branch
again. How is it different from before?
How do you get from branch to branch?
git checkout
Let’s manipulate the develop branch…
$ git checkout develop
M Dory.txt
Switched to branch 'develop'
Run git branch
again. What has changed?
ls
the working directory of the “develop” branch
Edits, stages, and commits here are specific to the develop branch. Let’s test this…
Let’s sprinkle in little HTML code on our song…
Populate Dory’s song into the html template on ASC or github and save as “Dory.html”
What has changed about your directory content (Check with ls
and git status
)?
Go ahead and commit these modifications to the repository (Quiz)
git merge
Let’s incorporate changes in the development branch into the master branch…
$ git checkout master
Switched to branch 'master'
$ git merge develop
Updating e40bf73..edb4f1f
Fast-forward
Dory.html | 16 ++++++++++++++++
Dory.txt | 1 +
2 files changed, 17 insertions(+)
create mode 100644 Dory.html
We just merged our content and histories from “develop” into “master”
What does our working directory and “master” branch look like now?
We have new files, why don’t we have anything to commit?
Branches are extremely useful for making changes to existing content, particularly when publically available.
Move between branches with git checkout
Remember git diff
? Try comparing two branches (test vs master)
Changes on the branch didn’t work out? Use the git branch
option -d
to delete the branch
A version of your local repository hosted elsewhere (i.e., Internet)
Often is read-only accessible or read-write accessible to one person (typically, the creator)
A way to make your final products and code available to the public
Created with BioRender.com
git clone
Existing repositories can be copied and set up for remote access…
$ git clone <url or directory>
Makes a local repository that is a clone of the repository at the give address
Automatically makes a remote connection to the given address under the alias “origin”
git remote
Manage remote repositories with git remote
$ git remote -v
Here you are asking git
to list your remote connections
-v
means verbose, and will print the remote alias and address
$ git remote add <alias> <url or directory>
Here you are asking git
to set up a remote connection to a given address under a specific alias
git pull
Getting content from a remote repository…
$ git pull <alias> <branch>
# often the command is simply origin master
$ git pull origin master
git pull
is actually two commands that incorporates remote content locally
git fetch
which grabs content from the remote repository
git merge
which merges remote content with your local repository
The two commands (fetch merge
) are useful for when you want to inspect content before merging
git push
Sending content to a remote repository…
$ git push origin master
We are asking git
to send a snapshot of our local repository content to the “master” branch of the remote repository named “origin”
Keep your local repository up-to-date with git pull
Let’s see a remote repository in action…
GitHub is one host of public and private remote code repositories (SourceForge is another example)
GitHub also provides a GUI for interacting with a repository
GitHub repositories can be accessed in a CLI using git clone <github url for repository>
Public Repositories can be Forked on GitHub
Forking is similar to git clone
on the command line
Users can make edits or revisions to public code to submit back to the creator or to make something novel
git pull
frequently to ensure the most up-to-date state of the repository
Make sure to add meaningful messages for your collaborators!
git commit
and git push
often to reduce merging conflicts. (Remember to git pull
first!)
Git for Scientist in Bioinformatics Data Skills by Vince Buffalo (Chapter 5)
This presentation was adapted from Dr. Jamie Oaks’ Intro to Git
This presentation was made possible by funding to Amanda Clark from NSF GRFP (1414475) and Auburn University