Version Control for
Scientific Collaboration

Summer Workshop Series – July 11, 2022

Ryan Mears

Git

A History of Git

  • Invented by Linus Torvalds in 2005
    for maintenence of Linux kernel
  • Git - an idiom meaning idiot or fool
  • Complexity required for remote and local tracking of changes

Goals of the Git version control system (VCS):

  • Speed
  • Simple design
  • Strong support for non-linear development (thousands of parallel branches)
  • Fully distributed
  • Able to handle large projects like the Linux kernel efficiently (speed and data size)

Why use Git?

  • Analogy: turn tracking-changes on in Word document
  • Antipattern: revision auto-saving w/ time-stamp

Why use Git?

  • Remote repo: accessed &
    updated anytime
  • Solutions for continuous
    merging of sets of changes

How to use Git

  • Three States
    • Modified means that you have changed the file
      but have not committed it to your database yet.
    • Staged means that you have marked a modified file
      to go into your next commit snapshot.
    • Committed means data is safely stored in local database.

How to use Git

Frequently used

git status
git add
git commit -a
git push

Used at key points

git config
git clone
git log
git pull
git branch
git diff

ProGit Book

Best Git INTRO

Key Feature of Git: Timeline Control

Example from Think Like (a) Git: A Guide for the Perplexed

Key Feature of Git: Timeline Control

Key Feature of Git: Timeline Control

Key Feature of Git: Timeline Control

Key Feature of Git: Timeline Control

Learn Git

Practice Git!!

Repositories

You typically obtain a Git repository in one of two ways:
- Initializing a Repository
- Cloning an Existing Repository

GitHub

GitHub

  • Analogy: Similar to Dropbox except sync timing
  • Antipattern: Dropbox has no system to entertain +1 versions
  • One variant of remote-repository hosting including:
    • BitBucket, GitLab, Sourceforge
  • Extensive documentation and additional online features
  • Project organization tools: Projects, wikis, webpages
  • Security: public vs private (e.g., access-control & 2FA)
    • Private: organizations, teams, assignees
    • Public: oocial tools, watching, fork-ing

Getting Started with GH Desktop

Getting Started with GH Desktop

Github Desktop Setup

## Getting Started with GH Desktop

Issues

This is where the organization of development begins

  • @ mention collaborators
  • # followed by issue title, links related issues in repository
  • Create and assign new branch for issue (When pull-request; issue closed?)

Issues

Issues

Issues

Issues

Issues

Issues

Issues

Issues

Discussions

Use GitHub Repository Discussions to:

  • ask and answer questions
  • share information
  • make announcements
  • conduct or participate in
    conversations about a project

Repos

  • remote-repositories
  • cloning vs forking
  • tracking changes
  • commits
  • main/local
  • branches
  • merges

Pull-Requests

  • at some point consolidation of work needs to happen

Writing Manuscripts

Manuscript written on GitHub

Flow Diagrams

Manuscript written on GitHub: history flow diagram

GitHub Organizations, Projects, & Beyond

Teams and Code Review

  • Teams for sub-components of larger projects
  • Teams for super-group for several repos
  • Groups in classrooms: template projects and assignments
  • Packages to arrange
    members, groups,
    communication,
    & organization

Version Control for Open Science

Projects

Webpages and Wikis

  • Built-in GitHub wikis
  • Hosting webpages
  • Hosting slides
  • Verson control to other high bandwidth servers

ROpenSci

Developer Events

Code Sprints

CodeSpaces

  • Integrated with VS Code
  • Run code from a Repo in a Containerized environment

Resources