TIL: git and github diff Differently
My team switched over to the SkullCandy git
workflow
last spring and we did not make a new develop
branch for a long time
as deleting the branch on github automatically deletes the branch of
any open pull requests as well.
So, this week we ripped the band-aid off and remastered develop
.
It’s been painful.
I was hoping pull request from the develop
branch into the master
branch would tell us the commits on develop
that are not in
master
, so we can sort out the differences.
That pull request did not tell us anything. In fact, it revealed a disturbing fact: changes that I thought were in both branches were not there. How is that so??
I ran experiments to see what’s going on. You can see it here.
Replication
This is what I replicated on the repository, which is the workflow used for SkullCandy:
- start a
master
branch. - create a
develop
branch by cloning themaster
branch. - when starting a new feature, clone off the
develop
branch. - when ready to merge change into
develop
, make a pull request in. - after change is in
develop
and validated, cherry-pick the commit from the branch intomaster
and make a new pull request. - done.
After experiments in different merge strategies (merge commit, squash
commit, rebase commit), I started to notice: on github, changes that
were on the master
branch would ONLY be the same if and only if
the commit SHA for the change matched.
When I checked locally the difference between master
and the
corresponding develop
and feature
branch.
Example: develop3
and master
Let’s go through an example from the repository:
The master
branch has all the work and it’s file contents are:
start of work stuff
work stuff 1
work stuff 2
work stuff 3
work stuff 4
work2 changes
more work2 changes
work3 stuff
more work3 stuff
The branch which also has the same work: develop3
has the same file
and its contents are :
start of work stuff
work stuff 1
work stuff 2
work stuff 3
work stuff 4
work2 changes
more work2 changes
work3 stuff
more work3 stuff
Locally
Doing a git diff
on the command line produces
vagrant@ubuntu-xenial:/vagrant$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
vagrant@ubuntu-xenial:/vagrant$ git diff develop3
vagrant@ubuntu-xenial:/vagrant$
On github
When making a Pull Request on github.com, the result is:
diff --git a/work_file.txt b/work_file.txt
index bd6764b..9e7d796 100644
--- a/work_file.txt
+++ b/work_file.txt
@@ -5,3 +5,5 @@ work stuff 3
work stuff 4
work2 changes
more work2 changes
+work3 stuff
+more work3 stuff
which is pretty much as if the work never existed, but is there!
https://github.com/a-leung/commit_tests/compare/master…develop3?expand=1
Why does this matter?
It’s important because there are differences between git and github. I can’t trust github to be consistent with git, even for a simple change if the SHA do not match.
git can resolve the same code appearing with different SHA, github relies on the SHA to compute differences between branches.
The reason for the difference? git computes the difference between branches using diff, github computes the differences between branches using SHA.
The only difference between the branches master
and develop3
is
the SHA values for the change:
On master
branch:
vagrant@ubuntu-xenial:/vagrant$ git blame -s work_file.txt
fabcea4b 1) start of work stuff
fabcea4b 2) work stuff 1
fabcea4b 3) work stuff 2
c92a36c5 4) work stuff 3
c92a36c5 5) work stuff 4
e492f5f3 6) work2 changes
e492f5f3 7) more work2 changes
de94346e 8) work3 stuff
de94346e 9) more work3 stuff
On develop3
branch:
vagrant@ubuntu-xenial:/vagrant$ git blame -s work_file.txt
fabcea4b 1) start of work stuff
fabcea4b 2) work stuff 1
fabcea4b 3) work stuff 2
c92a36c5 4) work stuff 3
c92a36c5 5) work stuff 4
e492f5f3 6) work2 changes
e492f5f3 7) more work2 changes
88e6fff2 8) work3 stuff
88e6fff2 9) more work3 stuff
So, that’s one area git and github differ!
Lesson Learned
We have to adjust our workflow for the ways git and github treats differences in code. It’s a subtle difference, but with greater consequences in that we cannot use the tooling to help us, which adds work (that is not value add!)
For now, I will be remastering the develop
branch with higher
frequency.