Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I do have the following alias, to show me the commit history of any given file:

file-history = log --follow --date-order --date=short -C

It works well, but never shows "merge commits", while the file can have been modified in a branch we did merge into main, for example.

enter image description here

The solution is to add the option -m, but then it shows many, many, many merge commits, for which most of them seem unrelated to the commit history of the file.

What is the right way to write such an alias to make it behave correctly (like in BitBucket, for this matter): showing all commits that did change a file, and only those?

EXTRA INFORMATION --

Using -m shows way too many commits; concretely:

enter image description here

(In red rectangles, what I should see... that's what BitBucket displays...)

(BTW, I don't understand why the commit da3c94a1 is duplicated.)

Using -c shows even much more commits (the first commit that should be reported being in the bottom of the page) and displays the diffs (what I don't want to see here):

enter image description here

Same results for --cc:

enter image description here

And --first-parent shows weird results (as I don't see at all the commits I'm interested in):

enter image description here

NEW EXTRA INFORMATION --

And, with --first-parent -m, no change:

enter image description here

ANSWER TO TOREK --

To make things simpler, I've created the following test repo:

    master    master
     C--D      I--J
    /        /    
A--B      G--H      M--N  master
        /        /
     E--F      K--L
     br1       br2

where I did merge br1 and br2 onto master.

I've created commits which only changed one file at a time.

Commits which changed file1 (only):

  • A
  • C
  • F
  • I
  • L

Commits which changed file2 (only):

  • B
  • D
  • E
  • H
  • J
  • K
  • N

Commits which changed both files:

  • G (the merge of br1 onto master)
  • M (the merge of br2 onto master)

Let's begin with the tests:

$ git log --decorate --date=short
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 7ae0238 (br2) Commit L
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 ca2e68f Commit I
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 9aaa030 (br1) Commit F
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 611bef2 Commit C
2021-11-05 eceafb8 Commit B
2021-11-05 e137033 Initial commit

You know what? I was expecting to see this instead:

2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 ecd490f Commit J
2021-11-05 ca2e68f Commit I
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 86a71ff Commit D
2021-11-05 611bef2 Commit C
2021-11-05 eceafb8 Commit B
2021-11-05 e137033 Initial commit

That is, I was expecting to see neither commits E and F from br1 nor K and L from br2. So, it seems I don't undertand everything...

Now, let's look at the file history of file2.txt... Both GitHub and BitBucket -- I've tested both of them -- show me the following commits (and only those) when asked to display the history of the file:

  • B
  • D
  • E
  • G
  • H
  • J
  • K
  • M
  • N

This is 1 of the 2 results I would have expected -- the other one being the same without commits E and K, as I could have thought they would be hidden (as being part of branches, not committed on master).

Now, let's play with some "file history" commands:

$ git log --follow --date-order --date=short -C file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

$ git log --follow --date-order --date=short -C -m file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

$ git log --follow --date-order --date=short -C -c -s file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

$ git log --follow --date-order --date=short -C --cc -s file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

$ git log --follow --date-order --date=short -C -m --first-parent file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

$ git log --follow --date-order --date=short -C --cc --full-history -s file2.txt
2021-11-05 d670be5 (HEAD -> master, origin/master, origin/HEAD) Commit N
2021-11-05 838f65c Merge branch 'br2' (Commit M)
2021-11-05 affed14 Commit K
2021-11-05 ecd490f Commit J
2021-11-05 45d8231 Commit H
2021-11-05 eb356b8 Merge branch 'br1'
2021-11-05 552a403 Commit E
2021-11-05 86a71ff Commit D
2021-11-05 eceafb8 Commit B

Let's analyse the results, one by one:

$ git log --follow --date-order --date=short -C file2.txt

does not show the merge commits. Incomplete results. Failure, then.

$ git log --follow --date-order --date=short -C -m file2.txt

does show all commits where file2.txt has been changed, but duplicates the merge commits. Partial failure...

$ git log --follow --date-order --date=short -C -c -s file2.txt

and

$ git log --follow --date-order --date=short -C --cc -s file2.txt

both show the 9 commits (7 "normal" + 2 merge) where file2.txt has been changed. Same results as on BitBucket and GitHub.

$ git log --follow --date-order --date=short -C -m --first-parent file2.txt

shows all commits on master where file2.txt has been changed, and the merge commits. Could be the other expected results I had, but not the same as BitBucket and GitHub. Let's discard it, then.

$ git log --follow --date-order --date=short -C --cc --full-history -s file2.txt

also shows the 9 commits.

So, the commands that give the same (complete) results as the ones from GitHub and BitBucket are:

$ git log --follow --date-order --date=short -C -c -s file2.txt
$ git log --follow --date-order --date=short -C --cc -s file2.txt
$ git log --follow --date-order --date=short -C --cc --full-history -s file2.txt

Coming back to my request, which may have been badly expressed, it is the following: I do want to see all commits that did change some file, in order to display the other files also changed in the same commits, and doing so discover the list of files I do have to change for some specific functional request.

Based on my real-world example, it seems that BitBucket was correctly identifying those commits, and that my file-history alias(es) did not... either showing not enough commits, too much commits, or even inappropriate ones...

Coming back to that real-world example, the following commands:

$ git log --follow --date-order --date=short -C -c -s 32-factures-creation.R | wc -l
$ git log --follow --date-order --date=short -C --cc -s 32-factures-creation.R | wc -l
$ git log --follow --date-order --date=short -C --cc --full-history -s 32-factures-creation.R | wc -l

all return me 440 lines:

2021-10-18 d5590007 Merge branch 'master' of https://bitbucket.org/.../...
2021-10-18 6ccde740 Merge branch 'master' of https://bitbucket.org/.../...
2021-10-06 9d532874 Merge branch 'indexation-RMMMG-09-2021' into release/21.10
2021-10-04 d982c3d8 Merge branch 'indexation-RMMMG-09-2021' into release/21.10
2021-10-04 0a65134f Merge branch 'indexation-RMMMG-09-2021' into release/21.10
2021-10-02 728897b9 Merge branch 'indexation-RMMMG-09-2021' into release/21.10
2021-09-30 0df507b9 Simplify SQL expression in 32-factures-creation.R
2021-09-30 16f94a10 Update format of prsAnneeMois
2021-09-29 f9a6cafb Update "Facturation à l'employeur"
2021-10-02 22ef1194 Merge branch 'feature/103-upgrade-...-logo' into release/21.10
2021-09-20 9a2244d3 (tag: xxx_21-10-20_23-01-50, tag: sh_21-10-20_22-56-11, tag: sh_21-10-20_22-54-54, tag: 2021.10.20_23.04_xxx) Merge branch 'master' of https://bitbucket.org/mc.../...
2021-09-20 9fa77b1e Merge branch 'new-new-augm-eff'
2021-07-02 b4538cce Merge branch 'new-augm-eff' into release/21.07
2021-07-02 20c72364 (tag: 2021.07.01) Merge branch 'master' of https://bitbucket.org/.../...
...

That's way more than what I see on BitBucket:

2021-09-30 0df507b9 Simplify SQL expression in 32-factures-creation.R
2021-09-30 16f94a10 Update format of prsAnneeMois
2021-09-29 f9a6c

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
109 views
Welcome To Ask or Share your Answers For Others

1 Answer

You asked specifically about looking for merge commits in your output. I think now, based on all the comments under the question, this was a mistake: you don't want the merge commits at all, even if they do change the file in question. What you want is to stop git log from performing History Simplification.

To do that, simply provide the --full-history flag to git log. But it's also important to know what this flag means: in particular, I don't think you understand what Git is trying to show you here (which is not surprising, as Git documentation does a terrible job of explaining what Git is trying to do in the first place).

To get to the aha! moment, we have to start with a simple review of stuff probably already know but may have shoved into the back of your mind and forgotten about:

  • Git is all about commits, and each commit is a numbered entity, found by its big ugly random-looking hash ID;
  • each commit stores a snapshot and some metadata, and the metadata include the raw hash ID of some set of earlier commits; and
  • most commits store just one previous commit hash ID.

This makes commits form simple backwards-looking chains. Let's use simple uppercase letters as pretend hash IDs, and allocate them sequentially to make things easy for our puny human brains, and imagine we have a repository that ends with a commit with hash ID H, like this:

A <-B <-C ... <-F <-G <-H

That is, the last—and therefore latest—commit in this repository is commit H. Commit H stores both a full snapshot of every file and a backwards-pointing arrow (really, the true commit hash ID of) earlier commit G.

Using the stored snapshot in G and the stored snapshot in H, Git can compare the two snapshots. Whatever is different here, those are the files we changed; by comparing those files, Git can produce a diff, showing the particular lines we changed, or Git can just make a list of the files that we changed. That's pretty straightforward, but it does mean that to know what changed in H, Git must extract both snapshots: the one from H, but also the one from its parent G.

The git log command will do this for H, then move back one step to G. Now, to see what changed in G, Git must compare the snapshot of its parent F to the snapshot in G. That suffices for knowing what changed in G.

Now git log can step backwards yet again. This repeats as needed, until we have run all the way back to the very first commit, which—by definition—simply adds all the files it has in its snapshot. There's nothing before the root commit A, so everything is new, and now git log can stop.

Merges mess with this

That works fine for these simple linear chains, but Git's commits are not always simple linear chains. Suppose we have our simple-so-far repository, where there is only one branch named main and it ends at H, but now we make some new branch names, make some commits on these new branches, and get ready to merge them:

          I--J   <-- br1
         /
...--G--H
         
          K--L   <-- br2

Commits up through H are on all branches, while commits I-J are only on br1 and commits K-L are only on br2. Using git log at this point shows us J, then I, then H, then G, etc., following the arrows backwards from br1's latest commit; or, it shows us L, then K, then H, then G, etc., following the arrows backwards from br2's latest commit.

Git will of course find file "changes" in the usual way: compare the snapshot in L vs that in K, or K vs H, etc. Since every commit has exactly one parent commit, this works fine.

Once we merge, however, we have a problem. The merge itself works by:

  • comparing H vs J to see what changed on br1;
  • comparing H vs L to see what changed on br2; and
  • combining these changes, and applying the combined changes to the snapshot in H.

This keeps "our" changes on br1 and adds "their" changes on br2, if that's the direction we're doing the merge. Or, it keeps "our" changes on br2 and adds "their" changes on br1. Either way the result is the same (except for conflict resolutions, if any, which depend on how we choose to resolve the conflict).

We now have Git make a new merge commit, M, which has:

  • one snapshot, but
  • two parents.

It looks like this:

          I--J
         /    
...--G--H      M
             /
          K--L

I have taken the labels away because at this point we often do that: M is now the latest main commit instead, and when we add another new commit N it just extends main:

          I--J
         /    
...--G--H      M--N
             /
          K--L

N is an ordinary single parent commit as usual, so the niceness of comparing the snapshot in M vs that in N works as usual, finding the changes as usual.

Merge commit M, on the other hand, is quite thorny. How should git log show the changes? Changes, in Git, require that we look at "the" parent commit. But M does not have the parent. M has two parents, J and L. Which one should we use?

The -m flag means run two separate git diff operations, one against J, and then a second one against L. That way we'll see what changed vs J, i.e., what we brought in via K-L, and then we'll also see what changed vs L, i.e., what we brought in via I-J.

Adding --first-parent means follow just one of these lines so that at M we'll see, e.g., what happened in K-L, but then we won't look at K or L at all any more. We'll just move back to J. The effect is that Git pretends, for the duration of -m --first-parent, that the commit graph looks like this:

...--G--H--I--J--M--N

This is, more or less, literally what you asked for—but it's not what Bitbucket is doing.

Undoing the merge mess several other ways

We can, if we so choose, have git log compare M vs both J and L—i.e., make two separate git diffs—but then discard most of the results of these two diffs. Git has two different "combined diff" modes, which you can get with -c or --cc.

Unfortunately, neither one does what you want. They're also rather difficult to explain (and I still don't really know what the true difference between the two is, though they are demonstrably different: I can show some differences, but I don't know what the goals are, of the two different options).

History Simplification

The real key here though is this. Suppose there is some file F that appears in all three commits M, J, and L. Remember, this particular snippet of our picture looks like this:

       I--J
      /    
...--H      M
          /
       K--L
  • If F is the same in all three commits, it's not "interesting" in this merge. Nobody made any changes to it.
  • If F matches in J vs M, but is different in L vs M, then "something interesting" happened. The same is true if F matches in L vs M, but is different in J vs M.

What git log does in most cases here is to try to find out about the final state of the file. Why does file F look the way it does in M? But think about this: If F differs in J vs M but matches in L vs M, then anything we did to the file along the top row is irrelevant! We threw away the top-row copy of file F and kept only the bottom-row copy.

So, if you're asking git log about file F at this point, git log simply does not bother to look at commits I-J. It follows only the bottom row.

On the other hand, if F exactly matches in J-vs-M but differs in L-vs-M, git log -- F will follow only the top row, because we threw away anything that came out of the bottom row.

This is History Simplification in a nutshell. The git log command will, at merge points, throw out one "side" of the merge entirely if it can. If the file(s) we care about match one side, that's the side git log will pick. If the file(s) we care about match all sides of the merge, git log will pick one side at random, and follow that side.

This means git log never even looks at any of the files on the "other side" of the merge so you will not see any of those commits in the git log output. The program is assuming that since the merge took "one side" over the other, that's the interesting side, and everything that might show up on the other is irrelevant dross, to be discarded.

This is sometimes what you want

The reason git log does this kind of history simplification is that it assumes your goal is to know why the file looks the way it does in the latest version. Any irrelevant-dross-commits that got throw out don't matter, so let's not even look at them.

When that's what you want, that's what you want! But sometimes you want to know: I'm sure I changed this myself, where was that? or something similar. Here, you must tell git log not to do history simplification at all. The flag for this is --full-history. There are other history simplification flags, so that you can control the simplification: it is useful after all. Read through the git log<


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...