Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I read about that Git uses SHA-1 digest as an ID for a revision. Why does it not use a more modern version of SHA?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
849 views
Welcome To Ask or Share your Answers For Others

1 Answer

Why does it not use a more modern version of SHA?

Dec. 2017: It will. And Git 2.16 (Q1 2018) is the first release to illustrate and implement that intent.

Note: see Git 2.19 below: it will be SHA-256.

Git 2.16 will propose an infrastructure to define what hash function is used in Git, and will start an effort to plumb that throughout various codepaths.

See commit c250e02 (28 Nov 2017) by Ramsay Jones (``).
See commit eb0ccfd, commit 78a6766, commit f50e766, commit abade65 (12 Nov 2017) by brian m. carlson (bk2204).
(Merged by Junio C Hamano -- gitster -- in commit 721cc43, 13 Dec 2017)


Add structure representing hash algorithm

Since in the future we want to support an additional hash algorithm, add a structure that represents a hash algorithm and all the data that must go along with it.
Add a constant to allow easy enumeration of hash algorithms.
Implement function typedefs to create an abstract API that can be used by any hash algorithm, and wrappers for the existing SHA1 functions that conform to this API.

Expose a value for hex size as well as binary size.
While one will always be twice the other, the two values are both used extremely commonly throughout the codebase and providing both leads to improved readability.

Don't include an entry in the hash algorithm structure for the null object ID.
As this value is all zeros, any suitably sized all-zero object ID can be used, and there's no need to store a given one on a per-hash basis.

The current hash function transition plan envisions a time when we will accept input from the user that might be in SHA-1 or in the NewHash format.
Since we cannot know which the user has provided, add a constant representing the unknown algorithm to allow us to indicate that we must look the correct value up.


Integrate hash algorithm support with repo setup

In future versions of Git, we plan to support an additional hash algorithm.
Integrate the enumeration of hash algorithms with repository setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in read_repository_format.
In the future, we'll enumerate this value from the configuration.

Add a constant, the_hash_algo, which points to the hash_algo structure pointer in the repository global.
Note that this is the hash which is used to serialize data to disk, not the hash which is used to display items to the user.
The transition plan anticipates that these may be different.
We can add an additional element in the future (say, ui_hash_algo) to provide for this case.


Update August 2018, for Git 2.19 (Q3 2018), Git seems to pick SHA-256 as NewHash.

See commit 0ed8d8d (04 Aug 2018) by Jonathan Nieder (artagnon).
See commit 13f5e09 (25 Jul 2018) by ?var Arnfj?re Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 34f2297, 20 Aug 2018)

doc hash-function-transition: pick SHA-256 as NewHash

From a security perspective, it seems that SHA-256, BLAKE2, SHA3-256, K12, and so on are all believed to have similar security properties.
All are good options from a security point of view.

SHA-256 has a number of advantages:

  • It has been around for a while, is widely used, and is supported by just about every single crypto library (OpenSSL, mbedTLS, CryptoNG, SecureTransport, etc).

  • When you compare against SHA1DC, most vectorized SHA-256 implementations are indeed faster, even without acceleration.

  • If we're doing signatures with OpenPGP (or even, I suppose, CMS), we're going to be using SHA-2, so it doesn't make sense to have our security depend on two separate algorithms when either one of them alone could break the security when we could just depend on one.

So SHA-256 it is.
Update the hash-function-transition design doc to say so.

After this patch, there are no remaining instances of the string "NewHash", except for an unrelated use from 2008 as a variable name in t/t9700/test.pl.


You can see this transition to SHA 256 in progress with Git 2.20 (Q4 2018):

See commit 0d7c419, commit dda6346, commit eccb5a5, commit 93eb00f, commit d8a3a69, commit fbd0e37, commit f690b6b, commit 49d1660, commit 268babd, commit fa13080, commit 7b5e614, commit 58ce21b, commit 2f0c9e9, commit 825544a (15 Oct 2018) by brian m. carlson (bk2204).
See commit 6afedba (15 Oct 2018) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit d829d49, 30 Oct 2018)

replace hard-coded constants

Replace several 40-based constants with references to GIT_MAX_HEXSZ or the_hash_algo, as appropriate.
Convert all uses of the GIT_SHA1_HEXSZ to use the_hash_algo so that they are appropriate for any given hash length.
Instead of using a hard-coded constant for the size of a hex object ID, switch to use the computed pointer from parse_oid_hex that points after the parsed object ID.

GIT_SHA1_HEXSZ is further remove/replaced with Git 2.22 (Q2 2019) and commit d4e568b.


That transition continues with Git 2.21 (Q1 2019), which adds sha-256 hash and plug it through the code to allow building Git with the "NewHash".

See commit 4b4e291, commit 27dc04c, commit 13eeedb, commit c166599, commit 37649b7, commit a2ce0a7, commit 50c817e, commit 9a3a0ff, commit 0dab712, commit 47edb64 (14 Nov 2018), and commit 2f90b9d, <


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...