Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a database where 2 (or maybe 3 or 4) different applications are inserting information. The new information has IDs of the type GUID/UUID, but each application is using a different algorithm to generate the IDs. For example, one is using the NHibernate's "guid.comb", other is using the SQLServer's NEWID(), other might want to use .NET's Guid.NewGuid() implementation.

Is there an above normal risk of ID collision or duplicates?

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
439 views
Welcome To Ask or Share your Answers For Others

1 Answer

The risk of collisions is elevated slightly but still vanishingly small. Consider that:

  • Both Comb and NEWID/NEWSEQUENTIALID include a timestamp with precision down to a few ms?. Thus, unless you are generating a large number of IDs at the exact same moment time from all of these different sources, it is literally impossible for IDs to collide.

  • The part of the GUID that isn't based on the timestamp can be thought of as random; most GUID algorithms base these digits on a PRNG. Thus, the likelihood of a collision between these other 10 bytes or so is on the same order as if you used two separate random number generators and watched for collisions.

    Think about this for a moment - PRNGs can and do repeat numbers, so the likelihood of a collision between two of them isn't significantly higher than a collision using just one of them, even if they use slightly different algorithms. It's sort of like playing the same lottery numbers every week vs. picking a random set every week - the odds of winning are exactly the same either way.

Now, keep in mind that when you use an algorithm like Guid.Comb, you only have 10 bits of uniqueifier, which equates to 1024 separate values. So if you're generating a huge number of GUIDs within the same few milliseconds, you will get collisions. But if you generate GUIDs at a fairly low frequency, it doesn't really matter how many different algorithms you use at the same time, the likelihood of a collision is still practically nonexistent.

The best way for you to be absolutely certain is to run a test; have all 2 or 3 (or however many you use) generating GUIDs, at the same time, at regular intervals, and write them out to a log file, and see if you get collisions (and if so, how many). That should give you a good idea of how safe this is in practice.

P.S. If you're using NHibernate's comb generator to generate GUIDs for a clustered primary key, consider using NEWSEQUENTIALID() instead of NEWID() - the whole point of Comb is to avoid page splits, and you're not accomplishing that if you have other processes using non-sequential algorithms. You should also change any code using Guid.NewGuid to use the same Comb generator - the actual Comb algorithm used in NHibernate is not complicated and easy to duplicate in your own domain logic.

? Note that there seems to be some dispute about NEWID, and whether or not it contains a timestamp. In any case, since it is based on the MAC address, the range of possible values is considerably smaller than a V4 GUID or a Comb. Further reason for me to recommend sticking to Comb GUIDs outside the database and NEWSEQUENTIALID inside the database.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...