Note: The following SO questions are related, but neither they nor the linked resources seem to fully answer my questions, particularly in relation to implementing equality tests for collections of objects.
- Best practices for overriding -isEqual: and -hash
- Techniques for implementing -hash on mutable Cocoa objects
Background
NSObject provides default implementations of -hash
(which returns the address of the instance, like (NSUInteger)self
) and -isEqual:
(which returns NO
unless the addresses of the receiver and the parameter are identical). These methods are designed to be overridden as necessary, but the documentation makes it clear that you should provide both or neither. Further, if -isEqual:
returns YES
for two objects, then the result of -hash
for those objects must be the same. If not, problems can ensue when objects that should be the same — such as two string instances for which -compare:
returns NSOrderedSame
— are added to a Cocoa collection or compared directly.
Context
I develop CHDataStructures.framework, an open-source library of Objective-C data structures. I have implemented a number of collections, and am currently refining and enhancing their functionality. One of the features I want to add is the ability to compare collections for equality with another.
Rather than comparing only memory addresses, these comparisons should consider the objects present in the two collections (including ordering, if applicable). This approach has quite a precedent in Cocoa, and generally uses a separate method, including the following:
-[NSArray isEqualToArray:]
-[NSDate isEqualToDate:]
-[NSDictionary isEqualToDictionary:]
-[NSNumber isEqualToNumber:]
-[NSSet isEqualToSet:]
-[NSString isEqualToString:]
-[NSValue isEqualToValue:]
I want to make my custom collections robust to tests of equality, so they may safely (and predictably) be added to other collections, and allow others (like an NSSet) to determine whether two collections are equal/equivalent/duplicates.
Problems
An -isEqualTo...:
method works great on its own, but classes which define these methods usually also override -isEqual:
to invoke [self isEqualTo...:]
if the parameter is of the same class (or perhaps subclass) as the receiver, or [super isEqual:]
otherwise. This means the class must also define -hash
such that it will return the same value for disparate instances that have the same contents.
In addition, Apple's documentation for -hash
stipulates the following: (emphasis mine)
"If a mutable object is added to a collection that uses hash values to determine the object's position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object's internal state information or you must make sure the object's internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)"
Edit: I definitely understand why this is necessary and totally agree with the reasoning — I mentioned it here to provide additional context, and skirted the topic of why it's the case for the sake of brevity.
All of my collections are mutable, and the hash will have to consider at least some of the contents, so the only option here is to consider it a programming error to mutate a collection stored in another collection. (My collections all adopt NSCopying, so collections like NSDictionary can successfully make a copy to use as a key, etc.)
It makes sense for me to implement -isEqual:
and -hash
, since (for example) an indirect user of one of my classes may not know the specific -isEqualTo...:
method to call, or even care whether two objects are instances of the same class. They should be able to call -isEqual:
or -hash
on any variable of type id
and get the expected result.
Unlike -isEqual:
(which has access to two instances being compared), -hash
must return a result "blindly", with access only to the data within a particular instance. Since it can't know what the hash is being used for, the result must be consistent for all possible instances that should be considered equal/identical, and must always agree with -isEqual:
. (Edit: This has been debunked by the answers below, and it certainly makes life easier.) Further, writing good hash functions is non-trivial — guaranteeing uniqueness is a challenge, especially when you only have an NSUInteger (32/64 bits) in which to represent it.
Questions
- Are there best practices when implementing equality comparisons
-hash
for collections? - Are there any peculiarities to plan for in Objective-C and Cocoa-esque collections?
- Are there any good approaches for unit testing
-hash
with a reasonable degree of confidence? - Any suggestions on implementing
-hash
to agree with-isEqual:
for collections containing elements of arbitrary types? What pitfalls should I know about? (Edit: Not as problematic as I first thought — as @kperryua points out, "equal-hash
values do not imply-isEqual:
".)
Edit: I should have clarified that I'm not confused about how to implement -isEqual: or -isEqualTo...: for collections, that's straightforward. I think my confusion stemmed mainly from (mistakenly) thinking that -hash MUST return a different value if -isEqual: returns NO. Having done cryptography in the past, I was thinking that hashes for different values MUST be different. However, the answers below made me realize that a "good" hash function is really about minimizing bucket collisions and chaining for collections that use -hash
. While unique hashes are preferable, they are not a strict requirement.