marinbenc

Thinking from Scratch with Sets in Swift

Habits are actions we do without spending too much time thinking about them. When you have done something a bunch of times, you can do it faster and without much thought. This is what makes you a better, more efficient and more disciplined programmer.

But sometimes habits narrow your field of vision. We try to fit everything into our habits, and arrive at a more convoluted solution than someone starting from scratch.

I ran into one of these cases recently.

Friends

Let’s say your application has a Friends feature. This is a list of people that the user can interact with. It holds the persons name and email. The app features friends import from two different sources: the user’s phone and Facebook.

As you are probably aware, a big problem with importing contacts is duplication. I’m sure you have experienced the case where the same Mary Smith was two different entries in your Contacts app. I want to avoid this issue.

I first started with having multiple arrays of friends for each source.

struct Friend {
  let name: String
  let email: String
}

let facebookFriends: [Friend]
let phoneFriends: [Friend]

When I wanted to create the actual friends array, I realised that solving this with arrays is pretty tough. I would have to filter out all of the duplicates in the arrays, and then merge them together. Sounds pretty messy.

Not only that, but think of the complexity of that algorithm. For each new person, it has to go trough all other people to check if it already exists. This means that the time it takes for this algorithm to complete grows exponentially with the number of contacts! (In big-O notation, this is O(N^2))

Thinking From Scratch

Had I not had thousands of cases where I used Arrays, I would have never thought an Array was a good candidate for this. There are other collection types in Swift , and obviously the contacts are best represented with a Set.

Sets are unordered collections of unique elements, so it’s impossible to have two same values in a Set. This is exactly what we need, we’ll let Swift handle checking if an element already exists in a Set.

Because elements in a Set have to be unique, they need a way to identify themselves as different than other values. This is why elements need to implement the Hashable protocol.

Hashable is anything that can be represented via a “hash value”. A hash value is an integer that Swift will use to compare your two values. So John Doe with email john.doe@mail.com, and another John Doe with the same email have the same hash value. On the other hand, John Doe and Anna Smith have different hash values.

In other words, two values that are equal are guaranteed to have the same hash value. However, the reverse is not true. Two different objects can have the same hash value.

A lot of Swift structs in the standard library already conform to Hashable, including String, different number types and Bool. We can leverage those existing implementations to create a hash value for our own types.

The easiest way to do this would be to somehow combine the hash values of our properties into a single integer. You can combine them any way you like, but using the binary XOR operator will produce the most unique integers, thus leading to less collisions.

struct Friend: Hashable {
  let name: String
  let email: String

  var hashValue: Int {
    return name.hashValue ^ email.hashValue
  }
}

Friend(name: "Adam", email: "adam@gmail.com").hashValue //-5497483150216078145
Friend(name: "Sarah", email: "sarah@gmail.com").hashValue //6542838826496608417
Friend(name: "Adam", email: "adam@gmail.com").hashValue //-5497483150216078145

Note: The XOR operator might not be the most efficient way to implement Hashable in this case. Because A ^ B = B ^ A , collisions might occur. Another implementation to consider is just concatenating the two strings and getting the hashValue of that.

You can see here that the two instances of Adam friends have the same hashValue, but Sarah has a different one.

When Swift is inserting an item into a set, it will first look at all of the existing items in the set that have the same hash value as the item you want to add. It will then go trough all of those items and actually check if they are really equal.

Because Swift is making a lookup based on the hash value, the algorithm for inserting an item into a set takes a constant amount of time (at least on average), regardless of the number of elements in our contacts.

In order for Swift to perform the actual equality check, you also need to add the == function for your type, so the compiler can compare your two elements.

struct Friend: Hashable, Equatable {
  
  //...

  static func ==(lhs: Friend, rhs: Friend)-> Bool {
    return lhs.name == rhs.name && lhs.email == rhs.email
  }
}

Thankfully, implementing == is easy, we’ll just make sure all of our properties are equal on both sides of the operator.

Another advantage of sets is that you can perform set operations like union, intersection etc. We can use union to easily merge multiple friend sources, and make sure we don’t repeat a single friend.

let facebookFriends: Set<Friend> = [
  Friend(name: "Adam", email: "adam@gmail.com"),
  Friend(name: "Sarah", email: "sarah@gmail.com"),
  Friend(name: "Sam", email: "sam@gmail.com")
]

let phoneFriends: Set<Friend> = [
  Friend(name: "Anna", email: "anna@gmail.com"),
  Friend(name: "Sarah", email: "sarah@gmail.com"),
  Friend(name: "Jane", email: "jane@gmail.com")
]

let friends = facebookFriends.union(phoneFriends)

print(friends)
//[
//  Friend(name: "Jane", email: "jane@gmail.com"), 
//  Friend(name: "Adam", email: "adam@gmail.com"), 
//  Friend(name: "Anna", email: "anna@gmail.com"), 
//  Friend(name: "Sarah", email: "sarah@gmail.com"), 
//  Friend(name: "Sam", email: "sam@gmail.com")
//]

Just keep in mind that sets are unordered, so if you need to display them in a specific order make sure to sort them first.


I try to pay attention to the complexity of my code. Whenever things start to get messy, it’s a code smell. It means that you are probably approaching the problem from the wrong perspective. If you want to find better solutions to a problem, you need to approach it like you’ve never solved it before.