Hey everyone, welcome back to part two of the Maths in IT series. I got a lot of positive response, so I guess I should just keep doing what I was already doing. This post will continue where part one left off, so if you haven’t read it I suggest you do so now before continuing.
- Maths in IT #1: Basic set theory
- Maths in IT #2: Venn diagrams
- Maths in IT #3: Algebra of sets
- Coming soon…
Here’s a quick cheat sheet with symbols I’ll use in this article:
a is an element of A:
a is not an element of A:
A is a subset of B:
As promised we’ll use this post to combine sets. Before we do that let’s take a look at how to visually represent collection. We can do this using a Venn diagram. A Venn diagram is about as simple as it gets (although they can be pretty complex too). Each set is represented by a circle. The diagram can illustrate relationships between the represented sets.
Let’s look at an example. We have two collections, and . Let’s show them in a Venn diagram.
When two sets have no shared elements, or for every goes that , we say the sets are disjoint.
Now suppose (A is a subset of B). We can show this in a Venn diagram and you’ll recognize it immediately.
When A is a subset of B then B overlaps A completely.
In the next sections we’ll combine sets and use Venn diagrams to visualize what elements we’re interested in.
So suppose we have two collections, and . If I asked you which elements are in both A and B you’d answer c and d.
This is called the intersection of A and B.
We can write this as .
In this example we can say
More formally we say that .
And the symbol for “and” is actually , so to formalize it completely:
Phew, that looks a whole lot like maths! You should read that as “the intersection of A and B is the collection of every x where x is an element of A and x is an element of B.”
And here is the Venn diagram, which visualizes this nicely. The intersection is the part in the thick black line where A and B overlap.
With an intersection we can give a formal definition of disjoint sets. When then A and B are disjoint.
Furthermore we can say that for any collection A goes that .
And when then (check the subset Venn diagram).
The intersection of two sets is the set of elements x where x is in A and B. Likewise, the union of two sets is the set of elements that are in set A or B. So when we have and then the union of A and B is (no doubles).
We can write a union of A and B as .
Like is the symbol for “and” is the symbol for “or”. So the formal definition of union is as follows:
Read that as “the union of A and B is the collection of every x where x is an element of A or x is an element of B.”
In a Venn diagram the union is basically just both sets (the part in the thick black line).
Unlike an intersection, a union of disjoint sets is not an empty set (notice that both sets have a thick black line).
Now for any collection A goes that .
And, again, .
Also when then (check the subset Venn diagram).
Universe and Complement
When we’re talking about sets we’re usually not talking about that set in isolation. When I tell you that non-smokers live longer you understand that they live longer compared to people that do smoke. And you also understand that non-smokers and smokers combined make up for the worlds population. In this case we’re saying that we have a set of non-smokers in a universe of all people. We denote a universe with the capital letter U.
For every set goes that .
That makes sense as U represents all elements we wish to consider for our case. No collection can ever contain an element that is not a part of all elements.
In a Venn diagram we draw a universe as a rectangle in which all our sets are drawn. In the following example N is the set of non-smokers and U is the universe containing all people.
Now with the notion of a universe we can say we want all elements that are not in any collection A. We call this the complement of A and we write it as . In the following Venn diagram the white part represents .
We can now formally define the complement:
It goes without saying, but and .
A little less obvious is that . It makes sense though, as we first take everything that isn’t A (the complement of A) and then we take everything that isn’t in the resulting set, but that is A. Try drawing it in a Venn diagram and you’ll see what I mean.
A complement relative to a universe is called an absolute complement. A complement can also be relative to other sets. For example, when no universe is defined and we have sets A and B then the relative complement of A in B is the set of elements of B that are not in A. This takes the form of or .
In a Venn diagram look as follows:
We can now combine sets using intersection, union and complements. For example, let’s say our universe U is all living creatures on earth. Within U we have sets M, containing all mammals, B, containing all birds, and E, containing all animals that lay eggs. Formally:
Giving the following Venn diagram we can already draw some conclusions.
We can see that all birds lay eggs. Some mammals lay eggs too. No animal is both a bird and a mammal. As you see Venn diagrams can be really useful.
Now suppose we want the set of all mammals that lay eggs and also all birds. This is the collection . That is, we take the intersection of M and E and union the result with B. In a Venn diagram we can see this collection (the red colored parts).
Now we can get every combination of sets using intersection, union and complement. It’s not always easy, but it’s possible.
As promised I’m going to keep things practical. So what’s the practical use of all this? Well, most languages let you work with exactly these functions!
For example, take a look at this SQL expression.
(SELECT 1 UNION SELECT 2 UNION SELECT 3) INTERSECT (SELECT 3 UNION SELECT 4)
What will that return? It returns only 3 because 3 is an element of both collections. This example also shows the UNION operator. Basically this is the set .
And SQL knows complement too.
(SELECT 1 UNION SELECT 2 UNION SELECT 3) EXCEPT (SELECT 3 UNION SELECT 4)
What happens here is that we have (an implied) universe and EXCEPT is the complement. So this query is basically the formula .
We could also say that U is not defined and this is the relative complement of in : .
But what about C#? In my previous post we’ve seen HashSet<T>, but I’m not going to use that now. Instead I’m just going with a List<T> as LINQ provides various extension methods for working with sets. Notice that HashSet<T> has its own methods in addition to the LINQ methods.
List<int> a = new List<int>(); a.Add(1); a.Add(2); a.Add(3); List<int> b = new List<int>(); b.Add(3); b.Add(4); List<int> intersect = a.Intersect(b).ToList(); List<int> union = a.Union(b).ToList(); List<int> except = a.Except(b).ToList();
You may wonder what that looks like in Haskell (since I’ve shown you Haskell the last time too), but it’s not really that different.
Prelude> import Data.List Prelude Data.List> [1, 2, 3] `intersect` [3, 4]  Prelude Data.List> [1, 2, 3] `union` [3, 4] [1,2,3,4] Prelude Data.List> [1, 2, 3] \\ [3, 4] [1,2]
I’ve actually needed this stuff in my day to day work. It’s not that hard and sometimes it’s an explicit business requirement. For example I needed all sales orders from The Netherlands, Belgium and Luxembourg (the BeNeLux). That’s really just a union! And I’ve needed intersect too, give me all Dutch customers that are not in some list of customers. And how about all non-Dutch customers? That’s just the complement!
Next time I’d like to continue with algebra and algebra of sets in specific. Does that sound like it will give you nightmares? Don’t worry, I’ll be gentle!
See you next time!