Maths in IT #3: Algebra of sets

Hey everyone and welcome back! I’ve already announced this posts subject in my previous post, algebra of sets. So algebra is difficult and scary. At least that’s what people say (hmm hmm), so let’s shake that off. I’m sorry if you got the reference 🙂

If you haven’t read my previous posts I really suggest you do so as this post will continue where the previous one left off.

  1. Maths in IT #1: Basic set theory
  2. Maths in IT #2: Venn diagrams
  3. Maths in IT #3: Algebra of sets
  4. Coming soon…

Here’s a quick cheat sheet of symbols introduced in the previous post:
Intersection: A \cap B = \{x | x \in A \land x \in B\}
Union: A \cup B = \{x | x \in A \lor x \in B\}
Universe U: The collection of all elements we consider for our scenario.
Absolute complement (or simply complement): A^c = \{x | x \in U \land x \notin A\}
Relative complement of A in B: B \cap A^c = A \setminus B

Algebra

What exactly is algebra? It’s about the study of symbols, or operations, and laws for manipulating these operations. Elementary algebra, for example, is a set of laws for operations (+, -, /, *) on numbers. Just like that \cap, \cup and A^c are operations on sets and the laws used to manipulating them is called the algebra of sets.

These laws can help us to simplify formulas. So let’s get started.

Commutative property

Consider the following addition: 2 + 3. I’m pretty sure we all know the answer is 5 and the answer doesn’t change if we switch the numbers to 3 + 2. More abstract we can say that for every two numbers x and y goes that x + y = y + x.
When the order of operands (that is the objects on which an operation is performed) in an operation doesn’t matter for the outcome of that operation we call that operation commutative.

Multiplication (on real numbers) is commutative too, because 2 \cdot 3 = 3 \cdot 2 (you may be used to seeing x or * as multiplication symbol, but you should remember \cdot from high school). Minus isn’t commutative, because 3 - 2 \neq 2 - 3 and neither is division, 2/3 \neq 3/2.

When working with sets we can say that A \cap B = B \cap A and A \cup B = B \cup A. That means both \cap and \cup are commutative.

Associative property

When the order in which the same operation is performed in a single formula is unimportant for the outcome of that formula we say that the operation is associative. For example (1 + 2) + 3 = 1 + (2 + 3). It doesn’t matter if we first evaluate 1 + 2 and then add 3 or first evaluate 2 + 3 and add it to 1, the outcome is always 6.
The same holds true for multiplication, because (2 \cdot 3) \cdot 4 = 2 \cdot (3 \cdot 4) but not for subtraction, (10 - 8) - 2 \neq 10 - (8 - 2) and division, (16/4)/2 \neq 16/(4/2).

When looking at sets we find that both \cap and \cup are associative, because (A \cap B) \cap C = A \cap (B \cap C) and (A \cup B) \cup C = A \cup (B \cup C).

That means parenthesis can be omitted, removing any confusion on operator precedence.

And with the commutative and associative properties of both intersection and union we can now simplify
C \cap (D \cap A) \cap B
to A \cap B \cap C \cap D
or C \cup (D \cup A) \cup B
to A \cup B \cup C \cup D.
And when we combine intersection and union we can simplify as well. D \cap (C \cup (B \cup A)) can be simplified to (A \cup B \cup C) \cap D. Notice that we still need parenthesis to define the order of the \cap and \cup operations.

We can now also take the intersection or union of any number of sets because we don’t have to worry about the order of operands or operators. A_1 \cap A_2 \cap A_3... gets tiresome really quick though. So suppose we have A_1 to A_{100}. The intersection or union of all those sets can now be expressed as follows:
\displaystyle\bigcup_{i=1}^{100} A_i     or     \displaystyle\bigcap_{i=1}^{100} A_i
And inline this look like \cup_{i=1}^{100} A_i and \cap_{i=1}^{100} A_i.

Sometimes we don’t know if a set has 100 sets. Especially when a set is an infinite set of sets the above notation is not feasible. We can now use index notation to intersect or union all sets within the set.
\displaystyle\bigcap_{i \in I} A_i     or     \displaystyle\bigcup_{i \in I} A_i
\cap_{i \in I} A_i and \cup_{i \in I} A_i basically means “intersect/union for every set i in set I” (think of a foreach loop on a List of Lists in C#).

Distributive property

The distributive property is difficult to describe in words. Just remember that multiplication is distributive over addition, meaning that 3 \cdot (4 + 5) = (3 \cdot 4) + (3 \cdot 5). More generally x(y + z) = xy + xz (the multiplication symbol is often omitted between letters).

\cap is distributive over \cup, meaning that
A \cap (B \cup C) = (A \cap B) \cup (A \cap C)

Likewise \cup is distributive over \cap, so
A \cup (B \cap C) = (A \cup B) \cap (A \cup C)

We can now show that (A \cap B) \cap (A \cup B) = A \cap B. When we apply the distributive property on (A \cap B) \cap (A \cup B) we get ((A \cap B) \cap A) \cup ((A \cap B) \cap B).
Here it is again, but with colors (and non-mathematical notation) so you can follow where everything went:
(A n B) n (A u B) = ((A n B) n A) u ((A n B) n B).
Applying commutativity and associativity we can now simplify further:
((A \cap B) \cap A) \cup ((A \cap B) \cap B) = (A \cap A \cap B) \cup (A \cap B \cap B).
In the previous article, Maths in IT #2: Venn diagrams, we have seen that A \cap A = A and A \cup A = A. This is called idempotence and can be applied here (twice) to simplify further:
(A \cap A \cap B) \cup (A \cap B \cap B) = (A \cap B) \cup (A \cap B) = A \cap B

Let’s repeat that:

(A \cap B) \cap (A \cup B)

Distributivity

((A \cap B) \cap A) \cup ((A \cap B) \cap B)

Associativity

(A \cap B \cap A) \cup (A \cap B \cap B)

Commutativity

(A \cap A \cap B) \cup (A \cap B \cap B)

Idempotence

(A \cap B) \cup (A \cap B)

Idempotence

A \cap B

So we could make use of distributivity, associativity, commutativity, and idempotence to make a rather difficult formula pretty easy! Unfortunately it was a lot less easy to do…

Other properties

Unfortunately we’re not done yet. There are a few more properties, some of which we’ve already encountered, that I’d like to go over.

First we have the absorption law, which states that A \cap (A \cup B) = A and A \cup (A \cap B) = A.

We’ve seen the so called null element, or empty set, \emptyset. Just like with numbers (x + 0 = x, x - 0 = x and x \cdot 0 = 0) there are some rules for working with \emptyset. Luckily they’re not that hard.
A \cap \emptyset = \emptyset and A \cup \emptyset = A.

When a universe U is defined we can say the following:
A \cap U = A and A \cup U = U.

I’ve shown you that a double complement returns the original: (A^c)^c = A.
Also, A \cap A^c = \emptyset and A \cup A^c = U.

De Morgan’s Law

Finally we’ll look at the law of De Morgan, named after its inventor, the British mathematician Augustus De Morgan.
Suppose my blog can only be read by people who have not studied maths or IT. Let A = \{x | \text{x has studied maths}\} and let B = \{x | \text{x has studied IT}\}. So now the people who may read my blog are people who haven’t studied maths or IT: (A \cup B)^c. I could also say people who haven’t studied maths and people who haven’t studied IT may read my blog: A^c \cap B^c. But that’s the same set of people!
In the following Venn diagram this is represented by the white part.

A Venn diagram of De Morgan's Law
A Venn diagram of De Morgan’s Law

So in short (A \cup B)^c = A^c \cap B^c.

Now suppose my blog may be read only by people who haven’t studied both maths and IT. So that’s (A \cap B)^c. Or we could say people who haven’t studied maths and who haven’t studied IT, A^c \cup B^c. And again this is the same set of people. In the Venn diagram it’s the part where A and B don’t overlap (so almost everything).

A Venn diagram of De Morgan's Law
A Venn diagram of De Morgan’s Law

And so that means (A \cap B)^c = A^c \cup B^c.

Now let’s take another example of how to simplify a formula. Suppose we have B \cap (A \cap B)^c. So let’s first apply De Morgan’s law: B \cap (A \cap B)^c = B \cap (A^c \cup B^c). And now you may recognize a nice pattern for distributivity! So let’s apply distributivity, B \cap (A^c \cup B^c) = (B \cap A^c) \cup (B \cap B^c). That’s nice, because we know that B \cap B^c = \emptyset. And now we know that (B \cap A^c) \cup \emptyset = B \cap A^c. And maybe you’ll remember this from my previous post, this is B \setminus A. And there you have it!

Again, let’s repeat:

B \cap (A \cap B)^c

De Morgan’s law

B \cap (A^c \cup B^c)

Distributivity

(B \cap A^c) \cup (B \cap B^c)

Complement rule

(B \cap A^c) \cup \emptyset

Null element

B \cap A^c

Relative complement

B \setminus A

And there you go! You algebra master, you!

So how will this help you in your day to day programming tasks? For most of us very little I’m afraid, except that you can work out some difficult set operations on paper and possibly simplify requirements. I’m no database developer, but I can imagine you’d need this when developing databases (and to a lesser extent when working with databases).

We did venture into the world of algebra though, and if you’re still with me I say congratulations! Algebra is a bit too abstract and difficult for many people.

I’m leaving you with a little cheat sheet of today’s lessons and I hope to see you again next time (no algebra, I promise)!

Commutativity:
A \cap B = B \cap A
A \cup B = B \cup A

Associativity:
(A \cap B) \cap C = A \cap (B \cap C)
(A \cup B) \cup C = A \cup (B \cup C)

Distributivity:
A \cap (B \cup C) = (A \cap B) \cup (A \cap C)
A \cup (B \cap C) = (A \cup B) \cap (A \cup C)

Idempotence:
A \cup A = A
A \cap A = A

Absorption:
A \cup (A \cap B) = A
A \cap (A \cup B) = A

Null element:
A \cap \emptyset = \emptyset
A \cup \emptyset = A

Universe:
A \cap U = A
A \cup U = U

Complement:
A \cap A^c = \emptyset
A \cup A^c = U

Double complement:
(A^c)^c = A

Relative complement:
A \cap B^c = A \setminus B

De Morgan’s Law:
(A \cap B)^c = A^c \cup B^c
(A \cup B)^c = A^c \cap B^c

Maths in IT #2: Venn diagrams

Hey everyone, welcome back to part two of the Maths in IT series. I got a lot of positive response, so I guess I should just keep doing what I was already doing. This post will continue where part one left off, so if you haven’t read it I suggest you do so now before continuing.

  1. Maths in IT #1: Basic set theory
  2. Maths in IT #2: Venn diagrams
  3. Maths in IT #3: Algebra of sets
  4. Coming soon…

Here’s a quick cheat sheet with symbols I’ll use in this article:
Explicit definition: A = \{a, b, c\}
Implicit definition: A = \{x | \text{ x is a letter in the alphabet}\}
a is an element of A: a \in A
a is not an element of A: a \notin A
A is a subset of B: A \subset B
Empty set: \emptyset

Venn diagrams

As promised we’ll use this post to combine sets. Before we do that let’s take a look at how to visually represent collection. We can do this using a Venn diagram. A Venn diagram is about as simple as it gets (although they can be pretty complex too). Each set is represented by a circle. The diagram can illustrate relationships between the represented sets.

Let’s look at an example. We have two collections, A = \{a, b, c\} and B = \{d, e, f\}. Let’s show them in a Venn diagram.

A Venn diagram
A Venn diagram

When two sets have no shared elements, or for every x \in A goes that x \notin B, we say the sets are disjoint.

Now suppose A \subset B (A is a subset of B). We can show this in a Venn diagram and you’ll recognize it immediately.

A Venn diagram of a subset
A Venn diagram of a subset

When A is a subset of B then B overlaps A completely.

In the next sections we’ll combine sets and use Venn diagrams to visualize what elements we’re interested in.

Intersection

So suppose we have two collections, A = \{a, b, c, d\} and B = \{c, d, e, f\}. If I asked you which elements are in both A and B you’d answer c and d.

This is called the intersection of A and B.
We can write this as A \cap B.
In this example we can say A \cap B = \{c, d\}
More formally we say that A \cap B = \{x | x \in A \text{ and } x \in B\}.
And the symbol for “and” is actually \land, so to formalize it completely:

A \cap B = \{x | x \in A \land x \in B\}

Phew, that looks a whole lot like maths! You should read that as “the intersection of A and B is the collection of every x where x is an element of A and x is an element of B.”
And here is the Venn diagram, which visualizes this nicely. The intersection is the part in the thick black line where A and B overlap.

A Venn diagram of an intersection
A Venn diagram of an intersection

With an intersection we can give a formal definition of disjoint sets. When A \cap B = \emptyset then A and B are disjoint.

Furthermore we can say that for any collection A goes that A \cap \emptyset = \emptyset.
Also A \cap A = A.
And when A \subset B then A \cap B = A (check the subset Venn diagram).

Union

The intersection of two sets is the set of elements x where x is in A and B. Likewise, the union of two sets is the set of elements that are in set A or B. So when we have A = \{a, b, c, d\} and B = \{c, d, e, f\} then the union of A and B is \{a, b, c, d, e, f\} (no doubles).

We can write a union of A and B as A \cup B.
Like \land is the symbol for “and” \lor is the symbol for “or”. So the formal definition of union is as follows:

A \cup B = \{x | x \in A \lor x \in B\}

Read that as “the union of A and B is the collection of every x where x is an element of A or x is an element of B.”
In a Venn diagram the union is basically just both sets (the part in the thick black line).

A Venn diagram of an overlapping union
A Venn diagram of an overlapping union

Unlike an intersection, a union of disjoint sets is not an empty set (notice that both sets have a thick black line).

A Venn diagram of a union
A Venn diagram of a union

Now for any collection A goes that A \cup \emptyset = A.
And, again, A \cup A = A.
Also when A \subset B then A \cup B = B (check the subset Venn diagram).

Universe and Complement

When we’re talking about sets we’re usually not talking about that set in isolation. When I tell you that non-smokers live longer you understand that they live longer compared to people that do smoke. And you also understand that non-smokers and smokers combined make up for the worlds population. In this case we’re saying that we have a set of non-smokers in a universe of all people. We denote a universe with the capital letter U.

For every set A_1, A_2, A_3... A_n goes that A_n \subset U.
That makes sense as U represents all elements we wish to consider for our case. No collection can ever contain an element that is not a part of all elements.

In a Venn diagram we draw a universe as a rectangle in which all our sets are drawn. In the following example N is the set of non-smokers and U is the universe containing all people.

A Venn diagram of a universe
A Venn diagram of a universe

Now with the notion of a universe we can say we want all elements that are not in any collection A. We call this the complement of A and we write it as A^c. In the following Venn diagram the white part represents A^c.

A Venn diagram of a complement
A Venn diagram of a complement

We can now formally define the complement:

A^c = \{ x | x \in U \land x \notin A \}

It goes without saying, but \emptyset^c = U and U^c = \emptyset.
A little less obvious is that (A^c)^c = A. It makes sense though, as we first take everything that isn’t A (the complement of A) and then we take everything that isn’t in the resulting set, but that is A. Try drawing it in a Venn diagram and you’ll see what I mean.

A complement relative to a universe is called an absolute complement. A complement can also be relative to other sets. For example, when no universe is defined and we have sets A and B then the relative complement of A in B is the set of elements of B that are not in A. This takes the form of B \cap A^c or A \setminus B.

A \setminus B = \{ x | x \in B \land x \notin A \}

In a Venn diagram A \setminus B look as follows:

Venn10

Combining sets

We can now combine sets using intersection, union and complements. For example, let’s say our universe U is all living creatures on earth. Within U we have sets M, containing all mammals, B, containing all birds, and E, containing all animals that lay eggs. Formally:
U = \{ x | \text{x is an animal} \}
M = \{ x | x \in U \land \text{x is a mammal} \}
B = \{ x | x \in U \land \text{x is a bird} \}
E = \{ x | x \in U \land \text{x lays eggs} \}
Giving the following Venn diagram we can already draw some conclusions.

A Venn diagram with multiple sets
A Venn diagram with multiple sets

We can see that all birds lay eggs. Some mammals lay eggs too. No animal is both a bird and a mammal. As you see Venn diagrams can be really useful.
Now suppose we want the set of all mammals that lay eggs and also all birds. This is the collection (M \cap E) \cup B. That is, we take the intersection of M and E and union the result with B. In a Venn diagram we can see this collection (the red colored parts).

A Venn diagram with multiple sets
A Venn diagram with multiple sets

Now we can get every combination of sets using intersection, union and complement. It’s not always easy, but it’s possible.

Some code

As promised I’m going to keep things practical. So what’s the practical use of all this? Well, most languages let you work with exactly these functions!

For example, take a look at this SQL expression.

(SELECT 1
UNION SELECT 2
UNION SELECT 3)
INTERSECT
(SELECT 3
UNION SELECT 4)

What will that return? It returns only 3 because 3 is an element of both collections. This example also shows the UNION operator. Basically this is the set (\{1\} \cup \{2\} \cup \{3\}) \cap (\{3\} \cup \{4\}).

And SQL knows complement too.

(SELECT 1
UNION SELECT 2
UNION SELECT 3)
EXCEPT
(SELECT 3
UNION SELECT 4)

What happens here is that we have (an implied) universe U = \{1, 2, 3, 4\} and EXCEPT is the complement. So this query is basically the formula \{3, 4\}^c.
We could also say that U is not defined and this is the relative complement of \{3, 4\} in \{1, 2, 3\}: \{3, 4\} \setminus \{1, 2, 3\} = \{1, 2\}.

But what about C#? In my previous post we’ve seen HashSet<T>, but I’m not going to use that now. Instead I’m just going with a List<T> as LINQ provides various extension methods for working with sets. Notice that HashSet<T> has its own methods in addition to the LINQ methods.

List<int> a = new List<int>();
a.Add(1);
a.Add(2);
a.Add(3);

List<int> b = new List<int>();
b.Add(3);
b.Add(4);

List<int> intersect = a.Intersect(b).ToList();
List<int> union = a.Union(b).ToList();
List<int> except = a.Except(b).ToList();

You may wonder what that looks like in Haskell (since I’ve shown you Haskell the last time too), but it’s not really that different.

Prelude> import Data.List
Prelude Data.List> [1, 2, 3] `intersect` [3, 4]
[3]
Prelude Data.List> [1, 2, 3] `union` [3, 4]
[1,2,3,4]
Prelude Data.List> [1, 2, 3] \\ [3, 4]
[1,2]

I’ve actually needed this stuff in my day to day work. It’s not that hard and sometimes it’s an explicit business requirement. For example I needed all sales orders from The Netherlands, Belgium and Luxembourg (the BeNeLux). That’s really just a union! And I’ve needed intersect too, give me all Dutch customers that are not in some list of customers. And how about all non-Dutch customers? That’s just the complement!

Next time I’d like to continue with algebra and algebra of sets in specific. Does that sound like it will give you nightmares? Don’t worry, I’ll be gentle!

See you next time!

Maths in IT #1: Basic set theory

Welcome back everyone. Today I have a little something different for you. No new language or framework, but something that’s been around for millennia: maths.
When asking programmers about maths you’ll find two kinds of people, those who say you don’t need maths to be a good programmer and those who say maths is essential. Personally I think both are true. For some applications and industries you really don’t need advanced maths, but go into robotics, machine learning, statistics, or that kind of thing and you’re going to need maths, lots of it. And whether you need it or not, computers, programming languages and databases all wouldn’t exist without maths.
For now, let’s put it this way: knowing a thing or two about maths gives you an edge as a programmer!

Maths is everywhere: physics, chemistry, biology, economy and, yes, even in the arts! For this series I’ll focus on the maths we need in IT. I don’t know how much entries it’s going to have or what I’ll be discussing (and what I won’t be discussing), but I’ll be sure to keep it somewhat practical. No prior maths knowledge is assumed. Don’t worry, I’ll still post about code once in a while too!

  1. Maths in IT #1: Basic set theory
  2. Maths in IT #2: Venn diagrams
  3. Maths in IT #3: Algebra of sets
  4. Coming soon…

Collections

Do I need to tell you why we should study collections? You probably use them in your code every day in the form of arrays, database tables, lists or hash tables. What you probably didn’t know is that collections have a lot of mathematical theory! Since collections are very important in both maths as in programming I’m going to start this series here. More specifically we’re going to talk about sets.

A set is an unordered collection containing only distinct values.

Let’s look at an example of a set in real life. Do you collect anything? Maybe you collect old records, each record in your collection can be uniquely identified and doubles are for trading or selling. Also, no matter in which order you put the records on the shelf, it’s still the same collection of records. So a record collection is really a set.

Likewise we can have a collection of paintings or stamps. Another collection is an alphabet, for example the English, Russian or Greek alphabet. Using these alphabets we can construct a language. We have natural languages (like the ones I just mentioned) and formal languages. Programming languages like C#, Java, C or Haskell are examples of formal languages.

A collection in maths is usually indicated by a single suggestive capital letter, such as R for records, S for stamps or A for alphabet. If we have more than one collection we can use index notations to uniquely identify them: A_{1}, A_{2}, A_{3}

The notation for a single collection containing elements a, b and c is \{a, b, c\}. This is called the explicit definition. We can now declare a collection A as A = \{a, b, c\}.
And of course we can have a collection of collections: C = \{\{a, b\}, \{c, d\}, \{a, c\}\} . Notice that collection \{a, c\} is unique even though a and c are already elements in other collections.
The following set of sets isn’t valid: \{\{a, b\}, \{b, a\}\}. Because the order of elements in sets is ignored this set contains the same set twice, but sets also must have distinct values.
Now let’s say we have English alphabet E, Russian alphabet R and Greek alphabet G. The collection of alphabets is written as \{E, R, G\}.

Now we want to indicate that a is an elements of E. We do this using the following syntax: a \in E.
To indicate that \lambda (the Greek letter lambda) is not an element of E we use notation: \lambda \notin E.

We can compare collections. Collections are considered to be equal when they contain exactly the same elements, no more and no less. \{a, b, c\} = \{a, b, c\}, \{a, b, c\} = \{c, b, a\} (remember, order is ignored), \{a, b, c\} \neq \{d, e, f\} and \{a, b, c\} \neq \{a, \{b\}, c\} (the collection \{b\} does not equal b).

So far we’ve seen only explicitly defined collections. We can also implicitly define collections. This is especially useful when a collection contains too many elements to write down. For example a collection containing all countries on Earth. The notation for such a collection is as follows: \{x | \text{x is a country on Earth}\}. You should read that as “the collection consisting of all (objects) x for which x is a country on Earth”.
Generally we can say that an implicitly defined collection is in the form of \{x | P(x)\} where, in this example, P(x) is the statement that x is a country on Earth. Actually P(x) is a function taking parameter x and returning whether x is or isn’t a part of the set. We’ll look at functions in a later blog post.

To indicate how many elements are in any given (finite) collection we can use notation |A|. So |\{a, b, c\}| = 3 and |E| = 26 (where E is the English alphabet). Of course this is only possible when our collection is finite (we can count the elements).

If a collection is empty (it contains no elements) or some collection A = \{\} we can use a special symbol \emptyset. And of course |\emptyset| = 0 (an empty set has 0 elements).

When we allow the same element to appear in a collection more than once and we start taking the order of elements into consideration we’re speaking of a row. The syntax for a row is simply putting the elements next to each other. For example, using the set \{a, b, c\} we can make the rows a, ababc, baac, caab, but not baad (because d is not in the set).

Infinite collections

So far we’ve looked at finite collections.  Let’s look at some infinite collections now. Consider the collection of all numbers: 1, 2, 3, 4… In theory we can always add 1 to any number, so we can never stop counting. A few of these collections are so important that they got their own symbol.
First we have the natural numbers: \mathbb{N} = \{0, 1, 2, 3, ...\}.
If we add negatives to the collection we get all whole numbers, or integers: \mathbb{Z} = \{..., -3, -2, -1, 0, 1, 2, 3, ...\}.
If we also allow fractions, like \frac{1}{2} (a half) or \frac{1}{4} (a quarter) then we have the collection of rational numbers \mathbb{Q}.
Not all numbers can be expressed in fractions, for example pi (\pi, the surface of a circle with radius 1) or \sqrt{2}. When we want to include those numbers we get the collection of real numbers \mathbb{R}.

Now suppose we want all positive natural numbers, so 1, 2, 3… (excluding 0). We can indicate this with \mathbb{N}^+. Likewise, all negative numbers -1, -2, -3… can be indicated using \mathbb{Z}^-. And of course we can use \mathbb{Q}^+, \mathbb{Q}^-, \mathbb{R}^+ and \mathbb{R}^- to indicate positive or negative fractions and real numbers too.

And we can now define new infinite collections using implicit definitions. The collection of all even numbers, for example, can be defined as follows: E = \{ x | x \in \mathbb{Z} \text{ and x is even}\}.
So here E is a collection consisting of all x for which x is an element in \mathbb{Z} (all integers) and x is even.

Subsets

If a collection A contains elements that are all elements of another collection B we say that A is a subset of B. For example A = \{a, b, c\} is a subset of B = \{x | \text{x is a letter in the English alphabet}\} because a, b and c are all letters in the English alphabet.

More formally we can say that if A and B are both collections and for every x \in A applies x \in B then A is a subset of B.

We can write this as A \subset B. Since, in the previous example, A does not contain all letters in the English alphabet B we can also say that B is not a subset of A, this is written as B \not\subset A.

Given the above definition we can also say that A \subset A or A is a subset of itself. The empty collection \emptyset is a subset of all collections (including itself).

When a collection A \subset B, but A \neq B then we say that A is a proper subset of B. We may write this as A \nsubseteq B.
Sometimes you may see the notation A \subseteq B to indicate that A is a subset of B that may or may not be equal to B.
I’ll simply use A \subset B throughout the series).

When A \subset B and B \subset A then A = B.

In a similar manner we can say that B is a superset of A if A is a subset of B. This is notated as B \supset A (it’s the subset-symbol reversed). And of course we can say B \supseteq A to indicate that B is a superset of A that may or may not be equal to A. For a proper superset we may use A \nsupseteq B notation.

And when A \supset B and B \supset A then A = B.

Some code

I promised I’d keep this series somewhat practical. So let’s look at some code. Consider the following C# sample using your favorite .NET collection class.

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(1);
int thirdItem = list[2]; // 0-based index.

The list now contains the items 1, 2, 3 and 1 again.  We can also get the item at the nth position, which is only useful when we know the order of the elements within the list. Based on that we can conclude that the List class in .NET is not a set. If we want a set in .NET we can use the HashSet class instead.

HashSet<int> set = new HashSet<int>();
set.Add(1);
set.Add(2);
set.Add(3);
if (!set.Add(1))
{
    Console.WriteLine("Item 1 couldn't be added.");
}
//int secondItem = set[1]; // Doesn't compile.

Because a set only contains unique items a hash of each item can be generated which means that lookup time for sets is much faster than that of lists (especially when the size of the list grows). The HashSet<T> can be compared to the Dictionary<TKey, TValue> class, but without values.

Unfortunately C# doesn’t know list generators. Working with infinite collections isn’t very do-able in C# either. Let’s say you’d like to get all positive even numbers, E^+ = \{ x | x \in \mathbb{N}^+ \text{ and x is even}\}. We have a few problems. First \mathbb{N} isn’t available in C#. We could take Int32.MaxValue (but that threw an OutOfMemoryException). So let’s just take all positive evens smaller than or equal to a million.

IEnumerable evens = from x in Enumerable.Range(1, 1000000)
                         where x % 2 == 0
                         select x;
List evaluatedEvens = evens.ToList();

Something like that is really the closest we can get in C# without going through a lot of trouble.

And at this point I stand corrected. Paulo Zemek pointed out that it’s actually pretty easy to work with infinite collections by utilizing the yield keyword. The next (Console Application) example illustrates this (although int will eventually overflow…).

static void Main(string[] args)
{
    foreach (int i in Evens().Take(10))
    {
        Console.WriteLine(i);
    }
    Console.ReadKey();
}

static IEnumerable Evens()
{
    return XToInfinity(1).Where(i => i % 2 == 0);
}

static IEnumerable XToInfinity(int start)
{
    int current = start;
    while (true)
    {
        yield return current;
        current++;
    }
}

That’s still a lot of typing though…

Let’s take another language that solves these kinds of problems a little better, a language that is a little closer to actual maths, Haskell.

evens = [ x | x <- [1..], x `mod` 2 == 0]

And there you have it. Notice that the code is actually pretty close to the mathematical notation. Take each x where x is an element of [1..] (one to infinity) and where x is even.
We can easily take the first 10 items of this infinite collection without crashing the program or encountering out of memory exceptions.

*Main> take 10 evens
[2,4,6,8,10,12,14,16,18,20]

This isn’t a blog on Haskell, so I can’t really expand on what it does, but I can say that Haskell is a “lazy” language, which means it doesn’t evaluate the items in the list until you actually need them. In this case it will only evaluate the first ten items of evens, keeping it from hogging our memory and CPU.

That’s it for now. We’ve had a very basic introduction to sets, but if you’re not familiar with this stuff it gets hard pretty quickly. So let’s take it nice and easy. Next time we’re going to combine collections and take a look at the Venn diagram.

Hope to see you next time!