Maths in IT #1: Basic set theory

Welcome back everyone. Today I have a little something different for you. No new language or framework, but something that’s been around for millennia: maths.
When asking programmers about maths you’ll find two kinds of people, those who say you don’t need maths to be a good programmer and those who say maths is essential. Personally I think both are true. For some applications and industries you really don’t need advanced maths, but go into robotics, machine learning, statistics, or that kind of thing and you’re going to need maths, lots of it. And whether you need it or not, computers, programming languages and databases all wouldn’t exist without maths.
For now, let’s put it this way: knowing a thing or two about maths gives you an edge as a programmer!

Maths is everywhere: physics, chemistry, biology, economy and, yes, even in the arts! For this series I’ll focus on the maths we need in IT. I don’t know how much entries it’s going to have or what I’ll be discussing (and what I won’t be discussing), but I’ll be sure to keep it somewhat practical. No prior maths knowledge is assumed. Don’t worry, I’ll still post about code once in a while too!

  1. Maths in IT #1: Basic set theory
  2. Maths in IT #2: Venn diagrams
  3. Maths in IT #3: Algebra of sets
  4. Coming soon…


Do I need to tell you why we should study collections? You probably use them in your code every day in the form of arrays, database tables, lists or hash tables. What you probably didn’t know is that collections have a lot of mathematical theory! Since collections are very important in both maths as in programming I’m going to start this series here. More specifically we’re going to talk about sets.

A set is an unordered collection containing only distinct values.

Let’s look at an example of a set in real life. Do you collect anything? Maybe you collect old records, each record in your collection can be uniquely identified and doubles are for trading or selling. Also, no matter in which order you put the records on the shelf, it’s still the same collection of records. So a record collection is really a set.

Likewise we can have a collection of paintings or stamps. Another collection is an alphabet, for example the English, Russian or Greek alphabet. Using these alphabets we can construct a language. We have natural languages (like the ones I just mentioned) and formal languages. Programming languages like C#, Java, C or Haskell are examples of formal languages.

A collection in maths is usually indicated by a single suggestive capital letter, such as R for records, S for stamps or A for alphabet. If we have more than one collection we can use index notations to uniquely identify them: A_{1}, A_{2}, A_{3}

The notation for a single collection containing elements a, b and c is \{a, b, c\}. This is called the explicit definition. We can now declare a collection A as A = \{a, b, c\}.
And of course we can have a collection of collections: C = \{\{a, b\}, \{c, d\}, \{a, c\}\} . Notice that collection \{a, c\} is unique even though a and c are already elements in other collections.
The following set of sets isn’t valid: \{\{a, b\}, \{b, a\}\}. Because the order of elements in sets is ignored this set contains the same set twice, but sets also must have distinct values.
Now let’s say we have English alphabet E, Russian alphabet R and Greek alphabet G. The collection of alphabets is written as \{E, R, G\}.

Now we want to indicate that a is an elements of E. We do this using the following syntax: a \in E.
To indicate that \lambda (the Greek letter lambda) is not an element of E we use notation: \lambda \notin E.

We can compare collections. Collections are considered to be equal when they contain exactly the same elements, no more and no less. \{a, b, c\} = \{a, b, c\}, \{a, b, c\} = \{c, b, a\} (remember, order is ignored), \{a, b, c\} \neq \{d, e, f\} and \{a, b, c\} \neq \{a, \{b\}, c\} (the collection \{b\} does not equal b).

So far we’ve seen only explicitly defined collections. We can also implicitly define collections. This is especially useful when a collection contains too many elements to write down. For example a collection containing all countries on Earth. The notation for such a collection is as follows: \{x | \text{x is a country on Earth}\}. You should read that as “the collection consisting of all (objects) x for which x is a country on Earth”.
Generally we can say that an implicitly defined collection is in the form of \{x | P(x)\} where, in this example, P(x) is the statement that x is a country on Earth. Actually P(x) is a function taking parameter x and returning whether x is or isn’t a part of the set. We’ll look at functions in a later blog post.

To indicate how many elements are in any given (finite) collection we can use notation |A|. So |\{a, b, c\}| = 3 and |E| = 26 (where E is the English alphabet). Of course this is only possible when our collection is finite (we can count the elements).

If a collection is empty (it contains no elements) or some collection A = \{\} we can use a special symbol \emptyset. And of course |\emptyset| = 0 (an empty set has 0 elements).

When we allow the same element to appear in a collection more than once and we start taking the order of elements into consideration we’re speaking of a row. The syntax for a row is simply putting the elements next to each other. For example, using the set \{a, b, c\} we can make the rows a, ababc, baac, caab, but not baad (because d is not in the set).

Infinite collections

So far we’ve looked at finite collections.  Let’s look at some infinite collections now. Consider the collection of all numbers: 1, 2, 3, 4… In theory we can always add 1 to any number, so we can never stop counting. A few of these collections are so important that they got their own symbol.
First we have the natural numbers: \mathbb{N} = \{0, 1, 2, 3, ...\}.
If we add negatives to the collection we get all whole numbers, or integers: \mathbb{Z} = \{..., -3, -2, -1, 0, 1, 2, 3, ...\}.
If we also allow fractions, like \frac{1}{2} (a half) or \frac{1}{4} (a quarter) then we have the collection of rational numbers \mathbb{Q}.
Not all numbers can be expressed in fractions, for example pi (\pi, the surface of a circle with radius 1) or \sqrt{2}. When we want to include those numbers we get the collection of real numbers \mathbb{R}.

Now suppose we want all positive natural numbers, so 1, 2, 3… (excluding 0). We can indicate this with \mathbb{N}^+. Likewise, all negative numbers -1, -2, -3… can be indicated using \mathbb{Z}^-. And of course we can use \mathbb{Q}^+, \mathbb{Q}^-, \mathbb{R}^+ and \mathbb{R}^- to indicate positive or negative fractions and real numbers too.

And we can now define new infinite collections using implicit definitions. The collection of all even numbers, for example, can be defined as follows: E = \{ x | x \in \mathbb{Z} \text{ and x is even}\}.
So here E is a collection consisting of all x for which x is an element in \mathbb{Z} (all integers) and x is even.


If a collection A contains elements that are all elements of another collection B we say that A is a subset of B. For example A = \{a, b, c\} is a subset of B = \{x | \text{x is a letter in the English alphabet}\} because a, b and c are all letters in the English alphabet.

More formally we can say that if A and B are both collections and for every x \in A applies x \in B then A is a subset of B.

We can write this as A \subset B. Since, in the previous example, A does not contain all letters in the English alphabet B we can also say that B is not a subset of A, this is written as B \not\subset A.

Given the above definition we can also say that A \subset A or A is a subset of itself. The empty collection \emptyset is a subset of all collections (including itself).

When a collection A \subset B, but A \neq B then we say that A is a proper subset of B. We may write this as A \nsubseteq B.
Sometimes you may see the notation A \subseteq B to indicate that A is a subset of B that may or may not be equal to B.
I’ll simply use A \subset B throughout the series).

When A \subset B and B \subset A then A = B.

In a similar manner we can say that B is a superset of A if A is a subset of B. This is notated as B \supset A (it’s the subset-symbol reversed). And of course we can say B \supseteq A to indicate that B is a superset of A that may or may not be equal to A. For a proper superset we may use A \nsupseteq B notation.

And when A \supset B and B \supset A then A = B.

Some code

I promised I’d keep this series somewhat practical. So let’s look at some code. Consider the following C# sample using your favorite .NET collection class.

List<int> list = new List<int>();
int thirdItem = list[2]; // 0-based index.

The list now contains the items 1, 2, 3 and 1 again.  We can also get the item at the nth position, which is only useful when we know the order of the elements within the list. Based on that we can conclude that the List class in .NET is not a set. If we want a set in .NET we can use the HashSet class instead.

HashSet<int> set = new HashSet<int>();
if (!set.Add(1))
    Console.WriteLine("Item 1 couldn't be added.");
//int secondItem = set[1]; // Doesn't compile.

Because a set only contains unique items a hash of each item can be generated which means that lookup time for sets is much faster than that of lists (especially when the size of the list grows). The HashSet<T> can be compared to the Dictionary<TKey, TValue> class, but without values.

Unfortunately C# doesn’t know list generators. Working with infinite collections isn’t very do-able in C# either. Let’s say you’d like to get all positive even numbers, E^+ = \{ x | x \in \mathbb{N}^+ \text{ and x is even}\}. We have a few problems. First \mathbb{N} isn’t available in C#. We could take Int32.MaxValue (but that threw an OutOfMemoryException). So let’s just take all positive evens smaller than or equal to a million.

IEnumerable evens = from x in Enumerable.Range(1, 1000000)
                         where x % 2 == 0
                         select x;
List evaluatedEvens = evens.ToList();

Something like that is really the closest we can get in C# without going through a lot of trouble.

And at this point I stand corrected. Paulo Zemek pointed out that it’s actually pretty easy to work with infinite collections by utilizing the yield keyword. The next (Console Application) example illustrates this (although int will eventually overflow…).

static void Main(string[] args)
    foreach (int i in Evens().Take(10))

static IEnumerable Evens()
    return XToInfinity(1).Where(i => i % 2 == 0);

static IEnumerable XToInfinity(int start)
    int current = start;
    while (true)
        yield return current;

That’s still a lot of typing though…

Let’s take another language that solves these kinds of problems a little better, a language that is a little closer to actual maths, Haskell.

evens = [ x | x <- [1..], x `mod` 2 == 0]

And there you have it. Notice that the code is actually pretty close to the mathematical notation. Take each x where x is an element of [1..] (one to infinity) and where x is even.
We can easily take the first 10 items of this infinite collection without crashing the program or encountering out of memory exceptions.

*Main> take 10 evens

This isn’t a blog on Haskell, so I can’t really expand on what it does, but I can say that Haskell is a “lazy” language, which means it doesn’t evaluate the items in the list until you actually need them. In this case it will only evaluate the first ten items of evens, keeping it from hogging our memory and CPU.

That’s it for now. We’ve had a very basic introduction to sets, but if you’re not familiar with this stuff it gets hard pretty quickly. So let’s take it nice and easy. Next time we’re going to combine collections and take a look at the Venn diagram.

Hope to see you next time!

Leave a Reply