This is the second installment of a blog series about web development. In this post we’re going to talk about HTML.
You can find other blogs here:
- Web development #1: Internet and the World Wide Web
- Web development #2: Our first website using HTML
- Web development #3: Styling our page with CSS 3
- Web development #4: PHP in the back
- Web development #5: User input with HTML Forms
- Web development #7: Dynamic page updates with AJAX
- Web development #8: Where to go from here
At this point I presume you know a bit about the internet and the world wide web. If you don’t please read my previous blog first. In this part we’re going to take a look at HTML and create our first web page.
The history of HTML
HTML, which stands for HyperText Markup Language, was first presented in 1991 by Sir Tim Berners-Lee, “inventor” of the internet as we know it today and chairman of the World Wide Web Consortium (W3C). HTML was based on SGML (Standard Generalized Markup Language) and was supposed to be the standard language in which web pages would be built. A proper HTML document consists of elements which consist of tags, usually a start and end tag. For example, a header followed by a paragraph in HTML would look as follows:
<h1>This is a header.</h1> <p>This is a paragraph.</p>
In this example you can see the descriptive (markup) nature of HTML. The tag <h1> indicates “this is where a header starts” and </h1> indicates “this is where a header ends”. The forward slash (/) denotes a closing tag. The entire
<h1>This is a header.</h1> makes up an element in HTML. The same goes for the paragraph and the <p> tag.
Elements can be nested, however a nested element must be closed before the parent element is closed. In the following example I will demonstrate this. The indentation is purely for readability, it is not actually part of HTML. I’ve also added a little comment. The <!– … –> denotes a comment and is ignored in the HTML, but visible for (human) readers.
<p> This is a paragraph. <h1> This is a header inside a paragraph <!--Close h1 before closing p--> </h1> </p>
Whenever a client requests a web page from a web server the web server sends an HTML document. Web browsers, like Internet Explorer, FireFox and Chrome interpret these documents and show you the page (without tags).
In 1995 the IETF introduced new tags and concepts with the release of the new HTML 2 standard. Subsequent HTML versions (standards) were released by W3C exclusively. HTML 3.2 was released in january 1997 and HTML 4 in december 1997, adding and dropping tags and rules (for both writing and interpreting HTML).
In 2000, after version 4.01, came XHTML (Extensible HTML) which was based on XML rather than SGML. XML is actually a more restrictive subset of SGML and by using XML browsers were able to parse HTML as XML. XHTML was supposed to be a backward compatible XML version of HTML. With XHTML 2 the W3C wanted to make a clean start and break free from the past. HTML 4 and XHTML 1 dropped backward compatibility. Because of the controversy this caused XHTML 2 never saw the light of day as a standard.
In 2008 a first draft version of HTML 5 appeared, building upon XHTML. Almost seven years later (and 14 days ago at the time of this writing) HTML 5 was released as the new standard! Although browsers and developers have been working with HTML 5 for a while now. HTML 5 introduces some new concepts. In this blog post I’m working with HTML 5 and I’m going to explain what makes HTML 5 different while we’re building our first web page.
I recommend you take a look at the HTML 5 standard. It contains a description of all elements and how to use them.
As said it is up to web browsers to interpret HTML, or markup, and translate it into visuals on screen. Each browser does this a little differently. As such your HTML code may look different in different browsers. While most browsers nowadays follow the standard pretty well it’s important to check that your website looks okay in different browsers. It’s also one of the reasons you should typically update your browser when updates are available.
Especially Internet Explorer is notorious for being just a little different. Back in the 90’s, when the internet was still young, IE and Netscape were the two browsers you typically used. Of course both wanted to be better than the other. They started making up their own supported HTML tags like <blink> and <marquee> (yes, this was a time when blinking text was considered flashy and high-tech). Netscape disappeared and IE had to support older, non-standard HTML.
So enough with those history lessons already! Let’s start creating our very own HTML document. Open up Notepad, type something, save it as “something.html”, open it from the location where you saved it and it will display your text in your default browser. Congrats, you’ve just created your very first web page. That was a bit of an anti-climax, right? While it works, after all you can see your typed text in your browser, creating a well formed HTML document is a bit harder than that (although creating an HTML document is actually that easy). So you can keep using Notepad or use a more sophisticated text editor. I prefer Notepad++, which even has some IntelliSense for HTML. There are other editors too, but I recommend you keep your editor simple while learning.
Now we’re heading somewhere
So, now that you have your editor in place let’s start by creating a real HTML document. First you’ll need to specify the type and version of HTML you’re using (we’ve seen there’s a few). We can do this by using the DOCTYPE tag. The version for HTML 5 has been drastically simplified. The following example illustrates this.
<!--HTML 5--> <!DOCTYPE html> <!--XHTML 1.1--> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <!--HTML 4.01 Strict (not backward compatible)--> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
After that we need some <html> tags (starting and ending tags) between which our document will be formed. A document consists of a header, which contains the title and possibly some metadata, and a body, which contains the content of our page. While your page is displayed perfectly fine when you place the header under the body, or when your header does not have a element, it is recommended to place your header above your body and put a title inside your header because that’s the W3C standard.
<!DOCTYPE html> <html> <head> <title>Our first web page!</title> </head> <body> </body> </html>
<head> <title>Our first web page!</title> <meta charset="utf-8"> <meta name="description" content="HTML tutorial on sanderrossel.wordpress.com"> <meta name="keywords" content="HTML,XHTML,HTML 5,Web Development"> <meta name="author" content="Sander Rossel"> </head>
Now let’s add some body to the document. We have already seen the <h1> and <p> elements. In HTML there are a couple of header tags, <h1> to <h6>. Suppose you have a header for your whole document, this would be h1. But in your document you want to have some additional headers, these would be h2. And any headers inside your h2 would be h3, etc. Again, this is not mandatory, it’s a recommendation. When a page shows more articles (like a blog home page) each article can, of course, have a h1 header.
<body> <h1>Our first webpage!</h1> <p>HTML stands for HyperText Markup Language.</p> <h2>Important tags</h2> <p>Paragraphs are enclosed in <p> tags</p> </body>
You may be wondering why <p> is looking so weird. Well, imagine that we said “<p>”, and a browser is going to read our HTML, it would think we started a new paragraph! So because the < and > symbols serve a purpose in HTML we can’t use these symbols in ‘regular’ text. Instead we use < for the ‘lesser than’ symbol (<) and > for the ‘greater than’ symbol. We must escape the ampersand symbol (&) and the quote (“) in a likewise manner (& and ").
It’s just Semantics
At this point you may be satisfied with the content, but you may want to style it a bit. Perhaps give the header text another color, use a background, center it in the middle of the screen, etc. This is where HTML 5 is different from its predecessors.
In HTML 5 we define no style whatsoever. We use elements only describe the text semantically. For example, we want to make “HyperText Markup Language” bold, because it’s important and we want to italicize “<p> tags” because it’s a technical term. For this we can use the <b>, <strong>, <i> or <em> (for emphasize) tags. Now in earlier versions of HTML the <b> tag meant bold and the <i> tag meant italic, but in HTML 5 they have a different meaning. <b> means we want to draw attention to this text without conveying extra importance. <strong> does convey this extra importance. <i> represents text in an alternative mood, but without emphasizing the importance of the text. <em> is used when you want to emphasize some text.
And that’s the big difference between HTML 5 and everything that came before. In HTML 5 we think of tags in terms of meaning (or semantics). What do we want to convey with this text, without thinking about the visual aspect. In this case I’m going with <b> and <i> because they fit the meaning of the words best (alternatively, I could’ve used <code> for the “<p> tags”).
<body> <h1>Our first webpage!</h1> <p>HTML stands for <b>HyperText Markup Language<b>.</p> <h2>Important tags</h2> <p>Paragraphs are enclosed in <i><p> tags</i></p> </body>
When you view this body in your browser you’ll probably see some style already. Your <h1> header is pretty large while your <h2> header is a bit smaller. Your <b> element is bold and your <i> element is italicized. This is just some default style, you can easily override it. If the elements fit your meaning, but the way your browser renders them doesn’t (maybe we wanted <b> elements to be red) still go for the elements that convey the right meaning.
There’s plenty of more elements that can help you give meaning to your document. For example there is <article>, <section>, <aside>, <footer>, <small>, <blockquote>, <code>, <abbr> (from abbreviation), <address> (for contact information), etc.
The <a> element represents a hyperlink. It can link to other websites, pages, or a specific place on your current page. The place you want to link to is specified by the href attribute. The text the user has to click on to be redirected comes between the starting and ending <a> tags. Often you’ll find something like: “For my awesome blog click here”, where “here” redirects you to the blog. Don’t be like that, make your linking text descriptive, for example: “You can read more on my awesome blog” where “awesome blog” redirects you to the blog. This is, again a recommendation from the W3C. Here’s a small example of the <a> element (including a target attribute):
<p>You can read all about web development on <a href="http://www.sanderrossel.wordpress.com" target="_blank">my awesome blog</a>!</p>
There’s one last element I wish to discuss, the <img> element. I want to discuss it mostly because we can learn two things from it. First, that we should really use HTML 5 for the meaning of our text, not our styling. And second why we should create our document according to the W3C standards (in a later section).
First let’s look at how we use the <img> tag. Like the <meta> tag the <img> tag has no ending tag. It has one mandatory attribute, the src (source) attribute. Next to the src attribute it is recommended to provide an alt (alternative) attribute, which is used when the image cannot be loaded from the source. It is important that your image is a part of your text (it visualizes or supports your story). DO NOT use <img> to set your background! Again, that would be styling your document and in HTML 5 we are not styling! We’ll see an example in the next section.
Putting it all together
So let’s put all of what we have learned together in a single document. I have created a small HTML file with some elements. I recommend you study the code and read the actual output page.
<!DOCTYPE html> <html> <head> <title>Our first web page!</title> <meta charset="utf-8"> <meta name="description" content="HTML tutorial on sanderrossel.wordpress.com"> <meta name="keywords" content="HTML,XHTML,HTML 5,Web Development"> <meta name="author" content="Sander Rossel"> </head> <body> <article> <h1>Our first webpage!</h1> <p><abbr title="HyperText Markup Language">HTML</abbr> stands for <b>HyperText Markup Language</b>.</p> <p>The language consists of <i>tags</i> that describe the content of a document. For example, a tag can indicate that a certain text belongs to a single paragrah, that certain text is more important, less important, that an image should be displayed, or that a new line must be inserted.</p> <p>A typical piece of HTML may look as follows:<br> <code><p>This is a paragraph with <strong>important text</strong></p>.</code></p> <p>Because the < and > symbols are used as part of HTML you'll need to use a special code to display them as plain text.<br> You can write them as &lt; (lesser than) and &gt; (greather than).</p> <aside> <h2>Important tags</h2> <p>Paragraphs are enclosed in <i><p> tags</i></p> <p>HTML ignores line breaks. Instead we use the <i><br> tag</i><br> Like this!</p> <p></p> </aside> <h2>More cool stuff</h2> <p>You can read all about web development on <a href="http://www.sanderrossel.wordpress.com" target="_blank">Sander's bits</a>!</p> <p>HTML 5 is awesome!<br> <img src="https://www.w3.org/html/logo/downloads/HTML5_Logo_128.png" alt="The HTML 5 Logo" title="The HTML 5 Logo"></p> </article> </body> </html>
You may want to use more or less elements than I did. Maybe you want to use <section> elements for the different parts, maybe you want more <abbr> elements (by the way, notice the tooltip on <abbr>?). Since HTML 5 is all about semantics (meaning) it’s a bit difficult to tell what is wrong and right. Do we want <strong> or <em>, do we need the <article> element and is it logical to have an <article> element inside an <article> element (or <section> inside a <section>)? It’s mostly up to you!
Why use standards
You might wonder why I’m going through so much trouble to get my markup right. The <article> element doesn’t change the way my page looks, neither does the <aside> element. Why would I need an alt attribute in my <img> tag if I know the image is available? Luckily, HTML is pretty forgiving. Your page will probably be correctly displayed anyway even if you miss a starting or ending tag .
However, if your page is well formed, meaning your ending tags match your starting tags and all your ‘mandatory’ elements (such as <title> element in your <head> element) and attributes (such as alt attribute in the <img> tag) are in place, the chances that they are correctly displayed in any browser is bigger. In addition your page might load faster.
Your page can be better indexed by search engines, like Google, when your document is well formed. That is quite beneficial because that means people will sooner find your page.
But what is perhaps most important, and probably something you didn’t think of, is that people with disabilities will have it a little easier to access your page. Someone who is blind cannot see your image, but some software can read the text in your alt attribute to the person. That’s also the reason you should not have “here” link to some page, but rather a description of that page, like “my awesome blog”!
You can use various tools to check whether your markup is correct. For example, the W3C has a validator that checks various inputs, The W3C Markup Validation Service. I suggest you use it.
We’ve seen a bit of HTML 5, but there’s much more. We might get to that in a later blog. For now we know enough to move on. In the next blog we’re going to add some style to the page we just created using CSS!