KCLCCHMinor programmeAV1000Electronic communications and publishing


AV1000
Fundamentals of the digital humanities
Introduction to HTML

I. HTML

The Hyper-Text Markup Language is the metalanguage used to construct documents on the World Wide Web. In other words, HTML comprises a set of instructions that describe how the words of an online document are to be handled. It is called hypertextual because one of its major features is to provide for automatic linking between any two labelled places within the same document or in two different documents.

Each HTML document may be displayed by the computer in two forms: (1) what the author composes, and (2) what the reader sees. The former is a mixture of HTML instructions ("markup") with the text that will appear on the page; the latter is a formatted page. If you click on the link in the previous sentence, you will see the formatted page (2) in the normal way. From the View or Page menu of Internet Explorer or Firefox, you can choose View Source or Page Source and see the original HTML (1).

See also the course's template for writing new HTML documents, which saves on typing parts that are always necessary. In this course, the normal way of creating HTML documents will be to start from this template and use a text editor to fill in the rest. On Windows the Notepad program will work for this; there is an enhanced version of this program, NotePad2, which you can download here and which may be run on PAWS machines. Copy it to your memory stick to save the trouble of downloading it each time.

Note that:

  1. HTML is a simplified derivative of the Standard Generalised Markup Language (SGML), a widely used metalanguage for describing the elements and logical organisation of electronic documents. The version of HTML we teach in this course is XHTML 1.0, a variety of HTML based on the Extensible Markup Language (XML). It is stricter than other versions of HTML.
  2. HTML consists of instructions applied to portions of a document; together these are called elements. The elements are divided into those that define how the body of a document is to be displayed by the WWW browser and those that define information about the document, such as its title, keywords, and relationship to other online documents.
  3. Most elements affect a block of text by specifying a start-tag at the beginning of the block and an end-tag immediately following it. All tags are enclosed in angle-brackets and consist of an element name optionally followed by one or more attributes. The end-tag repeats the element name preceded by a forward slash (virgule). Thus a paragraph is indicated by a <p> at its beginning and a </p> at its end, a centred paragraph by a <p align=center> and a </p>.
  4. Some elements are empty, i.e. contain no affected text. An example is the element to produce a horizontal "rule" or line, <hr />. In these cases the forward slash appears within the start-tag, just before the closing angle bracket.

II. Structure of an HTML document

See again the attached sample Web page and use View Source for an illustration of the following.

  1. Enclosed by the <html> … </html> element. Next is
  2. A <head> … </head> element, which usually contains a <title> … </title> element. The title is whatever words you want to appear on the title-bar of the browser window. It will also appear as the title in listings produced by search engines such as Google.
  3. The body of the document follows and encloses all other contents. It is denoted by the <body> … </body> element, which (as here) may contain attributes determining the background colour or pattern, the colour of the text and of text affected by linking.
  4. The body is free-form, i.e. may contain any mixture of HTML elements and text.

III. Common HTML elements

Again, see the attached sample Web page and use View Source for illustration of the following.

  1. Headings or titles. These are denoted by a set of elements, <hx> … </hx>, where x = an Arabic numeral from 1 to 6. The lower the number the larger the enclosed text is rendered. How big the actual text is on screen is a function of settings in the browser. Thus <h2>London: a late 12th-century opinion</h2> renders that title in the second-largest size.
  2. Paragraphs, denoted as such by being enclosed with the <p> … </p> element. Note that "hard returns" that you type in the text-editor are not represented as such by the browser but as spaces. The text is reformatted to fit the browser window rather than following the line breaks within your HTML file. Two or more returns or spaces will be represented as a single space. A common attribute for the paragraph element, illustrated in the sample, is align, which may be set to display text centred (<p align="center">), pushed to the right-hand margin (<p align="right">) or against the left-hand margin (the default).
  3. Lists. There are two kinds of lists: "ordered" lists, denoted by the <ol> … </ol> element, which produce an enumerated series such as you see here; and "unordered" lists, denoted by the <ul> … </ul> element, which produce a series marked by bullets. Each item in the series for both <ol> … </ol> and <ul> … </ul> is defined by the list-item element, <li> … </li>.
  4. Links. Any segment of text may be made into a hypertextual link, which when activated by the user will fetch another document. Text is denoted as a link by enclosing it within an anchor element whose attribute specifies the destination address. See the examples in the attached sample.

IV. Browser tricks

  1. The best way to learn HTML is by example. Extensive browsing on the Web with an eye to effective design features, followed in each case by investigation of the HTML that causes them, is highly recommended.
  2. To see the HTML behind any Web page, either choose the item "view source" or "page source" (under the View or Page menu) or save the page, using the Save As… item under the File menu, then use a text-editor to view it.
  3. You can capture any image or graphic you see, including animated GIFs. On the PC, place the mouse pointer on the image, right-click the mouse and choose the Save As… option.

V. Design

  1. Effective design is a major consideration in publishing on the Web. Observe what you think works, then ask yourself why. Copy the best examples before you try to be creative.
  2. Since HTML gives you great freedom in how your pages are designed, you have to be particularly thoughtful about what you are providing for your reader.
  3. Readers need to be told, explicitly or implicitly, what kind of a thing your page is, what genre it belongs to (e.g., personal homepage, academic c.v., essay, collection of links). It should be quickly recognisable as something familiar; otherwise be sure of what you are doing.
  4. Segmentation of a document into smaller, interlinked pages is often better than a single page containing a large document.
  5. The mechanisms and path you provide to help your reader navigate through your document will have much to do with its success. Two common mechanisms are (a) a table of contents at the top of the first page in a set, and (b) navigational "buttons" at the top and bottom of each page. Consider caefully how you think your document should be read and how it may be accessed. Note that a Web search-engine may deliver to a potential reader a page somewhere in the middle of a document; how is that reader to know what he or she has landed in the middle of?
  6. Always remember: your page should be designed to be read (or viewed), not to be admired. Cleverness is sometimes a vice.
  7. See the style guides in the Course Bibliography.

VI. Pragmatics of displaying & ethics of copying

  1. Anything you put on a Web page may be copied. You are in effect giving away whatever you publish there. (This point is of special concern to artists, but note the technology for imposing a "digital watermark".)
  2. Information about yourself that you put on a Web page may be used by others for purposes you may not welcome, e.g. sending you unwanted e-mail.
  3. Your Internet service provider (ISP), King's College London, holds you responsible for the contents. Nothing naughty!
  4. You are expected to acknowledge your sources explicitly. Ideas cannot be owned, but significant implementation of them is. They should be documented as carefully as you would a source for a conventional academic paper. Failure to observe the ethics of copying could get you into trouble (i.e. a charge of plagarism, or worse).

revised October 2007