The World Wide Web

John O’Gorman

john@og.co.nz

12 December 2010

1 Intro

This document tries to explain a little of the history of the development of the World Wide Web on the internet in the 1980s. It is a rewrite of a document which was lost in a computer crash (Real Programmers do not take backups).
The Internet can work using just the IP (Internet Protocol) addresses of each host machine. The web on the other hand relies on easily recognised names such as google.com or wikipedia.org rather than IP number sequences. To use this we need something analogous to a phone book. In the beginning the phone book was implemented as a file /etc/hosts. But the growth of the internet soon made this impractical. There are now billions of hosts each with a unique IP address. No practical file (or real life phone book) could hold all this. The solution is a distributed Domain Name System (DNS).

2 BIND

The first Unix implementation of DNS lookups was the Berkeley Internet Name Domain (BIND) implemented in 1984 by 4 students of UCB (University of California Berkeley). Since 1985 Paul Vixie has been responsible for maintaining BIND. BIND is freely available and comes with all flavours of Unix including Linux and Mac OS-X.

2.1 dig

To get an idea of how BIND is structured, run the command dig (Domain Information Grope).
The output is something like this:
; <<>> DiG 9.8.3-P1 <<>>
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61583
;; flags: qr rd ra; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;.				IN	NS
;; ANSWER SECTION:
.			3062	IN	NS	k.root-servers.net.
.			3062	IN	NS	g.root-servers.net.
.			3062	IN	NS	l.root-servers.net.
.			3062	IN	NS	c.root-servers.net.
.			3062	IN	NS	j.root-servers.net.
.			3062	IN	NS	h.root-servers.net.
.			3062	IN	NS	b.root-servers.net.
.			3062	IN	NS	d.root-servers.net.
.			3062	IN	NS	a.root-servers.net.
.			3062	IN	NS	e.root-servers.net.
.			3062	IN	NS	m.root-servers.net.
.			3062	IN	NS	i.root-servers.net.
.			3062	IN	NS	f.root-servers.net.
;; Query time: 5 msec
;; SERVER: 192.168.168.8#53(192.168.168.8)
;; WHEN: Sat Apr 29 14:14:33 2017
;; MSG SIZE  rcvd: 228
The above lists the 13 root name servers: a to m. The last dot is the root node of the domain name tree.
Try the command: dig co.nz.
; <<>> DiG 9.8.3-P1 <<>> co.nz.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; QUESTION SECTION:
;co.nz.				IN	A
;; AUTHORITY SECTION:
co.nz.			2080	IN	SOA	loopback.dns.net.nz. soa.nzrs.net.nz. 2017042956 900 300 604800 3600
;; Query time: 11 msec
;; SERVER: 192.168.168.8#53(192.168.168.8)
;; WHEN: Sat Apr 29 14:33:11 2017
;; MSG SIZE  rcvd: 85
Now try the command: dig og.co.nz.
; <<>> DiG 9.8.3-P1 <<>> og.co.nz.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3552
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 7, ADDITIONAL: 14
;; QUESTION SECTION:
;og.co.nz.			IN	A
;; ANSWER SECTION:
og.co.nz.		1493	IN	A	65.99.223.195
;; AUTHORITY SECTION:
nz.			119265	IN	NS	ns4.dns.net.nz.
nz.			119265	IN	NS	ns7.dns.net.nz.
nz.			119265	IN	NS	ns1.dns.net.nz.
nz.			119265	IN	NS	ns2.dns.net.nz.
nz.			119265	IN	NS	ns3.dns.net.nz.
nz.			119265	IN	NS	ns6.dns.net.nz.
nz.			119265	IN	NS	ns5.dns.net.nz.
;; ADDITIONAL SECTION:
ns1.dns.net.nz.		119265	IN	A	202.46.190.130
ns2.dns.net.nz.		119265	IN	A	202.46.187.130
ns3.dns.net.nz.		119265	IN	A	202.46.188.130
ns4.dns.net.nz.		119265	IN	A	202.46.189.130
ns5.dns.net.nz.		119265	IN	A	185.159.197.130
ns6.dns.net.nz.		119265	IN	A	185.159.198.130
ns7.dns.net.nz.		119265	IN	A	194.146.106.54
ns1.dns.net.nz.		119265	IN	AAAA	2001:dce:2000:2::130
ns2.dns.net.nz.		119265	IN	AAAA	2001:dce:7000:2::130
ns3.dns.net.nz.		119265	IN	AAAA	2001:dce:8400:2::130
ns4.dns.net.nz.		119265	IN	AAAA	2001:dce:d454::53
ns5.dns.net.nz.		119265	IN	AAAA	2620:10a:80aa::130
ns6.dns.net.nz.		119265	IN	AAAA	2620:10a:80ab::130
ns7.dns.net.nz.		119265	IN	AAAA	2001:67c:1010:13::53
;; Query time: 2 msec
;; SERVER: 192.168.168.8#53(192.168.168.8)
;; WHEN: Sat Apr 29 14:32:54 2017
;; MSG SIZE  rcvd: 484

2.2 whois

The whois command is used to find those responsible for DNS for any domain: whois og.co.nz
% Terms of Use
%
% By submitting a WHOIS query you are entering into an agreement with Domain
% Name Commission Ltd on the following terms and conditions, and subject to
% all relevant .nz Policies and procedures as found at https://dnc.org.nz/.
%
% It is prohibited to:
% - Send high volume WHOIS queries with the effect of downloading part of or
%   all of the .nz Register or collecting register data or records;
% - Access the .nz Register in bulk through the WHOIS service (ie. where a
%   user is able to access WHOIS data other than by sending individual queries
%   to the database);
% - Use WHOIS data to allow, enable, or otherwise support mass unsolicited
%   commercial advertising, or mass solicitations to registrants or to
%   undertake market research via direct mail, electronic mail, SMS, telephone
%   or any other medium;
% - Use WHOIS data in contravention of any applicable data and privacy laws,
%   including the Unsolicited Electronic Messages Act 2007;
% - Store or compile WHOIS data to build up a secondary register of
%   information;
% - Publish historical or non-current versions of WHOIS data; and
% - Publish any WHOIS data in bulk.
%
% Copyright Domain Name Commission Limited (a company wholly-owned by Internet
% New Zealand Incorporated) which may enforce its rights against any person or
% entity that undertakes any prohibited activity without its written
% permission.
%
% The WHOIS service is provided by NZRS Limited.
%
version: 7.40
query_datetime: 2017-04-29T14:43:07+12:00
domain_name: og.co.nz
query_status: 200 Active
domain_dateregistered: 1997-03-13T00:00:00+13:00
domain_datebilleduntil: 2020-01-01T00:00:00+13:00
domain_datelastmodified: 2011-09-13T14:06:40+12:00
domain_delegaterequested: yes
domain_signed: no
%
registrar_name: Domainz Limited
registrar_address1: Private Bag 1810
registrar_city: Wellington
registrar_country: NZ (NEW ZEALAND)
registrar_phone: +64 4 473 4567
registrar_fax: +64 4 473 4569
registrar_email: 4service@domainz.net.nz
%
registrant_contact_name: OGorman Computer Consulting Ltd
registrant_contact_address1: 21 Ngake Street
registrant_contact_address2: Orakei
registrant_contact_city: AUCKLAND
registrant_contact_postalcode: 1071
registrant_contact_country: NZ (NEW ZEALAND)
registrant_contact_phone: +64 9 5210336
registrant_contact_fax: +64 9 5210336
registrant_contact_email: ogormanj@xtra.co.nz
%
admin_contact_name: John O’’Gorman
admin_contact_address1: 21 Ngake Street
admin_contact_address2: Orakei
admin_contact_city: AUCKLAND
admin_contact_postalcode: 1071
admin_contact_country: NZ (NEW ZEALAND)
admin_contact_phone: +64 9 5210336
admin_contact_fax: +64 9 5210336
admin_contact_email: ogormanj@xtra.co.nz
%
technical_contact_name: Thomas Matysik
technical_contact_address1: 46 Bel Air Drive, Hillsborough
technical_contact_address2: Auckland, New Zealand
technical_contact_city: Auckland, 1061
technical_contact_country: NZ (NEW ZEALAND)
technical_contact_phone: +64 21 210 4822
technical_contact_email: thomas@extant.net.nz
%
ns_name_01: ns1.extant.net.nz
ns_name_02: ns2.extant.net.nz
%

3 World Wide Web

In 1989 Tim Berners-Lee proposed a solution to the problem of sharing data on the internet. In 1990 he took time out to develop the 3 fundamental technologies which are still basic to today’s web:
  1. HTML: Hypertext Markup Language - the format of web pages
  2. URI and URL: Uniform Resource Indicator or Location - how webservers are identified
  3. HTTP: Hypertext Transfer Protocol - the requests/commands run between browsers and webservers
These 3 standards allow a client server combination of browser as front end to communicate with a webserver as back end to fetch and present html files to the client browser.
By the end of 1990 Tim had also created the first browser on a Next computer and written a webserver program (hpptd). So we had the first working web.

3.1 HTML

HTML is a markup language. It derives ultimately from IBM’s policy of separation of content from presentation. IBM has millions of lines of documentation available in multifarious formats. The principal is that the content is identical for all formats. Tags placed in the document control the formatting for different media. IBM created SGML as the formatting language to convert all their documents to various formats. The basic structure for an HTML web page is:
<html>
   <head>
</html>
In principal each chunk of text should be surrounded by tag pairs of the form: <tag> ... </tag>. In practice the earliest versions of HTML permitted upper or lower case tags. Later versions of HTML are defined in terms of XML and prescribe lowercase. Developers generally prefer lowercase.
Empty tags can take the form <tag />
Historically HTML has undergone development changes. There have been versions 3.2, 4, xhtml, and the latest: html5.
Version 4 will have a heading like the following:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
 "http://www.w3.org/TR/html4/loose.dtd">
The xhtml version will have a header like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
The new and still under development html5 will have a much simpler header:
<!DOCTYPE html>
The W3C in the above URLs the World Wide Web Consortium which is the body which accepts and promulgates proposals for web standards. It is run by Tim Berners-Lee who left CERN in 1994 to start up the consortium and manage the process of developing standards to meet new requirements.
The DTD stands for Document Type Definition and is an XML standard for defining what tags are permitted in a document.

Within an html page you can create a table with rows and columns. Rows are tagged <tr> .. </tr>. Columns are tagged <td> .. </td> and are nested within <tr> .. </tr> tags
<table>
</table>
Interactive forms are created using the tag <form> .. </form>. e.g.
<form action=”prog.cgi” method=”POST”>
</form>

3.2 URI and URL

A Uniform Resource Identifier is a kind of address for a resource available on the web. An example:
http://www.w3.org/People/Berners-Lee/FAQ.html
URIs are also known as URLs (Uniform Resource Locators).

3.3 HTTP

The Hypertext Transfer Protocol prescribes how and what communications may pass between browser and webserver. In the beginning GET, HEAD, POST, and PUT commands sufficed. Later DELETE, TRACE, OPTIONS, CONNECT, and PATCH were added.

3.4 Webserver

The 4th component of the the WWW is the webserver program named httpd. If the protocol is HTTP the program reads the file interprets it appropriately, executes any builtin Javascript or php code (which will be tagged <script language=”php”> ... </script> or <?php> ... </php?> ) then transmits the file back to the calling browser.
The first httpd program was created by Tim Berners-Lee. Nowadays the most popular instance is Apache which like all WWW components is free and open source.

3.4.1 Server Languages

3.4.1.1 CGI
The Common Gateway Interface is a mechanism allowing programs on the webserver to be executed by the server with the purpose of generating content for web pages.
3.4.1.2 Javascript
Javascript was invented at SUN by Brendan Eich. Now standardised and also called ECMAscript. It shares much of its syntax with Java but is different in purpose.
3.4.1.3 PHP
Private Home Pages now called PHP Hypertext Preprocessor created by Rasmus Lerdorf and released in 1997 is a server-side embedded language for HTML or html5 pages.

3.5 Browser

Tim Berners-Lee created the 1st browser. This was followed very soon by Arena an experimental browser used by W3C to test implementation of new RFC proposals, Mosaic (a graphical browser), then later rivals: Netscape, Mozilla Firefox, Opera from Norway, Apple’s Safari, and Google’s Chrome.