Warning: Although many browsers allow you to omit the
part http:// when specifying the URL of a document to be
visited, you must not omit it in when writing a normal URL
into an HTML document. (Otherwise browsers will try to interpret it
as a relative URL.)
Actually, this pattern is mainly for Web documents, ie http
URLs. For other URLs, simplifications and special interpretations are
applied. For example, a mailto URL is just of the form
mailto:address where address is
a normal Internet E-mail address like
Jukka.Korpela@hut.fi
(as specified in
RFC 822).
Please notice that appending anything to the E-mail address in
a mailto URL
is nonstandard and
may result in lost mail without
anyone noticing! (See also
the discussion of mailto: URLs in the description of the
A element.)
An http URL can also be
a fragment identifier
which consists of an absolute URL, the # sign and a
name (which refers to a location within the
document specified by the absolute URL).
See the description of the A element for more information.
It is safest to enclose URLs in
quotes when
writing them as attribute values in HTML.
For an overview of URLs, see
W3C
material on addressing.
As regards to the
technical specifications of the
syntax of URLs, see RFC 1738 (absolute URLs) and RFC 1808 (relative URLs).
In particular, the specifications
say that within a URL
only a limited set of characters can be used as such:
- alphanumeric characters (
A to Z,
a to z, 0 to 9)
- the characters
$-_.+!*'(),
- the characters
;/?:@=&# provided
that they are used in the special meaning reserved for
them in the RFCs mentioned above.
Other characters must be encoded.
(The characters ;/?:@=&# must also be encoded, if they
are not used in the special meaning.)
This encoding (which is defined by URL specifications, not HTML
specifications) consists of using the percent sign followed by two
hexadecimal digits, presenting the code position.
For example, tilde (~) should be presented as
%7E and space as %20.
(Violating the rules causes problems
much more likely in the latter case than in the former.)
When a URL occurs as an
attribute value in HTML,
there is another complication caused by the
& character which may have special
use in query form submissions. In principle,
that character should be escaped as &
or as & (there is
a footnote in the HTML 2.0 specification about this) and browsers should process it so that the actual URL passed to the
processing CGI script has that notation
replaced by plain & character. (Notice that it must not be
encoded. This is a confusing issue, and CGI scripts should
really be written so that semicolon ; and not ampersand & is used
as field separator.)