Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu HTML & CSS: The Complete Reference- P16 pdf
Nội dung xem thử
Mô tả chi tiết
This page intentionally left blank
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
A
Character Entities
Keyboard characters such as < and > have special meanings to (X)HTML because they
are part of HTML tags and must be encoded. Other characters, such as certain
foreign language accent characters and special symbols, can be difficult to specify,
depending on the keyboard being used. To address escaping of special-purpose characters
and inserting a wide range of characters and symbols, character entities should be
employed.
The format of character entities is in general
&code;
where code may be a
• A decimal form like Ë
• A hex form like Ë or stripped of leading zeros, simply &xCB;
• A named value if available, such as Ë
NOTE When using a hex form, either a lowercase or uppercase x may be used as well as upperand lowercase values for digits A–F, so Ë and Ë and Ë and so on are all
equivalent. Case sensitivity is not, however, guaranteed for named entities and may result in
errors or wrong characters. Good style would suggest lowercase for the hex symbol and uppercase
for the digits.
As an example,
<p>Numeric entity decimal: £</p>
<p>Numeric entity hex: £</p>
<p>Named entity: £</p>
727
APPENDIX
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
728 P a r t I I I : A p p e n d i x e s
would look like this:
Encoding Quirks and Considerations
Encoding characters is quite important if you want to validate your markup. For example,
consider when you have nontrivial query strings in (X)HTML links like so:
<p>Does this <a href="http://www.pint.com/program?p1=foo&p2=bar">link</a>
validate?</p>
The markup will not validate.
For this line to validate, you must encode the special characters in the link like so:
<p>Does this <a href="http://www.pint.com/program?p1=foo&p2=bar">link</a>
validate?</p>
Do not, however, take this as advice to change ampersands in typed URLs everywhere you
encounter them, such as within e-mails or the browser’s location bar. Typically, a browser
will exchange an entity for its correct value, but this change may not take place in other
environments.
Commonly, you will also have trouble when using characters that are part of (X)HTML
itself, particularly the less than (<) and greater than (>) symbols and, of course, the
ampersand that starts entities. As an example, consider this contrived example with a
mathematical expression:
<p>A silly math statement ahead x<y>z is dangerous to validation.</p>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
A p p e n d i x A : C h a r a c t e r E n t i t i e s 729
PART III
For the greatest safety, the markup should have had the special characters encoded like so:
<p>A silly math statement ahead x<y>z is not dangerous to
validation.</p>
We note that this example is fairly contrived and often just an extra space will allow the
validator (and browser) to tokenize the text correctly. For example,
<p>A silly math statement ahead x < y > z is dangerous to validation?</p>
will likely validate. The loose enforcement of special character handling is both a blessing
and a curse. It leads to sloppy usage and surprising bugs.
Sloppy syntax is troubling because interpretation may vary browser to browser.
Consider the point of case sensitivity of named entities in browsers. Named entities are
supposed to be case sensitive. For example, à and À are two different
characters.
Now given this fact, what should a browser do when faced with
<p>&POUND; and £</p>
Apparently it treats the first as text and the second as an entity.
But does that hold for all characters? Apparently not—some entities like © are
generally case insensitive, while others like ™ may vary by browser, and others like
¥ will always be case sensitive.
Initial drafts of HTML5 attempted to formalize what named entities should be case
insensitive; these drafts focused on the commonly used and supported entities. The current
list of what should be case-insensitive named entities is shown in Table A-1.
Best practice, however, would be not to rely on case insensitivity of named entities, it is
still inconsistent. In general, lax syntax enforcement and permissive interpretation of
entities in browsers just leads to all sorts of small quirks. Consider
<p>"E; and "e;</p>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
730 P a r t I I I : A p p e n d i x e s
Under Internet Explorer, the rendering engine even in a strict mode will “fix” this
problem and effectively convert this into
<p>"E; and "e;</p>
while other browsers will correctly leave this mistake alone.
While it turns out that SGML (and thus traditional HTML) does allow the final
semicolon to be left off in an entity in some cases, the preceding example clearly indicates it
does not allow for that latitude in the middle of words. Just as when dealing with markup
and CSS, it is best to get syntax right rather than rely on some variable fix-up applied by a
browser’s rendering engine.
There will be instances when you may get the syntax correct but the browser may not be
able to render the characters meaningfully. The reasons for nonsupport can vary and may
be because a particular font is missing or the operating environment or browser is unable to
render the character. Generally, browsers will present these failures as boxes or diamonds,
like so:
Named
Entity HTML5 Alias
Numbered
Entity Unicode Entity
Intended
Rendering Description
& & & & & Ampersand
© © © © © Copyright
> > > > > Greater than
< < < < < Less than
" " " " “ Double quotes
® ® ® ® ® Registration mark
™ ™ ™ ™ ™ Trademark symbol
TABLE A-1 Entities Considered Case Insensitive in HTML5
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.