Changes between Initial Version and Version 1 of web_validation


Ignore:
Timestamp:
May 6, 2008, 7:53:12 PM (17 years ago)
Author:
Daniel Kahn Gillmor
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • web_validation

    v1 v1  
     1[[PageOutline]]
     2= Validating your web site =
     3
     4If you post content on the web, and you want it to be readable by everyone, it makes sense to try to post that content in a universally-readable format.  This page is about helping web site authors and administrators ensure that the data they offer is presented in a universally-readable format.
     5
     6== Declaring a Standard ==
     7
     8The first step i take when trying to debug cross-browser compatibility
     9issues for web sites is to make sure that the pages i'm generating
     10explicitly specify the standard they're targeting (e.g. it's better to say "please use xhtml
     111.0 strict", instead of asking the browser to guess, in which case it
     12might choose something else, like "HTML 4.01 transitional").  This is done with a "[WikiPedia:DOCTYPE]"
     13header at the start of each page -- if you're using a dynamic page
     14generation tool, there should be a way to ask it to emit a DOCTYPE
     15header anyway.
     16
     17
     18== Meeting the Standard ==
     19
     20After i've specified a doctype, i make sure that the pages i'm emitting are syntactically
     21valid according to the standard i've chosen as a target.  One easy way to do
     22this is to ask the [http://validator.w3.org/ W3C's validator] to check the syntax of the pages.
     23
     24The goal is to get the site to a state where the validator reports no errors.  Note that this should theoretically be done on ''all'' pages of a site, since each page might be different.  However, just starting with a few example pages is a good start.
     25
     26While this process can be frustrating initially, it really does help
     27to ensure cross-browser compatibility.  If your page doesn't
     28syntactically match the standard you specify, (or if you don't specify
     29one at all), you're asking each browser to take its best
     30guess at what you meant.  Since browsers are written by different
     31people, when you diverge from the defined common language, you'll get
     32into different assumptions and have to deal with different quirks.
     33Sticking to a rigorous syntactic validity will help you minimize these
     34kind of surprises.
     35
     36== Character Sets ==
     37
     38In addition to making sure your site has valid syntax, you might also want to consider making sure that it uses unicode (the most popular form of unicode is UTF-8).  This choice of character set encoding (or "charset") is crucial for sites that contain (or may one day contain) characters from outside the standard alphabet used in your native language.  If you're not sure which character set your site uses, the [http://validator.w3.org W3c's validator] will also report the charset your site defaults to.  You can [http://www.w3.org/International/O-HTTP-charset.en.php read up on declaring a new
     39charset] if you're interested in making this change.
     40 
     41== Offline Validators ==
     42
     43If you run a [http://debian.org debian]-derived operating system (including [http://ubuntu.com/ ubuntu]), you might be interested in [DebianPackage:w3c-markup-validator] and [DebianPackage:wdg-html-validator], both of which are tools designed to let you run similar validation tests on your own.
     44
     45Running a local validator gives you more control over what you do with the output, and makes it easier to incorporate the tools into scripted or automated tests.
     46
     47It also lets you run the validator on sites that you might not want to grant access to over the internet (for example, if you're working on a local staging or development copy of a site that is only available over a loopback device).
     48
     49== Validation and [wiki:content_management_systems Content Management Systems] ==
     50
     51Many [wiki:content_management_systems CMSes] already declare a DOCTYPE and charset for you, and use it in their templates and static pages.  This usually means that any code you write that is published through such a CMS needs to be carefully checked to ensure that it meets the standard (and charset!) selected by the CMS.  Some CMSes will help you meet that standard by providing filters to run user-submitted content through.  While there is still guesswork about how to translate non-standard markup into standards-compliant markup, using the CMS' provided filters moves the guesswork into the server side, where it can be made ''once'' in a canonical fashion, instead of asking each viewer's web browser to do its own guesswork.
     52
     53An example of this is Drupal's [http://drupal.org/handbook/modules/filter Input Formats provided by the "filter" module].
     54
     55Also note that sometimes you might paste data from one system (e.g. Microsoft Word) into another system (e.g. an input box on a Drupal-based web site), and the two systems might be using different character sets.  Since common Latin characters (a, b, c, etc) are handled identically in most character sets, you'll usually notice the problem first in things like unusual punctuation (e.g. [WikiPedia:"smart quotes"], em dashes, the interrobang (‽), etc), unusual character, new symbols, or characters with unusual diacritics (e.g. ß, €, ẍ, , etc).  If characters like these are showing up in some garbled state, you should consider the choice of character set as a possible culprit.
     56
     57== Conclusion ==
     58
     59Once you've got a syntactically valid page that does what you want in
     60one browser, the difference between that and other modern browsers
     61should be relatively small, and you can focus your time on resolving
     62those differences.  Regular checks of page validity when you make
     63changes are also a good idea, just to make sure the page keeps
     64working.