4 Using Writer2xhtml and Calc2xhtml
Writer2xhtml is producing standards compliant XHTML files, in particular it can be used to put math on the web using the XHTML + MathML combination. Thus Writer2xhtml can convert into any of these XHTML variants:
-
XHTML 1.0 strict, which follows the guidelines for HTML compatibility, so that the output should be viewable with any browser that supports HTML 4.
-
XHTML 1.1 + MathML 2.0, which currently is viewable with the Mozilla and Amaya browsers only.
-
XHTML 1.1 + MathML 2.0 using XSL transformations from the W3C Math Working Group to make the file viewable also in some browsers that needs a plugin to display MathML, eg. Internet Explorer with MathPlayer plugin.
This is how W3C's Math Working Group recommends to put ”math on the web”.
Note that the default file extension and the recommended MIME types varies with the output format:
Output format |
Default file extenstion |
MIME type |
XHTML 1.0 |
.html |
text/html |
XHTML 1.1 + MathML 2.0 |
.xhtml |
application/xhtml+xml |
XHTML 1.1 + MathML 2.0 (with xsl transformation) |
.xml |
application/xml |
Writer2xhtml is quite flexible; in particular with respect to the handling of formatting:
-
You can let Writer2xhtml convert the style information in the source document and thus get an xhtml document that has the same general appearance as the original, but with an online look and feel.
-
You can use your own style sheet and let Writer2xhtml convert the content only. You can map styles in OOo to xhtml elements and css classes from your style sheet, see sections 4.3 and 4.4
Calc2xhtml is a companion to Writer2xhtml that produces XHTML 1.0 strict from your Calc documents.
4.1 Converting to XHTML from the command line
To convert a file to XHTML use the command line
w2l [options] <document/directory to convert>
[<output path and/or file name>]
The available options are
-
-xhtml, -xhtml+mathml and -xhtml+mathml+xsl specifies the output format (if you leave this out, the output format will be LaTeX!).
-
-recurse to specifiy that batch conversion of a directory should recurse into subdirectories.
-
-template filename to specify a template file. Writer2xhtml will use this file as a template for the converted document. The template must contain an element with the attribute id="content". This element should accept block content, eg. div or td. Optionally it can also contain elements with attributes id="header" and id="footer". These will be used for navigation links.
-
-config filename to specify a configuration file. Writer2xhtml will load this configuration file before converting your document. You can read more about configuration in section 4.3.
-
-option value to set any simple configuration option, where option the name of a simple option, see section 4.3.
This will produce an XHTML file with the specified name. If no output file is specified, Writer2xhtml will use the same name as the original document, but a different file extension.
Examples:
w2l -xhtml+mathml+xsl mydocument.sxw
or
w2l -xhtml -config myconfig.xml mydocument.sxw
The script w2l also provides a shorthand notation to use the sample configuration file included in writer2latex05.zip. The command line is
w2l -cleanxhtml <writer document to convert> [<output path and/or file name>]
This configuration file produces a ”clean” xhtml file (see section 4.4), for example:
w2l -cleanxhtml mydocument.sxw mypath/myoutputdoc.html
It is recommended that you create scripts to support your own configuration files.
4.2 Using Writer2xhtml as an export filter
If you choose File – Export in Writer you should be able to choose XHTML 1.0 strict, XHTML 1.1 + MathML 2.0 or XHTML 1.1 + MathML 2.0 (xsl) as file type. Using Calc2xhtml as an export filter is not yet supported.
Note: You have to use the export menu because Writer2xhtml does not provide an import filter for XHTML. You should always save in the native format of OOo as well!
4.3 Configuration
XHTML export can be configured with a configuration file. Where the configuration is read from depends on how you use Writer2xhtml:
If you use Writer2xhtml as an export filter in OOo, the configuration is handled as follows:
-
The file writer2latex.xml is read from the user installation directory of OOo
On linux/unix usually something like <home directory>/.OpenOffice.org2/user
On windows usually something like <user profile>\OpenOffice.org2\user
If the file does not exist, it will be created automatically.
If, on the other hand, you use Writer2xhtml from the command line, you will have to specify on the command line which configuration file to use.
The configuration is a file in xml format. Here is a sample configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<option name="xhtml_custom_stylesheet" value="/mystyle.css" />
<option name="xhtml_ignore_styles" value="false" />
<option name="xhtml_use_dublin_core" value="true" />
<option name="xhtml_convert_to_px" value="true" />
<option name="xhtml_split_level" value="1" />
<xhtml-style-map name="mystyle" class="paragraph" element="p" css="mycssstyle" />
</config>
Options
-
The option xhtml_no_doctype can have the values true or false (default). When this option is true, Writer2xhtml will not include the !DOCTYPE declaration in the converted document. The !DOCTYPE is required for a valid xhtml document; this option should only be used if you need to process the document further.
-
The option xhtml_encoding (default UTF-8) is used to specify the character encoding to use for the xhtml document.
-
The option xhtml_custom_stylesheet is used to specify an URL to your own, external stylesheet. If the value is empty or the option is not specified, no external stylesheet will be used.
-
The option xhtml_formatting is used to specify how much text formatting (character, paragraph and list formatting) to export7. Possible values are
-
convert_all (default): Convert all formatting to css.
-
ignore_styles: Convert hard formatting but not formatting by styles. Use this value if you use a custom stylesheet, but still want to be able to add some hard formatting (eg. a centered paragraph, some bold text etc.)
-
ignore_hard: Convert formatting by styles, but no hard formatting (except as given by attribute style maps, see below). Use this if the document is well structured using styles, so that any hard formatting should be considered an error.
-
ignore_all: Convert no formatting at all. Use this value if you use a custom stylesheet and the document is well structured using styles, so that any hard formatting should be considered an error.
-
-
The option xhtml_frame_formatting is used for the same purpose for frame formatting.
-
The option xhtml_section_formatting is used for the same purpose for section formatting. (But note that OOo does not offer section styles currently).
-
The option xhtml_table_formatting is used for the same purpose for table formatting. (But note that OOo does not offer table styles currently).
-
The option xhtml_ignore_table_dimensions is used to specify that you don't want table dimensions (table width, column width and row height) to be exported, but want to leave the layout of the tables to the browser.
-
The option xhtml_use_dublin_core is used to specify if Dublin Core Meta data should be exported (the format will be as specified in http://dublincore.org/documents/dcq-html/). If the value is false, it will not be exported.
-
The option xhtml_convert_to_px can have the values true (default) or false. When this option is true, Writer2xhtml will convert all units to px, otherwise the original units are used. The resolution is assumed to be 96ppi, you can change this with the xhtml_scaling option. Eg. a scaling 75% will change the resolution to 72ppi.
-
The option xhtml_scaling is used to specify a scaling of all formatting, ie. to get a different text size than the original document. The value must be a percentage.
-
The option xhtml_column_scaling is used to specify an additional scaling for table colums. The value must be a percentage.
-
The option xhtml_split_level is used to specify that the Writer documents should be split in several documents and the outline level at which the splitting should happen (the default 0 means no split). This is convenient for long documents. Each output document will get a simple navigation panel in the header and the footer.
-
The option xhtml_calc_split is used to specify that the Calc documents should be split in several documents, one for each sheet. This is convenient for large spreadsheets. Each output document will get a simple navigation panel in the header and the footer.
-
The option xhtml_uplink is used to specify a link which brings the user up in the page hierarchy. For example "../index.html".
-
The option xhtml_directory_icon is used to specify an (icon) image that represents a directory. This is used when Writer2xhtml creates index pages for a directory.
-
The option xhtml_document_icon is used to specify an (icon) image that represents a document. This is used when Writer2xhtml creates index pages for a directory.
-
The option xhtml_use_list_hack is used to fix a problem with continued lists. This will export a list that continues on level 2 or below like <ol><ol><li>...</li></ol></ol>, which is not valid in xhtml, but works in browsers. Also two deprecated attributes are used to continue numbering. Default is false.
-
The option xhtml_tabstop_style can be used to specify a style used for tabstops. Normally tabstops are exported as spaces, but with this option the space will be contained in a span element, eg. <span class="tabstop"> </span>. You can then define a css rule like eg. tabstop { width: 2em; }.
-
The option ignore_double_spaces can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore double spaces, otherwise they are converted to non-breaking spaces.
-
The option ignore_empty_paragraphs can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore empty paragraphs..
-
The option ignore_hard_line_breaks can have the values true or false (default). Setting the option to true will instruct Writer2xhtml to ignore hard line breaks (shift-Enter).
Style maps
In addition to the options, you can specify that certain styles in Writer should be mapped to specific XHTML elements and CSS style classes. Here are some examples showing how to use some of the built-in Writer styles to create XHTML elements:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<!-- map OOo paragraph styles to xhtml elements -->
<xhtml-style-map name="Text body" class="paragraph"
element="p" css="(none)" />
<xhtml-style-map name="Sender" class="paragraph"
element="address" css="(none)" />
<xhtml-style-map name="Quotations" class="paragraph"
block-element="blockquote" block-css="(none)"
element="p" css="(none)" />
<!-- map OOo text styles to xhtml elements -->
<xhtml-style-map name="Citation" class="text"
element="cite" css="(none)" />
<xhtml-style-map name="Emphasis" class="text"
element="em" css="(none)" />
<!-- map hard formatting attributes to xhtml elements -->
<xhtml-style-map name="bold" class="attribute"
element="b" css="(none)" />
<xhtml-style-map name="italics" class="attribute"
element="i" css="(none)" />
</config>
An extended version of this is distributed with Writer2LaTeX, please see the file cleanxhtml.xml.
The attributes of the xhtml-style-map element are used as follows:
-
name specifies the name of the Writer style.
-
class specifies the styles class in Writer; this can either be text, paragraph, frame, list or attribute. The last value does not specify a real style, but refers to hard formatting attributes. The possible names in this case are bold, italics, fixed (for fixed pitch fonts), superscript and subscript.
-
element specifies the XHTML element to use when converting this style. This is not used for frame and list styles.
-
css specifies the CSS style class to use when converting this style. If it is not specified or the value is “(none)”, no CSS class will be used.
-
block-element only has effect for paragraph styles. It is used to specify a block XHTML element, that should surround several exported paragraphs with this style.
-
block-css specifies the CSS style class to be used for this block element. If it is not specified or the value is “(none)”, no CSS class will be used.
For example the rules above produces code like this:
<p>This paragraph is Text body</p>
<address>This paragraph is Sender</address>
<blockquote>
<p>This paragraph is Quotations</p>
<p>This paragraph is also Quotations</p>
</blockquote>
<p>This paragraph is also Text body and has some <em>text with emphasis style</em> and uses some <b>hard formatting</b>.</p>
You can use your own Writer styles together with your own CSS style sheet to create further style mappings, for example:
<xhtml-style-map name="Some OOo style" class="paragraph"
block-element="div" block-css="block_style"
element="p" css="par_style" />
to produce output like this:
<div class=”block_style”>
<p class=”par_style”>Paragraph with Some OOo style</p>
<p class=”par_style”>Yet another</p>
</div>
Note that the rules for hard formatting are only used when xhtml_ignore_styles is set to true. It is not recommended to rely on these rules, using real text styles is preferable. They are included because the use of hard character formatting is very common even in otherwise well-structured documents.
4.4 Using OpenOffice.org to create XHTML documents
The configuration file cleanxhtml.xml that is distributed with Writer2LaTeX, can be used to create semantically rich XHTML content, which can be formatted with your own stylesheet (you should edit the file to add the URL to the stylesheet you want to use).
A subset of the built-in styles in Writer are mapped to XHTML elements (note that the style names are localized, so this is for the english version of OpenOffice.org):
OOo Writer style |
OOo Writer class |
XHTML element |
Text body |
paragraph style |
p |
Sender |
paragraph style |
address |
Quotations |
paragraph style |
blockquote |
Preformatted Text |
paragraph style |
pre |
List Heading |
paragraph style |
dt (in dl) |
List Contents |
paragraph style |
dd (in dl) |
Horizontal Rule |
paragraph style |
hr |
Citation |
text style |
cite |
Definition |
text style |
dfn |
Emphasis |
text style |
em |
Example |
text style |
samp |
Source Text |
text style |
code |
Strong Emphasis |
text style |
strong |
Teletype |
text style |
tt |
User entry |
text style |
kbd |
Variable |
text style |
var |
bold |
hard formatting attribute |
b |
italics |
hard formatting attribute |
i |
fixed pitch font |
hard formatting attribute |
tt |
superscript |
hard formatting attribute |
sup |
subscript |
hard formatting attribute |
sub |
So by using these styles only, you will create well-structured XHTML documents. See the document sample-xhtml.sxw for an example of how to use this.
Warning: Some elements are not allowed inside pre, so this might in some cases lead to invalid documents. This will be fixed in a later version of Writer2xhtml.
7 This and the following options replaces the former option xhtml_ignore_styles.