4 Configuration
4.2 Writer2xhtml and Calc2xhtml configuration
Also the XHTML export can be configured with a configuration file in xml format. This is a sample configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<option name="custom_stylesheet" value="/mystyle.css" />
<option name="ignore_styles" value="false" />
<option name="use_dublin_core" value="true" />
<option name="convert_to_px" value="true" />
<option name="split_level" value="1" />
<xhtml-style-map name="mystyle" family="paragraph" element="p" css="mycssclass" />
</config>
The following subsections explains the available options. The options written in italics can be set using the dialog if you use Writer2xhtml as an export filter.
Style options
You can control some general aspects of the generated XHTML documents using these technical options.
template_ids | This option is used to specifiy the id's used for XHTML templates. These should be provided as a comma separated list defining the id for content,header,footer and panel in that order. The list can be truncated if you don't need them all. The default is empty, which is equivalent to content,header,footer,panel. |
pretty_print | Set this option to false (default is true) if you don't want “pretty print” (using indentations and line breaks) in the XHTML output. |
no_doctype | If you set this options to true (default is false), Writer2xhtml will not include the !DOCTYPE declaration in the converted document. The !DOCTYPE is required for a valid XHTML document: This option should only be used if you need to process the document further. |
encoding | This option is used to specify the character encoding to use for the XHTML document. Currently supported encodings are UTF-8 (default), UTF-16, ISO-8859-1 and US-ASCII. Characters not supported by the encoding are exported as numeric character entities. |
hexadecimal_entities | When this option is set to true (default) numeric character entities are exported using hexadecimal numbers, otherwise decimal numbers are used |
use_named_entities | If you set this options to true (default is false), Writer2xhtml will use named character entities as defined by (X)HTML. If you export to XHTML+MathML, also named MathML entities will be used. |
add_bom | In rare cases, it may be required to ad a BOM (Byte Order Mark) to the XHTML document. Most applications will not need this, but you can set this options to true to enable this (default is false). |
multilingual | Set this to false (default is true) to remove language information from the file (except on the root element) |
separate_stylesheet | Set this to true (default is false) to generate a separate CSS file if the XHTML document is split over several files (thus avoiding repeating the style information in every file). |
custom_stylesheet | Use this option to give an URL to your own, external CSS stylesheet. If the value is empty or the option is not specified, no external stylesheet will be used. For more advanced solutions (eg. different style sheets for screen viewing and printing) you can use an XHTML template – see below. |
The following options are used to control the conversion of the formatting in the source document. If you use an external CSS style sheet, this is important to define.
formatting | The option formatting is used to specify how much text formatting (character, paragraph and list formatting) to export5. Possible values are convert_all (default): Convert all formatting to CSS. ignore_styles: Convert hard formatting but not formatting by styles. Use this value if you use a custom stylesheet, but still want to be able to add some hard formatting (eg. a centered paragraph, some bold text etc.) ignore_hard: Convert formatting by styles, but no hard formatting (except as given by attribute style maps, see below). Use this if the document is well structured using styles, so that any hard formatting should be considered an error. ignore_all: Convert no formatting at all. Use this value if you use a custom stylesheet and the document is well structured using styles, so that any hard formatting should be considered an error. |
frame_formatting | Used for the same purpose for frame formatting. |
section_formatting | Used for the same purpose for section formatting. (But note that LO does not offer section styles currently). |
table_formatting | Used for the same purpose for table formatting. (But note that LO does not offer table styles currently). |
table_size6 | This option defines how to export table dimension. The possible values are: none: Do not export table dimensions (table width, column width and row height), leaving the layout of the tables to the browser. auto (default): Convert the dimensions in the source document using relative or absolute values as defined. relative: Convert the dimensions in the source document, but always using relative values for table width and column width. |
list_formatting | This option determines how list formatting is exported. Possible values are7: css1: List formatting is exported using CSS1. This only provides basic support for list labels, and currently the browsers default indentations are used. css1_hack: This value is used to fix a problem with continued lists. Writer2xhtml will export a list that continues on level 2 or below like <ol><ol><li>...</li></ol></ol> This is not valid in XHTML, but works in browsers. Also two deprecated attributes are used to continue numbering. hard_labels: If you use this value, list labels are exported as part of the text. This adds full support for list labels (e.g. labels of the form 1.2.3). Unlike the other values indentations of the list are exported as well. |
max_width | This option is used to specifiy the maximum width of the text (XHTML and text documents only). The default value is 800px. An empty value means that a maximum width is not set. |
tabstop_style | Used this option to specify a style used for tabstops. Normally tabstops are exported as spaces, but with this option the space will be contained in a span element, eg. <span class="tabstop"> </span> You can then define a CSS rule like eg. tabstop { width: 2em; } |
use_default_font | Set this option to true (default is false) to ignore all font information in the document and use a default font for the entire exported document. |
default_font_name | Use this option to supply a font name to use if the option use_default_font is set to true. A blank value will not insert ant font information. |
In addition, a number of options defines how dimensions in the source document should be handled.
convert_to_px | When this option is true (default), Writer2xhtml will convert all units to px, otherwise the original units are used. The resolution is assumed to be 96ppi, you can change this with the scaling option. Eg. a scaling of 75% will change the resolution to 72ppi. For EPUB export this option will export font sizes as percentages (and use px for other dimensions). |
scaling | Use this option to specify a scaling of all formatting, ie. to get a different text size than the original document. The value must be a percentage, default is 100%. |
column_scaling | Use this option to specify an additional scaling for table colums. The value must be a percentage, default is 100%. |
image_size8 | Use this option to specify how to export the size of images and text boxes: Possible values are absolute or auto (default, export absolute size), relative (export the size as a percentage of the current text width) and none or original_image_size (do not export size information; hence the browser or reader will use the original (unscaled) image size). |
relative_font_size | Set this option to true (default) false to export all font sizes as percentages rather than using absolute dimensions. The font size is calculated relative to the default font size in the document. |
font_scaling | Use this option to specify a scaling for all font sizes if relative_font_size is set to true. Default is 100%. |
Options for special content
formulas | If you are not exporting to XHTML+MathML or HTML5, this option defines how formulas are treated. The possible values are starmath (default) to export the formula in StarMath notation, latex to export the formula in LaTeX notation, image+starmath and image+latex to export the formula as an image, with an alt attribute giving the formula in StarMath or LaTeX notation. |
use_mathjax | If you export to XHTML+MathML or HTML5, you can set this option to true (default is false) to load the MathJax library on pages with fomulas. This will ensure that formulas are viewable on all modern browsers, even if they do not support MathML natively9. |
embed_svg | If you export to HTML5, you can set this option to true to export vector graphics embedded in the HTML documents as SVG (scalable vector graphics). If set to false (default), external SVG image files will be used. |
embed_img | If you set this option to true (default is false), all images will be embedded directly in the HTML document (base64 encoded). This is not recommended for documents with large images. |
endnotes_heading | In LO the endnotes are set on a separate page at the end of the document. It is not possible to give this page a heading, but you can use this option to add a heading. In EPUB export this heading will also appear in the navigation table. Default is empty (no heading). |
footnotes_heading | In LO the footnotes may be set on a separate page at the end of the document (if configured to do so). It is not possible to give this page a heading, but you can use this option to add a heading. In EPUB export this heading will also appear in the navigation table. Default is empty (no heading). |
use_dublin_core | Use this option to specify if Dublin Core Meta data should be exported. For the XHTML export, the format will be as specified in http://dublincore.org/documents/dcq-html/). For the EPUB export this option has no effect. If the value is false, it will not be exported (default is true). |
notes | If this option is set to true (default), notes in the document will be exported as XHTML comments. These are not directly visible in the browser. If you don't want to include notes, set this option to false. |
display_hidden_text | If this option is set to true (default is false), paragraphs and text portions marked as hidden will be exported. Otherwise they will be ignored. |
include_toc | If this option is set to true (default), the table of contents is exported. If it is set to false, the table of contents is ignored. The latter possibility is mainly intended for EPUB, which also provides an external navigation table. |
include_ncx | This option is specific for EPUB export. If the target format is EPUB 3 and this option is set to true (default is false), a navigation document in the old NCX format is included together with the EPUB 3 Navigation Document. |
AutoCorrect options
ignore_double_spaces | This options can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore double spaces, otherwise they are converted to non-breaking spaces. |
ignore_empty_paragraphs | This option can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore empty paragraphs.. |
ignore_hard_line_breaks | This option can have the values true or false (default). Setting the option to true will instruct Writer2xhtml to ignore hard line breaks (Shift-Enter in LO). |
File options
external_toc_depth | In addition to the text content, an EPUB document contains a table of contents, which can be used for navigation in the reader. This table is generated by Writer2xhtml from the headings in your document. This option is used to specify the number of levels to include in the table. The default value is auto, which determines the depth from the option split_value. If you want to set the depth independent from split_value, set this option to a positive integer. |
split_level | This option is used to specify that the Writer documents should be split in several documents and the outline level at which the splitting should happen (the default 0 means no split). This is convenient for long documents. Each output document will get a simple navigation panel in the header and the footer (with labels in the same language as the document). |
repeat_levels | If you split the document, you can use this option to specify that headings of higher levels should be repeated on page breaks. This may help the user to identify the current position in the document. Default is 5 (all levels are repeated). |
page_break_split | An alternative method to split the document is to use the original page breaks. Possible values are none (default): Do not split at page breaks. styles: Split at page breaks which are defined in styles. explicit: Split at all explicit page breaks (page breaks defined in styles and manual page breaks) all: Split at all page breaks. Automatic page breaks may occur within a paragraph, list or table, but Writer2xhtml will not split until this structure has ended. Also in this case, each output document will get a simple navigation panel in the header and the footer. |
split_after | This option (which only has effect for EPUB export) is used to automatic split long documents. When a single file exceeds the number of characters defined by this option (in 1000s), the document will be split at the first possible break point. The value 0 disables automatic split. |
image_split | This option (which only has effect for EPUB export) is used to convert large images to "full screen" images. The value of the option can be either none or a percentage. If set to a percentage, an image which is wider than this percentage and has an aspect ratio of at least 3:4 is placed in a separate file. |
cover_image | If you set this option to true (default is false), the first image in the document is used as cover image in EPUB export. |
save_images_in_subdir | Images contained in the document are normally placed in the same directory as the XHTML document. If the document contains a large number of images, it may be more convenient to put the images in a subdirectory. Set this option to true to do this. |
uplink | This option is used to specify a link which brings the user up in a page hierarchy. For example "../index.html". |
Options specific for spreadsheet documents
calc_split | Set this option to true if you want spreadsheet documents should be split in several documents (one for each sheet). This is convenient for large spreadsheets. Each output document will get a simple navigation panel in the header and the footer. The default value is false, which means that the entire spreadsheet will be converted to a singe XHTML document. |
display_hidden_sheets | Set this option to true if you want to export sheets that are defined as hidden. Default is false. |
display_hidden_rows_cols | Set this option to true if you want to export rows or columns that are defined as hidden. Default is false. |
display_filtered_rows_cols | Set this option to true if you want to export rows or columns that are not visible due to a filter. Default is false. |
apply_print_ranges | I you set this option to true, the print ranges defined in the document will be used. The content of the result will thus be identical to the content of printed output. If you set the option to false (default), the content of the output will be identical to the content that you can see when editing the document. |
use_title_as_heading | If you set this option to true (default), the title of the document will be included in the XHTML document as a heading. |
use_sheet_names_as_headings | If you set this option to true (default), the sheet name will be added as a heading above each table in the XHTML document. |
Options for batch conversion
directory_icon | Used to specify an URL for an (icon) image that represents a directory. This is used when Writer2xhtml creates index pages for a directory. |
document_icon | Used to specify an URL for an (icon) image that represents a document. This is used when Writer2xhtml creates index pages for a directory. |
Style maps
In addition to the options, you can specify that certain styles in Writer should be mapped to specific XHTML elements and CSS style classes. Here are some examples showing how to use some of the built-in Writer styles to create XHTML elements:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<!-- map LO paragraph styles to xhtml elements -->
<xhtml-style-map name="Text body" family="paragraph"
element="p" css="(none)" />
<xhtml-style-map name="Sender" family="paragraph"
element="address" css="(none)" />
<xhtml-style-map name="Quotations" family="paragraph"
block-element="blockquote" block-css="(none)"
element="p" css="(none)" />
<!-- map LO text styles to xhtml elements -->
<xhtml-style-map name="Citation" family="text"
element="cite" css="(none)" />
<xhtml-style-map name="Emphasis" family="text"
element="em" css="(none)" />
<!-- map hard formatting attributes to xhtml elements -->
<xhtml-style-map name="bold" family="attribute"
element="b" css="(none)" />
<xhtml-style-map name="italics" family="attribute"
element="i" css="(none)" />
</config>
An extended version of this is distributed with Writer2LaTeX, please see the file cleanxhtml.xml.
The attributes of the xhtml-style-map element are used as follows:
name specifies the name of the Writer style.
family10 specifies the style family in Writer; this can either be text, paragraph, heading, frame, list or attribute. The last value does not specify a real style, but refers to hard formatting attributes. The possible names in this case are bold, italics, fixed (for fixed pitch fonts), superscript, subscript, underline and overstrike.
element specifies the XHTML element to use when converting this style. This is not used for frame and list styles.
css specifies the CSS style class to use when converting this style. If it is not specified or the value is “(none)”, no CSS class will be used.
block-element only has effect for paragraph and heading styles. For paragraphs it is used to specify a block XHTML element, that should surround several exported paragraphs with this style. For headings it is used to specify the element containing the entire heading (the element is used for the text content only, excluding the label).
block-css specifies the CSS style class to be used for this block element. If it is not specified or the value is “(none)”, no CSS class will be used.
before and after only has effect for paragraph and heading styles. This attribute defines a fixed text to add before/after the text of all paragraphs formatted with this style. This is similar to the pseudo-elements ::before and ::after in CSS.
For example the rules above produces code like this:
<p>This paragraph is Text body</p>
<address>This paragraph is Sender</address>
<blockquote>
<p>This paragraph is Quotations</p>
<p>This paragraph is also Quotations</p>
</blockquote>
<p>This paragraph is also Text body and has some <em>text with emphasis style</em> and uses some <b>hard formatting</b>.</p>
You can use your own Writer styles together with your own CSS style sheet to create further style mappings, for example:
<xhtml-style-map name="Some LO style" family="paragraph"
block-element="div" block-css="block_style"
element="p" css="par_style" />
to produce output like this:
<div class=”block_style”>
<p class=”par_style”>Paragraph with Some LO style</p>
<p class=”par_style”>Yet another</p>
</div>
Note that the rules for hard formatting are only used when formatting is set to ignore_hard or ignore_all. It is not recommended to rely on these rules, using real text styles is preferable. They are included because the use of hard character formatting is very common even in otherwise well-structured documents.
XHTML templates
You can use your own XHTML document as a template for the generated XHTML documents. This should be an ordinary XHTML file (do not include DOCTYPE declaration) with some special elements:
An element with the id content is used to fill the text content. If no such element exists, the <body> element is used. If there is no <body> element in the template, the root element is used.
Elements with the id header or footer (optional) will be filled with a simple navigation panel using a first/previous/next/last scheme (for spreadsheet documents, sheet names are used for navigation).
An element with the id panel (optional) will be filled with a simple navigation panel using a table of contens-like scheme.
You can change the names of the id attributes using the template_ids option.
A simple template including a header might look like this:
<html>
<head>
<title/>
</head>
<body>
<div id='header' />
<div id='content' />
</body>
</html>
As the template does not include footer and panel nodes, these elements will not be included.
A template with all the elements, suitable for HTML5 might look like this:
<html>
<head>
<title/>
</head>
<body>
<header><nav id='header' /></header>
<aside><nav id='panel' /></aside>
<div id='content' />
<footer><nav id='footer' /></footer>
</body>
</html>
The absolute mininal template is this:
<div/>
The div-element will be used as the content container. The generated document will not be a complete XHTML document (no <html>, <head> and <body> nodes). It will however still be a well-formed XML file that can be handled with standard tools. The use case for this is that you can produce XHTML fragments suitable for inclusion in e.g. a CMS.
Note: Make sure to set the option no_doctype to true in this case!