The standard can be downloaded from the ISO website at this direct link
DOCX documents are a zipped folder containing several interacting components in a word doc. The main ones are:
word/document.xml: The main document contentword/styles.xml: Name style information (e.g. "Header 1"), similar to CSSword/numbering.xml: Sort of like CSS for numbering styles (e.g., "a)" vs "iii.")
Note on measures: The fundamental unit in DOCX is the TWIP, a "twentieth of a point", where a point ("pt") is 1/72 of an inch. Typically, properties referring to a physical length will accept a number indicating TWIPS or a string with a number followed by "mm|cm|in|pt|pc|pi" to indacte the units.
The main document content consists of a sequence of block-level items wrapped in
a body element. There are other "stories" you can include beyond body, such
as comments, headers, etc. The main types of block-level content are paragraphs
(p) and tables (tbl). Block-level elements have a sub-element specifying
their "properties" (pPr for paragraphs and tblPr for tables), which include
different options for styling and layout of the element. Each option corresponds
to a child element for properties. Tables and paragraphs have different
properties available as follows:
-
Style name
pStyle: reference to an entry inword/styles.xml. Sort of like a CSS class -
Numbering info
numPr: reference to an entry inword/numbering.xml. The reference is both an id (numId) and a level number (ilvl). Paragraphs with this property have a number/bullet placed before the beginning of text. -
Tab stops
tabs: Contains a list of tab stops (tab) to set on the given paragraph. Each tab stop specifies a distancepos, a stop type (val), and an optionalleader, indicating the fill character. Valid values for these attributes are:val: "clear", "start", "center", "end", "decimal", "bar", "num"leader: "none", "dot", "hyphen", "underscore", "heavy", "middleDot"pos: The distance from the left margin to the tab stop
-
Indentation
ind: This is an object with the following attributes:start: Indentation from the left margin. Negative values move text backwards.firstLine: Additional indentation for the first line. Ifhangingis also given, this is ignoredhanging: Negative indentation for the first line. TrumpsfirstLineif also givenend: Additional margin to leave empty on the right. Negative values move the margin backwards.
All of these properties accept a TWIPS number or number + unit value. They also all have alternates suffixed with
Chars(e.g.,startChars) to specify the indentation in "character units" -
Spacing
spacing: Controls spacing between lines and above/below the paragraph. The core attributes are as follows:before: Similar to CSS margin-top, in TWIPS or measure + unit.beforeLines: Similar to CSS margin-top, measured in hundredths of a lineafter: Similar to CSS margin-bottom, in TWIPS or measure + unit.afterLines: Similar to CSS margin-bottom, measured in hundredths of a lineline: Similar to CSS line-height, in 240ths of a line. The meaning of this attribute can change iflineRuleis not blank orauto(see the spec for details)- There are a couple more attributes, which see section 17.3.1.33
-
There are many more possible properties for a paragraph. See the spec for details on the following
- adjustRightInd
- autoSpaceDE
- autoSpaceDN
- bidi
- cnfStyle
- contextualSpacing
- divId
- framePr
- jc
- keepLines
- keepNext
- kinsoku
- mirrorIndents
- outlineLvl
- overflowPunct
- pBdr
- pageBreakBefore
- shd
- snapToGrid
- suppressAutoHyphens
- suppressLineNumbers
- suppressOverlap
- textAlignment
- textDirection
- textboxTightWrap
- topLinePunct
- widowControl
- wordWrap
-
Style
tblStyle: A reference to a style inword/styles.xml. Sort of like a CSS class -
Left indent
tblInd: This element has two attributes used to specify the leading indentation for a table,typeandw. Depending on the value oftype,wtakes on a different meaning as follows:dxa:wis interpreted as a number of TWIPSpct: Ifwis a number, it is interpreted as 1/50ths of a 1% of the document width (excluding margins). If it ends in "%" then it species the percentage of document width directlynil:wis ignored and margin is 0auto:wis ignored and margin is deferred to parent styles
-
Borders
tblBorders: Contains up to six elementstop,start,bottom,end,insideH, andinsideV(the first four correspond to the CSS top, left, bottom, and right). If there is a conflict between a cell border and the table border, cell borders typically win (but seetblPrExfor the corner-case where they don't). Each element has the following attributes:color: "RRGGBB" in hex (no leading "#") or "auto"sz: Border size in eighths of a point. Minimum border size is .25pt and maximum border size is 12pt.val: Type of border. E.g., "single", "dashed", "dotted", "double", etc. See the spec for the full list (17.18.2)- Many more. See the spec (17.3.4)
-
Width
tblW: This is an indication of the preferred width, which is an input into the overall layout algorithm. This element has the same two attributes astblIndabove.
-
Other table properties include:
- bidiVisual
- jc
- shd
- tblCaption
- tblCellMar
- tblCellSpacing
- tblDescription
- tblLayout
- tblLook
- tblOverlap
- tblStyleColBandSize
- tblStyleRowBandSize
- tblW
- tblpPr