Documents Flashcards Grammar checker. Docx4j – Getting Started This roc is for docx4j 2. The most up to date copy of this document is in English. From time to time, it is machine translated into other languages. Recent versions of docx4j also support Powerpoint pptx files and Excel xlsx files.

As an open source project, docx4j has been substantially improved by a number of contributions see the README or POM file for contributorsand further contributions are always welcome. Please see the docx4j forum at http: The Docx4j social contract docx4j is currently available under the Apache Software license. This gives you freedom to do pretty much anything you like with it. It also means you don’t have to pay for it there odcx4j no incentive to take up a commercial license, so we don’t offer one.

The quid pro quo is that if docx4j helps you out, you should please “give something back”, by way of code, community support, by “spreading the word” promotionor by buying commerical dov services. If you choose promotion, your options include: It includes a question on the above. You can think of docx4j as a JAXB implementation of amongst others: Support for new Word features will be added soon. The docx4j project is sponsored by Plutext www.

Is docx4j for you? Docx4j is for processing docx documents and pptx presentations and xlsx spreadsheets in Java.

It isn’t for docx4n binary. If xoc wish to invest your effort around docx as is wisebut you also need to be able to handle old doc files, see further below for your options. Nor is it for RTF files. If you want to process docx documents on the. An alternative to docx4j is Apache POI.

I’d particularly recommend that if you are only processing Excel documents, and need support for the old binary xls format. What sorts of things can you do with docx4j? Specific to docx4j as opposed to pptx4j, xlsx4j: You can try it or download its source code xoc www. The relevant parts of docx4j are generated from the ECMA schemas.


The main problem with those is that the XML namespace is different. An effective approach is to use OpenOffice via jodconverter to convert the doc to docx, which docx4j can then process. If you need to return a binary. There is also http: If a pure Java approach were required, this could be converted.

If you can volunteer to moderate a forum in another language for example, Ho, Chinese, Spanish…please let us know. Using docx4j binaries You can download the latest version of docx4j from http: Supporting jars can be found in the.

Command Line Samples With docx4j version 2. The two to try both discussed in detail further below are: To actually enable logging, log4j usually requires a log4. See for example http: You can disable the autoconfiguration by setting docx4j property “docx4j. If you are using Eclipse to run things, in the run configuration: The following table explains the other dependencies: Using docx4j via Maven As from version 2.


This makes it really easy to get going with docx4j. No need to mess around with manually installing jars, setting class paths etc. The blog entry hello-maven-central shows you what to do, starting with a fresh OS Win 7 is used, but these steps would work equally well on OSX or Linux. JDK versions You need to be using Java 1.

This is because of JAXB1. If you must use 1. If you are using 1. A word about Jaxb docx4j uses JAXB to marshall and unmarshall the key parts in a WordprocessingML document, including the main document part, the styles part, the theme part, and docs4j properties parts.

So if you are using the 1. We modified the wml. Javadoc Javadoc for browsing online or download, can be found in the directory http: Our subversion repository is obsolete. See docx4j-from-github-in-eclipse for details. Building docx4j from source Get the source code from GitHub see abovethen… you probably want to skip down to the next page, to get it working in Eclipse. You can get them from the binary distribution, or via maven.

Enable Maven make sure you have Maven and its plugin installed – see Docx4 above: If not, remove, then click “Add Library” Now, we need to check the class path etc within Eclipse so that it can build. Using a different IDE? Please post setup instructions in the forum, or as a wiki page on GitHub.

The type is not accessible due to restriction on required library rt.

WordprocessingMLPackage represents a docx document. File inputfilepath ; There is a similar signature to load from an input stream.


A similar approach works for pptx files: File inputfilepath ; And similarly for xlsx files. A Part is usually XML, but might not be an image part, for example, isn’t. The parts form a tree. If a part has child parts, it must have a relationships part which identifies these. The part which contains the main text of the document is the Main Document Part. Each Part has a name.

If the document has a header, then the main document part woud have a header child part, and this would be described in the main document part’s relationships part. Similarly for any images. To see the structure of any given document, see “Parts List” further below. An introduction to WordML is beyond the scope of this document. You can find a very readable introduction in 1st edition Part 3 Primer at http: Specification versions From Wikipedia: Office supports 4 transitional, and also has read only support for strict.

Docx4j can open documents which contain Word content.

As noted in "

AlternateContent contained in the document. If you use docx4j to save the document, the w Architecture Docx4j has 3 layers: Parts are generally subclasses of org. JaxbXmlPart This the jaxb content tree is the second level of the three layered model. Parts are arranged in a tree.

If a part has descendants, it will have a org. RelationshipsPart which identifies those descendant parts. The sample PartsList see next section shows you how this works.

A JaxbXmlPart has a content tree: These classes were generated from the Open XML schemas 3. For example, there is a MainDocumentPart class. When you want to work with the contents of a part, you work with its jaxbElement. When you open a docx document using docx4j, docx4j automatically unmarshals the contents of each XML part to a strongly-type Java object tree the jaxbElement.

Sometimes you will want to marshal or unmarshal things yourself. Parts List To get a better understanding of how docx4j works — and the structure of a docx document — you can run the PartsList sample on a docx or a pptx or xlsx.

If you do, it will list the hierarchy of parts used in that package. It will tell you which class is used to represent each part, and where that part is a JaxbXmlPart, it will also tell you what class the jaxbElement is.