GUITACA BLOGS
Blogs from our authors

Office 365: The best recipes for developers
Gustavo Velez
Learn more and buy
Return to Blogs
Microsoft OpenXML format for Office documentsGustavo Velez
Microsoft Open XML is an XML- and ZIP-based file format developed by Microsoft for spreadsheets (Excel), charts (Visio), presentations (PowerPoint) and word processing (Word) documents23-01-2021
Word

The OpenXML format is standardized by ECMA (European Computer Manufacturers Association) as ECMA-376, and by ISO (International Organization for Standardization) as ISO/IEC 29500. Support for OpenXML is provided since Microsoft Office 2010, but only from Office 2013 onward has it had full read/write backing. OpenXML is not only supported by Microsoft Word; several other text processors can read and write Office OpenXML format (LibreOffice, OpenOffice, Lotus Notes, WordPerfect, etc.).

An OpenXML based document is not more than a .ZIP file with several XML files inside. To open a .docx Word document, for example, add the .zip extension at the end of the file name, and open it with any zip program. The parts inside an OpenXML package are just plain text (XML) files, that can be viewed using any kind of text reader and can be parsed using processes such as XPath.

he OpenXML Software Development Kit (SDK) provides the tools for working with Office Word, Excel, PowerPoint, and Visio documents. The SDK is an open-source project maintained on GitHub (https://github.com/OfficeDev/Open-XML-SDK), and the binaries can be applied directly in Visual Studio through one NuGet (https://www.nuget.org/packages/DocumentFormat.OpenXml). Although the current version of OpenXML is 2.9.1 and the OpenXML SDK version is 2.5, the functionality in the last version of the OpenXML SDK is like version 2.5, therefore, the OpenXML SDK 2.5 for Office documentation available on MSDN is still accurate.

To work with OpenXML in Visual Studio, install the NuGet DocumentFormat.OpenXml. This NuGet installs references to DocumentFormat.OpenXml and WindowsBase.

The OpenXML SDK supports the following programmatic tasks:

    - Strongly Typed Classes and Objects. Instead of manipulating directly the XML, use the SDK objects that represent elements/attributes/values. All schema types are represented as strongly typed Common Language Runtime (CLR) classes and all attribute values as enumerations.

    - Content construction, search, and manipulation. LINQ, integrated into the SDK, allows to perform functional constructs and lambda expression queries directly on objects inside the document.

    - Validation. Provides validation functionality of documents, enabling to certify against variations not allowed on the format.

As an example of the use of OpenXML, a new Word document with text can be created directly to a file using the following CSharp routine.

public static void WordOpenXmlCreateDocument()
{
	using (WordprocessingDocument myWordDoc =
		WordprocessingDocument.Create(@"C:\Temporary\WordDoc01.docx",
											WordprocessingDocumentType.Document))
	{
		MainDocumentPart docMainPart = myWordDoc.AddMainDocumentPart();

		docMainPart.Document = new Document();
		Body docBody = docMainPart.Document.AppendChild(new Body());
		Paragraph docParagraph = docBody.AppendChild(new Paragraph());
		Run docRun = docParagraph.AppendChild(new Run());
		docRun.AppendChild(new Text("Text in the document"));
	}
}

Return to Blogs