Dismantling a DOCX

Software — Joe Anderson @ 8:22 pm Thursday 26 January 2006

This hack no longer works. If you need a file converted I might help if you email it to computerjoe (at) gmail (dot) com. PGP key here
Ubuntu user? Try this

DOCX is the format uses by default. This format is based, however, with a little work it can be backwards compatible.

Like OpenOffice.org documents, is a bunch of files, mainly XML based, stuck together in a zip. However, DOCXs also contain a Microsoft Office 97-2003 compatible .DOC . In this guide I walk guide you through how to get this file, and the rest of the data from a DOCX.

I made this DOCX earlier. It is basically 5 paragraphs of Lorem Ipsum.

Once you’ve downloaded it, if you already have or any program associated with DOCX make sure you are in the folder view which displays file extensions (you do this by going into Control Panel>Folder Options>View>And Unchecking Hide Extensions for Known File Types). Then, go into the folder containing test.docx. In the folder ensure it does have the file extension, if it doesn’t, please follow the above steps!

The first step is to change it’s file extension to zip, so the file becomes test.zip.

The file is now ready for you to open in WinZip, WinRAR, 7-Zip or any other ZIP program. I personally use, and suggest, WinRAR. You could extract them, though it’s not necessary, it’d just make life easier. For that reason, I’m extracting them.

This is everything contained within the zip:

To access the actual .doc what you need to do is to go into word. There, you have your .doc, and various XML files.

Oi Viola!

The XML through all of it is nifty too, so if you like that kind of thing, take a look.

Office 12 does a fantastic job at compressing it too. All the files, extracted, are a total of 105KB big. The DOCX/ZIP is only 21KB big! That’s one fifth the size of the original files.

41 Comments »

  1. Ahhh Joe! You speak a languagr that I do not understand, at all! In fact it might as well be Chinese, for all I am getting out of it! (LOL)
    Sorry about that….
    Here from Michele today and wish I had a clue!

    Comment by OldOldLady Of The Hills — 28 January 2006 @ 2:35 pm
  2. So does that mean if I rename the DOCX as .zip, I can use the native compressed folder capability of XP or Vista to view the innards? This would be the fastest way to grab the .doc and doesn’t need any special utilities?

    Written from a Mac!

    And Michele sent me today!

    Best.

    rashbre

    Comment by rashbre — 28 January 2006 @ 3:03 pm
  3. Thanks for the tech tips… here from Michele’s!

    Comment by Charles — 29 January 2006 @ 2:11 am
  4. holy krispy kremes batman! i actually was able to decipher a good deal of that. you must be a flat out genius. i admire anyone who can wrap their minds around the complexities of such advanced programming.
    do you like what you see so far with Office 12?
    i’m here via michele’s by the way. :)

    Comment by jacque — 29 January 2006 @ 2:52 am
  5. office 12 beta1 files do not contain the .doc files anymore

    Comment by kevinh90 — 2 April 2006 @ 3:13 am
  6. Hi,

    Did what was said, didn’t find a *.doc file! which then brought me to the comments made and read that Kevin had mentioned beta 1 doesn’t have the doc file!

    Any suggestions of how we may view the docx files now?! (without reinstalling office 12 of course)

    Comment by Ash — 17 May 2006 @ 1:42 pm
  7. Err… you can abstract the text manually from the XML.

    Comment by Joe — 17 May 2006 @ 5:02 pm
  8. Ash, if you want to view Office 2007 docx files in Office 2000/XP/2003 then you will need to install the Compatibility Pack. Choose “File->Open” in Word after installing and change the file type to “All Files” so you can see the docx file.
    http://www.microsoft.com/office/preview/beta/converter.mspx

    If anyone wants to have a look at the XML code then I suggest viewing the files in IE because it automatically lays the contents out making it more readable.

    Comment by mcm — 2 June 2006 @ 1:16 pm
  9. Thanks very much for laying this out. I just got a .docx sent over and had no idea what it was - now I’ve been able to extract the text from it! You rock.

    Comment by Wil Harris — 17 July 2006 @ 10:21 am
  10. hi there, how can i view a 2007 office docx file on a 2003 office?

    Comment by Leyli Nashiba — 22 July 2006 @ 11:36 am
  11. Hi,
    Thanks a lot, it helped I am able to extract the contents.
    Gurv

    Comment by egurv — 22 August 2006 @ 3:50 pm
  12. thanxx yaar. I was bit apprehensive about using .docx files earlier

    Comment by varun tyagi — 7 October 2006 @ 1:39 pm
  13. I didn’t know about these .docx files. I’ve got MS Office 2007 beta and there were some problems with file extensions. I removed MS Office and tried to install Works 8 and after that I am not able to open any Works program. Word processor provided with Works 8 doesn’t open any doc or txt file, it just shows a note Application couldn’t start or sth.
    I don’t know what to do with it.

    Comment by Norbert — 14 October 2006 @ 11:23 am
  14. I’m scratching my head a little why 5 paragraphs of Lorem Ipsum is 105KB and why your saying its small. :P

    Comment by Nick — 23 October 2006 @ 9:39 pm
  15. Grrreaatt! Thanks all, the one by mcm really helped.

    Comment by Joey — 8 November 2006 @ 8:31 am
  16. I followed your instructions for opening a .docx file. I changed the extension to .zip, then opened it as a ZIP file. I expected to extract some ’stuff’ among which I would find the .doc file. But no “oi viola!”. There were a bunch of .htm files but no .doc file. %^$^%#@*

    Comment by david f watts — 5 December 2006 @ 6:53 pm
  17. I couldn’t find your docx file on this site; instead I was redirected to http://lipsum.com/ and was unable to save in anything but HTML format. I did try your procedure on a docx I obtained from anther site, but it did not contain a Word 97-2003 doc file, only xml files.

    Comment by James — 27 December 2006 @ 4:07 am
  18. NOTE THIS DOESN’T APPEAR TO WORK ANYMORE (AS A HACK)

    Comment by Joe Anderson — 28 December 2006 @ 11:05 pm
  19. This is a very cheap trick from microsoft, offering the beta version for free and then expiring it without any warning..very useful tip indeeed with regards to that docx to doc conversion.
    otherwis i would have lost all my works

    Comment by Gates B — 3 February 2007 @ 8:38 pm
  20. THIS HACK DOESNT COMPLETElY WORK ANYMORE..

    the below link is useful to recover the plain text version of a docx document

    http://docx-converter.com/convert-online.php

    Comment by Muthu — 3 February 2007 @ 9:22 pm
  21. Hi. I just wrote a long reply, but forgot my email. and the whole thing deleted itself, so I’m going to be brief. I installed Office 2007 beta 2. In the .zip there was no .doc, the compatibility pack didn’t work, but I managed to extract the plain text online. But this paper is due tomorrow and I had tons of footnotes in the document.

    Is there anyone I could mail it to, who happen to have Office 2007 installed? They could save it as an Office 2003 and send it back? Please?

    Stupid microsoft…

    Comment by Ingrid — 4 February 2007 @ 12:00 pm
  22. oh and by the way, changing the file ending to .zip didn’t change the program (WORD). But i chose WinRAR automatically instead. still no .doc…

    If you can help, email me at ingolfdur(at)hotmail(dot)com

    Comment by Ingrid — 4 February 2007 @ 12:02 pm
  23. Have you seen http://www.docx2doc.com ? Amazing service where you can upload your docx and have it automatically converted to the format of your liking. For example you can go from Word 2007 (docx) to Word 2003 (doc). Or you can go from docx to rtf which is a more universal format.

    Check it out :)

    Comment by macuser — 11 February 2007 @ 7:43 am
  24. hi, I’m a bit curious as to how you figured out the relationship between a docx and a zip file. I feel that knowing this will allow me to figure out how to repair a database file (*.fgl) by uncovering what it “really” is, if that makes sense. I’d appreciate any help regarding this matter.

    Comment by Choong — 20 February 2007 @ 7:31 pm
  25. Open Office is OpenSouce product — free for use, you may have extremely small text files, extremely small calc tables, and presentations.
    It can easily convert all you .doc .rtf .xlc files to its format and you will be pleasantly surprised by freed space on your HD.
    For those who left suferring MS Office you may share your files easily save as .doc or .xls
    You need no licenses to use it!
    You will have no payments for future releases at all.
    Have a nice day.

    Comment by Alex — 21 March 2007 @ 7:41 am
  26. Wow - that’s a superb bit of info. I knew Office 2007 was meant to use xml-based files, but when I went to view source had no luck. This makes it all clear.

    Comment by Rich Quick — 31 March 2007 @ 8:00 pm
  27. Spot On!
    thanks for the tip. Now one begins to wonder why all this would be necessary… was the original *.DOC format really that bad?

    thanks!
    -mk

    Comment by Mike — 2 April 2007 @ 3:28 pm
  28. .DOC isn;t “really that bad” in any technical sense. It works, right?

    However, it is a closed, proprietray format that is not compatible with other formats and, in the long run, is bad for that reason. Open XML is “supposed” to be more “open” but unfortunately it, too, suffers from Microsoft’s monopolistic mindset and is not really all that “open”. Google a bit about it and you will see what I mean. Maybe.

    However, Open XML is a step in the right direction, methinks.

    But why would anyone shell out for MS Office (or Windows Vista, for that matter) when there are so many at-least-as-good-if-not-better FREE alternatives? (And I don’t mean “freeware”, I mean Open Source alternatives.)

    OpenOffice.org

    Ubuntu.com

    later.

    I love the docx2doc.com suggestion! But I wonder how long *that* website will be around….

    -JDS

    Comment by JDS — 26 April 2007 @ 6:48 pm
  29. Install the Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats:

    Comment by Office ID — 3 May 2007 @ 10:03 pm
  30. I opened the docx file as a zip. Couldn’t see a .doc in there. Maybe some have it, some don’t. A setting in MS-Office 2007 maybe?

    I have tried the Novell Openoffice 2.1 and added the Novell DocX converter plugin odfconverter-1.0.0-2.oxt, but the current plugin seems not to work. I understand Openoffice 2.3 will have a built-in filter for .docx .xlsx .pptx Word/Excel/ppt files.

    Someone further up the thread mentiond Microsoft Works and not being able to open works files. Openoffice can open and edit MS-Works files and save them in any common format including saving to .doc , .xls etc.

    You can get OpenOffice without charge, and without “pirating” from http://www.openoffice.org/ . Openoffice works on Windows, and has a standard user interface, unlike the newer MS-Office products. So if you’re used to Word 2002, you should have no trouble with OO2.

    Comment by NickH — 12 July 2007 @ 9:16 pm
  31. The original hack only works on very old .docx files made in a beta version of Office 2007.

    Comment by Joe Anderson — 13 July 2007 @ 6:53 am
  32. For the record, docx2doc.com is no longer free. They have begun charging. I suppose if you realy need to convert a lot of files over long periods of time, it’s a great site.

    Comment by Keith — 24 August 2007 @ 6:18 pm
  33. Right click on docx document and save it.
    Open ‘My computer’
    Find document
    Right click on document for drop down menu
    click on ‘open with’
    click on Microsoft word.

    Not my expertise, my daughter’s!

    Comment by Julia Medd — 17 September 2007 @ 4:07 pm
  34. Julia: Opening a docx is that simple if you have Office 2007 but most people have to convert them

    Comment by Joe Anderson — 17 September 2007 @ 4:09 pm
  35. Thanks for that Joe, I and my friends are begining to regret I ever paid to upgrade my micrsoft Office! Could you give some simple move by move instructions (as mine above) to others who like me find computer speak impossible to remember or follow, regarding ‘how to do the conversion’ if they have Microsoft 2003?

    Comment by Julia Medd — 17 September 2007 @ 10:03 pm
  36. Julia: Try this.

    Comment by Joe Anderson — 18 September 2007 @ 6:42 am
  37. Download the Compatibility Pack from msn. Its much easy and safe.

    http://www.microsoft.com/downloads/details.aspx?FamilyId=941B3470-3AE9-4AEE-8F43-C6BB74CD1466&displaylang=en

    Comment by azad — 22 November 2007 @ 8:03 am
  38. I know good tool - recover docx, know how recovering a corrupted docx file is very easy to use, a step-by-step wizard allows used by anyone for Microsoft Word recover docx, from experts to beginners to recover corrupt docx file, most popular tools for recovering a corrupted docx file and docx document recovery, can right now and test, whether it can help you and recover your damaged files in Microsoft Word format, restore corrupt docx files right now, this reliable solution will save many hours of your precious time for manual recover damaged docx file.

    Comment by Alex Krenvalk — 12 December 2007 @ 1:31 pm
  39. Open Source Sux! And I mean literally… Open Formats, sure, but Open Source?! Amateur software that has caused me (personally) many problems in the past. Just buy Office you cheap sod. Open formats let developers (anywhere) use the files you create in Word/Excel/Other in their own products…. Open Source means “please DONATE your time to building this for me, I won’t pay you but we’ll give it away free, because thats what people want”… Which also puts other software developers out of business (by the way people study for degrees in software, just as lawyers and accountants study for their qualifications, people will eventually stop developing if you dont pay them) Stupid bloody open source “Community”

    Comment by OPENSOURCESUX — 20 April 2008 @ 10:31 pm
  40. Modifying the extension to .zip is brilliant, but a bit technical for the average person. After researching, this is what we came up with:

    If you’re on a PC, the fastest way to view/print/copy a .docx document is to install the Microsoft Word 2007 Viewer:
    http://www.microsoft.com/downloads/details.aspx?FamilyId=3657CE88-7CFA-457A-9AEC-F4F827F20CAC&displaylang=en

    Most people won’t have much trouble downloading and installing the small program; however, remember that you must be a Windows Administrator to install this. If so, you may need to switch Windows users if the program prompts you with an Administrator rights error.

    Hope this helps.

    Comment by toneee.com — 17 July 2008 @ 10:17 am
  41. One more important note regarding the above post! You must download and install the following link AFTER you’ve downloaded the viewer.

    If you already have Microsoft Office Word installed on your PC, you ONLY need to install the below link. However, those without Word will need to install the Viewer (see previous comment above this one) first, then the below link.

    http://www.microsoft.com/downloads/details.aspx?familyid=941b3470-3ae9-4aee-8f43-c6bb74cd1466

    Comment by toneee.com — 17 July 2008 @ 10:31 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Licence. (c) 2008 Webby’s World | Privacy Policy | Powered by WordPress
Designed by Comma Dot Colon on the Barecity theme.