Original WWW proposal is a Word for Macintosh 4 file from 1990, can we open it?

jasomill · on Feb 13, 2024

For anyone interested, here's the document in modern Word format, with all vector artwork and fonts intact:

To convert it, I first opened and re-saved using Word 98[1] running on a QEMU-emulated Power Mac, at which point it opened in modern Word for Mac (viz., version 16.82).

The pictures were missing, however, with Word claiming "There is not enough memory or disk space to display or print the picture." (given 64 GB RAM with 30+ GB free at the time, I assume the actual problem is that Word no longer supports the PICT image format).

To restore the images, I used Acrobat (5.0.10) print-to-PDF in Word 98 to create a PDF, then extracted the three images to separate PDFs using (modern) Adobe Illustrator, preserving the original fonts, vector artwork, size, and exact bounding box of each image.

At this point, restoring the images was a simple matter of deleting the original images and dragging and dropping the PDF replacements from the Finder.

For comparison, here's the PDF created by Acrobat from Word 98 on the Power Mac

https://jasomill.at/proposal-Word98.pdf

and here's a PDF created by modern Word running on macOS Sonoma

https://jasomill.at/proposal-Word16.82.pdf

[1] https://archive.org/details/ms-word98-special-edition

whoopdedo · on Feb 13, 2024

Did you attempt to extract the pictures so they could be converted directly by another program? Archive Team says that LibreOffice can read vector PICT files[1]. And then saved as SVG. Of course you still have the font problem if it has text. I hadn't thought of using PDF to preserve vectors, but of course it does, as well as embedding the fonts.

[1] http://fileformats.archiveteam.org/wiki/PICT

jasomill · on Feb 14, 2024

Good question. I saved the original document as RTF and extracted what I believe is the raw PICT binary data, but quickly decided on the Acrobat route when I realized I didn't know of any software that could easily convert PICT to a more modern vector format (other than by printing the PICT to Acrobat PDF, but that's essentially what I did in Word with extra steps).

If you want to give it a go, here's the raw PICT data from the RTF:

https://jasomill.at/Picture1.PICT

(extracted from RTF tag \pict\macpict\picw513\pich459)

https://jasomill.at/Picture2.PICT

(\pict\macpict\picw410\pich327)

https://jasomill.at/Picture3.PICT

(\pict\macpict\picw420\pich291)

and here are MacBinary-encoded[1] PICT files containing the same data:

https://jasomill.at/Picture1.bin

https://jasomill.at/Picture2.bin

https://jasomill.at/Picture3.bin

[1] https://en.wikipedia.org/wiki/MacBinary

Encoding is required because the PICT file format stores image data in the file's resource fork[2].

[2] https://en.wikipedia.org/wiki/Resource_fork

Gormo · on Feb 14, 2024

Just tried it and confirmed that LibreOffice can indeed read PICT files as vector images and re-export to SVG.

This can be scripted using the `--convert-to` option on the LibreOffice command line.

jasomill · on Feb 13, 2024

As an aside, MacClippy 98 knew the score:

https://jasomill.at/Clippy.png

throwaway828 · on Feb 14, 2024

MacClippy seems like a useful bot. Similar to AI chat windows on websites without the second guessing.

animal_spirits · on Feb 14, 2024

The sci-fi job of digital archaeologists are becoming real!

tomjakubowski · on Feb 14, 2024

any time you dig through layers of git commit history to answer a question, you are performing archaeology

ragebol · on Feb 14, 2024

I did not expect to read about the LHC in such an 'old' document. I couldn't find (in the time I was willing to spend during work) when the LHC project started to this already be relevant in 1990 (20 years before it started, which is also longer than I would have guessed)

jgrahamc · on Feb 14, 2024

Marvellous. Thank you!

whoopdedo · on Feb 13, 2024

> That way I can see actual fonts, font sizes and layout to confirm how the document should have looked.

Or you would if you had the original fonts. Word 4.0 was released for System 6 with support as far back as System 3.2. Fonts at that time had separate screen and printer files for the different output resolutions. If you're missing the printer font it'll print a scaled (using nearest-neighbor) rendering of the screen font. If you're missing the screen font it'll substitute the system font. (Geneva by default, as seen in the screenshot.)

In this case, only the well-known Palatino and Courier typefaces are needed. But LibreOffice substituted Times New Roman even though I have Palatino Linotype installed.

jasomill · on Feb 13, 2024

This is probably because the (internal) name of Palatino Linotype is "PalatinoLinotype" (for the version shipped with Windows) or "PalatinoLTStd" (for the Adobe OpenType version).

In the absence of a hard-coded special case, font matching based on common prefixes could easily match something inappropriate, such as — taking the first example I see on my machine — mapping "Lucida" to "LucidaConsole", when almost any proportional sans-serif font would arguably be a better match for the document author's design intent.

Then again, even exact name matches provide no guarantees. For example, Apple has shipped two fonts (internally) named NewYork: the TrueType conversion of Susan Kare's 1983 bitmap design for the original Macintosh, and an unrelated design released in 2019.

whoopdedo · on Feb 13, 2024

It's more that I half-expected well-known mappings to be baked in. Like "Times" -> "Times New Roman".

Didn't they also name one of their new fonts "SanFrancisco" much to the ire of Susan Kare fans.

jasomill · on Feb 14, 2024

Yes, but the current OpenType San Francisco fonts use "SF" in their (display and internal) names, so no naming conflict exists with the original "ransom note" bitmap font.

Also, as far as I know, of the original Mac fonts, Apple only ever shipped TrueType versions of Chicago, Geneva, Monaco, and New York. And I'm not aware of any OS with native support for both OpenType and classic Mac bitmap fonts (conversions are always possible, of course).

jgrahamc · on Feb 13, 2024

That may go some way to explaining some of the differences I see, but the main thing I was looking for in the emulation was the font sizes.

aidenn0 · on Feb 13, 2024

Doesn't the font matter almost as much as the font-size setting for font sizes, given that different font families can have wildly different metrics at the same font size?

jgrahamc · on Feb 13, 2024

I bet it does. I should redo the final part after installing the required fonts.

noufalibrahim · on Feb 13, 2024

One underappreciated (though mentioned) hero in this little saga is the venerable file(1) command.

      proposal: Microsoft Word for Macintosh 4.0

It's so incredibly useful and so easily overlooked. I almost reflexively reach out to it when I'm curious about a file and the information it returns is just sufficient to satiate my curiosity and be useful.

cpach · on Feb 13, 2024

I agree, file is such a great tool.

I have cursed so many times in the past when I sat in front of a work computer that ran Windows and didn’t have this tool easily available. (Later on, WSL made life easier, but now I’m luckily nearly Windows-free.)

AdamJacobMuller · on Feb 13, 2024

One might even say that file has a lot of magic in it.

pdmccormick · on Feb 13, 2024

file has a lot of magic, but a file typically has only one magic.

layer8 · on Feb 14, 2024

I'd say it has a number of magic.

noufalibrahim · on Feb 14, 2024

Definitely uses magic to do its work.

msephton · on Feb 13, 2024

LibreOffice opens it right up. It's support for old document file formats is really excellent. I keep it around for just this purpose. https://imgur.com/a/JENgq6V

But I also love using BasiliskII and InfiniteMac emulators!

sigspec · on Feb 13, 2024

Yeah we read the article--- which matches your screenshot.

msephton · on Feb 13, 2024

This is for all the TL;DR folks.

jgrahamc · on Feb 13, 2024

I think your summary is a bit short. Sure, LibreOffice opens the file but there are multiple problems with the formatting that need correcting. Your screenshot shows at least one of them (there shouldn't be any headers on the first page and the page layout should be different).

msephton · on Feb 14, 2024

The question was "can we open it?"

chris_wot · on Feb 13, 2024

The question is: is there a bug report?

Karellen · on Feb 13, 2024

> LibreOffice opens it right up. It's support for old document formats is really excellent.

Yes, the OP also mentions that LibreOffice opens it.

...but they also point out with LibreOffice that "Although there's something weird about the margins and there are other formatting problems." - which is also apparent in your screenshot? Certainly that level support for such an old proprietary format is pretty good, but I'm not sure I'd class it as "really excellent" with those issues.

msephton · on Feb 13, 2024

I should have been clearer: what I meant was that its support for very many different old document formats is excellent. Atari ST, Amiga, Macintosh, and so on. The OP and you are quite right that it won't open the documents with exactly the right formatting, but it's good enough in a pinch so you don't have to learn how to use 40 year old computers. It's a good tool to have.

7zip has similar support for a wide range of compressed file formats, exes, data files, cabinets, and so on. Another good tool to save time and keep you on your modern operating system.

opello · on Feb 14, 2024

> 7zip has similar support for a wide range of compressed file formats, exes, data files, cabinets, and so on.

7zfm.exe (7-Zip File Manager) anyway, which I agree is very useful. I've wanted it in Linux multiple times to avoid creating loopback devices but seem to always find it's Windows only.

msephton · on Feb 14, 2024

I was referring to 7z on the command line.

opello · on Feb 14, 2024

Ah nice, I didn't realize it worked with the wider types of archives. I'm pretty sure I dug into the source in the past when trying to get it to handle an ISO in Linux and found that it was only supported on Windows. But that might have just been the GUI and not the command line tool.

Thanks!

jgrahamc · on Feb 13, 2024

Yes, LibreOffice opened it right up with the wrong font sizes, headers and footers messed up, incorrect gutter and margins, and a bunch of other problems. But they were all fixable.

jasomill · on Feb 13, 2024

Give QEMU a try — current versions do a great job emulating a Power Mac, able to run the most recent PowerPC versions of both classic Mac OS (9.2.2) and Mac OS X (10.5).

voltagex_ · on Feb 14, 2024

With what command line?

Figuring out what to ask qemu to do (without libvirt!) is half the battle.

(Thanks though, I have something to play with tonight)

jasomill · on Feb 14, 2024

On macOS, I typically run it from an .app bundle containing a one-line shell script that execs the following script with the "-monitor vc" option (to enable access to the QEMU monitor via a menu command in the Cocoa GUI; when actively using the monitor, I run the script directly with the "-monitor stdio" option instead, as opening the monitor in the Cocoa GUI hides the emulated Mac's display):

  #!/bin/bash
  export PATH=
  here="$(/opt/ld/bin/realpath -s "$(/usr/bin/dirname "$0")")"
  workdir="$here"
  name="$(/usr/bin/basename "$workdir")"
  qemu='/opt/qemu/bin/qemu-system-ppc'
  
  cd "$workdir" \
      && exec "$qemu" \
          -display cocoa \
          -L pc-bios -boot c -no-reboot \
          -M mac99,via=pmu -m 768 \
          -rtc base=localtime \
          -g 1920x1080x32 \
          -prom-env 'boot-args=-v' \
          -prom-env 'auto-boot?=true' \
          -prom-env 'vga-ndrv?=true' \
          -nodefaults \
          -device pci-ohci,id=usb0 \
          -device usb-kbd,id=keyboard0 \
          -device usb-mouse,id=mouse0 \
          -device VGA,edid=on,vgamem_mb=32,id=vga0 \
          -nic tap,id=nic0,ifname=tap9,script=no,downscript=no,model=sungem,mac=00:50:56:16:65:09 \
          -drive file="$here/disk/Classic.img",format=raw,media=disk,id=hd0 \
          -drive file="$here/../../scratch/$name/Scratch.img",format=raw,media=disk,cache=unsafe,id=hd1 \
          -drive media=cdrom,id=cd0 \
          "$@"

Paths are (obviously) site-specific, realpath is the GNU version — used here to ensure nice-looking absolute paths in light of my heavily symlinked filesystem — and specific details (options supplied in no particular order, $workdir vs $here, etc.) are artifacts of hours of fiddling and not cleaning up afterwards.

I'm currently running a version of QEMU recently built from Git, though I haven't changed this script in years.

For networking, I'm currently using the notarized tap kext bundled with Tunnelblick[1].

Finally, I'm currently using an Intel Mac, so YMMV with Apple Silicon or Linux, though I have no particular reason to believe any command-line changes would be necessary, other than the obvious -display change to something other than cocoa for Linux.

[1] https://www.tunnelblick.net/downloads.html

ogurechny · on Feb 13, 2024

Well, StarOffice already existed back then. Now I wonder whether LibreOffice still has some early '90s third party format parsing code inside, or some reverse engineered compatibility and conversion code from much later Word version actually does the job.

graemep · on Feb 13, 2024

LibreOffice was the first thing I tried, and it worked with no problem.

jgrahamc · on Feb 13, 2024

Well, except for all the problems I outlined in the post.

soperj · on Feb 13, 2024

headline says "open" and libreoffice opened it with no problem.

SequoiaHope · on Feb 13, 2024

I simply opened the file with my hex editor. Problem solved. (sarcasm)

jgrahamc · on Feb 13, 2024

I actually opened it in emacs in hexl-mode before I ran the file command!

skissane · on Feb 13, 2024

In the past, I have in all seriousness read Microsoft Word documents on Linux using less. I might have had LibreOffice installed, but it can’t run over SSH.

It works okay with most old school (pre-XML) ones, since the document text is in the file in plain ASCII amidst all the binary formatting stuff. For the new XML formats, less by itself doesn’t do anything useful, but unzip them and you can read the XML containing the document text.

NikkiA · on Feb 14, 2024

> but it can’t run over SSH.

I know it's being pedantic, but it absolutely can, libreoffice will happily run over a ssh -X tunnelled X display.

skissane · on Feb 14, 2024

Oh yeah, but that would require me to start an X server. Which I could do, but why bother when less does the job?

Also, less starts a lot faster than LibreOffice does

pests · on Feb 14, 2024

Word supported a mode, in order to speed up saving, changes were appended to the file in a diff-like format. How could you know you were reading the right content if it could be overwritten later on?

vidarh · on Feb 14, 2024

I once negotiated a higher offer for a job because the company sent out an offer letter they'd done this with, where the deleted details for another offer gave me info about another role that made me (correctly) guess there was room to ask for more.

pests · on Feb 14, 2024

Reminds me of whatever image format or editor that handled cropping the same way. Data was still there, bounds just redefined.

I remember a celebrity leaking some photos or similar back in the early 2000s or similar.

skissane · on Feb 14, 2024

Sometimes “reading the right content” isn’t that important - e.g. “what is this random doc document about?” “oh, it is a design doc for the XYZ subsystem”. Unless the changes completely rewrote the document into a completely different document, which I expect would be rare

If I was going to use the document in anger, I would open it with something proper, of course

vdaea · on Feb 13, 2024

So does Word 2019 for Windows.

jgrahamc · on Feb 13, 2024

Is the formatting correct? Are the images visible? Because others report (see other comments) that Word opens the file but the images are missing. See the Word generated PDF here: https://news.ycombinator.com/item?id=39359079

vdaea · on Feb 13, 2024

Yes, you are right, apologies. I thought it wouldn't open at all, like in the screenshot in that blog post.

lizknope · on Feb 13, 2024

Yeah, I stopped reading the article, downloaded the file, the only word processor is in Libre Office. It seemed to work fine so I didn't know what the issue was. Then I read the article and kept scrolling to the end where the author finally uses LibreOffice and it opens mostly okay.

markus92 · on Feb 13, 2024

As a testament to Microsoft's backwards compatibility: the file opened mostly fine in the Windows version of Word (version 2401), and the layout seems to be identical to the PDF of the article. It did block the file format by default but that was easy enough to allow.

The graphics did not open however, due to a missing graphics filter for the Microsoft Word Picture format. Seem it's been deprecated for a while now but Word 2003 should be able to open it? Which is old, but not that old not to run on modern systems.

markus92 · on Feb 13, 2024

Installed a copy of Word 2003, document opened flawlessly immediately with default settings. Saving it from there converted it to a modern .doc which I could open with Office 365 and convert to PDF etc.

I think the moral of the story is that the Windows Office team seems to spend a bit more time on backwards compatibility.

Moru · on Feb 14, 2024

I think they spend extra time creating those backward compatibility problems just to make it harder to create a perfect third-party tool.

[1] https://www.infoworld.com/article/2618153/how-microsoft-was-...

jgrahamc · on Feb 13, 2024

I would be interested to see a PDF generated from Office 365 to understand how flawless it really is.

zokier · on Feb 13, 2024

Here you go, exported from desktop Word to PDF.

https://drive.google.com/file/d/1lnaSr22l3kQbmFHnxg3Ggd3-46v...

Full version string:

Microsoft® Word for Microsoft 365 MSO (Version 2311 Build 16.0.17029.20140) 64-bit

jgrahamc · on Feb 13, 2024

Right. So all the images are missing. LibreOffice still gives the best conversion I think.

markus92 · on Feb 13, 2024

Yeah, that’s why you need Word 2003 for the images, it’s a deprecated format full of security holes I guess.

giancarlostoro · on Feb 13, 2024

Ah… yeah I was wondering why they would deprecate an image format at all. My understanding is that Word in the old days serialized what was in memory, maybe that was a little too exploitable with images?

Not sure just curious not even sure where to look that one up honestly.

zokier · on Feb 13, 2024

Digging through the files a bit I think the images are in PICT format which is very specific to Macs (the original ones). Its not surprising that modern Word doesn't support those that well as they are actually somewhat complicated kinda-vector image format. I am surprised that even Word 2003 implemented PICT on Windows.

ogurechny · on Feb 14, 2024

It's not “kinda-vector”, it's a metafile format for QuickDraw operations (Windows did the same later with WMF, which was a list of GDI operations).

http://fileformats.archiveteam.org/wiki/PICT

Imagemagick supports it. What's more important, QuickDraw source is available, so not only we can have “some” conversion, we can also reason about its correctness (to some extent — according to comments, it's from 1982-1985).

https://computerhistory.org/blog/macpaint-and-quickdraw-sour...

Extracting raw embedded PICT files from the document and working with them would be the best way to get proper charts. To see what appeared on paper, we can direct emulated system output to an emulated printer, or capture the PostScript commands and rasterize them at the resolution that was used by device available to the author. It is well known that Word for Windows stored last used printer settings in the document, so it could be the same for files produced by Mac version.

(M-hm, it says “Laserwriter” at 0x10097. Maybe they all do.)

Because Microsoft made the most popular document editor for both Windows and Mac, they had to deal with interoperability of two versions of their own software. Supporting WMF/EMF on Mac meant they had to drag GDI implementation along with Office (luckily, the reference could be grabbed from their colleagues). Supporting PICT on Windows meant they had to re-implement QuickDraw primitives.

https://en.wikipedia.org/wiki/History_of_Microsoft_Word

https://news.microsoft.com/1999/04/26/office-98-built-for-th...

It is totally possible that Office applications used built-in PICT parser even on Mac to make things simple, and not rely on 15 years of compatibility layers in the system.

zokier · on Feb 13, 2024

Probably the completely best would be to use LO for the images and Word otherwise... needs some manual twiddling but I suspect that way you can get pretty much perfect layout and images.

ogurechny · on Feb 13, 2024

Office applications up to (and probably including) version 2010 break and crash on latest Windows versions. That behavior varies based on Office service packs and updates installed. You were lucky to be able to just save the document.

Unless, of course, you've found some portable version on the net that packs ThinApp and an assortment of old system libraries under the hood.

astura · on Feb 14, 2024

This has not been my experience, I'm wondering where you heard this information from?

I have Office 2003 (or maybe it's 2007?) installed on my work computer, no problems. It even happily coexists with whatever modern Office version I have installed on there too.

I also have Office 2010 installed on my home computer and my husband uses it all the time. No issues.

Both computers are running Windows 10, so I guess it's not technically "the latest version."

ogurechny · on Feb 15, 2024

It might be forgotten now, but Office was a trade show for new ideas in Microsoft. Its UI routinely used non-standard controls, and relied on knowing how things move under the hood. On the other hand, it was one of the most important applications which had to remain compatible with any Windows system. There is probably an enormous amount of Office-specific fixes in Windows, and that complex and brittle symbiosis requires continuous work to function.

There is a multi-dimensional community of people who deal with old software and old hardware. It has noticed that compatibility of Windows 10 with old versions of Office got worse in releases made in 2018. It seems that those old Office versions were finally removed from integration testing environments at that time.

Problems are different in different applications (Excel might be the most sensitive). Problems may depend on language version. Problems may only affect certain actions (say, crashing during spelling checks). Problems may depend on updates installed — and certain official updates were known to break some applications for some people even before that, so they certainly were not as thoroughly tested as regular ones.

It is great if everything works for you. It is also possible that your specific combination of COM objects and silently overloaded SxS libraries from multiple Office versions works, while some other combination wouldn't.

lproven · on Feb 15, 2024

This thread is amazing to me in lots of ways, but this line:

> I have Office 2003 (or maybe it's 2007?)

... is the stand out jaw-dropping moment.

I use Word 2003 for outline mode today, because for me, it's the final version that's usable. Word 2007 has the "Fluent UI" with a ribbon, making it unusable to me.

I boggled that someone might not notice what was a deal-breaking total UI change that drove me off a platform I'd been using for 19 years.

astura · on Feb 15, 2024

What I have installed on my work computer is the last version of Office without the ribbon and before the new XML file formats. I don't know off the top of my head if that's 2003 or 2007. What exactly is so "jaw dropping" about not remembering exact version numbers?

I don't use it day-to-day, but I have a few legacy "spreadsheets" (really small programs) from clients I have to work with once in a while that have macros that don't work right in modern office.

lproven · on Feb 15, 2024

Ahh, OK, now I understand!

Sorry. I meant no offence.

There's a big difference -- at least for me -- between "I can't remember if it has the ribbon or not" (very surprising) and "I don't want the ribbon but I don't remember which version introduced it" (perfectly legit).

My apologies.

markus92 · on Feb 14, 2024

I had no problems installing a vanilla Office 2003 on Windows 11 23H2. Got the iso from archive.org and it installed without a hitch.

elzbardico · on Feb 13, 2024

I am deeply disappointed that a company like Microsoft doesn't make a point of Microsoft Word being able to open any document created by any version of Word, no matter how ancient it is. I think they have the social/historical/economical responsibility of doing so.

If they are worried about vulnerabilities in the old parsing code, move it to an external process, run it under isolation in a sandbox to spit out a newer readable version on the fly, but don't eliminate this capability from the software.

EDIT: zokier pointed out to me that the desktop version of Word opens the file fine, it is only the web version that doesn't. So, consider this post void.

EDIT 2: Well it opens the document, but is not able to display or print the embedded graphics, it seems.

larsrc · on Feb 13, 2024

Many old formats were essentially just binary dumps of memory, or something not far removed. Documenting the formats was not a standard. Yes, I agree that there is a social responsibility, but having worked in digital archiving I can tell you that the olden days were really, really messy. No, really.

resters · on Feb 13, 2024

This is the point that many of the commenters who criticize Microsoft are missing, and it's why the old formats are not enabled by default (security vulnerabilities) and why it's not as simple as creating a parser.

autoexec · on Feb 13, 2024

Microsoft still deserves criticism for designing their old word formats so badly. It was a design choice to turn documents of mostly text into obscure binary formats that were badly standardized and maintained.

resters · on Feb 13, 2024

Not true at all. Some of Microsoft's best minds created extremely ingenious methods that allowed early word processors to be usable on files that were dramatically larger than what would fit in memory. OSes didn't support suitable performance via VM infrastructure at the time. It was clever, outside of the box thinking that got MS to be able to beat WordPerfect (a worthy competitor) and the many other also-rans.

There was (contrary to popular belief) not a deliberate strategy to limit interoperability. It was simply the reality of the approaches utilized that made them tightly coupled to the MS Word codebase and less standardizable than would have otherwise been ideal.

Source: one of the guys who worked on it at MS.

layer8 · on Feb 14, 2024

Word 4.0 ran from floppy disks on PC XTs (8088 CPU) with 320 KB of RAM. You can't afford an elaborate parser in such limited memory, or you'd have to swap out its implementation on floppy on every load and save. Just running the parser would have slowed down document loading significantly. The floppy disk capacity also wasn't much larger. You already had to swap the disks for doing spell checking or similar. For comparison, the first web browser (WorldWideWeb) was an executable of about 1 MB and ran on a much faster 32-bit NeXT computer with 8 MB of RAM and a hard drive.

unsui · on Feb 14, 2024

no they don't.

They were effectively working at embedded scale, trying to capture state within tremendously limiting constraints.

This is a case of interpreting past decisions based on current criteria, when those same conditions would have prevented modern methods from being implemented.

bogantech · on Feb 14, 2024

> Microsoft still deserves criticism for designing their old word formats so badly.

I would love to see some modern devs try to write software for a 68000 system with only 512K of memory

OJFord · on Feb 13, 2024

You don't have to go anywhere near 1990 to find issues with modern Microsoft (especially cloud) apps opening documents created in older ones!

kiwijamo · on Feb 14, 2024

Indeed. If I ever end up in the cloud version of Word (or indeed any other app) my first instinct is to click 'Open in App'.

pompino · on Feb 13, 2024

Is there any commercial software development company with better backwards compatibility creds than Microsoft? I'm genuinely curious.

nullindividual · on Feb 13, 2024

Old file formats have security vulnerabilities. The online version of Word is designed for docx only, although it can open certain binary documents.

kelnos · on Feb 13, 2024

No they don't. Parsers can have security vulnerabilities, but you can fix those, and there's little reason why a parser for an old format would have more vulnerabilities than for a new format. Some formats can also have certain (intended) features that have security implications, but parsers can choose to disable them if they are concerned.

o11c · on Feb 13, 2024

Fundamentally, a data file format can't have vulnerabilities. At most it can be prone to vulnerabilities, but more often it's just that popular implementations are bad.

nullindividual · on Feb 13, 2024

Sorry, the Word parser does and Microsoft did not feel it important enough to fix as their focus is on OpenXml formats.

kelnos · on Feb 13, 2024

Then that's on Microsoft. There's no fundamental reason why a secure parser can't be written for old formats.

nullindividual · on Feb 13, 2024

Why would Microsoft do that? It makes zero financial sense to continue with a parser that may need to be rewritten from scratch for a ~30 year old format.

genewitch · on Feb 13, 2024

they can do what they want, and i'll continue on my 2 decade long decision to never give microsoft money, for anything. Same way i'll never give propellerhead another dime, or Plex[0], or any of these other consumer-hostile companies.

I don't trust MS to maintain software, even though as far as that goes, they're better than a lot of companies that have been writing software for decades. "time marches on" is silly when we have millions of times the compute, storage, and transit speeds available to us. I also don't see why people see the need to shill for multi-billion dollar companies.

What microsoft should have done is trademark a new name for their word processor the second they made the decision to not open word .doc from older versions. That way there's no confusion.

[0] having a hard time remembering the name/company of the software i purchased for in-house streaming over a decade ago. Plex is still a hassle to use for in-house streaming compared to the "service" or whatever they're selling. Unfortunately Synology seems to have grown weary of releasing a version of their client for every newfangled device that comes to market, so i'm stuck with plex on my TV; that is, unless i want to use a stick/set-top/computer attached to it.

nullindividual · on Feb 13, 2024

> I don't trust MS to maintain software

Then you should champion removal of any "old" software they have that is under maintenance-only status. You wouldn't want security vulnerabilities to go unfixed, would you?

> What microsoft should have done is trademark a new name for their word processor the second they made the decision to not open word .doc from older versions. That way there's no confusion.

That makes zero sense. Word is still Word. It performs the same tasks (and more) as Word 1.0 did.

And Word today still reads/writes .doc, just not versions that are that old.

zokier · on Feb 13, 2024

You missed the fact that the real Word does open this file just fine, its just the toy web version that has issues (and maybe Mac too but eh)

jgrahamc · on Feb 13, 2024

Yes, it opens it and throws away the graphics, so not "just fine".

zokier · on Feb 13, 2024

If we go into splitting hairs, it doesn't really throw the graphics away, it simply lacks the "filter" to display them but they are there still, as in it recognizes the graphics object correctly and lays out it on the page. Based on the error message, hypothetically I suppose you could even make a custom filter to handle the object.

But this really goes more into the facet of Office files that allowed embedding pretty much anything into them, and relying on this "filter" system (I guess OLE) to handle embedded objects. So while the DOC file itself is getting parsed and rendered pretty much perfectly, the embedded objects are another story.

In the same sense I'd say browser might open some HTML page "fine" even if it doesn't know how to handle some image format that is used on the page; it'd still handles the HTML correctly.

petersmagnusson · on Feb 13, 2024

if you read the blog, the main point of OP’s project was to get at the diagrams, so hardly “splitting hairs”.

jdofaz · on Feb 13, 2024

Makes me wonder if the graphics are in PICT format

zokier · on Feb 13, 2024

I think they are. You can even find some PICT files inside the ODT in the github from TFA

ben7799 · on Feb 13, 2024

The Office 365 Mac version refuses to open it.

You can recover text but the result is horrible. No graphics and all formatting lost.

nullindividual · on Feb 13, 2024

This is expected with the web versions of Office. They can read (certain) binary Office formats but not edit them. The web version of Office is designed for OpenXml file formats.

elzbardico · on Feb 13, 2024

Oh, really? I stand corrected. Thanks for pointing this out.

jgrahamc · on Feb 13, 2024

No, you're not wrong, another commenter points out that latest Word opens the document but doesn't display the graphics.

zdw · on Feb 13, 2024

If you wanted exactly what would have been printed, on the emulator running Word for Mac 4.0 you should be able to install a print queue that can generate a .ps (Postscript) file, which would could be converted to PDF.

Or Acrobat may be available for that old of an OS and would have a virtual print driver to go directly to PDF.

chrisfinazzo · on Feb 13, 2024

https://web.mit.edu/ghostscript/www/Ps2pdf.htm

Or, if you prefer to do more tweaking yourself, dive into the Ghostscript deep end :)

https://www.ghostscript.com

detourdog · on Feb 13, 2024

I know I have running Macs with Word 5.1a which I consider the last Word version needed. I'm sure I opened Word 4.0 files.

kps · on Feb 13, 2024

Yes, a few years ago I helped a friend recover a bunch of old documents. The solution was to use Mac Word 5 to open the Word 4 files and save them as something newer versions could read.

jgrahamc · on Feb 13, 2024

Ah. Great suggestion! I just used Print2PDF to make a PDF from Word. Will update the blog.

dorfsmay · on Feb 13, 2024

LibreOffice is amazing, beside being able to open many document formats, it can run headless and has command line options which allow automating some tasks such as converting format that would not be possible otherwise.

https://help.libreoffice.org/latest/en-US/text/shared/guide/...

https://opensource.com/article/21/3/libreoffice-command-line

scaglio · on Feb 13, 2024

This rises a potential problem, often underrated by companies: some have backups with infinite retention.

It is common to have backups with retention of 10 years, some may have 20 years for legal reasons… but the majority of people don't understand the difference between "readable" and "usable".

Of course, it depends on the data… And there are companies backing up whole virtual machines with infinite retention, believing to be able to run them: it is hard enough to restore a vSphere 5.x machine on a brand new vSphere 8, I really don't understand this waste of space.

rvnx · on Feb 13, 2024

If you backup all, you can sort later, and even eventually never. It costs 1 USD per month at Google Cloud to store 1TB of data.

At this price it's not worth sorting, when one single devops costs 100 USD+ per hour, not including the opportunity cost of not working on something more productive (and less boring for the developer).

Then X years after the company is acquired, or sufficient time has lapsed, you can delete / drop the data without sorting.

Regarding virtual machines, if it's VMDK for example, you can read the raw disks without booting it, and again, it's not worth taking a risk to lose data to potentially save 10 USD per month, which is similar to one developer taking one beer extra at a team event.

scaglio · on Feb 14, 2024

> if it's VMDK for example, you can read the raw disks without booting it

Yes, but that's the difference between "readable" and "usable". Many companies don't realize the technical difficulties to be able to run the VMs. They just expect that it will work, if needed.

actionfromafar · on Feb 13, 2024

Often an old file or disk image is tiny compared to modern file sizes.

So the waste of space is more of an administrative character than a waste of disk space.

arnaudsm · on Feb 13, 2024

Great cautionary tale about how quickly formats get obsolete, especially closed source ones.

I use markdown, plaintext and png for all the documents I need to store long term.

Even if these formats disappear, I could trivially reimplement my own parser.

kragen · on Feb 13, 2024

implementing a markdown parser is far from trivial

implementing a parser that tricks people into believing it parses markdown because it acts like a markdown parser in simple cases is what is trivial

it's likely that your markdown data will indeed be recoverable, but if you're generating it yourself, html is probably safer

samatman · on Feb 13, 2024

The (only) issue is that Markdown isn't a format, it's a loose family of formats with many extensions. Implementing a parser Commonmark is not an especially difficult task in the grand scheme of things, it's quite well specified and has an extensive test suite.

Although I find myself wondering what this "parsing Markdown" business is even about. It's perfectly legible as plain text, that was the main design principle behind it. If the goal is to have your data accessible in future, if you can read it now, and you don't go blind, you'll be able to read it later as well.

kragen · on Feb 16, 2024

in my comment 'markdown' does refer to a format, not a loose family of formats. that is also the case in the comment to which i am replying, as clarified in https://news.ycombinator.com/item?id=39358594, and it is also the case in the comment at https://news.ycombinator.com/item?id=39363100. (https://news.ycombinator.com/item?id=39360358 is using the same divergent definition you are using, and other comments are ambiguous.)

in the controversy around the release of commonmark, john gruber made it clear that his objection to the term 'standard markdown' was specifically that what he meant by 'markdown' was a specific syntax, not a loose family of formats; see https://nitter.lanterne-rouge.info/gruber/status/50765149869...

> @twangus @s_margheim @danielpunkass They’ve done more than “formalize”. They’ve changed the syntax. New syntax, new name. That’s all I ask.

you are of course free to use any word to mean whatever you like, however much it may annoy gruber; but consider that interpreting a word in a way at variance to the rest of the conversation will leave you, unnecessarily, wondering what the conversation is even about, as in this case

if you want to discuss what people should mean when they say 'markdown', please leave me out of that discussion, and please do not attempt to use it to imply that my words meant something i did not intend

it is true that markdown degrades gracefully, as does html. but graceful degradation is not the same thing as faithful preservation of archived documents

it would of course not make sense to write a parser for a loose family of file formats, but it makes perfect sense to write a parser for a file format. as discussed extensively in the funding documents for a project we recently worked on together, this is in fact an essential activity for the faithful preservation of archived documents in that format

samatman · on Feb 16, 2024

> consider that interpreting a word in a way at variance to the rest of the conversation will leave you, unnecessarily, wondering what the conversation is even about, as in this case

True.

So why did you do so?

You're well aware that in common parlance, markdown refers to any of the following incomplete list: Gruber's Markdown reference implementation (practically extinct in the wild), Git-flavored Markdown, and Commonmark. There are others, Julia's version of Markdown has some extensions and is missing at least one feature from the original. Its standard library for parsing this is called Markdown.

To write the comment you wrote, you had to pretend that you didn't know this. That's disingenuous. That kind of "what you're referring to is actually GNU/Linux" sort of nitpick is annoying.

kragen · on Feb 18, 2024

sam, you're capable of interesting, thought-provoking conversations and creations. this kind of kindergarten why-are-you-hitting-yourself nonsense is beneath you

you're well aware you waltzed into a conversation where we were using 'markdown' in precisely the way you're claiming people don't use it, then attempted to impose your own definition on the conversation, despite the fact that the existing conversation made no sense when interpreted through it. doubling down on that by attacking my integrity is contemptible behavior

you're not a contemptible person, and your better side will be ashamed of it when you look back on this conversation later

arnaudsm · on Feb 13, 2024

Parsing markdown is multiple orders of magnitude easier than Microsoft Word, especially before docx.

And it has the merit to be human-readable in plaintext!

kragen · on Feb 13, 2024

that's probably true

jprete · on Feb 13, 2024

But the Markdown document doesn't actually need a parser to still be usable. Markdown as a whole imitates the conventions of typed text. The table formats would even be usable on an old typewriter.

kragen · on Feb 13, 2024

markdown doesn't have tables, although you can include html <table> tags in it.

    perhaps you mean
      indented fixed-width blocks
        you can use for ascii art
          or typewriter-style tables?

kelnos · on Feb 13, 2024

Sure it does. It may not be in the original standard, but many/most parsers support tables that use pipe characters to separate columns.

And regardless, markdown documents -- including the table extension -- are readable without a parser.

kragen · on Feb 13, 2024

extensions to markdown aren't markdown; that's why commonmark is called commonmark

not being able to tell which variant of a language is in use is one of the biggest problems for archival, and in particular various extensions to the microsoft word format (all made by the same company!) were what made jgc's archival work so difficult in this case

language extensions are an especially bad problem when there's no extension mechanism—because sometimes a pipe is just a pipe. but unfortunately markdown's only extension mechanism is html

samatman · on Feb 13, 2024

It's called CommonMark because Gruber insisted. Not because extensions to markdown aren't Markdown®, which no one cares about, and not because it isn't markdown in the ways that matter.

Ironically, his objection was to the idea of a single and rigorous standard, you'll note that Git-flavored markdown never drew his wrath. And yet you're treating him and Swartz's implementation as if it was such a standard. Which it is not.

kragen · on Feb 16, 2024

see https://news.ycombinator.com/item?id=39398056 for corrections

zilti · on Feb 13, 2024

Or org-mode format. Then you even get tables properly.

inopinatus · on Feb 13, 2024

strictly speaking, markdown is a superset of html

mnw21cam · on Feb 13, 2024

The problem with markdown is that if you want to convert it to a formatted set of pages, the output will differ based on the version of your markdown converter. Similarly for HTML and also for plaintext to an extent. A PDF should remain exactly the same forever, but AFAIK the only properly editable document type that really keeps exactly the same formatting over time with updated software releases is TeX/LaTeX. In fact, that is a guarantee - if a LaTeX version doesn't produce exactly the same layout as a previous version for the same input document, it's officially a bug.

zzo38computer · on Feb 14, 2024

For such reasons, I think it is a good idea to use plain ASCII text format to document protocols and file formats as much as possible. (It is especially a problem if the documentation of a more complicated format or protocol requires use of that format or protocol itself.)

There is also Just Solve The File Format Problem wiki (which I have added stuff to), although it uses HTML, and does not include full specifications for all file formats (but it does for some of them), and in some cases are links to external files, but it is helpful to find information about file formats anyways.

ComputerGuru · on Feb 13, 2024

Isn’t markdown plaintext? (I didn’t downvote.)

williamcotton · on Feb 13, 2024

Isn’t HTML plaintext?

;)

ComputerGuru · on Feb 13, 2024

Yes, but not intended to be directly human readable by contrast.

Narishma · on Feb 13, 2024

If it wasn't intended to be human readable it would have been a binary format.

robinsonb5 · on Feb 13, 2024

It may have been intended to be human readable, but it failed dismally in that goal.

Even before the web turned into the javascript infested swamp that is now, the tags having the same visual weight as the text they enclose made it tiring to read.

Markdown's genius is in the formatting tags being almost no hindrance to readability.

williamcotton · on Feb 13, 2024

I definitely agree that Markdown is more readable than markup, but personally I abhor what some frameworks do to HTML. I make sure my HTML is legible! There is even a benefit when it comes to hyperlinks in that you can see the URL!

elzbardico · on Feb 13, 2024

As a society we should have been thinking more about digital preservation since the time we started eschewing archiving hard copies in paper.

People who don't know history are doomed to repeat it, but how can our future generations learn from our mistakes if all our documents are unreadable or lost by their time?

zokier · on Feb 13, 2024

Are you just casually dismissing all the work that digital archivists have done over the past couple of decades?

https://www.loc.gov/librarians/standards

https://www.loc.gov/preservation/digital/

https://www.loc.gov/programs/digital-collections-management/...

and that's just Library of Congress, they are hardly alone in this field

crazygringo · on Feb 13, 2024

I'm surprised he didn't try an intermediate version of Word -- not the original Word 4.0 for Mac, but not the current online version of Word either.

I had a lot of old Word 4.0 for Mac files at one point, and remember some point in the late 1990's or early 2000's opening them all up in a version of Word for Windows, and then re-saving them in a more up-to-date Word format. I believe there was an official converter tool Microsoft provided as a free add-on or an optional install component -- it wouldn't open the "ancient" Word formats otherwise.

There's definitely going to be a chain here of 1 or 2 intermediate versions of Word that should be able to open the document perfectly and get it into a modern Word format, I should think -- and I'm curious what the exact versions are. (Although as other people point out, if you don't need to edit it, then exporting it as PostScript in Word 4.0 and converting it to PDF works fine too.)

jasomill · on Feb 13, 2024

As I've discovered while playing with this document and reading this thread:

Current Word for Mac blocks opening the file under discussion, with no obvious workarounds.

Current Word for Windows will only open the file with non-default security settings, and won't render the images at all.

Per Microsoft, PICT image support was removed from all versions of Word for Windows in August 2019[1].

The current version of Word for Mac fails to render the images with a misleading error message ("There is not enough memory or disk space to display or print the picture.").

As for fonts, they should render fine assuming you have matching fonts, where "matching" is defined by some application- or OS-specific algorithm, e.g., a post above indicates LibreOffice (on Linux?) substituting Times New Roman for Palatino when Palatino Linotype was avilable, whereas current Word on Windows 11 has no problem rendering Palatino as Palatino, presumably using the copy of Palatino Linotype installed with the OS.

Finally, if matching spacing (character, word, and line), line breaks, and page breaks is important, you should definitely open the document using as close a version of Word as possible with the exact fonts used when creating the document installed.

Oh, and hope the original author didn't rely on printer fonts without matching scalable screen fonts available, or else you're probably SOL unless your goal is printing to a sufficiently similar printer.

[1] https://support.microsoft.com/en-gb/office/support-for-pict-...

traceroute66 · on Feb 13, 2024

Interestingly, the latest and greatest version (desktop app via Office365) of Microsoft Word on Mac appears to know what it is but refuses to open it.

If you drag the file onto Word, it launches a dialogue box telling you "proposal uses a file type that is blocked from opening in this version" along with a link to the supporting page on the Microsoft website[1].

[1] https://support.microsoft.com/en-us/office/error-filename-us...

worik · on Feb 13, 2024

> telling you "proposal uses a file type that is blocked from opening in this version"

"blocked"?

That sounds like Microsoft has some IP problems with their old software.

rietta · on Feb 13, 2024

Extremely interesting and thank you for doing this. I feel strongly that this goes to show just how important preserving historical software and emulation is. I have dabbled myself with old Windows 3.1 software for this very reason. We really, truly are going to have a period where web application driven software just disappears and we wont easily have this retro computing view of these decades in a short time from now.

dfxm12 · on Feb 13, 2024

I also think it is important to show the importance of open formats or open source in general if we want future generations to read our documents or run/compile/understand our software.

dzdt · on Feb 13, 2024

Somehow the author doesnt recognize that emulation is a legitimate answer to this question. Yes he was able to open the document, by using the original software on a highly accurate emulation of the original system. Everything beyond that point is a different question: can we get it inside of a modern word processor.

londons_explore · on Feb 13, 2024

Emulation is starting to get gaps too... for example, running Windows 95 in an emulator on a modern machine is getting harder and harder (emulators like vmware and virtualbox don't emulate the CPU speed accurately, which causes the system not to boot, and they also don't emulate various paging behaviours of old intel CPU's accurately which causes windows applications to crash within a few seconds of starting).

There are binary patches to windows 95 to fix these issues, but as the system gets older it's less likely people will put effort into binary patching it for compatibility with modern systems. And if it were more obscure, you'd be SOL.

fourfour3 · on Feb 13, 2024

Whole system emulation like 86box does a much better job of emulating older hardware and OSes - I use it quite a bit for DOS/Win3.11/Win9x era stuff.

thawkth · on Feb 13, 2024

PCem is far, far better for Win95 emulation - it can handle a P2 233 and a Voodoo3 fairly accurately - and tons and tons of hardware on top of that.

It’s amazing. I keep a 95 / 98 and some other vintage machines around as a hobby, but being able to play Unreal in an emulator with 3D acceleration blows my mind

fourfour3 · on Feb 14, 2024

How have you found the Voodoo 3 emulation? I have found it a bit ropey in 86box/PCem - but I find voodoo 1 or 2 works really well.

Narishma · on Feb 13, 2024

Those are virtual machines, not emulators. If you use a proper emulator like PCem or 86box, Windows 95 works fine.

thaumasiotes · on Feb 13, 2024

> running Windows 95 in an emulator on a modern machine is getting harder and harder (emulators like vmware and virtualbox don't emulate the CPU speed accurately, which causes the system not to boot, and they also don't emulate various paging behaviours of old intel CPU's accurately which causes windows applications to crash within a few seconds of starting)

I thought the normal way to run Windows 95 was in dosbox?

jgrahamc · on Feb 13, 2024

Sort of. What I wanted was to be able to get a PDF version of it. I was hoping that a modern word processor would read the file format, and LibreOffice did. But it's also true that using emulation I was able to get a PDF (albeit one that has different fonts).

nextaccountic · on Feb 13, 2024

> it's also true that using emulation I was able to get a PDF (albeit one that has different fonts).

Maybe you needed to have the right fonts installed in your emulated mac? Another comment in this thread pointed out this

0xcde4c3db · on Feb 14, 2024

See also: "How to hire Guillaume Portes" [1]

(also "autoSpaceLikeWord95" in case anyone shares that specific brainworm with me and is Ctrl+Fing for it)

[1] https://www.robweir.com/blog/2007/01/how-to-hire-guillaume-p...

api · on Feb 13, 2024

Today's historic working documents will mostly be SaaS hosted documents in systems like Google Docs, Notion, etc. In the future nobody will be able to open them. They won't exist, and the software won't exist, and there will be no way to restore it since the software is SaaS that can't be emulated or even installed anywhere.

jtotheh · on Feb 13, 2024

Tragically, Postscript support has been largely removed from MacOS now. Apparently the language was weird enough that supporting it made some (in)security hacks possible. I guess I'm old ! I remember first finding out about it in 1986 when is very "leet". Postscript printers were big $.

I say tragically because Postscript was pretty key in making DTP as compelling as it used to be, which kind of saved the Mac in terms of being the "killer app" for it.

I think you may be able to run some kind of postscript support in some tool from Adobe, or even Ghostscript. And probably, the newer software is better, but it's sad that you can't view a postscript file on macOS out of the box now.

jasomill · on Feb 14, 2024

While I agree — my first exposure to PostScript as a programming language was playing around with examples from the Adobe "blue book"[1] over a bidirectional serial connection to a LaserWriter sometime in the '80s — nothing in this document requires PostScript.

The embedded images are in PICT format, and TrueType versions of the three fonts used (Courier, Helvetica, and Palatino) have shipped with all versions of the Mac OS since System 7 in 1991.

And while Word 4.0 shipped in 1989, so did Adobe Type Manager[2], which supported Type 1 fonts onscreen and on non-PostScript printers, though to get a Type 1 version of Palatino for ATM at that time you'd have also needed the Adobe Plus Pack[3] (or possibly acquiring Palatino by other means; I don't recall when Adobe started selling individual fonts and the Font Folio).

[1] https://archive.org/details/postscriptlangua00adobrich

[2] https://www.nytimes.com/1989/12/19/science/personal-computer...

[3] https://archive.org/details/adobe-a

Lammy · on Feb 14, 2024

> or possibly acquiring Palatino by other means

Relevant: The Palatino FAQ (1998)

https://web.archive.org/web/19990202052926/http://www.mindsp... https://news.ycombinator.com/item?id=24005172

jtotheh · on Feb 14, 2024

Your information is much more detailed and specific. I was just giving an example of the loss of support for old software/formats. I didn’t mean that postscript support was involved in this particular case.

bilsbie · on Feb 13, 2024

I wonder if it would be a viable business to keep running versions of computers going back say 40 years and offering to recover and convert files for people. (Just getting stuff off floppy disks and Zip drives might be useful)

Sembiance · on Feb 13, 2024

This does an “okay” job at converting the document: https://archive.org/details/KeyViewPro

Here is the converted PDF: https://smallpdf.com/result#r=091f20f23de353fac21376a3a49a60...

jgrahamc · on Feb 13, 2024

Not sure that's really true. It did something but the images are a mess and a lot of formatting is gone. I think LibreOffice is still the winner here.

aidenn0 · on Feb 13, 2024

Somewhat off-topic, but I remember Word for Windows 6.0 would take considerable time (like a minute for a 10 page document on my AM386DX/40) to reflow paragraphs across page-breaks (trying to handle widows, orphans &c). If I made an edit to the first page and hit print before it was done, I would end up with a printed document that contained either duplicated or dropped lines at page boundaries.

acheron · on Feb 13, 2024

"Here's a 4000 year old letter from a merchant to his partners describing how to avoid taxes by smuggling goods in their underwear." ( https://www.britishmuseum.org/blog/trade-and-contraband-anci... )

vs

"Not sure if it's possible to read this 30 year old file!"

kelnos · on Feb 13, 2024

I get the point you're trying to make, but your former example is rare. While there are more exceedingly-old paper records that are still around and have been preserved than we might expect, we've lost so, so much. Paper and ink (and variations on that) are both fragile.

Digital documents are otherwise easy to preserve indefinitely, if care is taken up-front to choose a simple document format that is likely to remain parseable (or at least documented) for a long time. And even when you don't do that, there's always the possibility of writing a parser later (assuming documentation is around) or reverse-engineering the format.

And in this case, the 30-year-old file did end up getting opened, albeit not as trivially easily as one might hope.

thaumasiotes · on Feb 13, 2024

> but your former example is rare. While there are more exceedingly-old paper records that are still around and have been preserved than we might expect, we've lost so, so much. Paper and ink (and variations on that) are both fragile.

Depends what you mean by "rare". Ancient Near Eastern correspondence isn't rare at all, precisely because they didn't use paper. (And they went to war a lot.) You seem to be writing as if that letter was a paper document, but it isn't. Paper records that old only exist in Egypt.

> Digital documents are otherwise easy to preserve indefinitely, if care is taken up-front to choose a simple document format that is likely to remain parseable (or at least documented) for a long time.

This isn't a good match to the example either; Ancient Near Eastern records had to be deciphered. (The Semitic ones had to be deciphered. The Sumerian ones benefited from surviving documentation, but we had to find that and learn how to read it.)

The original example isn't particularly apt; reading this 30-year-old file, or a similar one, is a task that one guy can do in less than a week using existing tools and know that he's done it correctly. Reading a 4000-year-old cuneiform letter was a much larger project than that.

pjmlp · on Feb 14, 2024

Until they find a storage medium that don't deteriorate through time, nope, digital storage is still worse than plain paper or clay, in losing its storage capacity and it is enough to have one bad bit.

LarryMade2 · on Feb 14, 2024

Props to LibreOffice

Recently I was asked to locate an old form document which I found it was written in WriteNow for Macintosh, libreOffice opened it up easily (even without a filename extension) and except for some font substitutions the tables seemed to be all correct. Very impressive.

peter_hansteen · on Feb 14, 2024

This reminds me of my own screed of a much simpler document (an ASCII table generated as a printer test back in the late 1980s) that was not possible to render correctly some years later - https://bsdly.blogspot.com/2013/11/compatibility-is-hard-cha... - also contains a link to a further rant about other document formats that were supposed to be "standard" and "portable".

anonymouskimmer · on Feb 13, 2024

WordPerfect claims the ability to open MS Word 4.0 files. The standard edition is currently $175. I'm not buying it, but if you're willing to spend $175 it might be something to try.

bluedino · on Feb 13, 2024

That Mac Word screenshot gives me claustrophobic flashbacks to trying to work on those tiny screens in middle school computer lab, writing science fair papers.

retrac · on Feb 13, 2024

Heh, that screenshot is relatively high-resolution for the time in question, too. 800x600 maybe? The compact Macs were 512x342: https://www.betalogue.com/images/uploads/microsoft/pce-mac-w... (The toolbars, rulers, etc., could be hidden in the settings.)

cynicalsecurity · on Feb 13, 2024

It wasn't so bad. It's better now, but it was fine back then.

whoopdedo · on Feb 13, 2024

I consider it more of not knowing how much better we could have had it. Small monitors were "normal." But I imagine people who got to work with the Portrait Display[1] (an impressive 640x870 resolution!) felt then as we do now when they had to switch back to the internal screen.

[1] https://wiki.preterhuman.net/Apple_Macintosh_Portrait_Displa...

Dwedit · on Feb 13, 2024

Is there a way to make a PS or PDF file using the actual Word for Macintosh 4? I'd think that would be the definitive render.

wrs · on Feb 13, 2024

Keep reading…he did that. But it’s not clear he had the right PS fonts installed.

jgrahamc · on Feb 13, 2024

I probably did not as I did it really fast after someone suggested it.

aidenn0 · on Feb 13, 2024

Normally I have good success with abiword, but it completely barfs on this file; it seems to be falling back on its RTF support.

stuaxo · on Feb 13, 2024

This is good.

It would be good to get some feature requests into libreoffice to fix the remaining mis-matches in the formatting.

dusted · on Feb 14, 2024

It's an interesting problem we have with file formats.. Emulation saves us, but at which point will we need to run emulators in emulators to reach the documents ? I suppose it's still somewhat easier than trying to understand some symbols on a cave wall..

jmclnx · on Feb 13, 2024

I have a few Wang WP Documents from decades ago. I could not open them at all. Libreoffice thought they were corrupted Word Docs.

So the concern about some document formats being unreadable is still valid. Who knows what obscure proprietary formats exist out there.

pseingatl · on Feb 14, 2024

Wasn't Multimate a Wang clone? Of course, finding an 8" floppy drive might be difficult.

jmclnx · on Feb 14, 2024

It could have been. The Docs I have were created on the Wang PC using Wang WP. This 5¼" diskettes were used on those.

I actually coped then to 3½" later on.

willmadden · on Feb 13, 2024

MS word for mac 16.16 opens it with the diagrams intact in "compatibility mode". The only issue is the text is indented slightly too far on the left.

Libre Office opens it with the same quality, but has some weird gray ghost lines around tables.

vman81 · on Feb 14, 2024

> I downloaded the latest Apache OpenOffice and it did open the file

The last decade of Apache OpenOffice can VERY generously be described as "maintenance mode". Most of the pull requests are grammar and dictionary tweaks.

im_down_w_otp · on Feb 14, 2024

There’s a System 7.1 Mac SE/30 sitting 2ft to my right with Word 5 on it. Send it to me. I’ve got you. Using a combination of LocalTalk and two other computers on that shelf I should get it up to Office 2001 in no time.

DrNosferatu · on Feb 15, 2024

All you need at is at:

https://infinitemac.org/1998/Mac%20OS%208.1

j45 · on Feb 13, 2024

https://www.ebay.com/itm/235033043066

The original word for macOS software seems more than available.

voltagex_ · on Feb 14, 2024

ITT: people repeatedly making the same mistakes, misunderstanding archival and also ignoring glaring problems with converted output

ogurechny · on Feb 14, 2024

Just ask The Neural Net to draw something appropriate to illustrate the given text. There's little noticeable difference.

(ducks and runs away)

jxdxbx · on Feb 13, 2024

Amazing that you can just pop up an emulator in a browser window. Retro Mac emulation used to be such a pain in the ass.

_rupertius · on Feb 13, 2024

Now do one with Google Docs

melomac · on Feb 13, 2024

I was able to download and transfer the proposal document to a Mini vMac emulator, set the Finder's type and creator to those of a Microsoft Word 5 document i.e. respectively WDBN and MSWD, and finally open the document with Microsoft Word 5 for Mac to export it as a RTF document.

Here you have it: https://neko.melomac.net/tmp/proposal.rtf

I certainly agree opening a document from this Macintosh era should be, by far, easier than the process I detailed below, but this is how it is ¯\_(ツ)_/¯

jgrahamc · on Feb 13, 2024

Thanks. Unfortunately, the images are all missing.

melomac · on Feb 13, 2024

It is even more frustrating that the image are in the document, and Microsoft Word for Mac would still display them accurately.

And LibreOffice would display the images in the RTF document in a different size (a tiny block).

If my old Mac display would work, I could have been able to send the document over to CUPS via Netatalk, and make a PDF out of it. Unfortunately Mini vMac can't connect to that VM on the LAN...

Anyhow, it is scandalous that opening legacy documents became such a PITA.

cxr · on Feb 14, 2024

I've been collecting notes about this file for a few years.

Some of the information in this post was previously covered right here in the comments on HN a few years back: <https://news.ycombinator.com/item?id=12793157>

The top reply there links to an online file(1)-like tool that identified it as a MacWrite II document. Last time I checked, the tool was updated and identifies the file as "Word for the Macintosh document (v4.0)" (pretty much what my system's file(1) says about it).

We actually have a scan of Robert Cailliau's copy with his handwritten notes (including the infamous, "Vague but exciting..." remark). It's neither 20 nor 24 pages but instead 16 and differs in several respects: <https://cds.cern.ch/record/1405411>; the version linked in the post and described erroneously as "the original" on w3.org clearly isn't the original and has been changed in several ways besides just "the date added in May 1990". Rather, the May 1990 version here is the second revision of the original that was first passed to Cailliau, and by November 1990 Berners-Lee and Cailliau were calling this second revision "HyperText and CERN"[1][2].

That is, "Information Management: A Proposal" is the one authored solely by TBL and given to Cailliau. It's not the version that appears here. "HyperText and CERN" from May 1990 is what we're looking at here, but was mistakenly also published as "Information Management: A Proposal". Later, TBL and Cailliau coauthored a joint work called "WorldWideWeb: Proposal for a Hypertext Project"[1][3] that referenced "HyperText and CERN" by name.

TBL is also known to have used WriteNow—there are lots of .wn files littering w3.org. I now believe (since last summer) that it's likely that TBL authored this revision of the proposal in WriteNow (even if he didn't save it in the WriteNow format) or used WriteNow at least for the RTF export. Refer again to [2].

1. <https://cds.cern.ch/record/2639699/files/Proposal_Nov-1990.p...>

2. <https://www.w3.org/Administration/HTandCERN.>

3. <https://www.w3.org/Proposal>

cxr · on Feb 14, 2024

> We actually have a scan of Robert Cailliau's copy with his handwritten notes (including the infamous, "Vague but exciting..." remark).

Sorry, it was late when I wrote this. That was actually Mike Sendall (though TBL and Cailliau did collaborate on the others).

cranberryturkey · on Feb 13, 2024

libreoffice opened it.

kelnos · on Feb 13, 2024

Sure, but the layout was screwed up and the fonts and sizes were wrong.

Certainly this is helpful: it's better to be able to open a document and then have to manually fix those issues than to be unable to open it at all. But it was far from perfect.

EasyMark · on Feb 13, 2024

It's orders of magnitude better than "I can't open this file at all, -1"?