After years of working exclusively on Windows, I took a job that required me to build file management, except now, on macOS and Linux (along with Windows).
All I can say is, this article is the tip of the ice berg on Windows I/O weirdness. You really don't realize how strange it is until you are actively comparing it to an equivalent implementation on the two other competing operating systems day-to-day.
Each of them has their quirks, but I think Windows takes the cake for "out there" hacks you can do to get things running. I sometimes like to ponder what the business case behind all of them was, then I Google it and find the real reasons are wilder than most ideas I can imagine.
>Windows I/O weirdness. You really don't realize how strange it is until you
"can I get a wut... WUT?" - Developers, Developers, Developers
====
it's just hard for people today to understand what a revolution stdin and stdout was, full 8 bit but sticking with ASCII as much as possible. There was nothing about it that limited Unices from having whatever performant I/O underneath, but it gave programmers at the terminal the ability to get a lot done right from the command line.
The web itself is an extension of stdin and stdout, and the subsequent encrusting of simple HTTP/HTML/et al standards with layer upon layer of goop that calls for binary blobs can be seen as the invasion of Cutlerian-like mentalities. It's sad that linux so successfully took over that all the people who we were happy to let use IIS and ASP and ActiveX had to come over to this side with their ideas. No idea of which is bad, but which together are incoherent.
"No idea of which is bad, but which together are incoherent."
Right, my complaint about filename/path length being exceeded in my earlier post often occurs when a web page is saved by a browser (some web pages have outrageously long filenames).
Incidentally, many years ago I did a tour of Microsoft's operation in Seattle around the time Microsoft introduced subdirectories into MSDOS and the tour guide (can't recall his name but he was responsible for the development of MS's Flight Simulator) gave a considerable spiel about why Microsoft decided to run with the backslash instead of the forward one as per Unix. Even then, I thought 'oh no, here comes confusion', and others with me thought the same. When we challenged him about it, he said we (Microsoft), want to clearly differentiate ourselves from Unix (there was an arrogance about his answer that I well remember).
Flight Simulator was developed outside Microsoft, but within Microsoft you could only be referring to Alan Boyd. I received a similar tour.
I'm sure you heard what you thought you heard, arrogance and all, but iirc it was backslash because IBM insisted the slash be the switch character. Anybody remember $SWITCHAR
I know Flight Simulator was originally developed outside MS and I think my tour wasn't that long after MS acquired it.
It's too long ago for me to associate the name 'Alan Boyd' with the person in question but I do remember that he had a loud, penetrating self-assured voice. (Incidentally, he spent considerable time demonstrating Flight Simulator's new features).
You're right, IBM was a large part of the discussion as back then it was the principal client for MSDOS. However, I came away from the visit with the understanding that MS was in full agreement with IBM's decision despite MS's dabblings with Unix.
I had a particular interest at the time as I had a S-100 Godbout CompuPro computer, and in addition to CP/M, I ended up putting Seattle Computer Products' DOS (SB86 from Lifeboat Associates) on it which meant that I had compatibility with MSDOS.
I can understand why MS would have wanted to differentiate MSDOS and the backslash being one way, what I'm still not clear about is why IBM would have wanted to make such a distinction.
Re $SWITCHAR, very vaguely, but from my point it added confusion. Like some other commands its implementation and architecture appeared to be the result of afterthought rather than good design. I've forgotten much of that stuff.
I rather liked VMS conventions as it was immediately obvious what was the directory part of the path and what was the filename and the device. You could create virtual devices as well, so for example, with VMS TeX, I had TEX_ROOT defined to point to the root of the TeX distribution and you would have TEX_ROOT:[INPUTS] for the input directory, TEX_ROOT:[MF] for Metafont files, TEX_ROOT:[EXE] for executables, etc. and everything was logically arranged. CLD files were another wonderful thing where you could define your CLI interface in an external file and let the OS handle argument and option parsing for you.
On 2 & 3, what if you have a directory? There’s no easy way to tell whether /foo/bar is a directory or a file. FOO:[BAR] must be a directory.
I have a vague notion that there might have been the ability to have a virtual device span multiple directories, but as I think about it, this seems unlikely since it would make it ambiguous where something would be created if a directory exists in both root1 and root2, so perhaps not. It’s been 23 years since I last used VMS and 30 since it was my daily driver though, so it’s hard for me to say too much.
The downside of building in stuff is that you can't easily replace it. I don't have any VMS experience, but the versioned file systems I've studied are very far from being a replacement for what I would consider version control today. They're more like the automatic writing of backup files in Emacs.
Distributed FS sounds like something with a huge design space. It's better to do that in userspace. Was Plan 9 really the first time virtual file systems could be implemented in userspace? It seems like such an obviously useful idea in retrospect.
I’m not familiar with inner workings, but simply moving files feels odd in Windows compared to macOS. If it’s a big lift in terms of data size or file/folder counts it’s most obvious but it feels like Windows literally copies the files into memory, then rewrites them on disk or something similar that has results in negative performance and a long running cut/copy/paste dialog box. I’ve had some of these run for hours on decent hardware (SSD, etc) for what I consider small datasets (couple GB). It’s been a major Windows gripe of mine for years now.
Meanwhile macOS appears to just change an internal link to the data that’s already written on disk. As such, it’s usually so very fast compared to Windows.
Windows File Explorer does a lot of extra work to get a sense of file sizes and other metadata to try to keep the UI looking fresh/interesting/useful to someone watching the job in real time.
If you need to seriously move/copy lots of files or lots of data in Windows it is generally a good idea to use shell commands. Robocopy [1], especially, is one of the strongest tools you can learn on Windows. (It gets very close to being Windows' native rsync.)
Windows does literally copy (parts of) files into memory. More precisely it's Windows Defender Real-Time Protection. It's a real menace when you're dealing with a lot of small files, e.g. node_modules.
Windows Explorer is also slow for an unknown reason.
Doing file operations through the API with Real-Time Protection turned off is several orders of magnitude faster in the case of small files. It's crazy stuff.
A lot of this depends on whether you're crossing devices. If you think of drive letters as mount points it may make more sense - if you're moving between mountable filesystems obviously a move has to be a copy-then-delete; if you're remaining on the same filesystem a move can typically be a rewriting of indexing information only with very limited data rewriting.
One other thing that can be an issue particularly on NTFS with ACLs is that moving files typically retains their ownership and permissions, while copying files typically inherits the ownership and permissions of the destination. This can bite you if as an administrator you're moving data from one user's account to another because a move will leave the original owner still owning the files.
Eh, with moving files on Windows in the same security context it is 'generally' pretty fast on the same drive.... Are you sure you didn't paste in a directory that is setting new security permissions on all files?
> All I can say is, this article is the tip of the ice berg on Windows I/O weirdness. You really don't realize how strange it is until
In one way it is beautiful. "Laid cards lies", you know. Don't mess with the user for your conception of Agile Clean Extreme Code (tm). Each stupid design decision is forever.
Windows .bat files win the awfulness and quirkyness contest with sh with a razor thin margin. And both are awesome.
PowerShell is an absolutely amazing scripting language; it gets things done way quicker than Bash, because it's object oriented, and you don't have to call external tools to get anything done (sed, grep, find, touch, curl, etc.). It can even run raw C# code for you.
This definitely falls into the category for me of "things that I wasn't there for."
Because I learned computers when DOS was a thing, I will always be able to write a .bat or use CMD when necessary, but having been on the UNIX/Linux side since 2003, I didn't learn C# or PowerShell but rather bash, php, ruby. So while I'm friendly with modern Windows now that they closed up the "is it a stable OS" gap with Apple, I don't really know what to do in PowerShell and am more likely to use WSL!
PowerShell is the closest of Xerox PARC REPL experience that ships in the box on modern platforms.
Because not only it is a proper programming language, it is integrated into .NET, COM/DLLs as well, so not only you can script the OS, any application automation library is exposed as well.
Nowadays, it is possible to automate anything on Windows via PowerShell, the same OS APIs exposed by GUIs are also accessible to PowerShell.
On UNIX side there are things like Fish shell that also offer these capabilities, but they aren't widely adopted as PowerShell on Windows.
Sorry, but Powershell is not a programming language. It has way too many quirks and gotchas to qualify as a programming language. It is an interactive scripting language first, and a scripted language second. But a programming language it is not.
To give just one example: its automatic boxing and unboxing of arrays disqualifies it as a programming language. Try to return a one-element array from a Powershell function and you'll see what I mean.
It's worth learning at any age, especially now that it is an open-source, cross-platform shell. The PS Koans [1] that recently showed up on HN seemed an interesting way to try to learn it.
I just want a shell that runs my commands, I don't want yet another language.
The _beauty_ of bash is you can learn the basics of the language very easily, call out to external tools, and _take that knowledge with you_. Those tools exist independently.
I’ve tried to switch to Powershell a few different times and I always find it to occupy this no man’s land between a quality shell and a quality scripting language. As a shell I find it inferior to BASH and as a scripting language I find it inferior to Python.
Not that it’s a particularly compelling feature on Linux with the standard offering, but it’s a good option for cross platform scripts at times, particularly running in docker.
I mean, if I want something that can run on as many platforms as possible without a prior installation, I stick as closely as I can to posix sh. If I want something more flexible that can run consistently, but may require an installation beforehand, I use python. I don’t really see what niche PowerShell would fill for me.
It doesn't have to fill a niche for you. Before cross-platform PowerShell I certainly used Python for some of those kinds of scripts.
I think a lot of it gets down to ergonomics/aesthetics to decide if you find a useful niche for PowerShell for yourself. Python's os module is powerful and lets you run/chain almost any native commands and shell operations you want to spawn, but it is still a very different API and abstraction with different ergonomics and aesthetics than shell-style pipes and redirects.
PowerShell gives you that focus on shell-like pipes/redirects, but then gives you some Python-like power on top of that to also work with the outputs of some commands as objects in a scripting environment. There's a lot of interesting value to comparing/contrasting PowerShell and Python and if you are happy with Python maybe there isn't a big reason to learn PowerShell. PowerShell is there for when you are doing a lot of shell-like processing pipelines and want to write them as such, but have some of that power of a language like Python behind it. It's a lot more powerful than posix sh and it is similarly but differently powerful to Python but it starts from a REPL that looks/acts more like posix sh. I don't know if you have a need for that niche yourself, but I find it useful for that.
`$ErrorActionPreference = "Stop"` is very similar and does that for all of PS1 cmdlets. You still have to check $LastExitCode manually for BAT/EXEs, though.
`$PSNativeCommandUseErrorActionPreference = $true` is an experimental flag [1] as of PowerShell 7.3 that applies the same $ErrorActionPreference to BAT/EXEs (native commands), stopping (if $ErrorActionPreference is "Stop") on any write to stderr or any non-zero return value.
Having to handle exe's and bats separately is _exactly_ the problem with $ErrorActionPreference, and why it's not suitable.
I wasn't aware of $PSNativeCommandUseErrorActionPreference though, seems like it's very new. How does that work with the helpful windows tools that decide to not return 0 on success (hello, robocopy)
The answer to that, including robocopy as the direct example used, is at the bottom of that documentation I linked on $PSNativeCommandUseErrorActionPreference: you set it to $false before calling something like robocopy and then reset it when done.
5.1 was the last "Windows-specific"/"Windows-only" PowerShell (and is still branded "Windows PowerShell" more than "PowerShell") before it went full cross-platform (and open source). It's an easy install for PowerShell 7+ and absolutely worth installing. If you are using tools like the modern Windows Terminal and VS Code they automatically pick up PowerShell 7+ installations (and switch to them as default over the bundled "Windows PowerShell"), so the above command line really is the one and only step.
You can also install the latest PowerShell Core (the open-source, cross-platform releases we're talking about) via Scoop, which is a package manager for Windows that works even if you don't have admin rights: https://scoop.sh/#/apps?q=pwsh&s=0&d=1&o=true
Unless I can rely it being somewhat available, it's not really feasible to use. It's a bit like writing scripts in fish because it's easily installable - nobody is going to use it.
Winget isn't bundled with windows 10 either, (but I think it is with 11), and it's not on windows server.
If I need to install a package manager _and_ a shell, I might as well just install WSL and be done with it.
Winget is auto-installed in Windows 10 by Windows Update and/or Store Update for every copy of Windows 10 with a recent enough build for more than a year or two, so long as that machine doesn't have the Store disabled or blocked. It is bundled inside the "Application Installer Platform" which is a low-level Store package that powers a lot of little things like the "double-click to install an MSIX file" experience and Windows generally keeps up to date quickly if Store updates aren't blocked.
I can't speak to your usage of Windows Server, but provisioning winget and PowerShell 7+ are standard bootstrapping steps in VM images at places I work, because those are generally assumed to be basic equipment at this point.
It also adds it's own special brand of crap .. as in after trying 10x different ways (not kidding: https://social.technet.microsoft.com/wiki/contents/articles/...) of executing an external ffmpeg command over several hrs I eventually wrote a one line .bat file* and was done with it. Never again.
*
for %%a in ("*.mp4") do ffmpeg -i "%%a" -vcodec libx265 -crf 26 -tune animation "%%~na.mkv"
The maximum path length of the NT kernel is 3276-something UCS-2 characters. 260 is a limit imposed by the legacy Win32 interfaces IIRC. I believe the W-interfaces get you the full-fat version, it's just that they're so inconsistently used as to all but guarantee that something you need will have problems.
The user-mode stuff is kind of a mess. The kernel-mode stuff is comparatively orthogonal.
> I think the margin of bat vs sh is a larger margin [...] the entire file is reread for each line
Ye agreed. Separation of data and code always was a mistake.
I wonder if this feature of bat files was like a thing once as a "best practice"? Practically, you should only append lines I guess. When I close my eyes, I can see a DOS batch file doing actual batch job processing, appending to itself and becoming a intertwined log and execution script.
"All I can say is, this article is the tip of the ice berg on Windows I/O weirdness."
Well, then, is there a more detailed summary than this one that's accessible?
This one looks very useful and I'll use it, but to make the point about more info it'd be nice to know how they differ across Windows versions.
For example, I've never been sure about path length of 260 and file length of 255. I seem to recall these were a little different in earlier versions, f/l being 254 for instance. Can anyone clear that up for me?
Incidentally, I hit the 255/260 limit regularly, it's damn nuisance when copying stops because the path is say 296 or 320, or more.
"no amount of prefixing will help you, if random app enforces the limit"
I've noticed that, it's partially the reason for my confusion (I didn't wake up for quite a while as I put it down to the different versions of Windows I was running on various machines). Other pains are caused by apps that still don't run Unicode and crash or stop copying when they encounter a non-ASCII character.
Thanks for the reference, I wasn't aware of those changes in Win 10 (I run mainly Linux and have been weaning myself off Win for some years).
"...(this value is commonly 255 characters)."
I think the word 'commonly' in those notes confirms my point in that it's changed slightly over the years. Also, I recall the internal processing was once 16k and not 32k, come to think of it this may have been with the previous version of NTFS (can't remember which version of Win that was). My interest is now piqued so I'll search it out.
That we're even discussing such matters confirms the thrust of the article.
Explorer can resolve some http urls to network paths, e.g. for SharePoint libraries.
Programs, e.g. python scripts, can then use those paths, but only after explorer has resolved them. Before that, the path will be treated as not existing.
From my experience as an EE, working with serial ports is much nicer on Windows (COM1, COM2, etc.) than on Linux where serial ports are abstracted to files like /dev/ttyACM0 and has a lot more gotchas.
PowerShell is also quite a powerful alternative to Bash/Mingw, although it came out much later.
Windows might do some things differently than UNIX-like OSs, but it does them really well.
Technically, COM1, COM2 etc. are filenames as well. They are just special in that they are available everywhere. That's why you are not allowed to create any file named COM1 or such.
Wow! There's an operating system I ain't heard tell of in a good long while. That's the very first OS I used in a professional context. Got me my first computer store job (in my late "teens") on a CP/M system.
I deal with software that processes files on a Windows system... loves to break when people on other OS's subnet AUX, PRN, COM, File:Name, and tons of other unacceptable names (like 'file ').
I'm glad our new releases work on Linux and we don't have to deal with that crap in 99.99% of cases now.
I've done quite a bit of work with serial ports on Windows, Linux and other unixes. I've also written a serial device driver.
Your comment is very confusing to me. The serial ports are abstracted to a file on Windows just like on unixes - the file is actually discussed in the above article: \COM1
Maybe you're talking about the old days where you would just outb 0x3f8? The modern interfaces are actually fairly similar.
Remember typing in entire programs from magazines and computer manuals and saving them to cassette tape or floppy disc? That was "the good ol' days" for sure… :)
There is also the persistent problem of USB serial adapters being assigned incremental numbers until they're in double digits that many tools don't let you select from their GUI. So you have to go in and manually purge those devices to get back to sane numbering.
I just started with using serial ports on windows while doing some Raspberry Pico hobby projects. Something that I find strange is that every new device gets assigned a new comport, I mean let's say I do this for a while one day I will have a comport 100, 200 and so on. Is that right, or does it somehow reset the comports?
That's how it works and generally it's to the user's advantage. We often set specific parameters based on the device's serial number so getting the same COM port is nice, sometimes the devices are so simple that you cannot query its serial number.
Sometimes I'll do a "blank slate" and delete all my accumulated COM ports in Device Manager (need to enable "Show Hidden Devices").
COM ports on Windows are crap nowadays due to how crappy USB to serial adapters have become. I've seen Windows reassigning different COM names to the same device every single time it was unplugged due to it "remembering" what COM port was used previously. Needless to say, that was an anti-feature if there ever was one.
Windows tries to keep a long term identity of all of the device instances that it knows about (and in the idela world assign the same COM port numer to the same physical serial adapter). For USB this is supposed to be done by combination of VID, PID and serial number in device descriptor. But even early on there was a lot devices that had the serial number empty and thus Windows came up with some heuristics about when this method is unreliable and switches to identifying the device by its physical position on the USB bus. The whole mechanism is well intentioned, but in the end probably somewhat misguided because it is not exactly reliable and has surprising consequences even if it were reliable.
As a side note: on modern Windows NT implementations the so called "registry bloat" is non-issue (anyone who tells your otherwise is trying to sell you something), but keeping list of every device that was ever plugged into the computer in there is somewhat ridiculous.
> As a side note: on modern Windows NT implementations the so called "registry bloat" is non-issue
How modern? I manage Windows 7 (transitioning to 10) machines that are used for QC in a hardware manufacturing environment that enumerate hundreds of devices (with mostly identical VID/PID) every week. We find that if we don't clear old devices out of the registry every so often, the enumeration time slows to a crawl.
In the times when it was a real issue (I would hazard a guess that that means “before XP”) the reason was that the original registry on disk format made every registry access more or less O(n) in bunch of things like the overall on disk hive size, total number of keys, number of subkeys in each of the keys along the path…
It also do this for monitor or usb/bluetooth earphones. So you end up get earphone(2), monitor(2) even you never have a second one. The only way to fix it is delete the hidden device in device monitor and rename it back in monitor/audio manager.
It's really a confusing thing to me that the script I use to change sound output and leveling suddenly didn't work after a bios/mobo software/whatever windows update and noticing the device have an appended (2).
And this is why I hate Windows in an industry automation environment. Dislike having to troubleshoot why that USB NIC or Serial device being destroyed by plugging it into another port. Had to write a PowerShell script for the USB NIC issue to reapply NIC settings with a reboot.
Also, always locking an open file is repulsive. Other OSs allow for renaming an open file. Not Windows! Thumbs.db being locked because File Explorer keeps the file open / locked preventing deleting an empty folder and wastes so much time waiting for Windows to unlock the file.
We do all the time. In industrial automation COM ports are still shockingly popular, although it's usually the USB emulated variety. On a lot of our development and on some of our production tools we end up with COM20 or COM30, not because we have that many running at one time but because over time we've plugged in that many distinct devices. Nowadays most drivers will assign COM(n+1) when they see a device with a new serial number.
UART is available on nearly every microcontroller under the sun, and USB<->UART serial chips are super cheap, so it makes complete sense to me that'd become the defacto system for interfacing the automation controller with a computer
Even beyond that, USB is available on many microcontrollers, a USB CDC device is dead simple to implement, the drivers are baked into every modern OS, and all the software developers operating at that layer already know how to interact with text streams. Add in the ease of debugging when you can just manually send/receive ASCII to operate a device, and you've got the makings of a standard practice.
If you use USB dongles for serial adapters, then each path through USB is assigned a different COM number when you plug it in. For example if you plug into USB controller 2, port 3 which goes to a hub, and then you plug into port 2 that gets a number. Now plug the same thing into a different port and it will get another COM number.
Under the hood this is because the USB devices do not have the (optional) unique serial number (or in some cases they all get the same serial number).
So do I, I find the addressing more consistent, too.
It used to be completely predictable when I was working with drivers on 1994 (patching the code), then less predictable when hardware for more diverse, and predictable again (or at least "always the same") with UUIDs.
It was always amateur/hobby dev or sysadmin so I may have had the wrong impression.
It’s the flip side of the ‘we bent over backwards so SimCity runs’ coin. Even though Windows hasn’t supported programs out of this era since 64bit became the standard, it’s still held back by clinging on to the legacy. Because it doesn’t dare say ‘this is too old, run it in a VM’.
The fact these paths are considered at all "weird" just underlines how much we live in a Unix world.
Filesystem paths used to all be weird in the sense that there was more OS diversity. I'm sure some people here remember that classic MacOS paths used colon as the separator:
Hard Drive:My Folder:My Document
VMS (designed by the same person as Windows NT by the way) had paths that looked like this (per Wikipedia):
As someone who grew up with Windows, I don't think these paths are that weird at all. Drive letter working directories just make sense, for example. The weirdest part is the (edit: HFS) compatibility mode (file.ext:substream).
One fun surprise is that because of codepage reasons, the Windows will use ¥ as a path separator in Japanese. In Korean, it's ₩. These characters represent U+005C, which is \ in Latin-compatible character sets.
I tend to use /dev/sda1 more than /dev/disk/by-path/pci-0000:00:17.0-ata-1.0-part1. Disk names are nice, but also often longer than 8 characters and usually not very unique.
Starting from A and iterating on through Z makes sense, for an OS that's designed for two drives at most. /dev/sda and /dev/sdb are no less arbitrary than A: and B:.
One major difference was that Unix was used on big servers and couldn't fit itself onto a single disk, so /usr had to be created. DOS and Windows never needed a second drive to boot, so they didn't need to embed their resources into the drive hierarchy.
Of course, you can mount NTFS volumes at any directory you wish since at least somewhere in the early 2000s. Very few people do it, but you can!
First floppy drive was A, Second B, and when internal Hard Drives came along they defaulted to C to be compatible with computers that had at least 2 disk drives.
If I rembember correctly, you could use the B drive even if you have just one unit. It was useful to copy files from one disk to another, even if you didn't had an hard drive as temporary storage
> The weirdest part is the HPFS compatibility mode (file.ext:substream).
HPFS had extended attributes, but not substreams. You are thinking about HFS; substreams were added to NTFS to support storing resource forks on network shares used by Macs.
Windows is younger than Unix, and Unix filesystem evolved has "evolved less" due to getting it right the first time, removing backward compatibility issues.
NTFS implemented it to be compatible with Mac. They then started using it for storing the Mark of the Web and other special system properties, but practical came much later.
> I'm sure some people here remember that classic MacOS paths used colon as the separator
In modern macOS (previously OS X), you’ll eventually bump into those if you need to work with paths in AppleScript. You have to specify when you’re using a POSIX path so it is properly converted. Example:
$ osascript -e 'POSIX file "/System/Applications/Mail.app/Contents/MacOS/Mail"'
=> file Macintosh HD:System:Applications:Mail.app:Contents:MacOS:Mail
As another example, with ADFS (Advanced Disk Filing System) on the Acorn/BBC computer family, the root directory was specified with `$`, and the directory separator was `.`.
ADFS is the filesystem, IDEDisc4 is the disc name, $ is the root directory, Games is a subdirectory, !Repton is an application directory (since it begins with !) and Arctic is a file within the application directory, not normally referenced by users.
macOS still uses colons as the path separator, it just does a great job of hiding them from the user. If you try to open a file with a slash in its name in a shell, though, you'll need to use a colon.
I suspect that it is the other way around and the Finder and standard dialogs (both of which use slashes as path separator when you type the path) simply shows colons in filenames as slashes.
macOS's kernel has BSD roots, so I'd be surprised if its VFS code accepts anything other than unix paths. Just a guess, but it's probably the Cocoa APIs accepting colon paths and translating it to unix paths internally.
I was parsing paths into an array of dirs recently.
The `/` root dir is quirky. You can’t just do `dirPath.split(‘/‘)`. You have to handle it as a special case. Would be easier if it had a special name. Like `$/dir1/dir2`.
thank goodness unix has cleared this up, with paths, mountpoints, overlay filesystems, chroot, device trees, bind mounts, loopback mounts and probably a few I forgot...
(sort of amazing the original premise, and the exceptions and workarounds you gradually accumulate and take for granted)
Since windows creates the UUID the first time it "sees" a volume, and - usually - uses the network card MAC as node, by decoding the UUID you can get the MAC address of the PC and the time the volume was seen (this can be useful for forensics, expecially with removable devices and to verify there has been no manipulation of the MountedDevices in the Registry).
[1]possibly windows 11 changed that, or at least the UUID's shown in the article are type 4
So many things wrong with this article. Some things that I noticed by skimming over it:
> UNC paths can also be used to access local drives in a similar way:
> \\127.0.0.1\C$\Users\Alan Wilder
> UNC paths have a peculiar way of indicating the drive letter, we must use $ instead of :.
This is actually incorrect... He's actually accessing some random share that has no real connection to a drive. Yes, sometimes (quite often), the C$ share corresponds to the C: drive's root, but this is by no means given, as one can easily either delete the C$ share, or have it pointing to somewhere else entirely
> When the current directory is accessed via a UNC path, a current drive-relative path is interpreted relative to the current root share, say \\Earth\Asia.
This is also wrong. There is no "current directory" on an UNC share (which can easily be shown by trying to open a command prompt on a UNC share, it will show an error and start you somewhere on C:\users), and the example he gives just tries to access the share "Asia" on the server "Earth"
> Less commonly used, paths specifying a drive without a backslash, e.g. E:Kreuzberg, are interpreted relative to the current directory of that drive. This really only makes sense in the context of the command line shell, which keeps track of a current working directory for each drive.
Also wrong, it's not the command line shell that keeps track of the current directories, it's the Windows kernel itself. But I agree that such a scenario is quite useless as you can never be quite sure on what CWD you are on a given drive
> For the most part, : is also banned. However, there is an exotic exception in the form of NTFS alternate data streams.
Yeah, well, surprise: the ":" is not part of the file name, it's just a separator between filename and stream name. This is like saying that "you cannot have \ characters in a file name, but in directory names it is allowed". No, it's not. It's a separator
> Also wrong, it's not the command line shell that keeps track of the current directories, it's the Windows kernel itself. But I agree that such a scenario is quite useless as you can never be quite sure on what CWD you are on a given drive
>SetCurrentDirectory allows setting the current directory to a UNC share.
Exactly. The cmd prompt not setting UNC paths as current directory was introduced around Windows 2000 (or maybe post-XP, it's been a while) to help legacy batch files being run from a share and then getting confused by being on a UNC path instead of one beginning with a drive letter.
This was also why, when you do a pushd \\server\share cmd.exe puts you on a mapped drive instead of directly on a UNC path.
If you use the Windows native version of tcsh, for example, you can happily use UNC paths as current directories and run commands (provided they don't try to parse drive letters from their CWD)
Eh, I just thought the $ in a Windows NAS share was to ensure the share was hidden from browsing. Microsoft used to have documentation on that, but seems to be missing from their site after they removed old articles.
The bit about "UNC Paths" is a bit simplified. The "$" shares are administrative shares. They're created by default, you can delete or disable them (though, if you delete them, they'll be recreated on a reboot). You can also add normal users to them.
It should also be noted that while the single driver letter ones are automatically created, the "$" at the end just marks them as hidden. You can create your own hidden shares if you ever want to.
The (second-)worst offense I'm aware of here is that alternate data stream names can have otherwise special characters in them, like backslashes. So if you (for example) want to strip off the last path component, you technically cannot do this by just stripping everything after the last backslash.
In fact this probably isn't the worst thing - it's even worse than this. Because you first need to strip off the prefix that represent the volume (like C:\) before you can look for a colon. But the prefix can be something like \\.\C:\ or \\.\HarddiskVolume2\, or even \\?\GLOBALROOT\DosDevices\HarddiskVolume2\. Or it can be any mount point inside another volume! (Remember that feature inside Disk Management?)
Moreover you can't even assume the colon and alternate data streams are even a thing on the file system - it's an NTFS feature. So you gotta query the file system name first. And if the file system is something else with its own special syntax you don't know, then in general you can't find the file name and strip the last component at all.
All of which I think means it's impossible to figure out the prefix length without performing syscalls on the target system, and that the answer might vary if the mounts change at run time.
Oops, thanks for the correction! I must've seen this with other characters (most likely double quotes) and not realized slashes and backslashes are an exception.
Though ironically that still doesn't help you strip the last component, since it could still be a volume mount point. Like you don't want C:\mnt\..\foo to suddenly become C:\foo, just like how you don't want \\.\Server\Share1\..\Share2 to become \\.\Server\Share2, or for \\.\C:\..\HarddiskVolume1 to become \\.\HarddiskVolume1, etc.
> Moreover you can't even assume the colon and alternate data streams are even a thing on the file system - it's an NTFS feature. So you gotta query the file system name first. And if the file system is something else with its own special syntax you don't know, then in general you can't find the file name and strip the last component at all.
If the :stream syntax is not FS-specific then you can parse the data stream name out statically in almost every case. Yes, you have to work out the prefix, but you can mostly do that statically too, I think:
> In fact this might not even be the worst thing - it's even worse than this because you first need to strip off the prefix that represent the volume (like C:\) before you can look for a colon. But the prefix can be something like \\.\C:\ or \\.\HarddiskVolume2\, or even \\?\GLOBALROOT\DosDevices\HarddiskVolume2\. Or it can be any mount point inside another volume! Which I think means it's impossible to figure out the prefix length without performing syscalls on the target system, and that the answer might vary if the mounts change at run time.
The prefix of `\\.\C:\Foo:Bar` is `\\.\C:` as `C:` couldn't be a file name. The prefix of `\\.\HarddiskVolume2\Foo:Bar` is `\\.\HarddiskVolume2` because the volume name ends at the backslash. The prefix of `\\?\GLOBALROOT\DosDevices\HarddiskVolume2\Foo:Bar`... can be harder to determine but it doesn't matter because clearly there is no letter drive name in sight since a letter drive name would be... a single letter, but if the volume name were a single letter then it might require using system calls to resolve it (`\\?\GLOBALROOT\DosDevices\X\Y:Z\A:B` is harder to parse because X might be the volume name, or maybe Y: might be the letter drive and X might be part of the path prefix).
> `\\?\GLOBALROOT\DosDevices\X\Y:Z\A:B` is harder to parse
As in, this is impossible to do statically in the general case - those names aren't guaranteed to look like that. See the note I had added about mount points. Remember C:\mnt can itself be the mount point of a volume instead of a drive letter. (Junctions present a similar problem, but at least for those, you can make an argument that they're intended to look like physical folders, and treat them similarly. With mount points, you might not have that intention - you might be just trying to go over 26 drive letters.)
> It is, I believe, as I alluded to in the comment.
The FILE_STANDARD_INFORMATION_EX structure alludes to a common handling of alternateStream. Winbtrfs is a great resource on this, since it implements many bells and whistles from NTFS in an open way -- you just grep for a keyword and you will be close. The code exercising the Windows API for testing is src/tests
/streams.cpp.
Grep on FILE_STREAM_INFORMATION in the source should provide more useful hits on the source, but phone browsers are clumsy.
Data stream is basically the file content and on NTFS a file can have more than one. In practice it is comparable to extended attributes in the Linux world but somewhat superior.
But like extended attributes it doesn't seem to have too much real world use. The only use case for alternate data streams I can remember are the "this file was downloaded from the internet, do you really want to run it" warnings. In such cases the browser attached a standardized marker as alternate data stream to the file.
Oh contraire :-) .Alternate data streams are widely used by virus writers and spies using them to exfiltrate data from foreign (to the spy) government and corporate Windows IT systems.
You think I jest ? Look up the leaked source code for the US government spy tooling. They hide data to be exfiltrated in an ADS on the root directory of the share :-).
I finally realized ADS were the mother of bad ideas when Ted Tso responded to me asking why I couldn't have them in Linux for the umpteenth time by showing me a Windows task manager screenshot of Myfile.txt as an actively running process.
If the ADS ends in .exe then Windows will happily run it :-).
If you have macOS clients connecting to an SMB file share hosed by a Windows server they use alternate data streams to store resource forks - like fonts. Makes for a fun 'oh shit' moment if you go to zip up files on Windows to archive, then realize you're missing data when you later unzip as most compression applications don't keep them.
> macOS stores fonts in resource forks? I'm confused, what use does this have and what happens when you accidentally miss them?
Classic MacOS considers fonts to be a type of resource, and hence stores them in the resource fork. Contemporary macOS fonts are just ordinary files with a data fork only. I think grandparent is talking about the 1990s, although some of those machines remained in active use through the first few years of this century.
Windows originally considered fonts to be a type of resource too – the original bitmap fonts used with Windows 1.x-3.x are stored as a resource–except unlike MacOS it embeds resources into EXE/DLL file data instead of putting them in a fork. In fact, a .FON file containing a Windows bitmap font is just an EXE with no code, only resources. Nobody really uses this any more, everything is TrueType now and TrueType uses its own file format not resources, but Windows still supports the old bitmap fonts for any legacy apps which still use them.
I originally thought you meant "macOS takes random fonts, stuffs them in resource forks for other non-font files, then bad things happen if the resource forks are ever lost" which makes zero sense to me.
Anyway... so macOS fonts themselves were made of resource forks and therefore trying to transfer fonts themselves across a non-resource-fork-supporting network share will fail? As in, the resource forks were needed in order to use the font file?
Not me. ajcoll5 made the statement, you expressed confusion with it, I tried to explain what (I assume) ajcoll5 meant.
> Anyway... so macOS fonts themselves were made of resource forks and therefore trying to transfer fonts themselves across a non-resource-fork-supporting network share will fail? As in, the resource forks were needed in order to use the font file?
On Classic MacOS, some files, all the actual contents is in the resource fork, and the data fork is ignored and can be empty. So you copy such a file to a filesystem which doesn't support resource forks, you can end up with an empty file.
A good example of this is executables. 68k Mac executables, all the code is stored in the resource fork (as code resources), and the data fork is ignored and can be empty. So you copy a 68k Mac executable to a forkless filesystem, you can end up with an empty file.
By contrast, PPC Classic MacOS executables, the code is in the data fork, and the resource fork only contained actual resources such as icons or strings, not the code. If you lost the resource fork, you'd still have the code of the executable. But it probably won't run without the icons/strings/etc it expected.
This was how Apple's original (1994) implementation of "fat binaries" worked. The data fork contained the PowerPC binary and the resource fork contained the 68K binary. PPC Macs would load and run the PPC code from the data fork, 68K Macs would ignore the data fork and load and run the code from the resource fork. If you only needed PPC support, you could shrink the executable by deleting all the 68K code resources from its resource fork.
The core resources of Classic MacOS were originally stored in a single file, the "System suitcase". Originally, each installed font was a separate resource in the resource fork of that file; its data fork was unused, except to store an easter egg text message. Fonts were distributed as resources in separate suitcase files, and the "Font/DA Mover" copied them from the distribution suitcases into the system suitcase. So yes, a suitcase file used to distribute a classic MacOS font, the actual font data would be in the resource fork, and the data fork could be empty. In System 7.1, Apple introduced a separate folder called "Fonts". In some MacOS versions (not sure when it was introduced, but definitely was there by System 7.0), Finder displays suitcases as if they were folders, even though they are actually resource forks.
Contemporary macOS doesn't really use any of this stuff. It supports resource forks for backward compatibility, but modern applications don't use them. The "Font Book" app can import Classic MacOS fonts (not bitmap ones, but TrueType and Type 1) from the resource fork of a suitcase file. But once imported, the fonts are stored in ordinary files (with a data fork only) on the filesytstem.
Eh, whatever. I originally thought whoever meant. can't edit the comment now.
> On Classic MacOS, some files, all the actual contents is in the resource fork, and the data fork is ignored and can be empty. So you copy such a file to a filesystem which doesn't support resource forks, you can end up with an empty file.
Yeah, that's about what I thought. That makes sense, thank you~
You seem to talk about a specific command line argument of the compact command with a Windows typical (and IMO ugly) option style with '/' instead of '--' as option marker and ':' instead of '=' as option value separator.
But that would not be directly related to ADS and I cannot imagine a good use case where the compact command should use ADS.
In context of ADS the first thing I imagined was storing the compressed and uncompressed file alongside. (which is rather silly, why compress at all)
This use case is also kinda strange. Have the compressed content as ADS, the primary contend filled with 0 as sparse and fill it when needed/accessed. :/
C:\foo has a default (primary) data stream; the name of that stream is empty, so it's omitted entirely when writing the name. But the file can also have C:\foo:bar on NTFS. It's a different stream that's part of the original. (Look up "NTFS ADS" or just "NTFS streams".) These are often used to store information tied to a file that shouldn't affect the file contents.
In the late 1990s, there was a bug in MS IIS where if you requested http://example.com/page.php , it would execute the PHP script, but if you requested http://example.com/page.php: , it would give you the PHP source code. Even more than today, it was common to hard-code database connection details, including passwords, into the source code.
One thing that make Windows paths wired is that Windows API, NTFS and most Windows tools have different restrictions on file paths.
NTFS would accept almost anything. The Windows API (I think of the old Win32 one) would apply most restrictions the article mentions.
But for example not the normalization part. A filename can end with a space, no problem.
That lead me once to a minor bug in .NET Framework. One of the path related functions (I think it was Directory.move) did not correctly apply this normalization and could produce directories with trailing whitespace. Good luck removing/fixing those in Windows Explorer.
So for the longest time Adobe software had random bugs where it would create a series of folders name "Application Data" repeating recursively 3000+ characters deep.
The bit about allowing / as a path separator is one of my favorite bits of DOS/Windows trivia. As a unix guy it's fun to give a windows person a path with the slashes wrong like "z:/foo/bar", being corrected for a unix-ism, then having it actually work!
In practice I think the biggest problem with using forward slashes on Windows is confusing programs which expect "/" to indicate program switches. The non-uniformity of shell parsing is also a big unix/win design difference.
It doesn’t work everywhere. For example tab completion in cmd.exe doesn’t work for a path containing forward slashes (even when quoted), because forward slash is the prefix character for command-line options.
right but that's a cmd.exe thing, not a Windows thing.
Windows supports it, CMD doesn't. programs that you run from a CMD prompt support other options flag syntaxes, so it's just a cmd.exe feature.
CMD.exe is its own thing with its own backwards compatibility requirements and the case could be made that cmd.exe is "Windows" as much as anything else is, so I get it.
The point is, you can’t just blindly use forward-slash as a file system path separator everywhere on Windows. It’s not on equal footing with backslash in that respect.
That's incorrect. You're confusing totally separate issues by examining specific pieces of software with product specific bugs. This isn't a valid way to examine the issue: by this metric, spaces aren't supported on unix because many programs choke on them.
In fact, you can use forward slashes across the entire file API on Windows. That's the point.
I'm viewing this from the end user's perspective. They can use backslashes everywhere as a path separator, but they can't use forward slashes everywhere. In that sense, forward slashes are in practice a second-class citizen on Windows. The canonical path syntax is and will remain with backslashes.
I'd qualify that as "almost no non-programmers know". Forward slashes are so useful in languages that use \ as an escape sequence that most programmers do know this.
I had to double-check, but I ran into some issues at work where .Net Framework got confused if I used both separators in a path and used ".." to try to access the parent directory.
> UNC paths have a peculiar way of indicating the drive letter, we must use $ instead of :.
I don't believe that's true, I am almost positive they're SMB shares, just like any other, but are created by the system, which is why "accessing drives in this way will only work if you’re logged in as an administrator."
The dollar sign indicates that the share is 'hidden' and can't be enumerated by traditional means. The C$ share is created by default and provides root level access to the system drive, and is locked down by default for this reason
you are correct that they are just SMB shares like any other. They can be removed, though many management processes across different applications assume that those shares will be present
In UNC paths you can append “$NOCSC$” to the hostname to force the client to bypass the “Offline Files” cache. (There are probably other wild undocumented bits like this one hiding in other places in the Windows stack.)
I don't recall. Like the other reply to you says, these get leaked in support, etc. I'll also run "strings" or even Ghidra on closed-source binaries when I'm troubleshooting issues. There's usually good fun to be had from Microsoft binaries doing that. I've discovered undocumented debugging switches, registry entries, etc.
(In version 10.0.19041.985 of cscsvc.dll in Windows 10 I'm seeing the string "If you hit this breakpoint, send debugger remote to BrianAu." Presumably that's "Brian Aust", referenced in a chat[0] re: Offline Files.)
I knew windows filesystem layout was super bonkers when I had to explain to fellow devs that on a 64-bit machine, you put the 32-bit libraries in SysWow64, and the 64-bit libraries in system32.
This is a great article and really illustrates just how hard Windows works to be backwards compatible.
Lots of these (eg: the COM/LPT stuff) could be dropped and wouldn't affect most people either way, but for those things depending on it, it would be a profoundly breaking change.
'echo foo > COM1' returns 'The system cannot find the file specified.' on Windows 11. (Machine doesn't have a COM1; if this wasn't being redirected to the port, it'd have gone into a file of that name.)
I think what's missing from this discussion is an emphasis on how layered Windows paths are.
The Win32 paths are like an emulation layer. They parse the given path and produce a kernel path. Win32 implements all the weird history you know and love as well as things like `.` and `..`. You can use the `\\?\` prefix to escape this parsing and pass paths to the kernel.
The NT kernel has paths like `\Device\HarddiskVolume2\path\to\file`. NT paths are much simpler. There are no restrictions on paths except that they can't contain empty components (notably they can contain nul). At this layer, `.` and `..` are legit filenames.
However, it's the filesystem driver ultimately says what's a valid filename and what isn't.
> Say you, for whatever incomprehensible reason, need to access a file named .., which would normally be resolved to the parent directory during normalisation, you can do so via a literal device path.
Oh no. No. Windows allows files to be named `..`?!
But seriously: no, at least not on NTFS. This filename does have trailing space. Though it is enough to defeat Explorer, you cannot move or delete it and properties window is broken.
Preferring a minimal look (and being immature) my desktop shortcuts for "This PC" and "Recycle Bin" have been renamed with two of the many invisible characters that windows allows.
I also routinely use single extended unicode characters as root folder names and identifiers for various purposes.
Using a search programme 'Everything", it's a lot easier to find things if I use something like pilcrow symbol as the root folder for any directory dedicated to text documents, when the alternative is to wade through results for 'documents', 'text', 'reading' or any combination of those words.
For the same reason, I find I can make much more memorable associations. It helps me harness things relationally. I can preserve uncertainty and avoid the frustration and negativity of trying to make shades of grey and rose fit black and white patterns. It does sound a bit new age, but there's no doubt in my mind, flat heirachical alphanumeric patterns are restrictive, prescriptive, insufficient. For example, a lot of artists actively work to defy pidgeon holing. I still need identifiers.
I mean, even if I wasn't into 'bleeding edge' culture, restrictions, problems and frustrations are the normal experience. I think this is illustrated by the unsatisfactory experiences that people find when they try to make id3 tagging "work".
It's as close as I can get to banishing the pervasive 'what-if' heartbreak of WinFS being cancelled. Sadly it doesn't help at all make up for what 'Semantic Web' promised. But that's probably why I'm a believer in GPT and the like.
Is it just me that can't help thinking they are products that have arisen from the need to make non-semantic computing useful again?
Yes you can create files named `.` and `..`. However, any sensible filesystem driver will reject that name (spoiler: there does exist drivers that aren't sensible).
Under unix, if you create a symlink to a directory, e.g. `~/syslogs` is a symlink to `/var/log`, then `..` can be used to traverse the "true" parent directory. So `~/syslogs/../lib` will traverse `/var/log/..` and refer to `/var/lib`, not to `~/lib`.
However, a "normalising" path interpreter will just take something like `~/syslogs/../lib` and change it to `~/lib` without consulting the filesystem.
Given that (AIUI) Windows has supported symlinks for a while now (?), it's possible that files called `..` aren't actually allowed, but the ability to access `..` is still necessary.
(Notably, the article does point out that filenames ending in `.` are disallowed - which should exclude `..` as a name one can give a file.)
Just using Cygwin will show that file names can indeed end in periods (and spaces). The article is very much restraining itself to the standard limitations imposed by the Win32 API, but not what the operating system actually allows. Case sensitivity has always been a thing, since Windows NT 3.1, for example; the "forbidden" characters are not so forbidden with the right file access APIs.
No coverage of this nonsense would be complete without also mentioning that CON, AUX and PRN and a couple of others are verboten as filenames in Win. Although apparently you can defeat this via e.g. \\?\C:\con
One of the early stupid annoying teenager programs I wrote was a tool that would spam your desktop with CON.001, CON.002, and so on through the \\?\ trick.
Windows explorer could not delete the file. You have to specify the \\?\ path to get the delete call to work, but that didn't work well with cmd.exe's `del` command.
I've since used these files to create directories that can't be deleted by automated cleanups and such, like a special folder in %TEMP% that one program needed but didn't create on its own.
The 260 characters limit has been the bane of my existence. Even though it can be disabled in the registry there are gazillions software built against old APIs that will still not work. You also get really odd bugs when that plague hits you.
Not only there. Windows originally created them on disks that supported long names, too.
That was necessary to support the use case where an older OS tried to read the disk (could happen because the user rebooted into an old DOS, for example, or if an external disk was moved to a different computer)
“VFAT, a variant of FAT with an extended directory format, was introduced in Windows 95 and Windows NT 3.5. It allowed mixed-case Unicode long filenames (LFNs) in addition to classic 8.3 names by using multiple 32-byte directory entry records for long filenames (in such a way that only one will be recognised by old 8.3 system software as a valid directory entry).
To maintain backward-compatibility with legacy applications (on DOS and Windows 3.1), on FAT and VFAT filesystems an 8.3 filename is automatically generated for every LFN, through which the file can still be renamed, deleted or opened, although the generated name (e.g. OVI3KV~N) may show little similarity to the original. On NTFS filesystems the generation of 8.3 filenames can be turned off. The 8.3 filename can be obtained using the Kernel32.dll function GetShortPathName“
Right. Back in the 90s I worked on a network server to allow AppleTalk clients into DOS or OS/2 based networks. The Mac users enjoyed their filename freedom but the PC clients had trouble with the super-weird 8.3 short names. You couldn't really tell what the Mac filenames were.
The other direction worked great, though, DOS filenames always worked on the Mac side of the network.
I realized at some point that there is a discrepancy between what's allowed on the file system, and "Windows" itself (or, more exactly, the programs running on Windows and using its APIs to communicate with said file system.
In this case, NTFS, totally allows for "illegal" characters such as < > : " | ? * etc... pretty much everything except / and \, and \0, I think.
This makes for funny situations, where sometimes Windows programs cannot deal with that. At best, they can't read, write or rename them... at worse they'll crash, which is always fun.
This got me to try out out the Fileside app (an Explorer/Finder alternative) from the blog post author. It's available for Windows and Mac.
I think this is an interesting space with room for innovation.
Fileside starts out with a grid of four directories: Home, Documents, Desktop and Downloads. You can customize and name new grid layouts that are shown in a sidebar for quick switching. This seems like a neat idea for specific recurring manual workflows.
It's doesn't seem to be targetted to the minimalist crowd. Directory entries beginning with a dot are visible (but greyed out). Full Unix-style permissions are shown for each entry, etc.
It looks like it's Electron-based and implemented in a javascript SPA framework. It doesn't use the default system font (SF Pro) on Mac. A bunch of other things also don't look or behave as you expect.
The font weight in the size column maps to each file's relative size. All the way from very thin to very bold. Kind of cute.
The path completion seems pretty good - as could be guessed from the blog post.
I think this app sometimes confuses power with details/verbosity. There are some gold nuggets in there though.
the worst thing about windows paths is how unintegrated it all is. You can navigate into all kinds of weird paths in the win32 com shell (file explorer), which is itself possibly the pinnacle of executed design MS ever achieved. But those paths you build.. You can't copy them to the clipboard, you can't serialize them, you can't move them between various tools, and particularly not to the command prompt nor to socalled Power"shell. If there ever was a continent of independent fiefdoms, windows is it :-/.
If you don't know what i am talking about, try navigating to your android phone's image folder in file explorer. Next, then try to USE that path in powershell or cmd, to copy those image files.. good luck.
there must certainly have been some moron in charge to make SURE things couldnt interoperate on windows.. in spite of them having explorers design.
I believe it’s because it’s executing a Windows program (ssh.exe) located in Cygwin’s mount for your C: drive and that program therefore expects Windows-style paths.
There is also this can of worms regarding translations.
For example C:\Users will be shown as C:\Benutzer on a German system but still be C:\Users on the FS.
"C:\Program Files (x86)" will show up as "C:\Programme (x86)" BUT "C:\Program Files" will stay the same (non translated).
(I forgot how this topic is called though and on what layer it takes places.)
Windows XP already could do it, but didn't for the most part. I remember, like ~18 years ago, I was a sysadmin. One user out of 50 got his "My Doc/Pictures/etc." in English but for me on the File-Server it was all in German. Very confusing.
It's the shell (i.e. Explorer) showing a localized name. Although I think in Windows XP it was still baked into the language you installed Windows with and stuck with that.
That's because the Unix convention is far simpler: everything except / and \0 (the null byte) is allowed in filenames, which are also case-sensitive (exact byte match).
Amiga used to have the best file-path convention I've ever worked with: drive:dir/file where driver was not a letter like in Windows but rather the drive's label, so you could have Work:Pictures/HN.jpg.
To go up to the parent directory you had to use an additional slash. So /xxx was the equivalent of Windows ..\xxx, and you could add more slashes to go further up in the directory tree:
Work:a/b/c/d////file was the same as Work:a//file.
Drives could be "virtual" ones, similar to "bind" mount points in Unix, that could be associated with multiple positions. E.g. you could assign both System:Libs and Data:MyLibs to the virtual drive LIBS: so that LIBS:xxx would match a file called xxx in either directory.
Files used to have a comment field to store kind of extended attributes, but was seldom used IIRC.
Wildcards were quite unique, I think ? was like regexp . meaning any char, and # like * but prefixed meaning any number of the _following_ char, so that #? would match anything.
I'm sure there were other niceties I can't remember right now.
Ahhh, the memories of naming my folder AUX and then having my Uni professors with administrative privileges not being able to access my files on that folder. Drove them crazy because they thought there is the secret of the Universe hidden there and they demanded I let them see what's inside, when in reality I only wanted to show off and had nothing at all. Novel Netware on top of DOS, year is 1994 - good times.
.NET Framework to .NET Core transition, Xamarin.Forms to MAUI, XNA, the multiple rewrites on the WinRT platform since Windows 8, Windows 8 to 10 users pace drivers framework, .NET Native, C++/CX, Win2D, WinRT for XBox replaced by WinGDK, .NET Sharepoint Framework via a JavaScript one,
I still don't think you know what backwards compatibility is. you are saying anything necessary to shut me up and concede and this point, and since you are confused about backwards compatibility, you are not correct.
backwards compatibility is not about keeping all features once supported in visual studio in all future versions. that is forwards compatibility. Microsoft does not do that.
we are talking about backwards compatibility: the ability of new operating systems to run software unmodified which ran on old versions of the same operating system.
This might be some run-of-the-mill weirdness, but I was using a Microsoft file globbing library recently (https://learn.microsoft.com/en-us/dotnet/api/microsoft.exten...) and it handed back filepaths with forward slashes (instead of double-backslash), even on Windows (and in .Net Framework). I don't know if this is a library someone developed for .Net Core and forgot that it also was going to be used in .Net Framework? Anyways, another reason I don't like the occasional dip into Windows I have to do at my job (which is 80% mac/linux).
It'd be nice if Microsoft read this list and adjusted their software, like perhaps File Explorer, to be able to read and write this data. Or at least delete it.
A long time ago I found a hidden directory on my file server that some student had created to store their pirated software. This was back in the days of DOS and Novell NetWare.
Turns out you could create files with "illegal" (and invisible) characters in the filename. The standard OS utilities would not allow them, but the underlying file system did not care. So you could write a short program to do it.
They should have fixed that when they went to long file names. It's ridiculous that you can't name a file with its contents' actual title. Random example: http://doi.org/10.1145/327070.327153
Connecting from another operating system that allowed names like that to be corrected to a Windows share.
There are a few other possibilities where you boot to Linux using a FS driver for NTFS that allows you to create illegal file names. And/or odd things like WSL/Cygwin.
Possibly the most annoying thing about windows w.r.t. this topic is you need a large physical C drive to accomodate future windows bloat. It is very hard to get anything installed to put data on another physical drive. It is impossible to extend C to
another physical drive. Realistically you need a 1TB SSD/NVMe as your primary drive. So if you get a laptop you then usually need to get a high end one to offer you that.
If they could slowly start adopting Unix file paths and slowly phase out Windows file paths, I think more people would start to use Windows. I would love to want to use Windows. It's a platform that has first class hardware support and paid support, and it's designed (well, in theory) with users and application platforms in mind. And it actually has a bunch of advanced features that Linux and Mac doesn't have.
Windows was built with a POSIX layer; it already does this.
"Broad software compatibility was initially achieved with support for several API 'personalities', including Windows API, POSIX, and OS/2 APIs – the latter two were phased out starting with Windows XP."
you can create files and folders with illegal names like '.' or '..' with python. comedy ensues if you try to recursively go into . directory and explorer crashesh -_- . despite windows having rules, appearently not everything needs to play by it despite giving these expectations
Add \\wsl$ to the list of "weird paths". This one fits in the general schema the article is about but has the special meaning of "your files in the Linux VM hiding in the Windows box" (aka WSL). It's running as a Plan9 network filesystem.
Is it a weird path? It's just a simulated computer with hostname wsl$. It looks like this is just a network share server running on the loopback interface. Similarly, \\wsl.localhost\ will also show you your WSL files. The dollar sign has some weird special meaning to the Windows security system (trusted account, local computer, etc., in AD it's related to this mess: https://techcommunity.microsoft.com/t5/core-infrastructure-a...) but it's essentially just a standard Windows UNC paths.
One of these tripped me up, the "Disallowed Names" section has a bunch of not-too-wordlike names, except one.
When I made carefulwords.com, which I made because I wanted a thesaurus where you could just write eg https://carefulwords.com/book for "book" and get the results, I found out the hard way that you cannot make a file named "con" on Windows. Or "con.html", or any file extension. You can try to force this, make it via a script, but then programs like Git will hang when they come across it. So in my thesaurus the actual page is /con-word.html and I just have it rewrite to /con
This has actually changed in Windows 11. You can use "con.html" without fear. "con" is still a bit of a problem. ".\con" will work but not a bare "con".
It's not really a new version of Windows if it does not introduce a new variation of universal path addressing to end all variations of universal path addressing.
Recently we had a bug in an electron app that couldnt call exec on a filepath that contains a space in windows. After 2 days of researching and trying, i still don’t have a solution. I reckon adding nodejs to the fantastic world of windows file IO does not make it any easier.
As a related anecdote, you NEVER want to put whitespace in filepaths. Ever. Tons of programs will just break, even in 2023, even Microsoft ones (looking at you Powershell), and i'd imagine forevermore.
Your life will end up as a series of awful MYPATH~1\ kludges. You have been warned.
EDIT: There seems to be different camps here and perhaps a generational divide. Maybe the kids haven't (or never will be) burned with this one, but i've seen too much, wasted too many hours, wrote too many workarounds, and will forever remain #TeamNoWhiteSpace.
Counterpoint: Nearly all of my (thousands of) users use whitespace in basically every single folder name and file name every single day (which are usually titles of papers or experiments), and none of them ever have any problems.
When writing software I always make sure that all my test data has spaces - and not just the normal one but weird unicode ones too - in paths and filenames.
I grew up in days of DOS 3 and onward, so I am basically physically incapable of using whitespace in filenames. And frequently I'll replace white spaces with underscores on files shared with me.
Fascinatingly though (and sometimes irritatingly), several of my mentors have cautioned me to drop this habit as I get promoted. At management level, they all use white space and dots haphazardly, and they apparently perceive underscored filenames negatively. This goes up drastically at executive level.
That's... definitely not as dramatic as you try to make it sound. "Program Files" and "Documents and Settings" have been around for almost three decades, and most programs work just fine with files living somewhere inside those paths.
And the "cherry on top" is there's an env-var named `%ProgramFiles(x86)%` (aka `${env:ProgramFiles(x86)}` in PS) that points to where that directory actually lives since AFAIK it's controlled by a registry entry
Really? You posted a divisive and absolute statement and people pointed out how it's not accurate. Considering "Program Files", a very common Windows path, was introduced back in Windows 95 (https://devblogs.microsoft.com/oldnewthing/20120307-00/?p=81...), almost 30 years ago, it's pretty hard to blame it on "the kids".
That doesn't sound like "those damn kids and their newfangled apps that can handle spaces, they'll never know the pain I went through!!!" and more like "I said something that was clearly not accurate and people pushed back".
I've had lots of programs, libraries and scripts freak out from having whitespace in a file path.
> I said something that was clearly not accurate and people pushed back
Is my life experience invalid? It's a needless error that i never want to deal with again. I cover my mouth when i cough, i use my turn signal when changing lanes, and i don't put whitespace in a filepath or a URL. It's that simple.
What i cannot fathom for the life of me, on HN of all places, is vehemently defending a practice that is not guaranteed to work 100% of the time: "Well iiiive never grazed an oven coil pulling a potroast out of the oven so obviously this guy is an idiot for advocating oven mitts." Give me a break dude.
It’s almost as if, get this, it’s not a problem, let alone anywhere even remotely in the vicinity of a problem of the scale and magnitude you claim it to be.
It’s almost as if people in 2023 get by fine with spaces in their filenames whereas you seem to be stuck squarely in the 1980s.
I know, it’s a crazy idea. Those kids and their insanity. /s
You know, my main hobby is writing ASM for classic video game consoles, and my opinions and experience involves lots of janky / homemade / antiquated programs, so honestly you're not wrong =p
Microsoft called the directory where programs are installed "Program Files" and documents in "My Documents" specifically so that developers would have to learn to deal with spaces immediately.
Curious about PowerShell misbehaving there. Care to share any details? Typically you never want to handle paths to files as strings, or at least resolve them to File/DirectoryInfo objects as soon as possible. But the only real blunder regarding paths in PowerShell I'm aware of is the Path parameter to various cmdlets and having [] in file names (which is why LiteralPath always exists alongside Path parameters).
Came here to mention []. For those who haven't looked it up, in addition to * and ? as wildcard characters, PowerShell supports "[xyz]" and "[a-z]", meaning "x y or z" and "anything from a to z", so "[a-z]alls" would match calls, falls, walls, etc.
I knew the guy who tested the feature in PowerShell and he was deeply frustrated over the fact that there was no way to escape some sequences. For example, you can loosely match "[abc]" with something like "?abc?" or "[[]abc?" because the [[] indicates exactly one opening square bracket but there's no way (as far as I know) to say "this section ends with a square bracket".
The PM really wanted that feature, though, even though you could probably count on one hand how many times it's been used in real life.
Ah, OK. This works because you're using single quotes on your strings, meaning the escape sequence `[ is preserved until it reaches the globbing layer. This will not work, for example...
gi "`[abc`]"
...because the `[ will be processed before the string gets sent to globbing, meaning the back tick will be removed. This would work with double quotes:
gi "``[abc``]"
...because at the command line level this will evaluate to `[abc`] and the globbing will know that the square brackets are literal. So I will concede and downgrade my complaint from "impossible" to merely "overly complicated for the layman".
Man, it's never that simple. How about generating Powershell scripts with quotes? Sometimes you are dealing with escape characters \", sometimes you aren't. Why bother with all that? Hell, versions of software and compilers change such behaviors either on purpose or by accident all the time. It just gets messy dude. C:\NEVER_~1\AGAIN_~1.NOP
Perhaps i'm a dinosaur still scarred by the golden olden days, but, there is no convincing me that whitespace in filenames or URLs is ever a good idea.
> You gave two examples that aren't actually issues.
Man, i've had config files break because i saved them in UTF-8 instead of ANSI. I hope to god you never have to experience the horror... You also have a nice day.
Put that into a desktop.ini file, assign it the right attributes (`attrib +S +H desktop.ini`) and your folder will look right in any program that uses the shell API without any risk of breaking programs.
All I can say is, this article is the tip of the ice berg on Windows I/O weirdness. You really don't realize how strange it is until you are actively comparing it to an equivalent implementation on the two other competing operating systems day-to-day.
Each of them has their quirks, but I think Windows takes the cake for "out there" hacks you can do to get things running. I sometimes like to ponder what the business case behind all of them was, then I Google it and find the real reasons are wilder than most ideas I can imagine.
Fun stuff!