Long File Names

Long File Names (LFNs) were introduced in the original Windows 95, and remain a significant compatibility issue to this day. But to understand LFNs, you have to understand how normal file names work! A good learning tool for this purpose is "DirSnoop", which can be downloaded from the 'net. This views directory entries in their raw form, and it can be fun to associate DirSnoop as a non-default right-click action for "File Folder" (Win9x-speak for "directory").

The internal structure of LFNs is more rigorously documented elsewhere on the Internet; the focus here will be on practical things that go wrong with LFNs and how to anticipate and manage these. It's also crucial not to get confused between LFNs and other Win9x file system compatibility issues, such as FAT32 or VFAT (e.g. DOS Compatibility Mode).

8.3 names

Every file system data object is pointed to by a directory entry of 32 bytes in length, which contain information about the file; name, address of the first cluster (if any), length of the file in bytes, time and date stamps, and a set of attribute bits that determine whether the file was recently backed up, should be displayed, may be written to, etc.

But not every data object within the file system is a file - some of the attribute bits are used to signify subdirectories or volume labels. There should only be one volume label per disk volume, found in the root; however, subdirectories may abound. A subdirectory is stored like a file, except that the data clusters of this "file" are interpreted as further lists of file system objects.

There are 11 bytes set aside for the object name in a directory entry. All 11 can be used for volume labels, but in the case of files and (sub)directories, the 11 bytes are interpreted as 8 bytes of name, and 3 bytes of extension (the "three letters after the dot" that determine what should be done with the file). This pattern of 8 name characters plus a 3 character extension is referred to as the 8.3 naming convention, which also excludes spaces, certain other characters, and uses only upper case letters internally. It's quite a restrictive convention to live with, if you want to use meaningful names; hence the challenge to add long file name support while remaining compatible with programs bound to 8.3 names and conventions.

Long File Names

LFNs are stored as directory entries with an "impossible" combination of attribute bits - thus causing them to be safely ignored by most software written before Windows 95. Like volume labels, they point to no data clusters; in fact, they are pure character data, with no other information. Each character takes two bytes of space, so that LFNs up to 16 characters can be held within a single directory entry; longer names take up additional entries.

Spaces can be used, but some characters are still reserved for use as delimiters or redirection symbols. Both upper and lower case letters can be used, although the case is used only for display within the Win9x system. An LFN is generated whenever a file name that is invalid as 8.3 and is not in ALLCAPS is created within Windows through the Win32 API. File names that are 8.3-valid, but not in ALLCAPS, have a "cosmetic" LFN to preserve the letter case for display - something that becomes important when uploading to UNIX-based servers or web sites.

It's important to remember that the LFN is used in addition to the 8.3 name within the Win9x file system, and is associated with it by virtue of its position in the directory. All other information about the file (length, cluster address, etc.) is stored within the 8.3-named entry. When the file is stored in contexts other than the Win9x file system, this may not be the case; for example, only one name is used within a .zip archive, and the other name is lost.

Other file systems handle longer names (with or without spaces) natively, but may have limitations of their own, e.g. a file system used on CD-ROM may not accept names that start with a space. But few file systems support or store alternate names for the same object.

LFN Risks and Issues

These include software compatibility issues, data corruption risks, slight performance degradation, and problems with command line parameter management:

Pre-LFN disk utilities

Whereas "normal" pre-LFN programs will ignore LFNs, some programs that are written to manage or repair the details of the file system will trip over them. File system repair utilities such as MS-DOS 6.x Scandisk, pre-LFN Norton Disk Doctor, etc. will flag them as illegal entries (as indeed they are, within pre-LFN file systems) and may delete them; pre-LFN "directory sort" utilities will move them away from the real 8.3 named entries they are supposed to be associated with, as will pre-LFN defraggers such as MS-DOS 6.x Defrag and pre-LFN Norton Speed Disk.

While the file is still intact and accessible via the underlying 8.3 name, problems arise when inter-file links are broken or when it's not possible to guess the identity of the file (e.g. which "Invoice *.doc" is INVOIC~9.DOC ?). Windows uses LFNs internally (crazy, but true) so that the OS may not be able to run if these are mangled.

Pre-LFN software

Pre-LFN software won't see LFNs; instead, the raw underlying 8.3 names are seen and used. This can create problems if such files are copied, moved or renamed; changing the 8.3 name has the effect of blowing away the LFN associated with it. Under these circumstances, a rename will cause the LFN to change to the new 8.3 name; a copy or move will simply leave it behind.

This difference is useful when one needs to deliberately strip away LFNs, either for performance reasons or for use on disks that are destined to be maintained by pre-LFN environments.

DOS and DOS Mode

Anything prior to WinStart.bat within the startup axis cannot see LFNs, and neither can Win9x DOS mode or previous versions of MS-DOS.

The only exception to this are utilities that access the directory entries directly, rather than relying on API calls. Examples include the rather hairy Microsoft LFNBack.exe that is on most Win9x CDs, and the safer and better 3rd-party downloadable freeware DOSLFNBk.exe; both of these can be used to backup LFNs from within DOS mode.

LFNs used as 8.3 name data

There's a need for DOS Mode archivers of backup utilities that can "see" LFNs, but you can go seriously wrong here. For example, the freeware InfoZip might appear just the ticket as it sees LFNs and runs in both Windows and DOS Mode, and the .zip file standard itself is quite comfortable with LFNs.

However, if you create a .zip such that this contains LFNs, and then extract it in a non-LFN-aware environment (such as PKUnZip 2.04g, or InfoZip in DOS Mode), you will get neither the original 8.3 name (which was not stored in the archive) nor the LFN. What you will get is an 8.3 name taken as-is from the LFN name data; a name that may contain lower case letters, spaces, or other illegal characters. For example, the name "My Documents" might be extracted as "My Docum.ent", and as that space is within the actual 8.3 name itself, it will cause problems in both DOS Mode and Windows.

For this reason I prefer to use PKZip 2.50 (PKZip25.exe) rather than InfoZip, as it will refuse to operate in DOS or DOS Mode.

Numeric Tails

One of two Great Discredited Registry Hacks (the other being killing off IsShortcut as a way of hiding icon shortcut arrows) involved a setting that changed the way 8.3 names were generated from LFNs when creating names.

The LFN-to-8.3 naming method is:

Spaces are stripped, then the first 6 characters are used as the name stub, followed by a tilde (~ or "squiggle") and next required digit {1,2,3...}, then a dot (not stored internally) and the first 3 characters following the last dot in the LFN. The digit chosen will be the lowest that avoids a same-name clash with an 8.3 name already present in that directory; if all digits 1-9 are taken, the name stub is shortened and the number takes an extra digit to the left (i.e. ...NEXTIS~8.EXT, NEXTIS~9.EXT, NEXTI~10.EXT...)

Examples:

"A Long File Name.a.b.c.d" -> ALONGF~1.D

"My Safe Picture.gif.exe" -> MYSAFE~1.EXE

"My Safe Picture.gif.executable" -> MYSAFE~2.EXE

If the registry setting to disable numeric tales was added, these names would be created differently:

"A Long File Name.a.b.c.d" -> ALONGFIL.D

"My Safe Picture.gif.exe" -> MYSAFEPI.EXE

"My Safe Picture.gif.executable" -> MYSAFE~1.EXE

Where you have ambiguity like this, you no longer have a function (i.e. a process that generates only one result) and situations arise where "behavior is undefined".

Ambiguous LFNs

Any LFNs that have the same first six non-space characters and extension will generate the same name stub. The numeric tail is generated by enumeration rather than identity, so it is a matter of what order they were created in that determines which is called what. For example, consider this:

"Invoice 001.doc" -> INVOIC~1.DOC

"Invoice 002.doc" -> INVOIC~2.DOC

"Invoice 003.doc" -> INVOIC~3.DOC

"Invoice 004.doc" -> INVOIC~4.DOC

Now, "Invoice 003.doc" is deleted. Will "Invoice 005.doc" be INVOIC~3.DOC, or INVOIC~5.DOC? If these files are linked to from an Excel 6 spreadsheet (which is LFN-unaware), will the link to "Invoice 003.doc" now point to "Invoice 005.doc"? If this directory is backed up, and then restored elsewhere with the files created out of sequence (or into a dir that already has several "Invoice ???.doc" files present) will that spreadsheet's links be anything remotely sane?

Moral: Don't use same six characters for lots of file names; it also bedevils data recovery!

Life would be a lot cleaner and simpler had Microsoft called "Program Files" and "My Documents" "Programs" and "Docs" respectively. That avoids both parsing issues and the problem of ambiguous 8.3 names. For example, if you were to backup in this order...

"C:\Program Files" -> PROGRA~1

"C:\Program Fools" -> PROGRA~2

...and restored these in this order...

"C:\Program Fools" -> PROGRA~1

"C:\Program Files" -> PROGRA~2

...then apps that refer to their paths via 8.3 names (e.g. anything involving .inf handler, AutoExec.bat Path, etc.) will get the wrong directory and won't work.

This isn't such an unlikely scenario as it sounds; it's common to rename away a "Program Files" when doing a parallel Windows installation, and if you'd renamed it to "Program Files old" rather than, say, "ex-Program Files", your new installation might track PROGRA~2 instead of PROGRA~1. That might cause problems when you try and integrate the two into one working installation.

This enumeration-vs.-identity dilemma is a basic info-theory boo-boo that recurs in Plug-n-Play and drive letter management. It is one cause of problems that can arise after restoring Windows-based backups; the others being files that were open or in a dynamic state when the backup was made, and inter-file inconsistencies due to processes running within the backup period.

The only way to really backup all the information within the file system (while regenerating actual cluster positioning) is to back everything up outside of Windows, and use a separate process to backup the LFNs (e.g. DOSLFNBk.exe). Most of the time it works out OK, as long as you don't have hybrid LFN/8.3 access to similarly-named data files.

Ambiguous LFN display

There are certain characters that can be valid within LFNs, but will not be displayed by the Windows interface (i.e. Explorer.exe in its various guises). Typically the trouble character is shown as an underscore ("_") character.

Suspect this if you have what appears to be two entries with the same name (having excluded a .pif or .lnk etc.) or a file that you can't seem to "get a grip on" ("not found" errors when trying to access or delete it).

As long as the entire directory is not deranged (genuine same-name entries are quite common in an insane file system) and the rest of the file's directory information is sane (i.e. not a 57G file on a 2G drive) then you can use the DOS wildcard approach to rename or remove it.

You may also have to do this in the case of invalid 8.3 names generated by processing LFN name data outside an LFN-aware environment.

False extensions

This has significant safety implications, and is already exploited by malware. Because you can have as many dots within an LFN (the dot is a valid character under LFN rules, though not under 8.3 rules), you can get misleading names such as:

"LifeStages.txt.vbs"

"My Safe Picture.GIF.pif"

"Zipped_Files.zip.exe"

Couple this with the Microsoft default practice of hiding file extensions for registered file types, and you have a recipe for disaster. An .exe can have any icon embedded with it, so it's trivial to create a Zipped_Files.exe with a WinZip icon within - looking just like a "safe" archive.

The problem is compounded when certain dangerous extensions are hidden, regardless of how you set up Explorer; .shs, .shb, .lnk and .pif are all dangerous file types that fall into this category. Part of the problem can be managed by renaming away SHSCrap.dll so that .shs and .shb files cannot be processed by the system.

In this respect, Windows Millennium exacerbates the problem by making it more difficult to rename away system files (SFP replaces them on the fly) and by losing the facility to display the real 8.3 name via the file's Properties.

LFN bloat

Each directory sector can hold 16 entries, or 8 entries if all names have fairly short LFNs. Every subdirectory starts with two entries for the . (self) and .. (parent) pointers. A FAT32 volume under 8G in size will use 4k clusters by default, i.e. can hold 126 directory entries (or 63 with short LFNs) before having to link in additional clusters and thus potentially become fragmented.

But directories can often hold thousands of files, so fragmentation and slowdown are common. Fragmentation not only impacts performance, but increases the size of the double-zero on the dartboard (the time during which a crash will interrupt a file write operation and thus cause data corruption).

This is probably one of the reasons why users complain about "My Documents" taking long to "open", and is a reason why one gets fed up with programmers who blithely create thousands of temp files with lower-case names that generate cosmetic LFNs and thus double the bloat factor.

You can imagine the slowdown when processes have to create new, arbitrary-but-unique named temp files at the end of a 20-cluster fragmented mess of a temp directory.

Other scenarios where directory length (rather than file load) cause slowdowns are the case of a software botch that causes masses of zero-length .inf files to be spawned, and the Prolin/Creative virus that moves wads of .jpg and .zip files to the root of the C: volume. The latter causes slowdown on FAT32 volumes, and oddball errors on FAT16 volumes (as FAT16 has a fixed limit on the number of entries the root directory can hold).

The other form of LFN bloat is nonsense like "C:\Program Files\Microsoft\Common Files\Office\Microsoft Common Files\Some Common Files for MS Office\Version 10\Standard edition\Shared\Blob.dll". These gratuitously long paths break several backup utilities, CD file system standards, Path environment and Command.com parameter space restrictions, and are thrown up as errors by ScanDisk in DOS Mode.

Consider this if you were wondering why "clean up" batch files with lines like 'Del "C:\Windows\Application Data\Microsoft\Internet Explorer\Quick Launch\Launch Outlook Express.lnk"' don't seem to work.

Win9x internals

Strangely, some fresh-for-Win95 parts of Windows 9x are not LFN aware. A classic example in the .inf handler that is used when installing hardware drivers and so forth; it cannot list or find LFNs, and will often throw up a "where-is-it?" browse dialog when trying to access stuff that you'd pointed it to just one mouse click and dialog ago.

This has not got better, right up to Windows ME. It's odd, because the PnP and .inf handler is new to Win9x; it's not a legacy thing brought over from Win3.yuk - presumably it was one of the first things they developed and stabilized before kludging on LFN support.

Non-LFN volumes in Win9x

Sometimes users report they are unable to copy LFNs onto a particular hard drive volume; all they see there are 8.3 names. I've never seen this, but then I always do my FDisking and Formatting in DOS Mode.

I've read that this can happen if you do these actions within Windows, and then start working on the new volume while in the same Windows session; the problem goes away after restarting Windows.

You may also see this if you explicitly disable LFN support within System, Performance, File System, Troubleshooting for some reason. DOS Compatibility Mode and Safe Mode will not do this, however.

Parameter management

The space character is used as a parameter delimiter by Command.com, and the parameter parsing logic of many programs. This is countered by enclosing LFNs with spaces in quotes, so that whereas My Proctologist would be seen as two parameters, "My Proctologist" is seen as one.

Command line processors may add quotes, or not, and parser logic may strip quotes, or not. For example, you need to add an explicit "%1" to the command line for LView Pro, else it won't see associated files if there's a space in the file spec, but if you do that for IView, it won't see anything.

Consider this the likely problem when you have "unable to run Program" (for a reference to "C:\Program Files\SomePath\SomeApp.exe") errors on startup, or starting an application, or launching a file.

Consider this also when you see "can't find xxx" errors when the file you are trying to "open" happens to be in a directory with a space in it, or itself has a name with a space in it.

Somewhere; either in a shortcut, an .ini file, or within HKEY_CLASSES_ROOT, you will see a %1 that should be "%1", or a command line like "C:\Program Files\SomePath\SomeApp.exe" that needs an extra set of quotes. Exported .reg files have quotes around string values, so explicit quotes appear as "doubled" there. HKEY_CLASSES_ROOT takes the first parameter as %1, so adding an explicit "%1" via the "front door" has the effect of enquoting the auto-generated %1 parameter.

Makes you really wish they'd called it C:\PROGRAMS, doesn't it?

 

(C) Chris Quirke, all rights reserved - January 2001

Back to index