Troubleshooting DLLs

If you get error messages about a DLL file every time you do a particular thing (such as run a particular program), or you get crashes every time you do the same thing in a particular program, then you may have entered "DLL hell". The former's a lot easier to troubleshoot, as you at least have a file name to deal with; the latter has other possible causes and needs a wider approach.

Distinguish these situations from other scopes; erratic problems across a wide range of activities that don't happen every time (more likely a more generic problem, e.g. bad RAM or malware) or problems that restrict themselves to a particular set of data files (typically all based on the same template, or derived from one parent).

What is a DLL?

DLL stands for "Dynamic Link Library", and are ways of packaging code so that it can be shared with other programs. This was hailed as a Good Thing in the Windows 3.yuk era, when code files formed a substantial drain on hard drive capacities of the time. Some .EXE files are used in the same way as .DLLs, and both may be infected by Windows code malware. Unlike .EXE files, a .dll has no entry point to be run as a program itself, but you can use RunDLL.exe or RunDll32.exe to call functions within a .dll as if it were a program. This doesn't always make sense, of course; many functions only work in particular contexts.

Some .DLL files are unique to an application, but there are several core .DLL files included with various software development environments, and others that are part of Windows itself. The distinction is blurry; for example, one might think of OLE32.DLL, OLEAUT32.DLL etc. as "Windows files" and MFC42.DLL, MSVCRT.DLL, VBRUN100.DLL (MFC = Microsoft Foundation Classes, VB = Visual BASIC) etc. as being part of software development libraries - but both can come with Windows and both can be updated on an ad-hoc basis as side-effects of other software installations or upgrades.

Why are DLLs a problem?

The trouble is, .DLLs get updated with newer versions of software, and it's easy to end up with "version soup". If a newer program calls a function in a .DLL that isn't there (e.g. one added in a later version of the file), you'd expect an error such as "call to undefined function..." or "call to unlinked function..." or "missing function export...". But if a program assumes a particular version of the .DLL, and a newer one behaves differently, you might crashes, etc.

Some installers may splat an old version of a .DLL over a newer one, or install a newer one over an old one. Only the latter is supposed to happen, but you know what software is like. If the file already exists and isn't older, the installer's expected to increase the reference count of the file's registry entry, but not overwrite it.

Uninstallers are supposed to decrease the reference count, and only delete the .DLL file if this counter reaches zero. Two things can go wrong here; bad uninstallers that just delete the file, and there can be the delayed results of bad installers that didn't increment the count, when the count reaches zero prematurely. If you "scrape over" programs without formally installing them, then you can fall into this trap; the scrape-over may work if the required .DLLs happen to be there, but the reference count won't be incremented, so...

Because .DLL files are code that is intended to be shared by many programs, they should seldom if ever be written to (unless being replaced with new versions). So they should rarely be corrupted by "bad exits" from Windows, but anything that causes wild writes or interrupts a defrag of the file system can break the files. As code, they can also be infected by viruses.

A way of troubleshooting "DLL Hell"

Firstly; try to think back if the problem started just after installing or uninstalling something. Then, look to the logs of Scandisk and antivirus activity, to see if any particular files were "fixed" or cleaned.

If Scandisk "fixes" a file that was cross-linked or had an "incorrect length", then you can be fairly sure that file will be damaged (even though the file is now "fine" as far as Scandisk is concerned), and half a .DLL is no bread - in fact it may be a one-way ticket to BSoDsville.

If an antivirus cleans an infected code file, then that file won't be an infectious risk anymore - but it may not work properly either. Typically if you do a binary comparison (e.g. using FC /B) with the original file, you will find it differs, and the difference may or may not matter. Viruses infect code in different ways, sometimes overwriting code, or simply posing difficulties to the repair process. Other malware may replace the contents of the file completely, much as eggs laid inside a cocoon may replace whatever was metamorphosing there.

Tip: Do not allow anything (or anyone) to "fix" or "clean" files without logging these changes!

Here's a generic way to troubleshoot .DLL hell (e.g. for this case, MFC42.DLL)

  1. Start, Find, on all HD volumes; MFC42.DL?
  2. Right-click each, Properties
  3. For each, check size in bytes, time/date, Version tab, location
  4. Ballpoint all info from (3) somewhere
  5. Copy each to another directory (open another folder window, Copy/Paste)
  6. Name each copy from (5) differently, e.g. MFC42.DL1, .DL2, .DL3...
  7. Now rename each active .DLL in the Find in the same way
  8. The one you aren't allowed to rename is the active one
  9. Rename away the active one from DOS mode

Now you have no copies of that .DLL active, you'd expect the affected applications to not work. Try it, and make a note of the error message you get - remember, error messages are your friends!

Consulting your ballpointed notes and looking at the copied files now conveniently gathered in one directory, you'd see that some files within your MFC42.DL? series should be identical, given that they may have identical version tab info. Any files truncated to zero bytes are obviously dead; check the others using FC /B to see if they are in fact equal. If the shorter file has a byte count that is a round multiple of the volume's cluster size, then you are probably looking at Scandisk "fix" truncation damage.

Now it's time to try each of the multitude, one at a time, to see what works. Repeat steps (1) and (9), renaming a different one of the .DL? sequence into active .DLL form, then retry the program. If you get the same error as you had when none were active, you may need to copy that active .DLL to the Windows or System directory.

You may also need to shutdown and restart Windows between tests, ideally between renaming a file as active, and testing the application that needs it. If the .DLL is already loaded into RAM, then the changes you make may be ignored for the rest of that Windows session runtime.

If you need newer (or less broken) .DLLs, they may be available within the pre-installation self-extractor, CD-ROM, .zip file or .cab set, or (in the case of common development libraries) by download. Choose a trustable source there!

If your testing is ambiguous, e.g. one program works with one .DLL version, but other programs only work with another version, then you have the classic "DLL Hell" situation. This may work out OK if you place copies of the required .DLL in same directory as each of the feuding programs. Recent versions of Windows address this problem in various ways, e.g. XP has the "Side by Side" feature that lets each application see the .DLLs they want, without affecting everything else. This is nearly as good as the DOS days, when programs didn't drool their files into the operating system and stayed strictly in their own subtrees.


(C) Chris Quirke, all rights reserved - April 2003

Back to index