On Data Management

This document will cover the following topics:

Volume FAT types

Windows 95 OEM SR2.x, Windows 98 and Windows ME support two file systems for hard drives based on the File Allocation Table format; FAT16 and FAT32. Both divide the data area of a hard disk volume into a number of clusters in which file data is stored, and (unless disk compression is used) each cluster can hold data from only one file.

The 32-bit cluster address size used in FAT32 allows more clusters per volume, whereas FAT16's 16-bit cluster addresses limit this to under 65536 clusters per volume. This means that on large disk volumes, the size of the cluster has to be increased with FAT16. As every file stored on the volume has to occupy a whole number of clusters (rounded up), larger clusters mean more wasted disk space.

Each type of FAT has its pros and cons:

FAT16 cluster size will double at certain roll-over points:

Up to 255M - 4k clusters
256M to 511M - 8k clusters
512M to 1023M - 16k clusters
1024M to 2047M - 32k clusters

There's an example of how all this can be applied at the end of this page.

Partitioning

Space on a hard drive is divided into up to 4 partitions, each of which can belong to a different operating system. One of these will usually be marked as active by setting the most significant bit of its type byte in the partition table; this is the operating system that will boot if the hard drive is the boot device. DOS and Windows 9x support two types of partition; primary, from which the operating system can boot, and extended, which can be further subdivided into multiple logical volumes.

The FDisk utility is used to create and manage partitions and logical volumes, which then have to be formatted before use. Unlike some 3rd-party utilities such as Partition Magic, FDisk cannot change the size of a volume while preserving the contents thereof. In short, you should consider all FDisk operations other than "2: Set active partition", "4: View partition information" and "5: Select hard drive" as destructive.

When you start FDisk in versions of Windows 9x that support FAT32, and at least one hard drive is large enough for FAT32 to be deemed relevant, you will be asked whether you want "Support for large hard drives". If you say Yes, FAT32 will be used for "large" volumes, else FAT16 will be used. To create a mixture of FAT16 and FAT32 volumes on the same hard drive, you need to exit and restart FDisk so that you can give a different response to this question in order to "change gears".

Drive letter allocation

When the OS boots, IO.SYS allocates drive letters from C: upwards to all detected primary partitions, then to all logical volumes on all detected extended volumes. If a hard drive is found by Windows 9x GUI that was hidden from IO.SYS (e.g. was defined as "none" in CMOS), then drive letters are added to these volumes after those detected by IO.SYS, even if these include a primary partition.

This is highly relevant when adding a hard drive to an existing installation that uses logical volumes on an extended partition. If the hard drive that is added has a primary partition on it, and this has been detected by IO.SYS, then the logical volumes on the existing hard drive will be shifted up one letter higher. This will break the path information for all shortcuts, registry settings, batch files, ini files, private application settings, links within data files etc. that refer to files on these volumes.

To avoid this, you should either set the added drive up as an extended partition with logicals (and no primary partition), or leave the drive undefined in CMOS. I would advise the former approach, as this allows the drive to be accessible from DOS mode and isn't vulnerable to failure should CMOS settings be changed or default to "Auto detect" for some reason (such as a power glitch or CMOS-corrupting crash).

You may also need to consider the CD-ROM drive. By default, this will take the first letter after all hard drive volumes have been dealt with, but other devices added to the system (e.g. Zip drives) may "push in" first. For this reason, there is a case to be made for fixing the CD-ROM drive letter higher than the next available, so that should an additional hard drive or Zip drive be added, the CD-ROM stays on the same letter and pointers to CD files will still operate correctly. This is done in Windows 9x GUI mode via Device Manager, and in DOS or DOS mode via an /L: parameter added to the MSCDEx line. Note that should Windows Plug-n-Play redetect the CD-ROM drive for some reason, the Device Manager setting will be lost.

Finally, if you have reserved a drive letter or few via the above methods, you should take care that network drive mappings do not fall into this reserved area.

LAN file sharing

The "normal" approach is to FDisk the hard drive as one big happy C: and share the whole thing, with perhaps some subtrees shared redundantly so that drive letters can be mapped to these. This is a bad idea for several reasons, and is one of the main reasons why network administrators write off peer-to-peer LANs as "unmanageable".

Some factors that may be applied to rational sharing:

You can relax the "don't share subtrees" recommendation where the user will not have to use paths within the subtree, e.g. a single "inbox" directory, or a server-based archive repository that is not accessed from the system by users.

User data types

Files fall into four categories:

Generally, only the last two of these needs to be shared. User data is of these types:

Backup strategies for these three can differ markedly. Typical examples of the last (and most problematic) category include user preference settings, e-mail and fax mailboxes, address books and browser bookmarks, and incoming attached files.

A good strategy is to locate all "normal" user data within a single subtree that can be easily shared and backed up, and all "large" user data somewhere else. The "large" data may be either backed up separately, or redundancy-mirrored to "cold" space on another system on the LAN via a "run this at night" batch file. Where data of the third category is concerned, you would either make provision to extract just the relevant files from the application directory, or notify users that this is not backed up.

Traditional practice often clashes with this approach. Users weaned on old systems are often accustomed to working from diskette, while others will be used to saving data into the same directory as the application that created them. Finally, a depressing number of users have no concept of file locations at all, and just save wherever the dialog defaults to (then complain they can't find anything).

Each of these is a problem for different reasons, and requires re-education:

All of these issues will snap into focus when you attempt to put a LAN backup strategy into place, as will be discussed a few pages ahead.

Backup issues

There are several different backup strategies:

Full system backups

Windows 9x is considerably more difficult for full system backup and restore than previous versions of DOS and Windows, for these reasons:

For these reasons, I typically prefer to abandon attempts at backing up the whole system as a unit. In fact, I prefer to exclude all program code from backups, as this avoids version clashes and the re-introduction of dormant code viruses on restore.

Incremental backups

Incremental backups require everything since the last full backup to be restored; either this full backup followed by each successive increment, or the full backup following the last increment (depending on the scheme used). Not only is this more tedious, but files that were deleted between increments will persist from the full backup, unless the backup software makes provision for this.

Finally, incremental backups are exposed to time/date failures such as Y2k or CMOS loss that causes the system date to reset to 1980, or attribute failures where reliance is placed of the validity of the archive file attribute.

Risk levels and backup layering

Typically you will have a trade-off between these factors:

Your risks and levels of protection are:

Ease of use levels are:

Somewhere in there falls the automatically scheduled full data LAN backup, which impacts LAN bandwidth and requires systems to be running but with data files closed. If the total volume of backed up data fails to fit on a single unit of off-system media, this can no longer be fully automated. You then either have to wait around to change the media, or perform the backup as a two-stage process; unattended backup to hard drive space, followed by attended transfer of backup data to media.

Finally, there comes the policy issue of who should be responsible for the backup of data; the administrator, who performs the LAN backup, or the user. I suggest formally quantified administrator responsibility, with users encouraged to do their own backups as a redundant fall-back and to promote better data awareness. However, this policy may be inappropriate where there is concern about data leaving the confines of the organization.

Perils of restore

The backup process should be made as easy as possible, so that users back up often and there is a minimum amount of work lost between last backup and time of disaster. But the restore process is another matter entirely, as it is fraught with risks:

These risks can be contained in various ways:

The last point has three benefits:

You also need to verify that backups work, as several things can go wrong:

Finally, off-system backups create a problem of a different kind; data security. Depending on the nature of your data, you may want to control the ease at which volumes of data can be copied and taken off site.

Policy issues

In a less formal office, there's usually a user who others look to for assistance, and who acts as the point of contact with out-sourced support personnel and management. In a more formal office, a technically skilled administrator may be added to the staff to fulfil this role. What you want is the social compatibility of the former and the technical competence of the latter.

The responsibilities of this role need to be defined for both the administrator and the user pool, otherwise the users will leave everything to the administrator who becomes the scapegoat when any sort of technical failure ensues.

There should also be clarity on what matters should be brought to the administrator's attention, and I would suggest this as a basic list:

You can augment this policy by looking for signs of unreported problems:

Note that automated file system maintenance can remove some of these aids; for example, clearing files and auto-repairing file system errors. This is why I'd generally avoid clearing old ScanDisk .chk files from the root directory; the time/date stamps on these files point to when they were recovered by ScanDisk. If there are enough of these to fill the root directory and cause errors, you should get to hear about it.

How to implement a LAN backup policy:

I would avoid simply assuring users that the administrator will look after data backup and virus checking chores on their behalf. The key to data security is redundancy, and the user's own backups may be the only game in town in the event of failure in the LAN backup system. By providing some tasks for the user to perform, you can focus on the importance of the task rather than who to blame in the event of failure, and regular performance of such tasks maintains user awareness of data issues.

Hard drive setup

Having been through all that, we can now apply these concepts to how hard drives may be set up for various purposes. Here is an example…

C: Core FAT32 2047M, 8k clusters

Windows and windows application code, swap file and temporary directories. Fast, supporting 4k-orientated optimizations, minimum of contents to defrag, volume is not shared over LAN.

D: Data FAT16 1023M, 16k clusters

Internet and other applications, DOS utilities, other software, and "normal" user data. User data kept away from perils of constant Windows temp/swap writes, and file system is fast, robust, and quick to defrag. May be shared over LAN.

E: Extras FAT32 Large, 4k clusters

Using all but 1G of remaining disk space, this is where games and "large" user data or workspace is held. May be shared over LAN.

F: Factory FAT16 1023M, 16k clusters

Cold storage for pre-installation archives, Windows .cab files, device drivers, downloaded archives and local hard drive data backups. These are kept out of the other volumes to prevent the "sandbar effect" and speed up defrag, and are located at the far end of the drive where head travel distance causes slowest access. Typically shared over LAN on read-only basis.

Finally, as every hacking script-kiddie and trojan writer knows about C:\Windows and "C:\My Documents", there's a strong case to be made for using non-default paths for these entities - noting that not all malware or hacks will rely on these assumptions.

This also allows you to spot badly-programmed applications ("assumption is the mother of all screw-ups") that create and use these paths; you may want to root these out before their other assumptions (e.g. that their system .dlls are the latest possible versions) create problems on the system.

 

(C) Chris Quirke, all rights reserved

Back to index