This site is moving to The Zen Source Library
Please Update your bookmarks


Compressing a DocFile

DocFiles are optimized for speed, this is great for obvious reasons but this optimization can also cause a few minor problems. In the same way data on hard drive gets fragmented, so too does data in a DocFile. On a hard drive this is not a major problem since files can be split into many locations, they do not need contiguous space. The same is true for a DocFile, the data in a stream can be all over the place and is not necessarily contiguous. The problem is that a lot of space can be wasted this way.

Consider a DocFile with 3 streams, streams "a", "b" and "c". Each of the 3 streams contain 5mb of data. Now assume that you delete stream "b". The size of the DocFile does not drop from 15mb (3x 5mb) to 10mb (2x 5mb). Instead the file stays 15mb in size. 5mb is thus being wasted!

When a stream / storage is deleted, the data is contains is not physically removed. The deleted space is marked as unused but not removed. If you now add 2mb of data to the file it will not get any bigger. The 2mb of data will be stored in the space previously allocated for stream "b".

Fortunately there is an easy way to compress (defrag) a DocFile. Using IStorage.CopyTo you can create a simple function to compress any DocFile. Here is the algorithm that I use

  1. Open the file to compress
  2. Get the files CLSID
  3. Open a new tempory file
  4. Copy everything to the tempory file (IStorage.CopyTo)
  5. Close the original file (opened in step 1)
  6. Create a new DocFile with the same name as the original one (replace it)
  7. Set the new file's CLSID so that it is the same as the origional file
  8. Copy everything from the tempory file to the file created in the step above.
  9. Close both files
  10. Delete the tempory file

You might want to replace Steps 5 - 9 with

  1. Close both files
  2. Copy the tempory file over the original file (CopyFile)

However I found that this does not work. The original algorithm does reliably compress any DocFile so use it instead.


Here is the CompressDocFile function as well as GetTempDirFile which returns the name of a tempory file in the tempory directory. GetStorageCLSID is defined in the section titled "CLSIDs"



   function GetTempDirFile(   sPre : string  ) : string;
   var
      szFileName,  szPath : array[ 0 .. 500 ] of char;
   begin
         {Get temp path}
      GetTempPath(  499,  szPath  );
         {Get a tempory file name}
      GetTempFileName(   szPath,  PChar(sPre),  0,  szFileName  );
      GetTempDirFile := string(szFileName);
   end;



   function TForm1.CompressDocFile(  sStorageFileName : WideString  ) : boolean;
   var
      Hr : HResult;
      CLSID : TCLSID;
      StatStg : TStatStg;
      sTmpFileName : WideString;
      Storage,  StorageTmp : IStorage;
   begin   
         {Try to open the file}
      Hr := StgOpenStorage(  PWideChar(sStorageFileName),
                             nil,
                             STGM_READWRITE or STGM_SHARE_EXCLUSIVE or
                             STGM_DIRECT,
                             nil,
                             0,
                             Storage
                           );
   
      if(   not SUCCEEDED(  Hr  )   ) then
      begin
         Result := false;
         Exit;
      end;

         {Get the CLSID}
      Storage.Stat(  StatStg,  0  );
      CLSID := StatStg.clsid;

         {Get a tmp file name in the tempory directory}
      sTmpFileName := GetTempDirFile(  'ole_'  );

         {Create the tempory file}
      Hr := StgCreateDocFile(  PWideChar(sTmpFileName),
                               STGM_CREATE or STGM_SHARE_EXCLUSIVE or
                               STGM_DIRECT or STGM_READWRITE,
                               0,
                               StorageTmp
                             );

      if(   not SUCCEEDED(  Hr  )   ) then
      begin
         Result := false;
         Exit;
      end;

         {Copy everything to tmp file}
      Storage.CopyTo(  0,  nil,  nil,  StorageTmp  );

         {Close old file}
      Storage := nil;

         {Create file, del old one in the process}
      Hr := StgCreateDocFile(  PWideChar(sStorageFileName),
                               STGM_CREATE or STGM_SHARE_EXCLUSIVE or
                               STGM_DIRECT or STGM_READWRITE,
                               0,
                               Storage
                             );

      if(   not SUCCEEDED(  Hr  )   ) then
      begin
         DeleteFile(  sTmpFileName  );
         Result := false;
         Exit;
      end;

        {Set the CLSID}
      Storage.SetClass(  CLSID  );

        {Copy everything back from tmp file}
      StorageTmp.CopyTo(  0,  nil,  nil,  Storage  );

      Storage := nil;
      StorageTmp := nil;

         {Delete tmp file}
      DeleteFile(  sTmpFileName  );

      Result := true;
   end;






All information on these www pages is copyright (©) 1997 Andre .v.d. Merwe And may not be copied or mirrored without my permission.