netlib compression

"What do these .gz suffixes mean?"

A few files in netlib are large, intrinsically binary objects and are compressed even in the master collection. This is straightforward and causes little confusion. However, most files in netlib are uncompressed in the master copy and compressed on the ftp and http server, to encourage socially responsible use of the "free" Internet resource.

The compression issues are somewhat subtle, so let me try another way of describing this. We want to support the mental model that the "real" collection consists of files like /netlib/eispack/rg.f and everything else is transport mechanism that, ideally, is transparent to the end user but thrifty in use of network resources.

The best approximation I can give today is to publish the "real" uncompressed filename, and give simple algorithmic rules for how to add prefixes (like http://netlib.bell-labs.com) and suffixes (like .gz) that transports should apply. The current rule is: add .gz unless the real filename has a suffix listed in /netlib/crc/net/compressed.

File dates, seen for example in directory listings from the ftp server and in the /netlib/crc/ checksum files, are modification dates of the intrinsic file contents, and are not affected by compression.

Consistent application of this rule implies that small files are compressed when they don't really need to be. Because this is such a burden on people using nonconforming Web browsers, we make a special exception for .html files and provide those in both compressed and uncompressed form. This requires some care adjusting internal links in .html files, of course, but that is done inside the netlib server.

The ftp server at netlib.bell-labs.com has an additional feature of creating, on the fly, tar files of entire netlib subdirectories. This tar file does not get any additional compression, because the files inside are already compressed. (Microsoft later adopted the same approach, but a proprietary format, in their .cab "cabinet" files.)

compression resources on the Web

gzip (netlib copy of source, and executables for Cray, DOS, DEC5000, HP, IBM RS/6000, Sun, SGI)
WinZip But be sure to turn off the "smart" CR/LF conversion and change the viewer to wordpad.exe!
StuffIt
UMich archive (has, among others, MacGzip)
MacArchive
how to add helper applications to Web browsers

Our http server properly specifies both the Content-Type and the Content-Encoding fields, but some older versions of browsers ignore the information and may even corrupt the file during download. Fortunately, the current Netscape and Internet Explorer on Windows seem to be working correctly. Unfortunately, even recent versions of Netscape on Unix systems have a bug causing them to strip off the .gz suffix (but not decompress) when saving files labelled as "Content-Type: gzip". You may not have seen this behavior with other web servers that use the older syntax "Content-Type: x-gzip", which Unix Netscape handles ok. The relevant standard, IETF RFC2616, says that servers should say "gzip" rather than "x-gzip" and that browsers should treat both the same.

Broken browsers that violate the long-standing HTTP specification do you a disservice by needlessly slowing transfers by a factor of 2 to 4. If you get stuck with one of these buggy browsers and have no other way to get your files, try webget which you can get in (uncompressed) email via

     mail netlib@research.bell-labs.com
     send access/webget.c

ehg