[1049] | 1 |
|
---|
| 2 |
|
---|
| 3 |
|
---|
| 4 |
|
---|
| 5 |
|
---|
| 6 |
|
---|
| 7 | Network Working Group P. Deutsch
|
---|
| 8 | Request for Comments: 1952 Aladdin Enterprises
|
---|
| 9 | Category: Informational May 1996
|
---|
| 10 |
|
---|
| 11 |
|
---|
| 12 | GZIP file format specification version 4.3
|
---|
| 13 |
|
---|
| 14 | Status of This Memo
|
---|
| 15 |
|
---|
| 16 | This memo provides information for the Internet community. This memo
|
---|
| 17 | does not specify an Internet standard of any kind. Distribution of
|
---|
| 18 | this memo is unlimited.
|
---|
| 19 |
|
---|
| 20 | IESG Note:
|
---|
| 21 |
|
---|
| 22 | The IESG takes no position on the validity of any Intellectual
|
---|
| 23 | Property Rights statements contained in this document.
|
---|
| 24 |
|
---|
| 25 | Notices
|
---|
| 26 |
|
---|
| 27 | Copyright (c) 1996 L. Peter Deutsch
|
---|
| 28 |
|
---|
| 29 | Permission is granted to copy and distribute this document for any
|
---|
| 30 | purpose and without charge, including translations into other
|
---|
| 31 | languages and incorporation into compilations, provided that the
|
---|
| 32 | copyright notice and this notice are preserved, and that any
|
---|
| 33 | substantive changes or deletions from the original are clearly
|
---|
| 34 | marked.
|
---|
| 35 |
|
---|
| 36 | A pointer to the latest version of this and related documentation in
|
---|
| 37 | HTML format can be found at the URL
|
---|
| 38 | <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
|
---|
| 39 |
|
---|
| 40 | Abstract
|
---|
| 41 |
|
---|
| 42 | This specification defines a lossless compressed data format that is
|
---|
| 43 | compatible with the widely used GZIP utility. The format includes a
|
---|
| 44 | cyclic redundancy check value for detecting data corruption. The
|
---|
| 45 | format presently uses the DEFLATE method of compression but can be
|
---|
| 46 | easily extended to use other compression methods. The format can be
|
---|
| 47 | implemented readily in a manner not covered by patents.
|
---|
| 48 |
|
---|
| 49 |
|
---|
| 50 |
|
---|
| 51 |
|
---|
| 52 |
|
---|
| 53 |
|
---|
| 54 |
|
---|
| 55 |
|
---|
| 56 |
|
---|
| 57 |
|
---|
| 58 | Deutsch Informational [Page 1]
|
---|
| 59 | |
---|
| 60 |
|
---|
| 61 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 62 |
|
---|
| 63 |
|
---|
| 64 | Table of Contents
|
---|
| 65 |
|
---|
| 66 | 1. Introduction ................................................... 2
|
---|
| 67 | 1.1. Purpose ................................................... 2
|
---|
| 68 | 1.2. Intended audience ......................................... 3
|
---|
| 69 | 1.3. Scope ..................................................... 3
|
---|
| 70 | 1.4. Compliance ................................................ 3
|
---|
| 71 | 1.5. Definitions of terms and conventions used ................. 3
|
---|
| 72 | 1.6. Changes from previous versions ............................ 3
|
---|
| 73 | 2. Detailed specification ......................................... 4
|
---|
| 74 | 2.1. Overall conventions ....................................... 4
|
---|
| 75 | 2.2. File format ............................................... 5
|
---|
| 76 | 2.3. Member format ............................................. 5
|
---|
| 77 | 2.3.1. Member header and trailer ........................... 6
|
---|
| 78 | 2.3.1.1. Extra field ................................... 8
|
---|
| 79 | 2.3.1.2. Compliance .................................... 9
|
---|
| 80 | 3. References .................................................. 9
|
---|
| 81 | 4. Security Considerations .................................... 10
|
---|
| 82 | 5. Acknowledgements ........................................... 10
|
---|
| 83 | 6. Author's Address ........................................... 10
|
---|
| 84 | 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
|
---|
| 85 | 8. Appendix: Sample CRC Code .................................. 11
|
---|
| 86 |
|
---|
| 87 | 1. Introduction
|
---|
| 88 |
|
---|
| 89 | 1.1. Purpose
|
---|
| 90 |
|
---|
| 91 | The purpose of this specification is to define a lossless
|
---|
| 92 | compressed data format that:
|
---|
| 93 |
|
---|
| 94 | * Is independent of CPU type, operating system, file system,
|
---|
| 95 | and character set, and hence can be used for interchange;
|
---|
| 96 | * Can compress or decompress a data stream (as opposed to a
|
---|
| 97 | randomly accessible file) to produce another data stream,
|
---|
| 98 | using only an a priori bounded amount of intermediate
|
---|
| 99 | storage, and hence can be used in data communications or
|
---|
| 100 | similar structures such as Unix filters;
|
---|
| 101 | * Compresses data with efficiency comparable to the best
|
---|
| 102 | currently available general-purpose compression methods,
|
---|
| 103 | and in particular considerably better than the "compress"
|
---|
| 104 | program;
|
---|
| 105 | * Can be implemented readily in a manner not covered by
|
---|
| 106 | patents, and hence can be practiced freely;
|
---|
| 107 | * Is compatible with the file format produced by the current
|
---|
| 108 | widely used gzip utility, in that conforming decompressors
|
---|
| 109 | will be able to read data produced by the existing gzip
|
---|
| 110 | compressor.
|
---|
| 111 |
|
---|
| 112 |
|
---|
| 113 |
|
---|
| 114 |
|
---|
| 115 | Deutsch Informational [Page 2]
|
---|
| 116 | |
---|
| 117 |
|
---|
| 118 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 119 |
|
---|
| 120 |
|
---|
| 121 | The data format defined by this specification does not attempt to:
|
---|
| 122 |
|
---|
| 123 | * Provide random access to compressed data;
|
---|
| 124 | * Compress specialized data (e.g., raster graphics) as well as
|
---|
| 125 | the best currently available specialized algorithms.
|
---|
| 126 |
|
---|
| 127 | 1.2. Intended audience
|
---|
| 128 |
|
---|
| 129 | This specification is intended for use by implementors of software
|
---|
| 130 | to compress data into gzip format and/or decompress data from gzip
|
---|
| 131 | format.
|
---|
| 132 |
|
---|
| 133 | The text of the specification assumes a basic background in
|
---|
| 134 | programming at the level of bits and other primitive data
|
---|
| 135 | representations.
|
---|
| 136 |
|
---|
| 137 | 1.3. Scope
|
---|
| 138 |
|
---|
| 139 | The specification specifies a compression method and a file format
|
---|
| 140 | (the latter assuming only that a file can store a sequence of
|
---|
| 141 | arbitrary bytes). It does not specify any particular interface to
|
---|
| 142 | a file system or anything about character sets or encodings
|
---|
| 143 | (except for file names and comments, which are optional).
|
---|
| 144 |
|
---|
| 145 | 1.4. Compliance
|
---|
| 146 |
|
---|
| 147 | Unless otherwise indicated below, a compliant decompressor must be
|
---|
| 148 | able to accept and decompress any file that conforms to all the
|
---|
| 149 | specifications presented here; a compliant compressor must produce
|
---|
| 150 | files that conform to all the specifications presented here. The
|
---|
| 151 | material in the appendices is not part of the specification per se
|
---|
| 152 | and is not relevant to compliance.
|
---|
| 153 |
|
---|
| 154 | 1.5. Definitions of terms and conventions used
|
---|
| 155 |
|
---|
| 156 | byte: 8 bits stored or transmitted as a unit (same as an octet).
|
---|
| 157 | (For this specification, a byte is exactly 8 bits, even on
|
---|
| 158 | machines which store a character on a number of bits different
|
---|
| 159 | from 8.) See below for the numbering of bits within a byte.
|
---|
| 160 |
|
---|
| 161 | 1.6. Changes from previous versions
|
---|
| 162 |
|
---|
| 163 | There have been no technical changes to the gzip format since
|
---|
| 164 | version 4.1 of this specification. In version 4.2, some
|
---|
| 165 | terminology was changed, and the sample CRC code was rewritten for
|
---|
| 166 | clarity and to eliminate the requirement for the caller to do pre-
|
---|
| 167 | and post-conditioning. Version 4.3 is a conversion of the
|
---|
| 168 | specification to RFC style.
|
---|
| 169 |
|
---|
| 170 |
|
---|
| 171 |
|
---|
| 172 | Deutsch Informational [Page 3]
|
---|
| 173 | |
---|
| 174 |
|
---|
| 175 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 176 |
|
---|
| 177 |
|
---|
| 178 | 2. Detailed specification
|
---|
| 179 |
|
---|
| 180 | 2.1. Overall conventions
|
---|
| 181 |
|
---|
| 182 | In the diagrams below, a box like this:
|
---|
| 183 |
|
---|
| 184 | +---+
|
---|
| 185 | | | <-- the vertical bars might be missing
|
---|
| 186 | +---+
|
---|
| 187 |
|
---|
| 188 | represents one byte; a box like this:
|
---|
| 189 |
|
---|
| 190 | +==============+
|
---|
| 191 | | |
|
---|
| 192 | +==============+
|
---|
| 193 |
|
---|
| 194 | represents a variable number of bytes.
|
---|
| 195 |
|
---|
| 196 | Bytes stored within a computer do not have a "bit order", since
|
---|
| 197 | they are always treated as a unit. However, a byte considered as
|
---|
| 198 | an integer between 0 and 255 does have a most- and least-
|
---|
| 199 | significant bit, and since we write numbers with the most-
|
---|
| 200 | significant digit on the left, we also write bytes with the most-
|
---|
| 201 | significant bit on the left. In the diagrams below, we number the
|
---|
| 202 | bits of a byte so that bit 0 is the least-significant bit, i.e.,
|
---|
| 203 | the bits are numbered:
|
---|
| 204 |
|
---|
| 205 | +--------+
|
---|
| 206 | |76543210|
|
---|
| 207 | +--------+
|
---|
| 208 |
|
---|
| 209 | This document does not address the issue of the order in which
|
---|
| 210 | bits of a byte are transmitted on a bit-sequential medium, since
|
---|
| 211 | the data format described here is byte- rather than bit-oriented.
|
---|
| 212 |
|
---|
| 213 | Within a computer, a number may occupy multiple bytes. All
|
---|
| 214 | multi-byte numbers in the format described here are stored with
|
---|
| 215 | the least-significant byte first (at the lower memory address).
|
---|
| 216 | For example, the decimal number 520 is stored as:
|
---|
| 217 |
|
---|
| 218 | 0 1
|
---|
| 219 | +--------+--------+
|
---|
| 220 | |00001000|00000010|
|
---|
| 221 | +--------+--------+
|
---|
| 222 | ^ ^
|
---|
| 223 | | |
|
---|
| 224 | | + more significant byte = 2 x 256
|
---|
| 225 | + less significant byte = 8
|
---|
| 226 |
|
---|
| 227 |
|
---|
| 228 |
|
---|
| 229 | Deutsch Informational [Page 4]
|
---|
| 230 | |
---|
| 231 |
|
---|
| 232 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 233 |
|
---|
| 234 |
|
---|
| 235 | 2.2. File format
|
---|
| 236 |
|
---|
| 237 | A gzip file consists of a series of "members" (compressed data
|
---|
| 238 | sets). The format of each member is specified in the following
|
---|
| 239 | section. The members simply appear one after another in the file,
|
---|
| 240 | with no additional information before, between, or after them.
|
---|
| 241 |
|
---|
| 242 | 2.3. Member format
|
---|
| 243 |
|
---|
| 244 | Each member has the following structure:
|
---|
| 245 |
|
---|
| 246 | +---+---+---+---+---+---+---+---+---+---+
|
---|
| 247 | |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
|
---|
| 248 | +---+---+---+---+---+---+---+---+---+---+
|
---|
| 249 |
|
---|
| 250 | (if FLG.FEXTRA set)
|
---|
| 251 |
|
---|
| 252 | +---+---+=================================+
|
---|
| 253 | | XLEN |...XLEN bytes of "extra field"...| (more-->)
|
---|
| 254 | +---+---+=================================+
|
---|
| 255 |
|
---|
| 256 | (if FLG.FNAME set)
|
---|
| 257 |
|
---|
| 258 | +=========================================+
|
---|
| 259 | |...original file name, zero-terminated...| (more-->)
|
---|
| 260 | +=========================================+
|
---|
| 261 |
|
---|
| 262 | (if FLG.FCOMMENT set)
|
---|
| 263 |
|
---|
| 264 | +===================================+
|
---|
| 265 | |...file comment, zero-terminated...| (more-->)
|
---|
| 266 | +===================================+
|
---|
| 267 |
|
---|
| 268 | (if FLG.FHCRC set)
|
---|
| 269 |
|
---|
| 270 | +---+---+
|
---|
| 271 | | CRC16 |
|
---|
| 272 | +---+---+
|
---|
| 273 |
|
---|
| 274 | +=======================+
|
---|
| 275 | |...compressed blocks...| (more-->)
|
---|
| 276 | +=======================+
|
---|
| 277 |
|
---|
| 278 | 0 1 2 3 4 5 6 7
|
---|
| 279 | +---+---+---+---+---+---+---+---+
|
---|
| 280 | | CRC32 | ISIZE |
|
---|
| 281 | +---+---+---+---+---+---+---+---+
|
---|
| 282 |
|
---|
| 283 |
|
---|
| 284 |
|
---|
| 285 |
|
---|
| 286 | Deutsch Informational [Page 5]
|
---|
| 287 | |
---|
| 288 |
|
---|
| 289 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 290 |
|
---|
| 291 |
|
---|
| 292 | 2.3.1. Member header and trailer
|
---|
| 293 |
|
---|
| 294 | ID1 (IDentification 1)
|
---|
| 295 | ID2 (IDentification 2)
|
---|
| 296 | These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
|
---|
| 297 | (0x8b, \213), to identify the file as being in gzip format.
|
---|
| 298 |
|
---|
| 299 | CM (Compression Method)
|
---|
| 300 | This identifies the compression method used in the file. CM
|
---|
| 301 | = 0-7 are reserved. CM = 8 denotes the "deflate"
|
---|
| 302 | compression method, which is the one customarily used by
|
---|
| 303 | gzip and which is documented elsewhere.
|
---|
| 304 |
|
---|
| 305 | FLG (FLaGs)
|
---|
| 306 | This flag byte is divided into individual bits as follows:
|
---|
| 307 |
|
---|
| 308 | bit 0 FTEXT
|
---|
| 309 | bit 1 FHCRC
|
---|
| 310 | bit 2 FEXTRA
|
---|
| 311 | bit 3 FNAME
|
---|
| 312 | bit 4 FCOMMENT
|
---|
| 313 | bit 5 reserved
|
---|
| 314 | bit 6 reserved
|
---|
| 315 | bit 7 reserved
|
---|
| 316 |
|
---|
| 317 | If FTEXT is set, the file is probably ASCII text. This is
|
---|
| 318 | an optional indication, which the compressor may set by
|
---|
| 319 | checking a small amount of the input data to see whether any
|
---|
| 320 | non-ASCII characters are present. In case of doubt, FTEXT
|
---|
| 321 | is cleared, indicating binary data. For systems which have
|
---|
| 322 | different file formats for ascii text and binary data, the
|
---|
| 323 | decompressor can use FTEXT to choose the appropriate format.
|
---|
| 324 | We deliberately do not specify the algorithm used to set
|
---|
| 325 | this bit, since a compressor always has the option of
|
---|
| 326 | leaving it cleared and a decompressor always has the option
|
---|
| 327 | of ignoring it and letting some other program handle issues
|
---|
| 328 | of data conversion.
|
---|
| 329 |
|
---|
| 330 | If FHCRC is set, a CRC16 for the gzip header is present,
|
---|
| 331 | immediately before the compressed data. The CRC16 consists
|
---|
| 332 | of the two least significant bytes of the CRC32 for all
|
---|
| 333 | bytes of the gzip header up to and not including the CRC16.
|
---|
| 334 | [The FHCRC bit was never set by versions of gzip up to
|
---|
| 335 | 1.2.4, even though it was documented with a different
|
---|
| 336 | meaning in gzip 1.2.4.]
|
---|
| 337 |
|
---|
| 338 | If FEXTRA is set, optional extra fields are present, as
|
---|
| 339 | described in a following section.
|
---|
| 340 |
|
---|
| 341 |
|
---|
| 342 |
|
---|
| 343 | Deutsch Informational [Page 6]
|
---|
| 344 | |
---|
| 345 |
|
---|
| 346 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 347 |
|
---|
| 348 |
|
---|
| 349 | If FNAME is set, an original file name is present,
|
---|
| 350 | terminated by a zero byte. The name must consist of ISO
|
---|
| 351 | 8859-1 (LATIN-1) characters; on operating systems using
|
---|
| 352 | EBCDIC or any other character set for file names, the name
|
---|
| 353 | must be translated to the ISO LATIN-1 character set. This
|
---|
| 354 | is the original name of the file being compressed, with any
|
---|
| 355 | directory components removed, and, if the file being
|
---|
| 356 | compressed is on a file system with case insensitive names,
|
---|
| 357 | forced to lower case. There is no original file name if the
|
---|
| 358 | data was compressed from a source other than a named file;
|
---|
| 359 | for example, if the source was stdin on a Unix system, there
|
---|
| 360 | is no file name.
|
---|
| 361 |
|
---|
| 362 | If FCOMMENT is set, a zero-terminated file comment is
|
---|
| 363 | present. This comment is not interpreted; it is only
|
---|
| 364 | intended for human consumption. The comment must consist of
|
---|
| 365 | ISO 8859-1 (LATIN-1) characters. Line breaks should be
|
---|
| 366 | denoted by a single line feed character (10 decimal).
|
---|
| 367 |
|
---|
| 368 | Reserved FLG bits must be zero.
|
---|
| 369 |
|
---|
| 370 | MTIME (Modification TIME)
|
---|
| 371 | This gives the most recent modification time of the original
|
---|
| 372 | file being compressed. The time is in Unix format, i.e.,
|
---|
| 373 | seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
|
---|
| 374 | may cause problems for MS-DOS and other systems that use
|
---|
| 375 | local rather than Universal time.) If the compressed data
|
---|
| 376 | did not come from a file, MTIME is set to the time at which
|
---|
| 377 | compression started. MTIME = 0 means no time stamp is
|
---|
| 378 | available.
|
---|
| 379 |
|
---|
| 380 | XFL (eXtra FLags)
|
---|
| 381 | These flags are available for use by specific compression
|
---|
| 382 | methods. The "deflate" method (CM = 8) sets these flags as
|
---|
| 383 | follows:
|
---|
| 384 |
|
---|
| 385 | XFL = 2 - compressor used maximum compression,
|
---|
| 386 | slowest algorithm
|
---|
| 387 | XFL = 4 - compressor used fastest algorithm
|
---|
| 388 |
|
---|
| 389 | OS (Operating System)
|
---|
| 390 | This identifies the type of file system on which compression
|
---|
| 391 | took place. This may be useful in determining end-of-line
|
---|
| 392 | convention for text files. The currently defined values are
|
---|
| 393 | as follows:
|
---|
| 394 |
|
---|
| 395 |
|
---|
| 396 |
|
---|
| 397 |
|
---|
| 398 |
|
---|
| 399 |
|
---|
| 400 | Deutsch Informational [Page 7]
|
---|
| 401 | |
---|
| 402 |
|
---|
| 403 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 404 |
|
---|
| 405 |
|
---|
| 406 | 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
|
---|
| 407 | 1 - Amiga
|
---|
| 408 | 2 - VMS (or OpenVMS)
|
---|
| 409 | 3 - Unix
|
---|
| 410 | 4 - VM/CMS
|
---|
| 411 | 5 - Atari TOS
|
---|
| 412 | 6 - HPFS filesystem (OS/2, NT)
|
---|
| 413 | 7 - Macintosh
|
---|
| 414 | 8 - Z-System
|
---|
| 415 | 9 - CP/M
|
---|
| 416 | 10 - TOPS-20
|
---|
| 417 | 11 - NTFS filesystem (NT)
|
---|
| 418 | 12 - QDOS
|
---|
| 419 | 13 - Acorn RISCOS
|
---|
| 420 | 255 - unknown
|
---|
| 421 |
|
---|
| 422 | XLEN (eXtra LENgth)
|
---|
| 423 | If FLG.FEXTRA is set, this gives the length of the optional
|
---|
| 424 | extra field. See below for details.
|
---|
| 425 |
|
---|
| 426 | CRC32 (CRC-32)
|
---|
| 427 | This contains a Cyclic Redundancy Check value of the
|
---|
| 428 | uncompressed data computed according to CRC-32 algorithm
|
---|
| 429 | used in the ISO 3309 standard and in section 8.1.1.6.2 of
|
---|
| 430 | ITU-T recommendation V.42. (See http://www.iso.ch for
|
---|
| 431 | ordering ISO documents. See gopher://info.itu.ch for an
|
---|
| 432 | online version of ITU-T V.42.)
|
---|
| 433 |
|
---|
| 434 | ISIZE (Input SIZE)
|
---|
| 435 | This contains the size of the original (uncompressed) input
|
---|
| 436 | data modulo 2^32.
|
---|
| 437 |
|
---|
| 438 | 2.3.1.1. Extra field
|
---|
| 439 |
|
---|
| 440 | If the FLG.FEXTRA bit is set, an "extra field" is present in
|
---|
| 441 | the header, with total length XLEN bytes. It consists of a
|
---|
| 442 | series of subfields, each of the form:
|
---|
| 443 |
|
---|
| 444 | +---+---+---+---+==================================+
|
---|
| 445 | |SI1|SI2| LEN |... LEN bytes of subfield data ...|
|
---|
| 446 | +---+---+---+---+==================================+
|
---|
| 447 |
|
---|
| 448 | SI1 and SI2 provide a subfield ID, typically two ASCII letters
|
---|
| 449 | with some mnemonic value. Jean-Loup Gailly
|
---|
| 450 | <gzip@prep.ai.mit.edu> is maintaining a registry of subfield
|
---|
| 451 | IDs; please send him any subfield ID you wish to use. Subfield
|
---|
| 452 | IDs with SI2 = 0 are reserved for future use. The following
|
---|
| 453 | IDs are currently defined:
|
---|
| 454 |
|
---|
| 455 |
|
---|
| 456 |
|
---|
| 457 | Deutsch Informational [Page 8]
|
---|
| 458 | |
---|
| 459 |
|
---|
| 460 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 461 |
|
---|
| 462 |
|
---|
| 463 | SI1 SI2 Data
|
---|
| 464 | ---------- ---------- ----
|
---|
| 465 | 0x41 ('A') 0x70 ('P') Apollo file type information
|
---|
| 466 |
|
---|
| 467 | LEN gives the length of the subfield data, excluding the 4
|
---|
| 468 | initial bytes.
|
---|
| 469 |
|
---|
| 470 | 2.3.1.2. Compliance
|
---|
| 471 |
|
---|
| 472 | A compliant compressor must produce files with correct ID1,
|
---|
| 473 | ID2, CM, CRC32, and ISIZE, but may set all the other fields in
|
---|
| 474 | the fixed-length part of the header to default values (255 for
|
---|
| 475 | OS, 0 for all others). The compressor must set all reserved
|
---|
| 476 | bits to zero.
|
---|
| 477 |
|
---|
| 478 | A compliant decompressor must check ID1, ID2, and CM, and
|
---|
| 479 | provide an error indication if any of these have incorrect
|
---|
| 480 | values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
|
---|
| 481 | at least so it can skip over the optional fields if they are
|
---|
| 482 | present. It need not examine any other part of the header or
|
---|
| 483 | trailer; in particular, a decompressor may ignore FTEXT and OS
|
---|
| 484 | and always produce binary output, and still be compliant. A
|
---|
| 485 | compliant decompressor must give an error indication if any
|
---|
| 486 | reserved bit is non-zero, since such a bit could indicate the
|
---|
| 487 | presence of a new field that would cause subsequent data to be
|
---|
| 488 | interpreted incorrectly.
|
---|
| 489 |
|
---|
| 490 | 3. References
|
---|
| 491 |
|
---|
| 492 | [1] "Information Processing - 8-bit single-byte coded graphic
|
---|
| 493 | character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
|
---|
| 494 | The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
|
---|
| 495 | ASCII. Files defining this character set are available as
|
---|
| 496 | iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/
|
---|
| 497 |
|
---|
| 498 | [2] ISO 3309
|
---|
| 499 |
|
---|
| 500 | [3] ITU-T recommendation V.42
|
---|
| 501 |
|
---|
| 502 | [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
|
---|
| 503 | available in ftp://ftp.uu.net/pub/archiving/zip/doc/
|
---|
| 504 |
|
---|
| 505 | [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
|
---|
| 506 | ftp://prep.ai.mit.edu/pub/gnu/
|
---|
| 507 |
|
---|
| 508 | [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
|
---|
| 509 | Look-Up", Communications of the ACM, 31(8), pp.1008-1013.
|
---|
| 510 |
|
---|
| 511 |
|
---|
| 512 |
|
---|
| 513 |
|
---|
| 514 | Deutsch Informational [Page 9]
|
---|
| 515 | |
---|
| 516 |
|
---|
| 517 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 518 |
|
---|
| 519 |
|
---|
| 520 | [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
|
---|
| 521 | pp.118-133.
|
---|
| 522 |
|
---|
| 523 | [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
|
---|
| 524 | describing the CRC concept.
|
---|
| 525 |
|
---|
| 526 | 4. Security Considerations
|
---|
| 527 |
|
---|
| 528 | Any data compression method involves the reduction of redundancy in
|
---|
| 529 | the data. Consequently, any corruption of the data is likely to have
|
---|
| 530 | severe effects and be difficult to correct. Uncompressed text, on
|
---|
| 531 | the other hand, will probably still be readable despite the presence
|
---|
| 532 | of some corrupted bytes.
|
---|
| 533 |
|
---|
| 534 | It is recommended that systems using this data format provide some
|
---|
| 535 | means of validating the integrity of the compressed data, such as by
|
---|
| 536 | setting and checking the CRC-32 check value.
|
---|
| 537 |
|
---|
| 538 | 5. Acknowledgements
|
---|
| 539 |
|
---|
| 540 | Trademarks cited in this document are the property of their
|
---|
| 541 | respective owners.
|
---|
| 542 |
|
---|
| 543 | Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
|
---|
| 544 | the related software described in this specification. Glenn
|
---|
| 545 | Randers-Pehrson converted this document to RFC and HTML format.
|
---|
| 546 |
|
---|
| 547 | 6. Author's Address
|
---|
| 548 |
|
---|
| 549 | L. Peter Deutsch
|
---|
| 550 | Aladdin Enterprises
|
---|
| 551 | 203 Santa Margarita Ave.
|
---|
| 552 | Menlo Park, CA 94025
|
---|
| 553 |
|
---|
| 554 | Phone: (415) 322-0103 (AM only)
|
---|
| 555 | FAX: (415) 322-1734
|
---|
| 556 | EMail: <ghost@aladdin.com>
|
---|
| 557 |
|
---|
| 558 | Questions about the technical content of this specification can be
|
---|
| 559 | sent by email to:
|
---|
| 560 |
|
---|
| 561 | Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
|
---|
| 562 | Mark Adler <madler@alumni.caltech.edu>
|
---|
| 563 |
|
---|
| 564 | Editorial comments on this specification can be sent by email to:
|
---|
| 565 |
|
---|
| 566 | L. Peter Deutsch <ghost@aladdin.com> and
|
---|
| 567 | Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
|
---|
| 568 |
|
---|
| 569 |
|
---|
| 570 |
|
---|
| 571 | Deutsch Informational [Page 10]
|
---|
| 572 | |
---|
| 573 |
|
---|
| 574 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 575 |
|
---|
| 576 |
|
---|
| 577 | 7. Appendix: Jean-Loup Gailly's gzip utility
|
---|
| 578 |
|
---|
| 579 | The most widely used implementation of gzip compression, and the
|
---|
| 580 | original documentation on which this specification is based, were
|
---|
| 581 | created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>. Since this
|
---|
| 582 | implementation is a de facto standard, we mention some more of its
|
---|
| 583 | features here. Again, the material in this section is not part of
|
---|
| 584 | the specification per se, and implementations need not follow it to
|
---|
| 585 | be compliant.
|
---|
| 586 |
|
---|
| 587 | When compressing or decompressing a file, gzip preserves the
|
---|
| 588 | protection, ownership, and modification time attributes on the local
|
---|
| 589 | file system, since there is no provision for representing protection
|
---|
| 590 | attributes in the gzip file format itself. Since the file format
|
---|
| 591 | includes a modification time, the gzip decompressor provides a
|
---|
| 592 | command line switch that assigns the modification time from the file,
|
---|
| 593 | rather than the local modification time of the compressed input, to
|
---|
| 594 | the decompressed output.
|
---|
| 595 |
|
---|
| 596 | 8. Appendix: Sample CRC Code
|
---|
| 597 |
|
---|
| 598 | The following sample code represents a practical implementation of
|
---|
| 599 | the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
|
---|
| 600 | for a formal specification.)
|
---|
| 601 |
|
---|
| 602 | The sample code is in the ANSI C programming language. Non C users
|
---|
| 603 | may find it easier to read with these hints:
|
---|
| 604 |
|
---|
| 605 | & Bitwise AND operator.
|
---|
| 606 | ^ Bitwise exclusive-OR operator.
|
---|
| 607 | >> Bitwise right shift operator. When applied to an
|
---|
| 608 | unsigned quantity, as here, right shift inserts zero
|
---|
| 609 | bit(s) at the left.
|
---|
| 610 | ! Logical NOT operator.
|
---|
| 611 | ++ "n++" increments the variable n.
|
---|
| 612 | 0xNNN 0x introduces a hexadecimal (base 16) constant.
|
---|
| 613 | Suffix L indicates a long value (at least 32 bits).
|
---|
| 614 |
|
---|
| 615 | /* Table of CRCs of all 8-bit messages. */
|
---|
| 616 | unsigned long crc_table[256];
|
---|
| 617 |
|
---|
| 618 | /* Flag: has the table been computed? Initially false. */
|
---|
| 619 | int crc_table_computed = 0;
|
---|
| 620 |
|
---|
| 621 | /* Make the table for a fast CRC. */
|
---|
| 622 | void make_crc_table(void)
|
---|
| 623 | {
|
---|
| 624 | unsigned long c;
|
---|
| 625 |
|
---|
| 626 |
|
---|
| 627 |
|
---|
| 628 | Deutsch Informational [Page 11]
|
---|
| 629 | |
---|
| 630 |
|
---|
| 631 | RFC 1952 GZIP File Format Specification May 1996
|
---|
| 632 |
|
---|
| 633 |
|
---|
| 634 | int n, k;
|
---|
| 635 | for (n = 0; n < 256; n++) {
|
---|
| 636 | c = (unsigned long) n;
|
---|
| 637 | for (k = 0; k < 8; k++) {
|
---|
| 638 | if (c & 1) {
|
---|
| 639 | c = 0xedb88320L ^ (c >> 1);
|
---|
| 640 | } else {
|
---|
| 641 | c = c >> 1;
|
---|
| 642 | }
|
---|
| 643 | }
|
---|
| 644 | crc_table[n] = c;
|
---|
| 645 | }
|
---|
| 646 | crc_table_computed = 1;
|
---|
| 647 | }
|
---|
| 648 |
|
---|
| 649 | /*
|
---|
| 650 | Update a running crc with the bytes buf[0..len-1] and return
|
---|
| 651 | the updated crc. The crc should be initialized to zero. Pre- and
|
---|
| 652 | post-conditioning (one's complement) is performed within this
|
---|
| 653 | function so it shouldn't be done by the caller. Usage example:
|
---|
| 654 |
|
---|
| 655 | unsigned long crc = 0L;
|
---|
| 656 |
|
---|
| 657 | while (read_buffer(buffer, length) != EOF) {
|
---|
| 658 | crc = update_crc(crc, buffer, length);
|
---|
| 659 | }
|
---|
| 660 | if (crc != original_crc) error();
|
---|
| 661 | */
|
---|
| 662 | unsigned long update_crc(unsigned long crc,
|
---|
| 663 | unsigned char *buf, int len)
|
---|
| 664 | {
|
---|
| 665 | unsigned long c = crc ^ 0xffffffffL;
|
---|
| 666 | int n;
|
---|
| 667 |
|
---|
| 668 | if (!crc_table_computed)
|
---|
| 669 | make_crc_table();
|
---|
| 670 | for (n = 0; n < len; n++) {
|
---|
| 671 | c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
|
---|
| 672 | }
|
---|
| 673 | return c ^ 0xffffffffL;
|
---|
| 674 | }
|
---|
| 675 |
|
---|
| 676 | /* Return the CRC of the bytes buf[0..len-1]. */
|
---|
| 677 | unsigned long crc(unsigned char *buf, int len)
|
---|
| 678 | {
|
---|
| 679 | return update_crc(0L, buf, len);
|
---|
| 680 | }
|
---|
| 681 |
|
---|
| 682 |
|
---|
| 683 |
|
---|
| 684 |
|
---|
| 685 | Deutsch Informational [Page 12]
|
---|
| 686 | |
---|
| 687 |
|
---|