[1050] | 1 |
|
---|
| 2 |
|
---|
| 3 |
|
---|
| 4 |
|
---|
| 5 |
|
---|
| 6 |
|
---|
| 7 | Network Working Group P. Deutsch
|
---|
| 8 | Request for Comments: 1950 Aladdin Enterprises
|
---|
| 9 | Category: Informational J-L. Gailly
|
---|
| 10 | Info-ZIP
|
---|
| 11 | May 1996
|
---|
| 12 |
|
---|
| 13 |
|
---|
| 14 | ZLIB Compressed Data Format Specification version 3.3
|
---|
| 15 |
|
---|
| 16 | Status of This Memo
|
---|
| 17 |
|
---|
| 18 | This memo provides information for the Internet community. This memo
|
---|
| 19 | does not specify an Internet standard of any kind. Distribution of
|
---|
| 20 | this memo is unlimited.
|
---|
| 21 |
|
---|
| 22 | IESG Note:
|
---|
| 23 |
|
---|
| 24 | The IESG takes no position on the validity of any Intellectual
|
---|
| 25 | Property Rights statements contained in this document.
|
---|
| 26 |
|
---|
| 27 | Notices
|
---|
| 28 |
|
---|
| 29 | Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
|
---|
| 30 |
|
---|
| 31 | Permission is granted to copy and distribute this document for any
|
---|
| 32 | purpose and without charge, including translations into other
|
---|
| 33 | languages and incorporation into compilations, provided that the
|
---|
| 34 | copyright notice and this notice are preserved, and that any
|
---|
| 35 | substantive changes or deletions from the original are clearly
|
---|
| 36 | marked.
|
---|
| 37 |
|
---|
| 38 | A pointer to the latest version of this and related documentation in
|
---|
| 39 | HTML format can be found at the URL
|
---|
| 40 | <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
|
---|
| 41 |
|
---|
| 42 | Abstract
|
---|
| 43 |
|
---|
| 44 | This specification defines a lossless compressed data format. The
|
---|
| 45 | data can be produced or consumed, even for an arbitrarily long
|
---|
| 46 | sequentially presented input data stream, using only an a priori
|
---|
| 47 | bounded amount of intermediate storage. The format presently uses
|
---|
| 48 | the DEFLATE compression method but can be easily extended to use
|
---|
| 49 | other compression methods. It can be implemented readily in a manner
|
---|
| 50 | not covered by patents. This specification also defines the ADLER-32
|
---|
| 51 | checksum (an extension and improvement of the Fletcher checksum),
|
---|
| 52 | used for detection of data corruption, and provides an algorithm for
|
---|
| 53 | computing it.
|
---|
| 54 |
|
---|
| 55 |
|
---|
| 56 |
|
---|
| 57 |
|
---|
| 58 | Deutsch & Gailly Informational [Page 1]
|
---|
| 59 | |
---|
| 60 |
|
---|
| 61 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 62 |
|
---|
| 63 |
|
---|
| 64 | Table of Contents
|
---|
| 65 |
|
---|
| 66 | 1. Introduction ................................................... 2
|
---|
| 67 | 1.1. Purpose ................................................... 2
|
---|
| 68 | 1.2. Intended audience ......................................... 3
|
---|
| 69 | 1.3. Scope ..................................................... 3
|
---|
| 70 | 1.4. Compliance ................................................ 3
|
---|
| 71 | 1.5. Definitions of terms and conventions used ................ 3
|
---|
| 72 | 1.6. Changes from previous versions ............................ 3
|
---|
| 73 | 2. Detailed specification ......................................... 3
|
---|
| 74 | 2.1. Overall conventions ....................................... 3
|
---|
| 75 | 2.2. Data format ............................................... 4
|
---|
| 76 | 2.3. Compliance ................................................ 7
|
---|
| 77 | 3. References ..................................................... 7
|
---|
| 78 | 4. Source code .................................................... 8
|
---|
| 79 | 5. Security Considerations ........................................ 8
|
---|
| 80 | 6. Acknowledgements ............................................... 8
|
---|
| 81 | 7. Authors' Addresses ............................................. 8
|
---|
| 82 | 8. Appendix: Rationale ............................................ 9
|
---|
| 83 | 9. Appendix: Sample code ..........................................10
|
---|
| 84 |
|
---|
| 85 | 1. Introduction
|
---|
| 86 |
|
---|
| 87 | 1.1. Purpose
|
---|
| 88 |
|
---|
| 89 | The purpose of this specification is to define a lossless
|
---|
| 90 | compressed data format that:
|
---|
| 91 |
|
---|
| 92 | * Is independent of CPU type, operating system, file system,
|
---|
| 93 | and character set, and hence can be used for interchange;
|
---|
| 94 |
|
---|
| 95 | * Can be produced or consumed, even for an arbitrarily long
|
---|
| 96 | sequentially presented input data stream, using only an a
|
---|
| 97 | priori bounded amount of intermediate storage, and hence can
|
---|
| 98 | be used in data communications or similar structures such as
|
---|
| 99 | Unix filters;
|
---|
| 100 |
|
---|
| 101 | * Can use a number of different compression methods;
|
---|
| 102 |
|
---|
| 103 | * Can be implemented readily in a manner not covered by
|
---|
| 104 | patents, and hence can be practiced freely.
|
---|
| 105 |
|
---|
| 106 | The data format defined by this specification does not attempt to
|
---|
| 107 | allow random access to compressed data.
|
---|
| 108 |
|
---|
| 109 |
|
---|
| 110 |
|
---|
| 111 |
|
---|
| 112 |
|
---|
| 113 |
|
---|
| 114 |
|
---|
| 115 | Deutsch & Gailly Informational [Page 2]
|
---|
| 116 | |
---|
| 117 |
|
---|
| 118 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 119 |
|
---|
| 120 |
|
---|
| 121 | 1.2. Intended audience
|
---|
| 122 |
|
---|
| 123 | This specification is intended for use by implementors of software
|
---|
| 124 | to compress data into zlib format and/or decompress data from zlib
|
---|
| 125 | format.
|
---|
| 126 |
|
---|
| 127 | The text of the specification assumes a basic background in
|
---|
| 128 | programming at the level of bits and other primitive data
|
---|
| 129 | representations.
|
---|
| 130 |
|
---|
| 131 | 1.3. Scope
|
---|
| 132 |
|
---|
| 133 | The specification specifies a compressed data format that can be
|
---|
| 134 | used for in-memory compression of a sequence of arbitrary bytes.
|
---|
| 135 |
|
---|
| 136 | 1.4. Compliance
|
---|
| 137 |
|
---|
| 138 | Unless otherwise indicated below, a compliant decompressor must be
|
---|
| 139 | able to accept and decompress any data set that conforms to all
|
---|
| 140 | the specifications presented here; a compliant compressor must
|
---|
| 141 | produce data sets that conform to all the specifications presented
|
---|
| 142 | here.
|
---|
| 143 |
|
---|
| 144 | 1.5. Definitions of terms and conventions used
|
---|
| 145 |
|
---|
| 146 | byte: 8 bits stored or transmitted as a unit (same as an octet).
|
---|
| 147 | (For this specification, a byte is exactly 8 bits, even on
|
---|
| 148 | machines which store a character on a number of bits different
|
---|
| 149 | from 8.) See below, for the numbering of bits within a byte.
|
---|
| 150 |
|
---|
| 151 | 1.6. Changes from previous versions
|
---|
| 152 |
|
---|
| 153 | Version 3.1 was the first public release of this specification.
|
---|
| 154 | In version 3.2, some terminology was changed and the Adler-32
|
---|
| 155 | sample code was rewritten for clarity. In version 3.3, the
|
---|
| 156 | support for a preset dictionary was introduced, and the
|
---|
| 157 | specification was converted to RFC style.
|
---|
| 158 |
|
---|
| 159 | 2. Detailed specification
|
---|
| 160 |
|
---|
| 161 | 2.1. Overall conventions
|
---|
| 162 |
|
---|
| 163 | In the diagrams below, a box like this:
|
---|
| 164 |
|
---|
| 165 | +---+
|
---|
| 166 | | | <-- the vertical bars might be missing
|
---|
| 167 | +---+
|
---|
| 168 |
|
---|
| 169 |
|
---|
| 170 |
|
---|
| 171 |
|
---|
| 172 | Deutsch & Gailly Informational [Page 3]
|
---|
| 173 | |
---|
| 174 |
|
---|
| 175 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 176 |
|
---|
| 177 |
|
---|
| 178 | represents one byte; a box like this:
|
---|
| 179 |
|
---|
| 180 | +==============+
|
---|
| 181 | | |
|
---|
| 182 | +==============+
|
---|
| 183 |
|
---|
| 184 | represents a variable number of bytes.
|
---|
| 185 |
|
---|
| 186 | Bytes stored within a computer do not have a "bit order", since
|
---|
| 187 | they are always treated as a unit. However, a byte considered as
|
---|
| 188 | an integer between 0 and 255 does have a most- and least-
|
---|
| 189 | significant bit, and since we write numbers with the most-
|
---|
| 190 | significant digit on the left, we also write bytes with the most-
|
---|
| 191 | significant bit on the left. In the diagrams below, we number the
|
---|
| 192 | bits of a byte so that bit 0 is the least-significant bit, i.e.,
|
---|
| 193 | the bits are numbered:
|
---|
| 194 |
|
---|
| 195 | +--------+
|
---|
| 196 | |76543210|
|
---|
| 197 | +--------+
|
---|
| 198 |
|
---|
| 199 | Within a computer, a number may occupy multiple bytes. All
|
---|
| 200 | multi-byte numbers in the format described here are stored with
|
---|
| 201 | the MOST-significant byte first (at the lower memory address).
|
---|
| 202 | For example, the decimal number 520 is stored as:
|
---|
| 203 |
|
---|
| 204 | 0 1
|
---|
| 205 | +--------+--------+
|
---|
| 206 | |00000010|00001000|
|
---|
| 207 | +--------+--------+
|
---|
| 208 | ^ ^
|
---|
| 209 | | |
|
---|
| 210 | | + less significant byte = 8
|
---|
| 211 | + more significant byte = 2 x 256
|
---|
| 212 |
|
---|
| 213 | 2.2. Data format
|
---|
| 214 |
|
---|
| 215 | A zlib stream has the following structure:
|
---|
| 216 |
|
---|
| 217 | 0 1
|
---|
| 218 | +---+---+
|
---|
| 219 | |CMF|FLG| (more-->)
|
---|
| 220 | +---+---+
|
---|
| 221 |
|
---|
| 222 |
|
---|
| 223 |
|
---|
| 224 |
|
---|
| 225 |
|
---|
| 226 |
|
---|
| 227 |
|
---|
| 228 |
|
---|
| 229 | Deutsch & Gailly Informational [Page 4]
|
---|
| 230 | |
---|
| 231 |
|
---|
| 232 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 233 |
|
---|
| 234 |
|
---|
| 235 | (if FLG.FDICT set)
|
---|
| 236 |
|
---|
| 237 | 0 1 2 3
|
---|
| 238 | +---+---+---+---+
|
---|
| 239 | | DICTID | (more-->)
|
---|
| 240 | +---+---+---+---+
|
---|
| 241 |
|
---|
| 242 | +=====================+---+---+---+---+
|
---|
| 243 | |...compressed data...| ADLER32 |
|
---|
| 244 | +=====================+---+---+---+---+
|
---|
| 245 |
|
---|
| 246 | Any data which may appear after ADLER32 are not part of the zlib
|
---|
| 247 | stream.
|
---|
| 248 |
|
---|
| 249 | CMF (Compression Method and flags)
|
---|
| 250 | This byte is divided into a 4-bit compression method and a 4-
|
---|
| 251 | bit information field depending on the compression method.
|
---|
| 252 |
|
---|
| 253 | bits 0 to 3 CM Compression method
|
---|
| 254 | bits 4 to 7 CINFO Compression info
|
---|
| 255 |
|
---|
| 256 | CM (Compression method)
|
---|
| 257 | This identifies the compression method used in the file. CM = 8
|
---|
| 258 | denotes the "deflate" compression method with a window size up
|
---|
| 259 | to 32K. This is the method used by gzip and PNG (see
|
---|
| 260 | references [1] and [2] in Chapter 3, below, for the reference
|
---|
| 261 | documents). CM = 15 is reserved. It might be used in a future
|
---|
| 262 | version of this specification to indicate the presence of an
|
---|
| 263 | extra field before the compressed data.
|
---|
| 264 |
|
---|
| 265 | CINFO (Compression info)
|
---|
| 266 | For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
|
---|
| 267 | size, minus eight (CINFO=7 indicates a 32K window size). Values
|
---|
| 268 | of CINFO above 7 are not allowed in this version of the
|
---|
| 269 | specification. CINFO is not defined in this specification for
|
---|
| 270 | CM not equal to 8.
|
---|
| 271 |
|
---|
| 272 | FLG (FLaGs)
|
---|
| 273 | This flag byte is divided as follows:
|
---|
| 274 |
|
---|
| 275 | bits 0 to 4 FCHECK (check bits for CMF and FLG)
|
---|
| 276 | bit 5 FDICT (preset dictionary)
|
---|
| 277 | bits 6 to 7 FLEVEL (compression level)
|
---|
| 278 |
|
---|
| 279 | The FCHECK value must be such that CMF and FLG, when viewed as
|
---|
| 280 | a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
|
---|
| 281 | is a multiple of 31.
|
---|
| 282 |
|
---|
| 283 |
|
---|
| 284 |
|
---|
| 285 |
|
---|
| 286 | Deutsch & Gailly Informational [Page 5]
|
---|
| 287 | |
---|
| 288 |
|
---|
| 289 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 290 |
|
---|
| 291 |
|
---|
| 292 | FDICT (Preset dictionary)
|
---|
| 293 | If FDICT is set, a DICT dictionary identifier is present
|
---|
| 294 | immediately after the FLG byte. The dictionary is a sequence of
|
---|
| 295 | bytes which are initially fed to the compressor without
|
---|
| 296 | producing any compressed output. DICT is the Adler-32 checksum
|
---|
| 297 | of this sequence of bytes (see the definition of ADLER32
|
---|
| 298 | below). The decompressor can use this identifier to determine
|
---|
| 299 | which dictionary has been used by the compressor.
|
---|
| 300 |
|
---|
| 301 | FLEVEL (Compression level)
|
---|
| 302 | These flags are available for use by specific compression
|
---|
| 303 | methods. The "deflate" method (CM = 8) sets these flags as
|
---|
| 304 | follows:
|
---|
| 305 |
|
---|
| 306 | 0 - compressor used fastest algorithm
|
---|
| 307 | 1 - compressor used fast algorithm
|
---|
| 308 | 2 - compressor used default algorithm
|
---|
| 309 | 3 - compressor used maximum compression, slowest algorithm
|
---|
| 310 |
|
---|
| 311 | The information in FLEVEL is not needed for decompression; it
|
---|
| 312 | is there to indicate if recompression might be worthwhile.
|
---|
| 313 |
|
---|
| 314 | compressed data
|
---|
| 315 | For compression method 8, the compressed data is stored in the
|
---|
| 316 | deflate compressed data format as described in the document
|
---|
| 317 | "DEFLATE Compressed Data Format Specification" by L. Peter
|
---|
| 318 | Deutsch. (See reference [3] in Chapter 3, below)
|
---|
| 319 |
|
---|
| 320 | Other compressed data formats are not specified in this version
|
---|
| 321 | of the zlib specification.
|
---|
| 322 |
|
---|
| 323 | ADLER32 (Adler-32 checksum)
|
---|
| 324 | This contains a checksum value of the uncompressed data
|
---|
| 325 | (excluding any dictionary data) computed according to Adler-32
|
---|
| 326 | algorithm. This algorithm is a 32-bit extension and improvement
|
---|
| 327 | of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
|
---|
| 328 | standard. See references [4] and [5] in Chapter 3, below)
|
---|
| 329 |
|
---|
| 330 | Adler-32 is composed of two sums accumulated per byte: s1 is
|
---|
| 331 | the sum of all bytes, s2 is the sum of all s1 values. Both sums
|
---|
| 332 | are done modulo 65521. s1 is initialized to 1, s2 to zero. The
|
---|
| 333 | Adler-32 checksum is stored as s2*65536 + s1 in most-
|
---|
| 334 | significant-byte first (network) order.
|
---|
| 335 |
|
---|
| 336 |
|
---|
| 337 |
|
---|
| 338 |
|
---|
| 339 |
|
---|
| 340 |
|
---|
| 341 |
|
---|
| 342 |
|
---|
| 343 | Deutsch & Gailly Informational [Page 6]
|
---|
| 344 | |
---|
| 345 |
|
---|
| 346 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 347 |
|
---|
| 348 |
|
---|
| 349 | 2.3. Compliance
|
---|
| 350 |
|
---|
| 351 | A compliant compressor must produce streams with correct CMF, FLG
|
---|
| 352 | and ADLER32, but need not support preset dictionaries. When the
|
---|
| 353 | zlib data format is used as part of another standard data format,
|
---|
| 354 | the compressor may use only preset dictionaries that are specified
|
---|
| 355 | by this other data format. If this other format does not use the
|
---|
| 356 | preset dictionary feature, the compressor must not set the FDICT
|
---|
| 357 | flag.
|
---|
| 358 |
|
---|
| 359 | A compliant decompressor must check CMF, FLG, and ADLER32, and
|
---|
| 360 | provide an error indication if any of these have incorrect values.
|
---|
| 361 | A compliant decompressor must give an error indication if CM is
|
---|
| 362 | not one of the values defined in this specification (only the
|
---|
| 363 | value 8 is permitted in this version), since another value could
|
---|
| 364 | indicate the presence of new features that would cause subsequent
|
---|
| 365 | data to be interpreted incorrectly. A compliant decompressor must
|
---|
| 366 | give an error indication if FDICT is set and DICTID is not the
|
---|
| 367 | identifier of a known preset dictionary. A decompressor may
|
---|
| 368 | ignore FLEVEL and still be compliant. When the zlib data format
|
---|
| 369 | is being used as a part of another standard format, a compliant
|
---|
| 370 | decompressor must support all the preset dictionaries specified by
|
---|
| 371 | the other format. When the other format does not use the preset
|
---|
| 372 | dictionary feature, a compliant decompressor must reject any
|
---|
| 373 | stream in which the FDICT flag is set.
|
---|
| 374 |
|
---|
| 375 | 3. References
|
---|
| 376 |
|
---|
| 377 | [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
|
---|
| 378 | available in ftp://ftp.uu.net/pub/archiving/zip/doc/
|
---|
| 379 |
|
---|
| 380 | [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
|
---|
| 381 | available in ftp://ftp.uu.net/graphics/png/documents/
|
---|
| 382 |
|
---|
| 383 | [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
|
---|
| 384 | available in ftp://ftp.uu.net/pub/archiving/zip/doc/
|
---|
| 385 |
|
---|
| 386 | [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
|
---|
| 387 | Transmissions," IEEE Transactions on Communications, Vol. COM-30,
|
---|
| 388 | No. 1, January 1982, pp. 247-252.
|
---|
| 389 |
|
---|
| 390 | [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
|
---|
| 391 | November, 1993, pp. 144, 145. (Available from
|
---|
| 392 | gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
|
---|
| 393 |
|
---|
| 394 |
|
---|
| 395 |
|
---|
| 396 |
|
---|
| 397 |
|
---|
| 398 |
|
---|
| 399 |
|
---|
| 400 | Deutsch & Gailly Informational [Page 7]
|
---|
| 401 | |
---|
| 402 |
|
---|
| 403 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 404 |
|
---|
| 405 |
|
---|
| 406 | 4. Source code
|
---|
| 407 |
|
---|
| 408 | Source code for a C language implementation of a "zlib" compliant
|
---|
| 409 | library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
|
---|
| 410 |
|
---|
| 411 | 5. Security Considerations
|
---|
| 412 |
|
---|
| 413 | A decoder that fails to check the ADLER32 checksum value may be
|
---|
| 414 | subject to undetected data corruption.
|
---|
| 415 |
|
---|
| 416 | 6. Acknowledgements
|
---|
| 417 |
|
---|
| 418 | Trademarks cited in this document are the property of their
|
---|
| 419 | respective owners.
|
---|
| 420 |
|
---|
| 421 | Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
|
---|
| 422 | the related software described in this specification. Glenn
|
---|
| 423 | Randers-Pehrson converted this document to RFC and HTML format.
|
---|
| 424 |
|
---|
| 425 | 7. Authors' Addresses
|
---|
| 426 |
|
---|
| 427 | L. Peter Deutsch
|
---|
| 428 | Aladdin Enterprises
|
---|
| 429 | 203 Santa Margarita Ave.
|
---|
| 430 | Menlo Park, CA 94025
|
---|
| 431 |
|
---|
| 432 | Phone: (415) 322-0103 (AM only)
|
---|
| 433 | FAX: (415) 322-1734
|
---|
| 434 | EMail: <ghost@aladdin.com>
|
---|
| 435 |
|
---|
| 436 |
|
---|
| 437 | Jean-Loup Gailly
|
---|
| 438 |
|
---|
| 439 | EMail: <gzip@prep.ai.mit.edu>
|
---|
| 440 |
|
---|
| 441 | Questions about the technical content of this specification can be
|
---|
| 442 | sent by email to
|
---|
| 443 |
|
---|
| 444 | Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
|
---|
| 445 | Mark Adler <madler@alumni.caltech.edu>
|
---|
| 446 |
|
---|
| 447 | Editorial comments on this specification can be sent by email to
|
---|
| 448 |
|
---|
| 449 | L. Peter Deutsch <ghost@aladdin.com> and
|
---|
| 450 | Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
|
---|
| 451 |
|
---|
| 452 |
|
---|
| 453 |
|
---|
| 454 |
|
---|
| 455 |
|
---|
| 456 |
|
---|
| 457 | Deutsch & Gailly Informational [Page 8]
|
---|
| 458 | |
---|
| 459 |
|
---|
| 460 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 461 |
|
---|
| 462 |
|
---|
| 463 | 8. Appendix: Rationale
|
---|
| 464 |
|
---|
| 465 | 8.1. Preset dictionaries
|
---|
| 466 |
|
---|
| 467 | A preset dictionary is specially useful to compress short input
|
---|
| 468 | sequences. The compressor can take advantage of the dictionary
|
---|
| 469 | context to encode the input in a more compact manner. The
|
---|
| 470 | decompressor can be initialized with the appropriate context by
|
---|
| 471 | virtually decompressing a compressed version of the dictionary
|
---|
| 472 | without producing any output. However for certain compression
|
---|
| 473 | algorithms such as the deflate algorithm this operation can be
|
---|
| 474 | achieved without actually performing any decompression.
|
---|
| 475 |
|
---|
| 476 | The compressor and the decompressor must use exactly the same
|
---|
| 477 | dictionary. The dictionary may be fixed or may be chosen among a
|
---|
| 478 | certain number of predefined dictionaries, according to the kind
|
---|
| 479 | of input data. The decompressor can determine which dictionary has
|
---|
| 480 | been chosen by the compressor by checking the dictionary
|
---|
| 481 | identifier. This document does not specify the contents of
|
---|
| 482 | predefined dictionaries, since the optimal dictionaries are
|
---|
| 483 | application specific. Standard data formats using this feature of
|
---|
| 484 | the zlib specification must precisely define the allowed
|
---|
| 485 | dictionaries.
|
---|
| 486 |
|
---|
| 487 | 8.2. The Adler-32 algorithm
|
---|
| 488 |
|
---|
| 489 | The Adler-32 algorithm is much faster than the CRC32 algorithm yet
|
---|
| 490 | still provides an extremely low probability of undetected errors.
|
---|
| 491 |
|
---|
| 492 | The modulo on unsigned long accumulators can be delayed for 5552
|
---|
| 493 | bytes, so the modulo operation time is negligible. If the bytes
|
---|
| 494 | are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
|
---|
| 495 | and order sensitive, unlike the first sum, which is just a
|
---|
| 496 | checksum. That 65521 is prime is important to avoid a possible
|
---|
| 497 | large class of two-byte errors that leave the check unchanged.
|
---|
| 498 | (The Fletcher checksum uses 255, which is not prime and which also
|
---|
| 499 | makes the Fletcher check insensitive to single byte changes 0 <->
|
---|
| 500 | 255.)
|
---|
| 501 |
|
---|
| 502 | The sum s1 is initialized to 1 instead of zero to make the length
|
---|
| 503 | of the sequence part of s2, so that the length does not have to be
|
---|
| 504 | checked separately. (Any sequence of zeroes has a Fletcher
|
---|
| 505 | checksum of zero.)
|
---|
| 506 |
|
---|
| 507 |
|
---|
| 508 |
|
---|
| 509 |
|
---|
| 510 |
|
---|
| 511 |
|
---|
| 512 |
|
---|
| 513 |
|
---|
| 514 | Deutsch & Gailly Informational [Page 9]
|
---|
| 515 | |
---|
| 516 |
|
---|
| 517 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 518 |
|
---|
| 519 |
|
---|
| 520 | 9. Appendix: Sample code
|
---|
| 521 |
|
---|
| 522 | The following C code computes the Adler-32 checksum of a data buffer.
|
---|
| 523 | It is written for clarity, not for speed. The sample code is in the
|
---|
| 524 | ANSI C programming language. Non C users may find it easier to read
|
---|
| 525 | with these hints:
|
---|
| 526 |
|
---|
| 527 | & Bitwise AND operator.
|
---|
| 528 | >> Bitwise right shift operator. When applied to an
|
---|
| 529 | unsigned quantity, as here, right shift inserts zero bit(s)
|
---|
| 530 | at the left.
|
---|
| 531 | << Bitwise left shift operator. Left shift inserts zero
|
---|
| 532 | bit(s) at the right.
|
---|
| 533 | ++ "n++" increments the variable n.
|
---|
| 534 | % modulo operator: a % b is the remainder of a divided by b.
|
---|
| 535 |
|
---|
| 536 | #define BASE 65521 /* largest prime smaller than 65536 */
|
---|
| 537 |
|
---|
| 538 | /*
|
---|
| 539 | Update a running Adler-32 checksum with the bytes buf[0..len-1]
|
---|
| 540 | and return the updated checksum. The Adler-32 checksum should be
|
---|
| 541 | initialized to 1.
|
---|
| 542 |
|
---|
| 543 | Usage example:
|
---|
| 544 |
|
---|
| 545 | unsigned long adler = 1L;
|
---|
| 546 |
|
---|
| 547 | while (read_buffer(buffer, length) != EOF) {
|
---|
| 548 | adler = update_adler32(adler, buffer, length);
|
---|
| 549 | }
|
---|
| 550 | if (adler != original_adler) error();
|
---|
| 551 | */
|
---|
| 552 | unsigned long update_adler32(unsigned long adler,
|
---|
| 553 | unsigned char *buf, int len)
|
---|
| 554 | {
|
---|
| 555 | unsigned long s1 = adler & 0xffff;
|
---|
| 556 | unsigned long s2 = (adler >> 16) & 0xffff;
|
---|
| 557 | int n;
|
---|
| 558 |
|
---|
| 559 | for (n = 0; n < len; n++) {
|
---|
| 560 | s1 = (s1 + buf[n]) % BASE;
|
---|
| 561 | s2 = (s2 + s1) % BASE;
|
---|
| 562 | }
|
---|
| 563 | return (s2 << 16) + s1;
|
---|
| 564 | }
|
---|
| 565 |
|
---|
| 566 | /* Return the adler32 of the bytes buf[0..len-1] */
|
---|
| 567 |
|
---|
| 568 |
|
---|
| 569 |
|
---|
| 570 |
|
---|
| 571 | Deutsch & Gailly Informational [Page 10]
|
---|
| 572 | |
---|
| 573 |
|
---|
| 574 | RFC 1950 ZLIB Compressed Data Format Specification May 1996
|
---|
| 575 |
|
---|
| 576 |
|
---|
| 577 | unsigned long adler32(unsigned char *buf, int len)
|
---|
| 578 | {
|
---|
| 579 | return update_adler32(1L, buf, len);
|
---|
| 580 | }
|
---|
| 581 |
|
---|
| 582 |
|
---|
| 583 |
|
---|
| 584 |
|
---|
| 585 |
|
---|
| 586 |
|
---|
| 587 |
|
---|
| 588 |
|
---|
| 589 |
|
---|
| 590 |
|
---|
| 591 |
|
---|
| 592 |
|
---|
| 593 |
|
---|
| 594 |
|
---|
| 595 |
|
---|
| 596 |
|
---|
| 597 |
|
---|
| 598 |
|
---|
| 599 |
|
---|
| 600 |
|
---|
| 601 |
|
---|
| 602 |
|
---|
| 603 |
|
---|
| 604 |
|
---|
| 605 |
|
---|
| 606 |
|
---|
| 607 |
|
---|
| 608 |
|
---|
| 609 |
|
---|
| 610 |
|
---|
| 611 |
|
---|
| 612 |
|
---|
| 613 |
|
---|
| 614 |
|
---|
| 615 |
|
---|
| 616 |
|
---|
| 617 |
|
---|
| 618 |
|
---|
| 619 |
|
---|
| 620 |
|
---|
| 621 |
|
---|
| 622 |
|
---|
| 623 |
|
---|
| 624 |
|
---|
| 625 |
|
---|
| 626 |
|
---|
| 627 |
|
---|
| 628 | Deutsch & Gailly Informational [Page 11]
|
---|
| 629 | |
---|
| 630 |
|
---|