source: Vago/zlib-1.2.8/doc/rfc1950.txt@ 1051

Last change on this file since 1051 was 1049, checked in by s10k, 8 years ago
File size: 20.0 KB
Line 
1
2
3
4
5
6
7Network Working Group P. Deutsch
8Request for Comments: 1950 Aladdin Enterprises
9Category: Informational J-L. Gailly
10 Info-ZIP
11 May 1996
12
13
14 ZLIB Compressed Data Format Specification version 3.3
15
16Status of This Memo
17
18 This memo provides information for the Internet community. This memo
19 does not specify an Internet standard of any kind. Distribution of
20 this memo is unlimited.
21
22IESG Note:
23
24 The IESG takes no position on the validity of any Intellectual
25 Property Rights statements contained in this document.
26
27Notices
28
29 Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30
31 Permission is granted to copy and distribute this document for any
32 purpose and without charge, including translations into other
33 languages and incorporation into compilations, provided that the
34 copyright notice and this notice are preserved, and that any
35 substantive changes or deletions from the original are clearly
36 marked.
37
38 A pointer to the latest version of this and related documentation in
39 HTML format can be found at the URL
40 <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41
42Abstract
43
44 This specification defines a lossless compressed data format. The
45 data can be produced or consumed, even for an arbitrarily long
46 sequentially presented input data stream, using only an a priori
47 bounded amount of intermediate storage. The format presently uses
48 the DEFLATE compression method but can be easily extended to use
49 other compression methods. It can be implemented readily in a manner
50 not covered by patents. This specification also defines the ADLER-32
51 checksum (an extension and improvement of the Fletcher checksum),
52 used for detection of data corruption, and provides an algorithm for
53 computing it.
54
55
56
57
58Deutsch & Gailly Informational [Page 1]
59
60
61RFC 1950 ZLIB Compressed Data Format Specification May 1996
62
63
64Table of Contents
65
66 1. Introduction ................................................... 2
67 1.1. Purpose ................................................... 2
68 1.2. Intended audience ......................................... 3
69 1.3. Scope ..................................................... 3
70 1.4. Compliance ................................................ 3
71 1.5. Definitions of terms and conventions used ................ 3
72 1.6. Changes from previous versions ............................ 3
73 2. Detailed specification ......................................... 3
74 2.1. Overall conventions ....................................... 3
75 2.2. Data format ............................................... 4
76 2.3. Compliance ................................................ 7
77 3. References ..................................................... 7
78 4. Source code .................................................... 8
79 5. Security Considerations ........................................ 8
80 6. Acknowledgements ............................................... 8
81 7. Authors' Addresses ............................................. 8
82 8. Appendix: Rationale ............................................ 9
83 9. Appendix: Sample code ..........................................10
84
851. Introduction
86
87 1.1. Purpose
88
89 The purpose of this specification is to define a lossless
90 compressed data format that:
91
92 * Is independent of CPU type, operating system, file system,
93 and character set, and hence can be used for interchange;
94
95 * Can be produced or consumed, even for an arbitrarily long
96 sequentially presented input data stream, using only an a
97 priori bounded amount of intermediate storage, and hence can
98 be used in data communications or similar structures such as
99 Unix filters;
100
101 * Can use a number of different compression methods;
102
103 * Can be implemented readily in a manner not covered by
104 patents, and hence can be practiced freely.
105
106 The data format defined by this specification does not attempt to
107 allow random access to compressed data.
108
109
110
111
112
113
114
115Deutsch & Gailly Informational [Page 2]
116
117
118RFC 1950 ZLIB Compressed Data Format Specification May 1996
119
120
121 1.2. Intended audience
122
123 This specification is intended for use by implementors of software
124 to compress data into zlib format and/or decompress data from zlib
125 format.
126
127 The text of the specification assumes a basic background in
128 programming at the level of bits and other primitive data
129 representations.
130
131 1.3. Scope
132
133 The specification specifies a compressed data format that can be
134 used for in-memory compression of a sequence of arbitrary bytes.
135
136 1.4. Compliance
137
138 Unless otherwise indicated below, a compliant decompressor must be
139 able to accept and decompress any data set that conforms to all
140 the specifications presented here; a compliant compressor must
141 produce data sets that conform to all the specifications presented
142 here.
143
144 1.5. Definitions of terms and conventions used
145
146 byte: 8 bits stored or transmitted as a unit (same as an octet).
147 (For this specification, a byte is exactly 8 bits, even on
148 machines which store a character on a number of bits different
149 from 8.) See below, for the numbering of bits within a byte.
150
151 1.6. Changes from previous versions
152
153 Version 3.1 was the first public release of this specification.
154 In version 3.2, some terminology was changed and the Adler-32
155 sample code was rewritten for clarity. In version 3.3, the
156 support for a preset dictionary was introduced, and the
157 specification was converted to RFC style.
158
1592. Detailed specification
160
161 2.1. Overall conventions
162
163 In the diagrams below, a box like this:
164
165 +---+
166 | | <-- the vertical bars might be missing
167 +---+
168
169
170
171
172Deutsch & Gailly Informational [Page 3]
173
174
175RFC 1950 ZLIB Compressed Data Format Specification May 1996
176
177
178 represents one byte; a box like this:
179
180 +==============+
181 | |
182 +==============+
183
184 represents a variable number of bytes.
185
186 Bytes stored within a computer do not have a "bit order", since
187 they are always treated as a unit. However, a byte considered as
188 an integer between 0 and 255 does have a most- and least-
189 significant bit, and since we write numbers with the most-
190 significant digit on the left, we also write bytes with the most-
191 significant bit on the left. In the diagrams below, we number the
192 bits of a byte so that bit 0 is the least-significant bit, i.e.,
193 the bits are numbered:
194
195 +--------+
196 |76543210|
197 +--------+
198
199 Within a computer, a number may occupy multiple bytes. All
200 multi-byte numbers in the format described here are stored with
201 the MOST-significant byte first (at the lower memory address).
202 For example, the decimal number 520 is stored as:
203
204 0 1
205 +--------+--------+
206 |00000010|00001000|
207 +--------+--------+
208 ^ ^
209 | |
210 | + less significant byte = 8
211 + more significant byte = 2 x 256
212
213 2.2. Data format
214
215 A zlib stream has the following structure:
216
217 0 1
218 +---+---+
219 |CMF|FLG| (more-->)
220 +---+---+
221
222
223
224
225
226
227
228
229Deutsch & Gailly Informational [Page 4]
230
231
232RFC 1950 ZLIB Compressed Data Format Specification May 1996
233
234
235 (if FLG.FDICT set)
236
237 0 1 2 3
238 +---+---+---+---+
239 | DICTID | (more-->)
240 +---+---+---+---+
241
242 +=====================+---+---+---+---+
243 |...compressed data...| ADLER32 |
244 +=====================+---+---+---+---+
245
246 Any data which may appear after ADLER32 are not part of the zlib
247 stream.
248
249 CMF (Compression Method and flags)
250 This byte is divided into a 4-bit compression method and a 4-
251 bit information field depending on the compression method.
252
253 bits 0 to 3 CM Compression method
254 bits 4 to 7 CINFO Compression info
255
256 CM (Compression method)
257 This identifies the compression method used in the file. CM = 8
258 denotes the "deflate" compression method with a window size up
259 to 32K. This is the method used by gzip and PNG (see
260 references [1] and [2] in Chapter 3, below, for the reference
261 documents). CM = 15 is reserved. It might be used in a future
262 version of this specification to indicate the presence of an
263 extra field before the compressed data.
264
265 CINFO (Compression info)
266 For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267 size, minus eight (CINFO=7 indicates a 32K window size). Values
268 of CINFO above 7 are not allowed in this version of the
269 specification. CINFO is not defined in this specification for
270 CM not equal to 8.
271
272 FLG (FLaGs)
273 This flag byte is divided as follows:
274
275 bits 0 to 4 FCHECK (check bits for CMF and FLG)
276 bit 5 FDICT (preset dictionary)
277 bits 6 to 7 FLEVEL (compression level)
278
279 The FCHECK value must be such that CMF and FLG, when viewed as
280 a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281 is a multiple of 31.
282
283
284
285
286Deutsch & Gailly Informational [Page 5]
287
288
289RFC 1950 ZLIB Compressed Data Format Specification May 1996
290
291
292 FDICT (Preset dictionary)
293 If FDICT is set, a DICT dictionary identifier is present
294 immediately after the FLG byte. The dictionary is a sequence of
295 bytes which are initially fed to the compressor without
296 producing any compressed output. DICT is the Adler-32 checksum
297 of this sequence of bytes (see the definition of ADLER32
298 below). The decompressor can use this identifier to determine
299 which dictionary has been used by the compressor.
300
301 FLEVEL (Compression level)
302 These flags are available for use by specific compression
303 methods. The "deflate" method (CM = 8) sets these flags as
304 follows:
305
306 0 - compressor used fastest algorithm
307 1 - compressor used fast algorithm
308 2 - compressor used default algorithm
309 3 - compressor used maximum compression, slowest algorithm
310
311 The information in FLEVEL is not needed for decompression; it
312 is there to indicate if recompression might be worthwhile.
313
314 compressed data
315 For compression method 8, the compressed data is stored in the
316 deflate compressed data format as described in the document
317 "DEFLATE Compressed Data Format Specification" by L. Peter
318 Deutsch. (See reference [3] in Chapter 3, below)
319
320 Other compressed data formats are not specified in this version
321 of the zlib specification.
322
323 ADLER32 (Adler-32 checksum)
324 This contains a checksum value of the uncompressed data
325 (excluding any dictionary data) computed according to Adler-32
326 algorithm. This algorithm is a 32-bit extension and improvement
327 of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328 standard. See references [4] and [5] in Chapter 3, below)
329
330 Adler-32 is composed of two sums accumulated per byte: s1 is
331 the sum of all bytes, s2 is the sum of all s1 values. Both sums
332 are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333 Adler-32 checksum is stored as s2*65536 + s1 in most-
334 significant-byte first (network) order.
335
336
337
338
339
340
341
342
343Deutsch & Gailly Informational [Page 6]
344
345
346RFC 1950 ZLIB Compressed Data Format Specification May 1996
347
348
349 2.3. Compliance
350
351 A compliant compressor must produce streams with correct CMF, FLG
352 and ADLER32, but need not support preset dictionaries. When the
353 zlib data format is used as part of another standard data format,
354 the compressor may use only preset dictionaries that are specified
355 by this other data format. If this other format does not use the
356 preset dictionary feature, the compressor must not set the FDICT
357 flag.
358
359 A compliant decompressor must check CMF, FLG, and ADLER32, and
360 provide an error indication if any of these have incorrect values.
361 A compliant decompressor must give an error indication if CM is
362 not one of the values defined in this specification (only the
363 value 8 is permitted in this version), since another value could
364 indicate the presence of new features that would cause subsequent
365 data to be interpreted incorrectly. A compliant decompressor must
366 give an error indication if FDICT is set and DICTID is not the
367 identifier of a known preset dictionary. A decompressor may
368 ignore FLEVEL and still be compliant. When the zlib data format
369 is being used as a part of another standard format, a compliant
370 decompressor must support all the preset dictionaries specified by
371 the other format. When the other format does not use the preset
372 dictionary feature, a compliant decompressor must reject any
373 stream in which the FDICT flag is set.
374
3753. References
376
377 [1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379
380 [2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381 available in ftp://ftp.uu.net/graphics/png/documents/
382
383 [3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384 available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385
386 [4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387 Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388 No. 1, January 1982, pp. 247-252.
389
390 [5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391 November, 1993, pp. 144, 145. (Available from
392 gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393
394
395
396
397
398
399
400Deutsch & Gailly Informational [Page 7]
401
402
403RFC 1950 ZLIB Compressed Data Format Specification May 1996
404
405
4064. Source code
407
408 Source code for a C language implementation of a "zlib" compliant
409 library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410
4115. Security Considerations
412
413 A decoder that fails to check the ADLER32 checksum value may be
414 subject to undetected data corruption.
415
4166. Acknowledgements
417
418 Trademarks cited in this document are the property of their
419 respective owners.
420
421 Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422 the related software described in this specification. Glenn
423 Randers-Pehrson converted this document to RFC and HTML format.
424
4257. Authors' Addresses
426
427 L. Peter Deutsch
428 Aladdin Enterprises
429 203 Santa Margarita Ave.
430 Menlo Park, CA 94025
431
432 Phone: (415) 322-0103 (AM only)
433 FAX: (415) 322-1734
434 EMail: <ghost@aladdin.com>
435
436
437 Jean-Loup Gailly
438
439 EMail: <gzip@prep.ai.mit.edu>
440
441 Questions about the technical content of this specification can be
442 sent by email to
443
444 Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
445 Mark Adler <madler@alumni.caltech.edu>
446
447 Editorial comments on this specification can be sent by email to
448
449 L. Peter Deutsch <ghost@aladdin.com> and
450 Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
451
452
453
454
455
456
457Deutsch & Gailly Informational [Page 8]
458
459
460RFC 1950 ZLIB Compressed Data Format Specification May 1996
461
462
4638. Appendix: Rationale
464
465 8.1. Preset dictionaries
466
467 A preset dictionary is specially useful to compress short input
468 sequences. The compressor can take advantage of the dictionary
469 context to encode the input in a more compact manner. The
470 decompressor can be initialized with the appropriate context by
471 virtually decompressing a compressed version of the dictionary
472 without producing any output. However for certain compression
473 algorithms such as the deflate algorithm this operation can be
474 achieved without actually performing any decompression.
475
476 The compressor and the decompressor must use exactly the same
477 dictionary. The dictionary may be fixed or may be chosen among a
478 certain number of predefined dictionaries, according to the kind
479 of input data. The decompressor can determine which dictionary has
480 been chosen by the compressor by checking the dictionary
481 identifier. This document does not specify the contents of
482 predefined dictionaries, since the optimal dictionaries are
483 application specific. Standard data formats using this feature of
484 the zlib specification must precisely define the allowed
485 dictionaries.
486
487 8.2. The Adler-32 algorithm
488
489 The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490 still provides an extremely low probability of undetected errors.
491
492 The modulo on unsigned long accumulators can be delayed for 5552
493 bytes, so the modulo operation time is negligible. If the bytes
494 are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495 and order sensitive, unlike the first sum, which is just a
496 checksum. That 65521 is prime is important to avoid a possible
497 large class of two-byte errors that leave the check unchanged.
498 (The Fletcher checksum uses 255, which is not prime and which also
499 makes the Fletcher check insensitive to single byte changes 0 <->
500 255.)
501
502 The sum s1 is initialized to 1 instead of zero to make the length
503 of the sequence part of s2, so that the length does not have to be
504 checked separately. (Any sequence of zeroes has a Fletcher
505 checksum of zero.)
506
507
508
509
510
511
512
513
514Deutsch & Gailly Informational [Page 9]
515
516
517RFC 1950 ZLIB Compressed Data Format Specification May 1996
518
519
5209. Appendix: Sample code
521
522 The following C code computes the Adler-32 checksum of a data buffer.
523 It is written for clarity, not for speed. The sample code is in the
524 ANSI C programming language. Non C users may find it easier to read
525 with these hints:
526
527 & Bitwise AND operator.
528 >> Bitwise right shift operator. When applied to an
529 unsigned quantity, as here, right shift inserts zero bit(s)
530 at the left.
531 << Bitwise left shift operator. Left shift inserts zero
532 bit(s) at the right.
533 ++ "n++" increments the variable n.
534 % modulo operator: a % b is the remainder of a divided by b.
535
536 #define BASE 65521 /* largest prime smaller than 65536 */
537
538 /*
539 Update a running Adler-32 checksum with the bytes buf[0..len-1]
540 and return the updated checksum. The Adler-32 checksum should be
541 initialized to 1.
542
543 Usage example:
544
545 unsigned long adler = 1L;
546
547 while (read_buffer(buffer, length) != EOF) {
548 adler = update_adler32(adler, buffer, length);
549 }
550 if (adler != original_adler) error();
551 */
552 unsigned long update_adler32(unsigned long adler,
553 unsigned char *buf, int len)
554 {
555 unsigned long s1 = adler & 0xffff;
556 unsigned long s2 = (adler >> 16) & 0xffff;
557 int n;
558
559 for (n = 0; n < len; n++) {
560 s1 = (s1 + buf[n]) % BASE;
561 s2 = (s2 + s1) % BASE;
562 }
563 return (s2 << 16) + s1;
564 }
565
566 /* Return the adler32 of the bytes buf[0..len-1] */
567
568
569
570
571Deutsch & Gailly Informational [Page 10]
572
573
574RFC 1950 ZLIB Compressed Data Format Specification May 1996
575
576
577 unsigned long adler32(unsigned char *buf, int len)
578 {
579 return update_adler32(1L, buf, len);
580 }
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628Deutsch & Gailly Informational [Page 11]
629
630
Note: See TracBrowser for help on using the repository browser.