Context Navigation

source: Vago/zlib-1.2.8/doc/rfc1950.txt@ 1050

Last change on this file since 1050 was 1049, checked in by s10k, 8 years ago

File size: 20.0 KB

Line
1
2
3
4
5
6
7	Network Working Group P. Deutsch
8	Request for Comments: 1950 Aladdin Enterprises
9	Category: Informational J-L. Gailly
10	Info-ZIP
11	May 1996
12
13
14	ZLIB Compressed Data Format Specification version 3.3
15
16	Status of This Memo
17
18	This memo provides information for the Internet community. This memo
19	does not specify an Internet standard of any kind. Distribution of
20	this memo is unlimited.
21
22	IESG Note:
23
24	The IESG takes no position on the validity of any Intellectual
25	Property Rights statements contained in this document.
26
27	Notices
28
29	Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
30
31	Permission is granted to copy and distribute this document for any
32	purpose and without charge, including translations into other
33	languages and incorporation into compilations, provided that the
34	copyright notice and this notice are preserved, and that any
35	substantive changes or deletions from the original are clearly
36	marked.
37
38	A pointer to the latest version of this and related documentation in
39	HTML format can be found at the URL
40	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
41
42	Abstract
43
44	This specification defines a lossless compressed data format. The
45	data can be produced or consumed, even for an arbitrarily long
46	sequentially presented input data stream, using only an a priori
47	bounded amount of intermediate storage. The format presently uses
48	the DEFLATE compression method but can be easily extended to use
49	other compression methods. It can be implemented readily in a manner
50	not covered by patents. This specification also defines the ADLER-32
51	checksum (an extension and improvement of the Fletcher checksum),
52	used for detection of data corruption, and provides an algorithm for
53	computing it.
54
55
56
57
58	Deutsch & Gailly Informational [Page 1]
59
60
61	RFC 1950 ZLIB Compressed Data Format Specification May 1996
62
63
64	Table of Contents
65
66	1. Introduction ................................................... 2
67	1.1. Purpose ................................................... 2
68	1.2. Intended audience ......................................... 3
69	1.3. Scope ..................................................... 3
70	1.4. Compliance ................................................ 3
71	1.5. Definitions of terms and conventions used ................ 3
72	1.6. Changes from previous versions ............................ 3
73	2. Detailed specification ......................................... 3
74	2.1. Overall conventions ....................................... 3
75	2.2. Data format ............................................... 4
76	2.3. Compliance ................................................ 7
77	3. References ..................................................... 7
78	4. Source code .................................................... 8
79	5. Security Considerations ........................................ 8
80	6. Acknowledgements ............................................... 8
81	7. Authors' Addresses ............................................. 8
82	8. Appendix: Rationale ............................................ 9
83	9. Appendix: Sample code ..........................................10
84
85	1. Introduction
86
87	1.1. Purpose
88
89	The purpose of this specification is to define a lossless
90	compressed data format that:
91
92	* Is independent of CPU type, operating system, file system,
93	and character set, and hence can be used for interchange;
94
95	* Can be produced or consumed, even for an arbitrarily long
96	sequentially presented input data stream, using only an a
97	priori bounded amount of intermediate storage, and hence can
98	be used in data communications or similar structures such as
99	Unix filters;
100
101	* Can use a number of different compression methods;
102
103	* Can be implemented readily in a manner not covered by
104	patents, and hence can be practiced freely.
105
106	The data format defined by this specification does not attempt to
107	allow random access to compressed data.
108
109
110
111
112
113
114
115	Deutsch & Gailly Informational [Page 2]
116
117
118	RFC 1950 ZLIB Compressed Data Format Specification May 1996
119
120
121	1.2. Intended audience
122
123	This specification is intended for use by implementors of software
124	to compress data into zlib format and/or decompress data from zlib
125	format.
126
127	The text of the specification assumes a basic background in
128	programming at the level of bits and other primitive data
129	representations.
130
131	1.3. Scope
132
133	The specification specifies a compressed data format that can be
134	used for in-memory compression of a sequence of arbitrary bytes.
135
136	1.4. Compliance
137
138	Unless otherwise indicated below, a compliant decompressor must be
139	able to accept and decompress any data set that conforms to all
140	the specifications presented here; a compliant compressor must
141	produce data sets that conform to all the specifications presented
142	here.
143
144	1.5. Definitions of terms and conventions used
145
146	byte: 8 bits stored or transmitted as a unit (same as an octet).
147	(For this specification, a byte is exactly 8 bits, even on
148	machines which store a character on a number of bits different
149	from 8.) See below, for the numbering of bits within a byte.
150
151	1.6. Changes from previous versions
152
153	Version 3.1 was the first public release of this specification.
154	In version 3.2, some terminology was changed and the Adler-32
155	sample code was rewritten for clarity. In version 3.3, the
156	support for a preset dictionary was introduced, and the
157	specification was converted to RFC style.
158
159	2. Detailed specification
160
161	2.1. Overall conventions
162
163	In the diagrams below, a box like this:
164
165	+---+
166	\| \| <-- the vertical bars might be missing
167	+---+
168
169
170
171
172	Deutsch & Gailly Informational [Page 3]
173
174
175	RFC 1950 ZLIB Compressed Data Format Specification May 1996
176
177
178	represents one byte; a box like this:
179
180	+==============+
181	\| \|
182	+==============+
183
184	represents a variable number of bytes.
185
186	Bytes stored within a computer do not have a "bit order", since
187	they are always treated as a unit. However, a byte considered as
188	an integer between 0 and 255 does have a most- and least-
189	significant bit, and since we write numbers with the most-
190	significant digit on the left, we also write bytes with the most-
191	significant bit on the left. In the diagrams below, we number the
192	bits of a byte so that bit 0 is the least-significant bit, i.e.,
193	the bits are numbered:
194
195	+--------+
196	\|76543210\|
197	+--------+
198
199	Within a computer, a number may occupy multiple bytes. All
200	multi-byte numbers in the format described here are stored with
201	the MOST-significant byte first (at the lower memory address).
202	For example, the decimal number 520 is stored as:
203
204	0 1
205	+--------+--------+
206	\|00000010\|00001000\|
207	+--------+--------+
208	^ ^
209	\| \|
210	\| + less significant byte = 8
211	+ more significant byte = 2 x 256
212
213	2.2. Data format
214
215	A zlib stream has the following structure:
216
217	0 1
218	+---+---+
219	\|CMF\|FLG\| (more-->)
220	+---+---+
221
222
223
224
225
226
227
228
229	Deutsch & Gailly Informational [Page 4]
230
231
232	RFC 1950 ZLIB Compressed Data Format Specification May 1996
233
234
235	(if FLG.FDICT set)
236
237	0 1 2 3
238	+---+---+---+---+
239	\| DICTID \| (more-->)
240	+---+---+---+---+
241
242	+=====================+---+---+---+---+
243	\|...compressed data...\| ADLER32 \|
244	+=====================+---+---+---+---+
245
246	Any data which may appear after ADLER32 are not part of the zlib
247	stream.
248
249	CMF (Compression Method and flags)
250	This byte is divided into a 4-bit compression method and a 4-
251	bit information field depending on the compression method.
252
253	bits 0 to 3 CM Compression method
254	bits 4 to 7 CINFO Compression info
255
256	CM (Compression method)
257	This identifies the compression method used in the file. CM = 8
258	denotes the "deflate" compression method with a window size up
259	to 32K. This is the method used by gzip and PNG (see
260	references [1] and [2] in Chapter 3, below, for the reference
261	documents). CM = 15 is reserved. It might be used in a future
262	version of this specification to indicate the presence of an
263	extra field before the compressed data.
264
265	CINFO (Compression info)
266	For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
267	size, minus eight (CINFO=7 indicates a 32K window size). Values
268	of CINFO above 7 are not allowed in this version of the
269	specification. CINFO is not defined in this specification for
270	CM not equal to 8.
271
272	FLG (FLaGs)
273	This flag byte is divided as follows:
274
275	bits 0 to 4 FCHECK (check bits for CMF and FLG)
276	bit 5 FDICT (preset dictionary)
277	bits 6 to 7 FLEVEL (compression level)
278
279	The FCHECK value must be such that CMF and FLG, when viewed as
280	a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
281	is a multiple of 31.
282
283
284
285
286	Deutsch & Gailly Informational [Page 5]
287
288
289	RFC 1950 ZLIB Compressed Data Format Specification May 1996
290
291
292	FDICT (Preset dictionary)
293	If FDICT is set, a DICT dictionary identifier is present
294	immediately after the FLG byte. The dictionary is a sequence of
295	bytes which are initially fed to the compressor without
296	producing any compressed output. DICT is the Adler-32 checksum
297	of this sequence of bytes (see the definition of ADLER32
298	below). The decompressor can use this identifier to determine
299	which dictionary has been used by the compressor.
300
301	FLEVEL (Compression level)
302	These flags are available for use by specific compression
303	methods. The "deflate" method (CM = 8) sets these flags as
304	follows:
305
306	0 - compressor used fastest algorithm
307	1 - compressor used fast algorithm
308	2 - compressor used default algorithm
309	3 - compressor used maximum compression, slowest algorithm
310
311	The information in FLEVEL is not needed for decompression; it
312	is there to indicate if recompression might be worthwhile.
313
314	compressed data
315	For compression method 8, the compressed data is stored in the
316	deflate compressed data format as described in the document
317	"DEFLATE Compressed Data Format Specification" by L. Peter
318	Deutsch. (See reference [3] in Chapter 3, below)
319
320	Other compressed data formats are not specified in this version
321	of the zlib specification.
322
323	ADLER32 (Adler-32 checksum)
324	This contains a checksum value of the uncompressed data
325	(excluding any dictionary data) computed according to Adler-32
326	algorithm. This algorithm is a 32-bit extension and improvement
327	of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
328	standard. See references [4] and [5] in Chapter 3, below)
329
330	Adler-32 is composed of two sums accumulated per byte: s1 is
331	the sum of all bytes, s2 is the sum of all s1 values. Both sums
332	are done modulo 65521. s1 is initialized to 1, s2 to zero. The
333	Adler-32 checksum is stored as s2*65536 + s1 in most-
334	significant-byte first (network) order.
335
336
337
338
339
340
341
342
343	Deutsch & Gailly Informational [Page 6]
344
345
346	RFC 1950 ZLIB Compressed Data Format Specification May 1996
347
348
349	2.3. Compliance
350
351	A compliant compressor must produce streams with correct CMF, FLG
352	and ADLER32, but need not support preset dictionaries. When the
353	zlib data format is used as part of another standard data format,
354	the compressor may use only preset dictionaries that are specified
355	by this other data format. If this other format does not use the
356	preset dictionary feature, the compressor must not set the FDICT
357	flag.
358
359	A compliant decompressor must check CMF, FLG, and ADLER32, and
360	provide an error indication if any of these have incorrect values.
361	A compliant decompressor must give an error indication if CM is
362	not one of the values defined in this specification (only the
363	value 8 is permitted in this version), since another value could
364	indicate the presence of new features that would cause subsequent
365	data to be interpreted incorrectly. A compliant decompressor must
366	give an error indication if FDICT is set and DICTID is not the
367	identifier of a known preset dictionary. A decompressor may
368	ignore FLEVEL and still be compliant. When the zlib data format
369	is being used as a part of another standard format, a compliant
370	decompressor must support all the preset dictionaries specified by
371	the other format. When the other format does not use the preset
372	dictionary feature, a compliant decompressor must reject any
373	stream in which the FDICT flag is set.
374
375	3. References
376
377	[1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
378	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
379
380	[2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
381	available in ftp://ftp.uu.net/graphics/png/documents/
382
383	[3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
384	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
385
386	[4] Fletcher, J. G., "An Arithmetic Checksum for Serial
387	Transmissions," IEEE Transactions on Communications, Vol. COM-30,
388	No. 1, January 1982, pp. 247-252.
389
390	[5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
391	November, 1993, pp. 144, 145. (Available from
392	gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
393
394
395
396
397
398
399
400	Deutsch & Gailly Informational [Page 7]
401
402
403	RFC 1950 ZLIB Compressed Data Format Specification May 1996
404
405
406	4. Source code
407
408	Source code for a C language implementation of a "zlib" compliant
409	library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
410
411	5. Security Considerations
412
413	A decoder that fails to check the ADLER32 checksum value may be
414	subject to undetected data corruption.
415
416	6. Acknowledgements
417
418	Trademarks cited in this document are the property of their
419	respective owners.
420
421	Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
422	the related software described in this specification. Glenn
423	Randers-Pehrson converted this document to RFC and HTML format.
424
425	7. Authors' Addresses
426
427	L. Peter Deutsch
428	Aladdin Enterprises
429	203 Santa Margarita Ave.
430	Menlo Park, CA 94025
431
432	Phone: (415) 322-0103 (AM only)
433	FAX: (415) 322-1734
434	EMail: <ghost@aladdin.com>
435
436
437	Jean-Loup Gailly
438
439	EMail: <gzip@prep.ai.mit.edu>
440
441	Questions about the technical content of this specification can be
442	sent by email to
443
444	Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
445	Mark Adler <madler@alumni.caltech.edu>
446
447	Editorial comments on this specification can be sent by email to
448
449	L. Peter Deutsch <ghost@aladdin.com> and
450	Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
451
452
453
454
455
456
457	Deutsch & Gailly Informational [Page 8]
458
459
460	RFC 1950 ZLIB Compressed Data Format Specification May 1996
461
462
463	8. Appendix: Rationale
464
465	8.1. Preset dictionaries
466
467	A preset dictionary is specially useful to compress short input
468	sequences. The compressor can take advantage of the dictionary
469	context to encode the input in a more compact manner. The
470	decompressor can be initialized with the appropriate context by
471	virtually decompressing a compressed version of the dictionary
472	without producing any output. However for certain compression
473	algorithms such as the deflate algorithm this operation can be
474	achieved without actually performing any decompression.
475
476	The compressor and the decompressor must use exactly the same
477	dictionary. The dictionary may be fixed or may be chosen among a
478	certain number of predefined dictionaries, according to the kind
479	of input data. The decompressor can determine which dictionary has
480	been chosen by the compressor by checking the dictionary
481	identifier. This document does not specify the contents of
482	predefined dictionaries, since the optimal dictionaries are
483	application specific. Standard data formats using this feature of
484	the zlib specification must precisely define the allowed
485	dictionaries.
486
487	8.2. The Adler-32 algorithm
488
489	The Adler-32 algorithm is much faster than the CRC32 algorithm yet
490	still provides an extremely low probability of undetected errors.
491
492	The modulo on unsigned long accumulators can be delayed for 5552
493	bytes, so the modulo operation time is negligible. If the bytes
494	are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
495	and order sensitive, unlike the first sum, which is just a
496	checksum. That 65521 is prime is important to avoid a possible
497	large class of two-byte errors that leave the check unchanged.
498	(The Fletcher checksum uses 255, which is not prime and which also
499	makes the Fletcher check insensitive to single byte changes 0 <->
500	255.)
501
502	The sum s1 is initialized to 1 instead of zero to make the length
503	of the sequence part of s2, so that the length does not have to be
504	checked separately. (Any sequence of zeroes has a Fletcher
505	checksum of zero.)
506
507
508
509
510
511
512
513
514	Deutsch & Gailly Informational [Page 9]
515
516
517	RFC 1950 ZLIB Compressed Data Format Specification May 1996
518
519
520	9. Appendix: Sample code
521
522	The following C code computes the Adler-32 checksum of a data buffer.
523	It is written for clarity, not for speed. The sample code is in the
524	ANSI C programming language. Non C users may find it easier to read
525	with these hints:
526
527	& Bitwise AND operator.
528	>> Bitwise right shift operator. When applied to an
529	unsigned quantity, as here, right shift inserts zero bit(s)
530	at the left.
531	<< Bitwise left shift operator. Left shift inserts zero
532	bit(s) at the right.
533	++ "n++" increments the variable n.
534	% modulo operator: a % b is the remainder of a divided by b.
535
536	#define BASE 65521 /* largest prime smaller than 65536 */
537
538	/*
539	Update a running Adler-32 checksum with the bytes buf[0..len-1]
540	and return the updated checksum. The Adler-32 checksum should be
541	initialized to 1.
542
543	Usage example:
544
545	unsigned long adler = 1L;
546
547	while (read_buffer(buffer, length) != EOF) {
548	adler = update_adler32(adler, buffer, length);
549	}
550	if (adler != original_adler) error();
551	*/
552	unsigned long update_adler32(unsigned long adler,
553	unsigned char *buf, int len)
554	{
555	unsigned long s1 = adler & 0xffff;
556	unsigned long s2 = (adler >> 16) & 0xffff;
557	int n;
558
559	for (n = 0; n < len; n++) {
560	s1 = (s1 + buf[n]) % BASE;
561	s2 = (s2 + s1) % BASE;
562	}
563	return (s2 << 16) + s1;
564	}
565
566	/* Return the adler32 of the bytes buf[0..len-1] */
567
568
569
570
571	Deutsch & Gailly Informational [Page 10]
572
573
574	RFC 1950 ZLIB Compressed Data Format Specification May 1996
575
576
577	unsigned long adler32(unsigned char *buf, int len)
578	{
579	return update_adler32(1L, buf, len);
580	}
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628	Deutsch & Gailly Informational [Page 11]
629
630

Note: See TracBrowser for help on using the repository browser.

Download in other formats: