Context Navigation

source: Vago/Libs/zlib-1.2.8/doc/rfc1950.txt@ 1051

Last change on this file since 1051 was 1050, checked in by s10k, 8 years ago

File size: 20.0 KB

Rev	Line
[1050]	1
	2
	3
	4
	5
	6
	7	Network Working Group P. Deutsch
	8	Request for Comments: 1950 Aladdin Enterprises
	9	Category: Informational J-L. Gailly
	10	Info-ZIP
	11	May 1996
	12
	13
	14	ZLIB Compressed Data Format Specification version 3.3
	15
	16	Status of This Memo
	17
	18	This memo provides information for the Internet community. This memo
	19	does not specify an Internet standard of any kind. Distribution of
	20	this memo is unlimited.
	21
	22	IESG Note:
	23
	24	The IESG takes no position on the validity of any Intellectual
	25	Property Rights statements contained in this document.
	26
	27	Notices
	28
	29	Copyright (c) 1996 L. Peter Deutsch and Jean-Loup Gailly
	30
	31	Permission is granted to copy and distribute this document for any
	32	purpose and without charge, including translations into other
	33	languages and incorporation into compilations, provided that the
	34	copyright notice and this notice are preserved, and that any
	35	substantive changes or deletions from the original are clearly
	36	marked.
	37
	38	A pointer to the latest version of this and related documentation in
	39	HTML format can be found at the URL
	40	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.
	41
	42	Abstract
	43
	44	This specification defines a lossless compressed data format. The
	45	data can be produced or consumed, even for an arbitrarily long
	46	sequentially presented input data stream, using only an a priori
	47	bounded amount of intermediate storage. The format presently uses
	48	the DEFLATE compression method but can be easily extended to use
	49	other compression methods. It can be implemented readily in a manner
	50	not covered by patents. This specification also defines the ADLER-32
	51	checksum (an extension and improvement of the Fletcher checksum),
	52	used for detection of data corruption, and provides an algorithm for
	53	computing it.
	54
	55
	56
	57
	58	Deutsch & Gailly Informational [Page 1]
	59
	60
	61	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	62
	63
	64	Table of Contents
	65
	66	1. Introduction ................................................... 2
	67	1.1. Purpose ................................................... 2
	68	1.2. Intended audience ......................................... 3
	69	1.3. Scope ..................................................... 3
	70	1.4. Compliance ................................................ 3
	71	1.5. Definitions of terms and conventions used ................ 3
	72	1.6. Changes from previous versions ............................ 3
	73	2. Detailed specification ......................................... 3
	74	2.1. Overall conventions ....................................... 3
	75	2.2. Data format ............................................... 4
	76	2.3. Compliance ................................................ 7
	77	3. References ..................................................... 7
	78	4. Source code .................................................... 8
	79	5. Security Considerations ........................................ 8
	80	6. Acknowledgements ............................................... 8
	81	7. Authors' Addresses ............................................. 8
	82	8. Appendix: Rationale ............................................ 9
	83	9. Appendix: Sample code ..........................................10
	84
	85	1. Introduction
	86
	87	1.1. Purpose
	88
	89	The purpose of this specification is to define a lossless
	90	compressed data format that:
	91
	92	* Is independent of CPU type, operating system, file system,
	93	and character set, and hence can be used for interchange;
	94
	95	* Can be produced or consumed, even for an arbitrarily long
	96	sequentially presented input data stream, using only an a
	97	priori bounded amount of intermediate storage, and hence can
	98	be used in data communications or similar structures such as
	99	Unix filters;
	100
	101	* Can use a number of different compression methods;
	102
	103	* Can be implemented readily in a manner not covered by
	104	patents, and hence can be practiced freely.
	105
	106	The data format defined by this specification does not attempt to
	107	allow random access to compressed data.
	108
	109
	110
	111
	112
	113
	114
	115	Deutsch & Gailly Informational [Page 2]
	116
	117
	118	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	119
	120
	121	1.2. Intended audience
	122
	123	This specification is intended for use by implementors of software
	124	to compress data into zlib format and/or decompress data from zlib
	125	format.
	126
	127	The text of the specification assumes a basic background in
	128	programming at the level of bits and other primitive data
	129	representations.
	130
	131	1.3. Scope
	132
	133	The specification specifies a compressed data format that can be
	134	used for in-memory compression of a sequence of arbitrary bytes.
	135
	136	1.4. Compliance
	137
	138	Unless otherwise indicated below, a compliant decompressor must be
	139	able to accept and decompress any data set that conforms to all
	140	the specifications presented here; a compliant compressor must
	141	produce data sets that conform to all the specifications presented
	142	here.
	143
	144	1.5. Definitions of terms and conventions used
	145
	146	byte: 8 bits stored or transmitted as a unit (same as an octet).
	147	(For this specification, a byte is exactly 8 bits, even on
	148	machines which store a character on a number of bits different
	149	from 8.) See below, for the numbering of bits within a byte.
	150
	151	1.6. Changes from previous versions
	152
	153	Version 3.1 was the first public release of this specification.
	154	In version 3.2, some terminology was changed and the Adler-32
	155	sample code was rewritten for clarity. In version 3.3, the
	156	support for a preset dictionary was introduced, and the
	157	specification was converted to RFC style.
	158
	159	2. Detailed specification
	160
	161	2.1. Overall conventions
	162
	163	In the diagrams below, a box like this:
	164
	165	+---+
	166	\| \| <-- the vertical bars might be missing
	167	+---+
	168
	169
	170
	171
	172	Deutsch & Gailly Informational [Page 3]
	173
	174
	175	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	176
	177
	178	represents one byte; a box like this:
	179
	180	+==============+
	181	\| \|
	182	+==============+
	183
	184	represents a variable number of bytes.
	185
	186	Bytes stored within a computer do not have a "bit order", since
	187	they are always treated as a unit. However, a byte considered as
	188	an integer between 0 and 255 does have a most- and least-
	189	significant bit, and since we write numbers with the most-
	190	significant digit on the left, we also write bytes with the most-
	191	significant bit on the left. In the diagrams below, we number the
	192	bits of a byte so that bit 0 is the least-significant bit, i.e.,
	193	the bits are numbered:
	194
	195	+--------+
	196	\|76543210\|
	197	+--------+
	198
	199	Within a computer, a number may occupy multiple bytes. All
	200	multi-byte numbers in the format described here are stored with
	201	the MOST-significant byte first (at the lower memory address).
	202	For example, the decimal number 520 is stored as:
	203
	204	0 1
	205	+--------+--------+
	206	\|00000010\|00001000\|
	207	+--------+--------+
	208	^ ^
	209	\| \|
	210	\| + less significant byte = 8
	211	+ more significant byte = 2 x 256
	212
	213	2.2. Data format
	214
	215	A zlib stream has the following structure:
	216
	217	0 1
	218	+---+---+
	219	\|CMF\|FLG\| (more-->)
	220	+---+---+
	221
	222
	223
	224
	225
	226
	227
	228
	229	Deutsch & Gailly Informational [Page 4]
	230
	231
	232	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	233
	234
	235	(if FLG.FDICT set)
	236
	237	0 1 2 3
	238	+---+---+---+---+
	239	\| DICTID \| (more-->)
	240	+---+---+---+---+
	241
	242	+=====================+---+---+---+---+
	243	\|...compressed data...\| ADLER32 \|
	244	+=====================+---+---+---+---+
	245
	246	Any data which may appear after ADLER32 are not part of the zlib
	247	stream.
	248
	249	CMF (Compression Method and flags)
	250	This byte is divided into a 4-bit compression method and a 4-
	251	bit information field depending on the compression method.
	252
	253	bits 0 to 3 CM Compression method
	254	bits 4 to 7 CINFO Compression info
	255
	256	CM (Compression method)
	257	This identifies the compression method used in the file. CM = 8
	258	denotes the "deflate" compression method with a window size up
	259	to 32K. This is the method used by gzip and PNG (see
	260	references [1] and [2] in Chapter 3, below, for the reference
	261	documents). CM = 15 is reserved. It might be used in a future
	262	version of this specification to indicate the presence of an
	263	extra field before the compressed data.
	264
	265	CINFO (Compression info)
	266	For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
	267	size, minus eight (CINFO=7 indicates a 32K window size). Values
	268	of CINFO above 7 are not allowed in this version of the
	269	specification. CINFO is not defined in this specification for
	270	CM not equal to 8.
	271
	272	FLG (FLaGs)
	273	This flag byte is divided as follows:
	274
	275	bits 0 to 4 FCHECK (check bits for CMF and FLG)
	276	bit 5 FDICT (preset dictionary)
	277	bits 6 to 7 FLEVEL (compression level)
	278
	279	The FCHECK value must be such that CMF and FLG, when viewed as
	280	a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
	281	is a multiple of 31.
	282
	283
	284
	285
	286	Deutsch & Gailly Informational [Page 5]
	287
	288
	289	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	290
	291
	292	FDICT (Preset dictionary)
	293	If FDICT is set, a DICT dictionary identifier is present
	294	immediately after the FLG byte. The dictionary is a sequence of
	295	bytes which are initially fed to the compressor without
	296	producing any compressed output. DICT is the Adler-32 checksum
	297	of this sequence of bytes (see the definition of ADLER32
	298	below). The decompressor can use this identifier to determine
	299	which dictionary has been used by the compressor.
	300
	301	FLEVEL (Compression level)
	302	These flags are available for use by specific compression
	303	methods. The "deflate" method (CM = 8) sets these flags as
	304	follows:
	305
	306	0 - compressor used fastest algorithm
	307	1 - compressor used fast algorithm
	308	2 - compressor used default algorithm
	309	3 - compressor used maximum compression, slowest algorithm
	310
	311	The information in FLEVEL is not needed for decompression; it
	312	is there to indicate if recompression might be worthwhile.
	313
	314	compressed data
	315	For compression method 8, the compressed data is stored in the
	316	deflate compressed data format as described in the document
	317	"DEFLATE Compressed Data Format Specification" by L. Peter
	318	Deutsch. (See reference [3] in Chapter 3, below)
	319
	320	Other compressed data formats are not specified in this version
	321	of the zlib specification.
	322
	323	ADLER32 (Adler-32 checksum)
	324	This contains a checksum value of the uncompressed data
	325	(excluding any dictionary data) computed according to Adler-32
	326	algorithm. This algorithm is a 32-bit extension and improvement
	327	of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
	328	standard. See references [4] and [5] in Chapter 3, below)
	329
	330	Adler-32 is composed of two sums accumulated per byte: s1 is
	331	the sum of all bytes, s2 is the sum of all s1 values. Both sums
	332	are done modulo 65521. s1 is initialized to 1, s2 to zero. The
	333	Adler-32 checksum is stored as s2*65536 + s1 in most-
	334	significant-byte first (network) order.
	335
	336
	337
	338
	339
	340
	341
	342
	343	Deutsch & Gailly Informational [Page 6]
	344
	345
	346	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	347
	348
	349	2.3. Compliance
	350
	351	A compliant compressor must produce streams with correct CMF, FLG
	352	and ADLER32, but need not support preset dictionaries. When the
	353	zlib data format is used as part of another standard data format,
	354	the compressor may use only preset dictionaries that are specified
	355	by this other data format. If this other format does not use the
	356	preset dictionary feature, the compressor must not set the FDICT
	357	flag.
	358
	359	A compliant decompressor must check CMF, FLG, and ADLER32, and
	360	provide an error indication if any of these have incorrect values.
	361	A compliant decompressor must give an error indication if CM is
	362	not one of the values defined in this specification (only the
	363	value 8 is permitted in this version), since another value could
	364	indicate the presence of new features that would cause subsequent
	365	data to be interpreted incorrectly. A compliant decompressor must
	366	give an error indication if FDICT is set and DICTID is not the
	367	identifier of a known preset dictionary. A decompressor may
	368	ignore FLEVEL and still be compliant. When the zlib data format
	369	is being used as a part of another standard format, a compliant
	370	decompressor must support all the preset dictionaries specified by
	371	the other format. When the other format does not use the preset
	372	dictionary feature, a compliant decompressor must reject any
	373	stream in which the FDICT flag is set.
	374
	375	3. References
	376
	377	[1] Deutsch, L.P.,"GZIP Compressed Data Format Specification",
	378	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
	379
	380	[2] Thomas Boutell, "PNG (Portable Network Graphics) specification",
	381	available in ftp://ftp.uu.net/graphics/png/documents/
	382
	383	[3] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
	384	available in ftp://ftp.uu.net/pub/archiving/zip/doc/
	385
	386	[4] Fletcher, J. G., "An Arithmetic Checksum for Serial
	387	Transmissions," IEEE Transactions on Communications, Vol. COM-30,
	388	No. 1, January 1982, pp. 247-252.
	389
	390	[5] ITU-T Recommendation X.224, Annex D, "Checksum Algorithms,"
	391	November, 1993, pp. 144, 145. (Available from
	392	gopher://info.itu.ch). ITU-T X.244 is also the same as ISO 8073.
	393
	394
	395
	396
	397
	398
	399
	400	Deutsch & Gailly Informational [Page 7]
	401
	402
	403	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	404
	405
	406	4. Source code
	407
	408	Source code for a C language implementation of a "zlib" compliant
	409	library is available at ftp://ftp.uu.net/pub/archiving/zip/zlib/.
	410
	411	5. Security Considerations
	412
	413	A decoder that fails to check the ADLER32 checksum value may be
	414	subject to undetected data corruption.
	415
	416	6. Acknowledgements
	417
	418	Trademarks cited in this document are the property of their
	419	respective owners.
	420
	421	Jean-Loup Gailly and Mark Adler designed the zlib format and wrote
	422	the related software described in this specification. Glenn
	423	Randers-Pehrson converted this document to RFC and HTML format.
	424
	425	7. Authors' Addresses
	426
	427	L. Peter Deutsch
	428	Aladdin Enterprises
	429	203 Santa Margarita Ave.
	430	Menlo Park, CA 94025
	431
	432	Phone: (415) 322-0103 (AM only)
	433	FAX: (415) 322-1734
	434	EMail: <ghost@aladdin.com>
	435
	436
	437	Jean-Loup Gailly
	438
	439	EMail: <gzip@prep.ai.mit.edu>
	440
	441	Questions about the technical content of this specification can be
	442	sent by email to
	443
	444	Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
	445	Mark Adler <madler@alumni.caltech.edu>
	446
	447	Editorial comments on this specification can be sent by email to
	448
	449	L. Peter Deutsch <ghost@aladdin.com> and
	450	Glenn Randers-Pehrson <randeg@alumni.rpi.edu>
	451
	452
	453
	454
	455
	456
	457	Deutsch & Gailly Informational [Page 8]
	458
	459
	460	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	461
	462
	463	8. Appendix: Rationale
	464
	465	8.1. Preset dictionaries
	466
	467	A preset dictionary is specially useful to compress short input
	468	sequences. The compressor can take advantage of the dictionary
	469	context to encode the input in a more compact manner. The
	470	decompressor can be initialized with the appropriate context by
	471	virtually decompressing a compressed version of the dictionary
	472	without producing any output. However for certain compression
	473	algorithms such as the deflate algorithm this operation can be
	474	achieved without actually performing any decompression.
	475
	476	The compressor and the decompressor must use exactly the same
	477	dictionary. The dictionary may be fixed or may be chosen among a
	478	certain number of predefined dictionaries, according to the kind
	479	of input data. The decompressor can determine which dictionary has
	480	been chosen by the compressor by checking the dictionary
	481	identifier. This document does not specify the contents of
	482	predefined dictionaries, since the optimal dictionaries are
	483	application specific. Standard data formats using this feature of
	484	the zlib specification must precisely define the allowed
	485	dictionaries.
	486
	487	8.2. The Adler-32 algorithm
	488
	489	The Adler-32 algorithm is much faster than the CRC32 algorithm yet
	490	still provides an extremely low probability of undetected errors.
	491
	492	The modulo on unsigned long accumulators can be delayed for 5552
	493	bytes, so the modulo operation time is negligible. If the bytes
	494	are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
	495	and order sensitive, unlike the first sum, which is just a
	496	checksum. That 65521 is prime is important to avoid a possible
	497	large class of two-byte errors that leave the check unchanged.
	498	(The Fletcher checksum uses 255, which is not prime and which also
	499	makes the Fletcher check insensitive to single byte changes 0 <->
	500	255.)
	501
	502	The sum s1 is initialized to 1 instead of zero to make the length
	503	of the sequence part of s2, so that the length does not have to be
	504	checked separately. (Any sequence of zeroes has a Fletcher
	505	checksum of zero.)
	506
	507
	508
	509
	510
	511
	512
	513
	514	Deutsch & Gailly Informational [Page 9]
	515
	516
	517	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	518
	519
	520	9. Appendix: Sample code
	521
	522	The following C code computes the Adler-32 checksum of a data buffer.
	523	It is written for clarity, not for speed. The sample code is in the
	524	ANSI C programming language. Non C users may find it easier to read
	525	with these hints:
	526
	527	& Bitwise AND operator.
	528	>> Bitwise right shift operator. When applied to an
	529	unsigned quantity, as here, right shift inserts zero bit(s)
	530	at the left.
	531	<< Bitwise left shift operator. Left shift inserts zero
	532	bit(s) at the right.
	533	++ "n++" increments the variable n.
	534	% modulo operator: a % b is the remainder of a divided by b.
	535
	536	#define BASE 65521 /* largest prime smaller than 65536 */
	537
	538	/*
	539	Update a running Adler-32 checksum with the bytes buf[0..len-1]
	540	and return the updated checksum. The Adler-32 checksum should be
	541	initialized to 1.
	542
	543	Usage example:
	544
	545	unsigned long adler = 1L;
	546
	547	while (read_buffer(buffer, length) != EOF) {
	548	adler = update_adler32(adler, buffer, length);
	549	}
	550	if (adler != original_adler) error();
	551	*/
	552	unsigned long update_adler32(unsigned long adler,
	553	unsigned char *buf, int len)
	554	{
	555	unsigned long s1 = adler & 0xffff;
	556	unsigned long s2 = (adler >> 16) & 0xffff;
	557	int n;
	558
	559	for (n = 0; n < len; n++) {
	560	s1 = (s1 + buf[n]) % BASE;
	561	s2 = (s2 + s1) % BASE;
	562	}
	563	return (s2 << 16) + s1;
	564	}
	565
	566	/* Return the adler32 of the bytes buf[0..len-1] */
	567
	568
	569
	570
	571	Deutsch & Gailly Informational [Page 10]
	572
	573
	574	RFC 1950 ZLIB Compressed Data Format Specification May 1996
	575
	576
	577	unsigned long adler32(unsigned char *buf, int len)
	578	{
	579	return update_adler32(1L, buf, len);
	580	}
	581
	582
	583
	584
	585
	586
	587
	588
	589
	590
	591
	592
	593
	594
	595
	596
	597
	598
	599
	600
	601
	602
	603
	604
	605
	606
	607
	608
	609
	610
	611
	612
	613
	614
	615
	616
	617
	618
	619
	620
	621
	622
	623
	624
	625
	626
	627
	628	Deutsch & Gailly Informational [Page 11]
	629
	630

Note: See TracBrowser for help on using the repository browser.

Download in other formats: