Context Navigation

source: Vago/Libs/zlib-1.2.8/examples/gzlog.c@ 1089

Last change on this file since 1089 was 1050, checked in by s10k, 8 years ago

File size: 40.5 KB

Rev	Line
[1050]	1	/*
	2	* gzlog.c
	3	* Copyright (C) 2004, 2008, 2012 Mark Adler, all rights reserved
	4	* For conditions of distribution and use, see copyright notice in gzlog.h
	5	* version 2.2, 14 Aug 2012
	6	*/
	7
	8	/*
	9	gzlog provides a mechanism for frequently appending short strings to a gzip
	10	file that is efficient both in execution time and compression ratio. The
	11	strategy is to write the short strings in an uncompressed form to the end of
	12	the gzip file, only compressing when the amount of uncompressed data has
	13	reached a given threshold.
	14
	15	gzlog also provides protection against interruptions in the process due to
	16	system crashes. The status of the operation is recorded in an extra field
	17	in the gzip file, and is only updated once the gzip file is brought to a
	18	valid state. The last data to be appended or compressed is saved in an
	19	auxiliary file, so that if the operation is interrupted, it can be completed
	20	the next time an append operation is attempted.
	21
	22	gzlog maintains another auxiliary file with the last 32K of data from the
	23	compressed portion, which is preloaded for the compression of the subsequent
	24	data. This minimizes the impact to the compression ratio of appending.
	25	*/
	26
	27	/*
	28	Operations Concept:
	29
	30	Files (log name "foo"):
	31	foo.gz -- gzip file with the complete log
	32	foo.add -- last message to append or last data to compress
	33	foo.dict -- dictionary of the last 32K of data for next compression
	34	foo.temp -- temporary dictionary file for compression after this one
	35	foo.lock -- lock file for reading and writing the other files
	36	foo.repairs -- log file for log file recovery operations (not compressed)
	37
	38	gzip file structure:
	39	- fixed-length (no file name) header with extra field (see below)
	40	- compressed data ending initially with empty stored block
	41	- uncompressed data filling out originally empty stored block and
	42	subsequent stored blocks as needed (16K max each)
	43	- gzip trailer
	44	- no junk at end (no other gzip streams)
	45
	46	When appending data, the information in the first three items above plus the
	47	foo.add file are sufficient to recover an interrupted append operation. The
	48	extra field has the necessary information to restore the start of the last
	49	stored block and determine where to append the data in the foo.add file, as
	50	well as the crc and length of the gzip data before the append operation.
	51
	52	The foo.add file is created before the gzip file is marked for append, and
	53	deleted after the gzip file is marked as complete. So if the append
	54	operation is interrupted, the data to add will still be there. If due to
	55	some external force, the foo.add file gets deleted between when the append
	56	operation was interrupted and when recovery is attempted, the gzip file will
	57	still be restored, but without the appended data.
	58
	59	When compressing data, the information in the first two items above plus the
	60	foo.add file are sufficient to recover an interrupted compress operation.
	61	The extra field has the necessary information to find the end of the
	62	compressed data, and contains both the crc and length of just the compressed
	63	data and of the complete set of data including the contents of the foo.add
	64	file.
	65
	66	Again, the foo.add file is maintained during the compress operation in case
	67	of an interruption. If in the unlikely event the foo.add file with the data
	68	to be compressed is missing due to some external force, a gzip file with
	69	just the previous compressed data will be reconstructed. In this case, all
	70	of the data that was to be compressed is lost (approximately one megabyte).
	71	This will not occur if all that happened was an interruption of the compress
	72	operation.
	73
	74	The third state that is marked is the replacement of the old dictionary with
	75	the new dictionary after a compress operation. Once compression is
	76	complete, the gzip file is marked as being in the replace state. This
	77	completes the gzip file, so an interrupt after being so marked does not
	78	result in recompression. Then the dictionary file is replaced, and the gzip
	79	file is marked as completed. This state prevents the possibility of
	80	restarting compression with the wrong dictionary file.
	81
	82	All three operations are wrapped by a lock/unlock procedure. In order to
	83	gain exclusive access to the log files, first a foo.lock file must be
	84	exclusively created. When all operations are complete, the lock is
	85	released by deleting the foo.lock file. If when attempting to create the
	86	lock file, it already exists and the modify time of the lock file is more
	87	than five minutes old (set by the PATIENCE define below), then the old
	88	lock file is considered stale and deleted, and the exclusive creation of
	89	the lock file is retried. To assure that there are no false assessments
	90	of the staleness of the lock file, the operations periodically touch the
	91	lock file to update the modified date.
	92
	93	Following is the definition of the extra field with all of the information
	94	required to enable the above append and compress operations and their
	95	recovery if interrupted. Multi-byte values are stored little endian
	96	(consistent with the gzip format). File pointers are eight bytes long.
	97	The crc's and lengths for the gzip trailer are four bytes long. (Note that
	98	the length at the end of a gzip file is used for error checking only, and
	99	for large files is actually the length modulo 2^32.) The stored block
	100	length is two bytes long. The gzip extra field two-byte identification is
	101	"ap" for append. It is assumed that writing the extra field to the file is
	102	an "atomic" operation. That is, either all of the extra field is written
	103	to the file, or none of it is, if the operation is interrupted right at the
	104	point of updating the extra field. This is a reasonable assumption, since
	105	the extra field is within the first 52 bytes of the file, which is smaller
	106	than any expected block size for a mass storage device (usually 512 bytes or
	107	larger).
	108
	109	Extra field (35 bytes):
	110	- Pointer to first stored block length -- this points to the two-byte length
	111	of the first stored block, which is followed by the two-byte, one's
	112	complement of that length. The stored block length is preceded by the
	113	three-bit header of the stored block, which is the actual start of the
	114	stored block in the deflate format. See the bit offset field below.
	115	- Pointer to the last stored block length. This is the same as above, but
	116	for the last stored block of the uncompressed data in the gzip file.
	117	Initially this is the same as the first stored block length pointer.
	118	When the stored block gets to 16K (see the MAX_STORE define), then a new
	119	stored block as added, at which point the last stored block length pointer
	120	is different from the first stored block length pointer. When they are
	121	different, the first bit of the last stored block header is eight bits, or
	122	one byte back from the block length.
	123	- Compressed data crc and length. This is the crc and length of the data
	124	that is in the compressed portion of the deflate stream. These are used
	125	only in the event that the foo.add file containing the data to compress is
	126	lost after a compress operation is interrupted.
	127	- Total data crc and length. This is the crc and length of all of the data
	128	stored in the gzip file, compressed and uncompressed. It is used to
	129	reconstruct the gzip trailer when compressing, as well as when recovering
	130	interrupted operations.
	131	- Final stored block length. This is used to quickly find where to append,
	132	and allows the restoration of the original final stored block state when
	133	an append operation is interrupted.
	134	- First stored block start as the number of bits back from the final stored
	135	block first length byte. This value is in the range of 3..10, and is
	136	stored as the low three bits of the final byte of the extra field after
	137	subtracting three (0..7). This allows the last-block bit of the stored
	138	block header to be updated when a new stored block is added, for the case
	139	when the first stored block and the last stored block are the same. (When
	140	they are different, the numbers of bits back is known to be eight.) This
	141	also allows for new compressed data to be appended to the old compressed
	142	data in the compress operation, overwriting the previous first stored
	143	block, or for the compressed data to be terminated and a valid gzip file
	144	reconstructed on the off chance that a compression operation was
	145	interrupted and the data to compress in the foo.add file was deleted.
	146	- The operation in process. This is the next two bits in the last byte (the
	147	bits under the mask 0x18). The are interpreted as 0: nothing in process,
	148	1: append in process, 2: compress in process, 3: replace in process.
	149	- The top three bits of the last byte in the extra field are reserved and
	150	are currently set to zero.
	151
	152	Main procedure:
	153	- Exclusively create the foo.lock file using the O_CREAT and O_EXCL modes of
	154	the system open() call. If the modify time of an existing lock file is
	155	more than PATIENCE seconds old, then the lock file is deleted and the
	156	exclusive create is retried.
	157	- Load the extra field from the foo.gz file, and see if an operation was in
	158	progress but not completed. If so, apply the recovery procedure below.
	159	- Perform the append procedure with the provided data.
	160	- If the uncompressed data in the foo.gz file is 1MB or more, apply the
	161	compress procedure.
	162	- Delete the foo.lock file.
	163
	164	Append procedure:
	165	- Put what to append in the foo.add file so that the operation can be
	166	restarted if this procedure is interrupted.
	167	- Mark the foo.gz extra field with the append operation in progress.
	168	+ Restore the original last-block bit and stored block length of the last
	169	stored block from the information in the extra field, in case a previous
	170	append operation was interrupted.
	171	- Append the provided data to the last stored block, creating new stored
	172	blocks as needed and updating the stored blocks last-block bits and
	173	lengths.
	174	- Update the crc and length with the new data, and write the gzip trailer.
	175	- Write over the extra field (with a single write operation) with the new
	176	pointers, lengths, and crc's, and mark the gzip file as not in process.
	177	Though there is still a foo.add file, it will be ignored since nothing
	178	is in process. If a foo.add file is leftover from a previously
	179	completed operation, it is truncated when writing new data to it.
	180	- Delete the foo.add file.
	181
	182	Compress and replace procedures:
	183	- Read all of the uncompressed data in the stored blocks in foo.gz and write
	184	it to foo.add. Also write foo.temp with the last 32K of that data to
	185	provide a dictionary for the next invocation of this procedure.
	186	- Rewrite the extra field marking foo.gz with a compression in process.
	187	* If there is no data provided to compress (due to a missing foo.add file
	188	when recovering), reconstruct and truncate the foo.gz file to contain
	189	only the previous compressed data and proceed to the step after the next
	190	one. Otherwise ...
	191	- Compress the data with the dictionary in foo.dict, and write to the
	192	foo.gz file starting at the bit immediately following the last previously
	193	compressed block. If there is no foo.dict, proceed anyway with the
	194	compression at slightly reduced efficiency. (For the foo.dict file to be
	195	missing requires some external failure beyond simply the interruption of
	196	a compress operation.) During this process, the foo.lock file is
	197	periodically touched to assure that that file is not considered stale by
	198	another process before we're done. The deflation is terminated with a
	199	non-last empty static block (10 bits long), that is then located and
	200	written over by a last-bit-set empty stored block.
	201	- Append the crc and length of the data in the gzip file (previously
	202	calculated during the append operations).
	203	- Write over the extra field with the updated stored block offsets, bits
	204	back, crc's, and lengths, and mark foo.gz as in process for a replacement
	205	of the dictionary.
	206	@ Delete the foo.add file.
	207	- Replace foo.dict with foo.temp.
	208	- Write over the extra field, marking foo.gz as complete.
	209
	210	Recovery procedure:
	211	- If not a replace recovery, read in the foo.add file, and provide that data
	212	to the appropriate recovery below. If there is no foo.add file, provide
	213	a zero data length to the recovery. In that case, the append recovery
	214	restores the foo.gz to the previous compressed + uncompressed data state.
	215	For the the compress recovery, a missing foo.add file results in foo.gz
	216	being restored to the previous compressed-only data state.
	217	- Append recovery:
	218	- Pick up append at + step above
	219	- Compress recovery:
	220	- Pick up compress at * step above
	221	- Replace recovery:
	222	- Pick up compress at @ step above
	223	- Log the repair with a date stamp in foo.repairs
	224	*/
	225
	226	#include <sys/types.h>
	227	#include <stdio.h> /* rename, fopen, fprintf, fclose */
	228	#include <stdlib.h> /* malloc, free */
	229	#include <string.h> /* strlen, strrchr, strcpy, strncpy, strcmp */
	230	#include <fcntl.h> /* open */
	231	#include <unistd.h> /* lseek, read, write, close, unlink, sleep, */
	232	/* ftruncate, fsync */
	233	#include <errno.h> /* errno */
	234	#include <time.h> /* time, ctime */
	235	#include <sys/stat.h> /* stat */
	236	#include <sys/time.h> /* utimes */
	237	#include "zlib.h" /* crc32 */
	238
	239	#include "gzlog.h" /* header for external access */
	240
	241	#define local static
	242	typedef unsigned int uint;
	243	typedef unsigned long ulong;
	244
	245	/* Macro for debugging to deterministically force recovery operations */
	246	#ifdef DEBUG
	247	#include <setjmp.h> /* longjmp */
	248	jmp_buf gzlog_jump; /* where to go back to */
	249	int gzlog_bail = 0; /* which point to bail at (1..8) */
	250	int gzlog_count = -1; /* number of times through to wait */
	251	# define BAIL(n) do { if (n == gzlog_bail && gzlog_count-- == 0) \
	252	longjmp(gzlog_jump, gzlog_bail); } while (0)
	253	#else
	254	# define BAIL(n)
	255	#endif
	256
	257	/* how old the lock file can be in seconds before considering it stale */
	258	#define PATIENCE 300
	259
	260	/* maximum stored block size in Kbytes -- must be in 1..63 */
	261	#define MAX_STORE 16
	262
	263	/* number of stored Kbytes to trigger compression (must be >= 32 to allow
	264	dictionary construction, and <= 204 * MAX_STORE, in order for >> 10 to
	265	discard the stored block headers contribution of five bytes each) */
	266	#define TRIGGER 1024
	267
	268	/* size of a deflate dictionary (this cannot be changed) */
	269	#define DICT 32768U
	270
	271	/* values for the operation (2 bits) */
	272	#define NO_OP 0
	273	#define APPEND_OP 1
	274	#define COMPRESS_OP 2
	275	#define REPLACE_OP 3
	276
	277	/* macros to extract little-endian integers from an unsigned byte buffer */
	278	#define PULL2(p) ((p)[0]+((uint)((p)[1])<<8))
	279	#define PULL4(p) (PULL2(p)+((ulong)PULL2(p+2)<<16))
	280	#define PULL8(p) (PULL4(p)+((off_t)PULL4(p+4)<<32))
	281
	282	/* macros to store integers into a byte buffer in little-endian order */
	283	#define PUT2(p,a) do {(p)[0]=a;(p)[1]=(a)>>8;} while(0)
	284	#define PUT4(p,a) do {PUT2(p,a);PUT2(p+2,a>>16);} while(0)
	285	#define PUT8(p,a) do {PUT4(p,a);PUT4(p+4,a>>32);} while(0)
	286
	287	/* internal structure for log information */
	288	#define LOGID "\106\035\172" /* should be three non-zero characters */
	289	struct log {
	290	char id[4]; /* contains LOGID to detect inadvertent overwrites */
	291	int fd; /* file descriptor for .gz file, opened read/write */
	292	char path; / allocated path, e.g. "/var/log/foo" or "foo" */
	293	char end; / end of path, for appending suffices such as ".gz" */
	294	off_t first; /* offset of first stored block first length byte */
	295	int back; /* location of first block id in bits back from first */
	296	uint stored; /* bytes currently in last stored block */
	297	off_t last; /* offset of last stored block first length byte */
	298	ulong ccrc; /* crc of compressed data */
	299	ulong clen; /* length (modulo 2^32) of compressed data */
	300	ulong tcrc; /* crc of total data */
	301	ulong tlen; /* length (modulo 2^32) of total data */
	302	time_t lock; /* last modify time of our lock file */
	303	};
	304
	305	/* gzip header for gzlog */
	306	local unsigned char log_gzhead[] = {
	307	0x1f, 0x8b, /* magic gzip id */
	308	8, /* compression method is deflate */
	309	4, /* there is an extra field (no file name) */
	310	0, 0, 0, 0, /* no modification time provided */
	311	0, 0xff, /* no extra flags, no OS specified */
	312	39, 0, 'a', 'p', 35, 0 /* extra field with "ap" subfield */
	313	/* 35 is EXTRA, 39 is EXTRA + 4 */
	314	};
	315
	316	#define HEAD sizeof(log_gzhead) /* should be 16 */
	317
	318	/* initial gzip extra field content (52 == HEAD + EXTRA + 1) */
	319	local unsigned char log_gzext[] = {
	320	52, 0, 0, 0, 0, 0, 0, 0, /* offset of first stored block length */
	321	52, 0, 0, 0, 0, 0, 0, 0, /* offset of last stored block length */
	322	0, 0, 0, 0, 0, 0, 0, 0, /* compressed data crc and length */
	323	0, 0, 0, 0, 0, 0, 0, 0, /* total data crc and length */
	324	0, 0, /* final stored block data length */
	325	5 /* op is NO_OP, last bit 8 bits back */
	326	};
	327
	328	#define EXTRA sizeof(log_gzext) /* should be 35 */
	329
	330	/* initial gzip data and trailer */
	331	local unsigned char log_gzbody[] = {
	332	1, 0, 0, 0xff, 0xff, /* empty stored block (last) */
	333	0, 0, 0, 0, /* crc */
	334	0, 0, 0, 0 /* uncompressed length */
	335	};
	336
	337	#define BODY sizeof(log_gzbody)
	338
	339	/* Exclusively create foo.lock in order to negotiate exclusive access to the
	340	foo.* files. If the modify time of an existing lock file is greater than
	341	PATIENCE seconds in the past, then consider the lock file to have been
	342	abandoned, delete it, and try the exclusive create again. Save the lock
	343	file modify time for verification of ownership. Return 0 on success, or -1
	344	on failure, usually due to an access restriction or invalid path. Note that
	345	if stat() or unlink() fails, it may be due to another process noticing the
	346	abandoned lock file a smidge sooner and deleting it, so those are not
	347	flagged as an error. */
	348	local int log_lock(struct log *log)
	349	{
	350	int fd;
	351	struct stat st;
	352
	353	strcpy(log->end, ".lock");
	354	while ((fd = open(log->path, O_CREAT \| O_EXCL, 0644)) < 0) {
	355	if (errno != EEXIST)
	356	return -1;
	357	if (stat(log->path, &st) == 0 && time(NULL) - st.st_mtime > PATIENCE) {
	358	unlink(log->path);
	359	continue;
	360	}
	361	sleep(2); /* relinquish the CPU for two seconds while waiting */
	362	}
	363	close(fd);
	364	if (stat(log->path, &st) == 0)
	365	log->lock = st.st_mtime;
	366	return 0;
	367	}
	368
	369	/* Update the modify time of the lock file to now, in order to prevent another
	370	task from thinking that the lock is stale. Save the lock file modify time
	371	for verification of ownership. */
	372	local void log_touch(struct log *log)
	373	{
	374	struct stat st;
	375
	376	strcpy(log->end, ".lock");
	377	utimes(log->path, NULL);
	378	if (stat(log->path, &st) == 0)
	379	log->lock = st.st_mtime;
	380	}
	381
	382	/* Check the log file modify time against what is expected. Return true if
	383	this is not our lock. If it is our lock, touch it to keep it. */
	384	local int log_check(struct log *log)
	385	{
	386	struct stat st;
	387
	388	strcpy(log->end, ".lock");
	389	if (stat(log->path, &st) \|\| st.st_mtime != log->lock)
	390	return 1;
	391	log_touch(log);
	392	return 0;
	393	}
	394
	395	/* Unlock a previously acquired lock, but only if it's ours. */
	396	local void log_unlock(struct log *log)
	397	{
	398	if (log_check(log))
	399	return;
	400	strcpy(log->end, ".lock");
	401	unlink(log->path);
	402	log->lock = 0;
	403	}
	404
	405	/* Check the gzip header and read in the extra field, filling in the values in
	406	the log structure. Return op on success or -1 if the gzip header was not as
	407	expected. op is the current operation in progress last written to the extra
	408	field. This assumes that the gzip file has already been opened, with the
	409	file descriptor log->fd. */
	410	local int log_head(struct log *log)
	411	{
	412	int op;
	413	unsigned char buf[HEAD + EXTRA];
	414
	415	if (lseek(log->fd, 0, SEEK_SET) < 0 \|\|
	416	read(log->fd, buf, HEAD + EXTRA) != HEAD + EXTRA \|\|
	417	memcmp(buf, log_gzhead, HEAD)) {
	418	return -1;
	419	}
	420	log->first = PULL8(buf + HEAD);
	421	log->last = PULL8(buf + HEAD + 8);
	422	log->ccrc = PULL4(buf + HEAD + 16);
	423	log->clen = PULL4(buf + HEAD + 20);
	424	log->tcrc = PULL4(buf + HEAD + 24);
	425	log->tlen = PULL4(buf + HEAD + 28);
	426	log->stored = PULL2(buf + HEAD + 32);
	427	log->back = 3 + (buf[HEAD + 34] & 7);
	428	op = (buf[HEAD + 34] >> 3) & 3;
	429	return op;
	430	}
	431
	432	/* Write over the extra field contents, marking the operation as op. Use fsync
	433	to assure that the device is written to, and in the requested order. This
	434	operation, and only this operation, is assumed to be atomic in order to
	435	assure that the log is recoverable in the event of an interruption at any
	436	point in the process. Return -1 if the write to foo.gz failed. */
	437	local int log_mark(struct log *log, int op)
	438	{
	439	int ret;
	440	unsigned char ext[EXTRA];
	441
	442	PUT8(ext, log->first);
	443	PUT8(ext + 8, log->last);
	444	PUT4(ext + 16, log->ccrc);
	445	PUT4(ext + 20, log->clen);
	446	PUT4(ext + 24, log->tcrc);
	447	PUT4(ext + 28, log->tlen);
	448	PUT2(ext + 32, log->stored);
	449	ext[34] = log->back - 3 + (op << 3);
	450	fsync(log->fd);
	451	ret = lseek(log->fd, HEAD, SEEK_SET) < 0 \|\|
	452	write(log->fd, ext, EXTRA) != EXTRA ? -1 : 0;
	453	fsync(log->fd);
	454	return ret;
	455	}
	456
	457	/* Rewrite the last block header bits and subsequent zero bits to get to a byte
	458	boundary, setting the last block bit if last is true, and then write the
	459	remainder of the stored block header (length and one's complement). Leave
	460	the file pointer after the end of the last stored block data. Return -1 if
	461	there is a read or write failure on the foo.gz file */
	462	local int log_last(struct log *log, int last)
	463	{
	464	int back, len, mask;
	465	unsigned char buf[6];
	466
	467	/* determine the locations of the bytes and bits to modify */
	468	back = log->last == log->first ? log->back : 8;
	469	len = back > 8 ? 2 : 1; /* bytes back from log->last */
	470	mask = 0x80 >> ((back - 1) & 7); /* mask for block last-bit */
	471
	472	/* get the byte to modify (one or two back) into buf[0] -- don't need to
	473	read the byte if the last-bit is eight bits back, since in that case
	474	the entire byte will be modified */
	475	buf[0] = 0;
	476	if (back != 8 && (lseek(log->fd, log->last - len, SEEK_SET) < 0 \|\|
	477	read(log->fd, buf, 1) != 1))
	478	return -1;
	479
	480	/* change the last-bit of the last stored block as requested -- note
	481	that all bits above the last-bit are set to zero, per the type bits
	482	of a stored block being 00 and per the convention that the bits to
	483	bring the stream to a byte boundary are also zeros */
	484	buf[1] = 0;
	485	buf[2 - len] = (*buf & (mask - 1)) + (last ? mask : 0);
	486
	487	/* write the modified stored block header and lengths, move the file
	488	pointer to after the last stored block data */
	489	PUT2(buf + 2, log->stored);
	490	PUT2(buf + 4, log->stored ^ 0xffff);
	491	return lseek(log->fd, log->last - len, SEEK_SET) < 0 \|\|
	492	write(log->fd, buf + 2 - len, len + 4) != len + 4 \|\|
	493	lseek(log->fd, log->stored, SEEK_CUR) < 0 ? -1 : 0;
	494	}
	495
	496	/* Append len bytes from data to the locked and open log file. len may be zero
	497	if recovering and no .add file was found. In that case, the previous state
	498	of the foo.gz file is restored. The data is appended uncompressed in
	499	deflate stored blocks. Return -1 if there was an error reading or writing
	500	the foo.gz file. */
	501	local int log_append(struct log log, unsigned char data, size_t len)
	502	{
	503	uint put;
	504	off_t end;
	505	unsigned char buf[8];
	506
	507	/* set the last block last-bit and length, in case recovering an
	508	interrupted append, then position the file pointer to append to the
	509	block */
	510	if (log_last(log, 1))
	511	return -1;
	512
	513	/* append, adding stored blocks and updating the offset of the last stored
	514	block as needed, and update the total crc and length */
	515	while (len) {
	516	/* append as much as we can to the last block */
	517	put = (MAX_STORE << 10) - log->stored;
	518	if (put > len)
	519	put = (uint)len;
	520	if (put) {
	521	if (write(log->fd, data, put) != put)
	522	return -1;
	523	BAIL(1);
	524	log->tcrc = crc32(log->tcrc, data, put);
	525	log->tlen += put;
	526	log->stored += put;
	527	data += put;
	528	len -= put;
	529	}
	530
	531	/* if we need to, add a new empty stored block */
	532	if (len) {
	533	/* mark current block as not last */
	534	if (log_last(log, 0))
	535	return -1;
	536
	537	/* point to new, empty stored block */
	538	log->last += 4 + log->stored + 1;
	539	log->stored = 0;
	540	}
	541
	542	/* mark last block as last, update its length */
	543	if (log_last(log, 1))
	544	return -1;
	545	BAIL(2);
	546	}
	547
	548	/* write the new crc and length trailer, and truncate just in case (could
	549	be recovering from partial append with a missing foo.add file) */
	550	PUT4(buf, log->tcrc);
	551	PUT4(buf + 4, log->tlen);
	552	if (write(log->fd, buf, 8) != 8 \|\|
	553	(end = lseek(log->fd, 0, SEEK_CUR)) < 0 \|\| ftruncate(log->fd, end))
	554	return -1;
	555
	556	/* write the extra field, marking the log file as done, delete .add file */
	557	if (log_mark(log, NO_OP))
	558	return -1;
	559	strcpy(log->end, ".add");
	560	unlink(log->path); /* ignore error, since may not exist */
	561	return 0;
	562	}
	563
	564	/* Replace the foo.dict file with the foo.temp file. Also delete the foo.add
	565	file, since the compress operation may have been interrupted before that was
	566	done. Returns 1 if memory could not be allocated, or -1 if reading or
	567	writing foo.gz fails, or if the rename fails for some reason other than
	568	foo.temp not existing. foo.temp not existing is a permitted error, since
	569	the replace operation may have been interrupted after the rename is done,
	570	but before foo.gz is marked as complete. */
	571	local int log_replace(struct log *log)
	572	{
	573	int ret;
	574	char *dest;
	575
	576	/* delete foo.add file */
	577	strcpy(log->end, ".add");
	578	unlink(log->path); /* ignore error, since may not exist */
	579	BAIL(3);
	580
	581	/* rename foo.name to foo.dict, replacing foo.dict if it exists */
	582	strcpy(log->end, ".dict");
	583	dest = malloc(strlen(log->path) + 1);
	584	if (dest == NULL)
	585	return -2;
	586	strcpy(dest, log->path);
	587	strcpy(log->end, ".temp");
	588	ret = rename(log->path, dest);
	589	free(dest);
	590	if (ret && errno != ENOENT)
	591	return -1;
	592	BAIL(4);
	593
	594	/* mark the foo.gz file as done */
	595	return log_mark(log, NO_OP);
	596	}
	597
	598	/* Compress the len bytes at data and append the compressed data to the
	599	foo.gz deflate data immediately after the previous compressed data. This
	600	overwrites the previous uncompressed data, which was stored in foo.add
	601	and is the data provided in data[0..len-1]. If this operation is
	602	interrupted, it picks up at the start of this routine, with the foo.add
	603	file read in again. If there is no data to compress (len == 0), then we
	604	simply terminate the foo.gz file after the previously compressed data,
	605	appending a final empty stored block and the gzip trailer. Return -1 if
	606	reading or writing the log.gz file failed, or -2 if there was a memory
	607	allocation failure. */
	608	local int log_compress(struct log log, unsigned char data, size_t len)
	609	{
	610	int fd;
	611	uint got, max;
	612	ssize_t dict;
	613	off_t end;
	614	z_stream strm;
	615	unsigned char buf[DICT];
	616
	617	/* compress and append compressed data */
	618	if (len) {
	619	/* set up for deflate, allocating memory */
	620	strm.zalloc = Z_NULL;
	621	strm.zfree = Z_NULL;
	622	strm.opaque = Z_NULL;
	623	if (deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED, -15, 8,
	624	Z_DEFAULT_STRATEGY) != Z_OK)
	625	return -2;
	626
	627	/* read in dictionary (last 32K of data that was compressed) */
	628	strcpy(log->end, ".dict");
	629	fd = open(log->path, O_RDONLY, 0);
	630	if (fd >= 0) {
	631	dict = read(fd, buf, DICT);
	632	close(fd);
	633	if (dict < 0) {
	634	deflateEnd(&strm);
	635	return -1;
	636	}
	637	if (dict)
	638	deflateSetDictionary(&strm, buf, (uint)dict);
	639	}
	640	log_touch(log);
	641
	642	/* prime deflate with last bits of previous block, position write
	643	pointer to write those bits and overwrite what follows */
	644	if (lseek(log->fd, log->first - (log->back > 8 ? 2 : 1),
	645	SEEK_SET) < 0 \|\|
	646	read(log->fd, buf, 1) != 1 \|\| lseek(log->fd, -1, SEEK_CUR) < 0) {
	647	deflateEnd(&strm);
	648	return -1;
	649	}
	650	deflatePrime(&strm, (8 - log->back) & 7, *buf);
	651
	652	/* compress, finishing with a partial non-last empty static block */
	653	strm.next_in = data;
	654	max = (((uint)0 - 1) >> 1) + 1; /* in case int smaller than size_t */
	655	do {
	656	strm.avail_in = len > max ? max : (uint)len;
	657	len -= strm.avail_in;
	658	do {
	659	strm.avail_out = DICT;
	660	strm.next_out = buf;
	661	deflate(&strm, len ? Z_NO_FLUSH : Z_PARTIAL_FLUSH);
	662	got = DICT - strm.avail_out;
	663	if (got && write(log->fd, buf, got) != got) {
	664	deflateEnd(&strm);
	665	return -1;
	666	}
	667	log_touch(log);
	668	} while (strm.avail_out == 0);
	669	} while (len);
	670	deflateEnd(&strm);
	671	BAIL(5);
	672
	673	/* find start of empty static block -- scanning backwards the first one
	674	bit is the second bit of the block, if the last byte is zero, then
	675	we know the byte before that has a one in the top bit, since an
	676	empty static block is ten bits long */
	677	if ((log->first = lseek(log->fd, -1, SEEK_CUR)) < 0 \|\|
	678	read(log->fd, buf, 1) != 1)
	679	return -1;
	680	log->first++;
	681	if (*buf) {
	682	log->back = 1;
	683	while ((*buf & ((uint)1 << (8 - log->back++))) == 0)
	684	; /* guaranteed to terminate, since buf != 0 /
	685	}
	686	else
	687	log->back = 10;
	688
	689	/* update compressed crc and length */
	690	log->ccrc = log->tcrc;
	691	log->clen = log->tlen;
	692	}
	693	else {
	694	/* no data to compress -- fix up existing gzip stream */
	695	log->tcrc = log->ccrc;
	696	log->tlen = log->clen;
	697	}
	698
	699	/* complete and truncate gzip stream */
	700	log->last = log->first;
	701	log->stored = 0;
	702	PUT4(buf, log->tcrc);
	703	PUT4(buf + 4, log->tlen);
	704	if (log_last(log, 1) \|\| write(log->fd, buf, 8) != 8 \|\|
	705	(end = lseek(log->fd, 0, SEEK_CUR)) < 0 \|\| ftruncate(log->fd, end))
	706	return -1;
	707	BAIL(6);
	708
	709	/* mark as being in the replace operation */
	710	if (log_mark(log, REPLACE_OP))
	711	return -1;
	712
	713	/* execute the replace operation and mark the file as done */
	714	return log_replace(log);
	715	}
	716
	717	/* log a repair record to the .repairs file */
	718	local void log_log(struct log log, int op, char record)
	719	{
	720	time_t now;
	721	FILE *rec;
	722
	723	now = time(NULL);
	724	strcpy(log->end, ".repairs");
	725	rec = fopen(log->path, "a");
	726	if (rec == NULL)
	727	return;
	728	fprintf(rec, "%.24s %s recovery: %s\n", ctime(&now), op == APPEND_OP ?
	729	"append" : (op == COMPRESS_OP ? "compress" : "replace"), record);
	730	fclose(rec);
	731	return;
	732	}
	733
	734	/* Recover the interrupted operation op. First read foo.add for recovering an
	735	append or compress operation. Return -1 if there was an error reading or
	736	writing foo.gz or reading an existing foo.add, or -2 if there was a memory
	737	allocation failure. */
	738	local int log_recover(struct log *log, int op)
	739	{
	740	int fd, ret = 0;
	741	unsigned char *data = NULL;
	742	size_t len = 0;
	743	struct stat st;
	744
	745	/* log recovery */
	746	log_log(log, op, "start");
	747
	748	/* load foo.add file if expected and present */
	749	if (op == APPEND_OP \|\| op == COMPRESS_OP) {
	750	strcpy(log->end, ".add");
	751	if (stat(log->path, &st) == 0 && st.st_size) {
	752	len = (size_t)(st.st_size);
	753	if ((off_t)len != st.st_size \|\|
	754	(data = malloc(st.st_size)) == NULL) {
	755	log_log(log, op, "allocation failure");
	756	return -2;
	757	}
	758	if ((fd = open(log->path, O_RDONLY, 0)) < 0) {
	759	log_log(log, op, ".add file read failure");
	760	return -1;
	761	}
	762	ret = (size_t)read(fd, data, len) != len;
	763	close(fd);
	764	if (ret) {
	765	log_log(log, op, ".add file read failure");
	766	return -1;
	767	}
	768	log_log(log, op, "loaded .add file");
	769	}
	770	else
	771	log_log(log, op, "missing .add file!");
	772	}
	773
	774	/* recover the interrupted operation */
	775	switch (op) {
	776	case APPEND_OP:
	777	ret = log_append(log, data, len);
	778	break;
	779	case COMPRESS_OP:
	780	ret = log_compress(log, data, len);
	781	break;
	782	case REPLACE_OP:
	783	ret = log_replace(log);
	784	}
	785
	786	/* log status */
	787	log_log(log, op, ret ? "failure" : "complete");
	788
	789	/* clean up */
	790	if (data != NULL)
	791	free(data);
	792	return ret;
	793	}
	794
	795	/* Close the foo.gz file (if open) and release the lock. */
	796	local void log_close(struct log *log)
	797	{
	798	if (log->fd >= 0)
	799	close(log->fd);
	800	log->fd = -1;
	801	log_unlock(log);
	802	}
	803
	804	/* Open foo.gz, verify the header, and load the extra field contents, after
	805	first creating the foo.lock file to gain exclusive access to the foo.*
	806	files. If foo.gz does not exist or is empty, then write the initial header,
	807	extra, and body content of an empty foo.gz log file. If there is an error
	808	creating the lock file due to access restrictions, or an error reading or
	809	writing the foo.gz file, or if the foo.gz file is not a proper log file for
	810	this object (e.g. not a gzip file or does not contain the expected extra
	811	field), then return true. If there is an error, the lock is released.
	812	Otherwise, the lock is left in place. */
	813	local int log_open(struct log *log)
	814	{
	815	int op;
	816
	817	/* release open file resource if left over -- can occur if lock lost
	818	between gzlog_open() and gzlog_write() */
	819	if (log->fd >= 0)
	820	close(log->fd);
	821	log->fd = -1;
	822
	823	/* negotiate exclusive access */
	824	if (log_lock(log) < 0)
	825	return -1;
	826
	827	/* open the log file, foo.gz */
	828	strcpy(log->end, ".gz");
	829	log->fd = open(log->path, O_RDWR \| O_CREAT, 0644);
	830	if (log->fd < 0) {
	831	log_close(log);
	832	return -1;
	833	}
	834
	835	/* if new, initialize foo.gz with an empty log, delete old dictionary */
	836	if (lseek(log->fd, 0, SEEK_END) == 0) {
	837	if (write(log->fd, log_gzhead, HEAD) != HEAD \|\|
	838	write(log->fd, log_gzext, EXTRA) != EXTRA \|\|
	839	write(log->fd, log_gzbody, BODY) != BODY) {
	840	log_close(log);
	841	return -1;
	842	}
	843	strcpy(log->end, ".dict");
	844	unlink(log->path);
	845	}
	846
	847	/* verify log file and load extra field information */
	848	if ((op = log_head(log)) < 0) {
	849	log_close(log);
	850	return -1;
	851	}
	852
	853	/* check for interrupted process and if so, recover */
	854	if (op != NO_OP && log_recover(log, op)) {
	855	log_close(log);
	856	return -1;
	857	}
	858
	859	/* touch the lock file to prevent another process from grabbing it */
	860	log_touch(log);
	861	return 0;
	862	}
	863
	864	/* See gzlog.h for the description of the external methods below */
	865	gzlog gzlog_open(char path)
	866	{
	867	size_t n;
	868	struct log *log;
	869
	870	/* check arguments */
	871	if (path == NULL \|\| *path == 0)
	872	return NULL;
	873
	874	/* allocate and initialize log structure */
	875	log = malloc(sizeof(struct log));
	876	if (log == NULL)
	877	return NULL;
	878	strcpy(log->id, LOGID);
	879	log->fd = -1;
	880
	881	/* save path and end of path for name construction */
	882	n = strlen(path);
	883	log->path = malloc(n + 9); /* allow for ".repairs" */
	884	if (log->path == NULL) {
	885	free(log);
	886	return NULL;
	887	}
	888	strcpy(log->path, path);
	889	log->end = log->path + n;
	890
	891	/* gain exclusive access and verify log file -- may perform a
	892	recovery operation if needed */
	893	if (log_open(log)) {
	894	free(log->path);
	895	free(log);
	896	return NULL;
	897	}
	898
	899	/* return pointer to log structure */
	900	return log;
	901	}
	902
	903	/* gzlog_compress() return values:
	904	0: all good
	905	-1: file i/o error (usually access issue)
	906	-2: memory allocation failure
	907	-3: invalid log pointer argument */
	908	int gzlog_compress(gzlog *logd)
	909	{
	910	int fd, ret;
	911	uint block;
	912	size_t len, next;
	913	unsigned char *data, buf[5];
	914	struct log *log = logd;
	915
	916	/* check arguments */
	917	if (log == NULL \|\| strcmp(log->id, LOGID))
	918	return -3;
	919
	920	/* see if we lost the lock -- if so get it again and reload the extra
	921	field information (it probably changed), recover last operation if
	922	necessary */
	923	if (log_check(log) && log_open(log))
	924	return -1;
	925
	926	/* create space for uncompressed data */
	927	len = ((size_t)(log->last - log->first) & ~(((size_t)1 << 10) - 1)) +
	928	log->stored;
	929	if ((data = malloc(len)) == NULL)
	930	return -2;
	931
	932	/* do statement here is just a cheap trick for error handling */
	933	do {
	934	/* read in the uncompressed data */
	935	if (lseek(log->fd, log->first - 1, SEEK_SET) < 0)
	936	break;
	937	next = 0;
	938	while (next < len) {
	939	if (read(log->fd, buf, 5) != 5)
	940	break;
	941	block = PULL2(buf + 1);
	942	if (next + block > len \|\|
	943	read(log->fd, (char *)data + next, block) != block)
	944	break;
	945	next += block;
	946	}
	947	if (lseek(log->fd, 0, SEEK_CUR) != log->last + 4 + log->stored)
	948	break;
	949	log_touch(log);
	950
	951	/* write the uncompressed data to the .add file */
	952	strcpy(log->end, ".add");
	953	fd = open(log->path, O_WRONLY \| O_CREAT \| O_TRUNC, 0644);
	954	if (fd < 0)
	955	break;
	956	ret = (size_t)write(fd, data, len) != len;
	957	if (ret \| close(fd))
	958	break;
	959	log_touch(log);
	960
	961	/* write the dictionary for the next compress to the .temp file */
	962	strcpy(log->end, ".temp");
	963	fd = open(log->path, O_WRONLY \| O_CREAT \| O_TRUNC, 0644);
	964	if (fd < 0)
	965	break;
	966	next = DICT > len ? len : DICT;
	967	ret = (size_t)write(fd, (char *)data + len - next, next) != next;
	968	if (ret \| close(fd))
	969	break;
	970	log_touch(log);
	971
	972	/* roll back to compressed data, mark the compress in progress */
	973	log->last = log->first;
	974	log->stored = 0;
	975	if (log_mark(log, COMPRESS_OP))
	976	break;
	977	BAIL(7);
	978
	979	/* compress and append the data (clears mark) */
	980	ret = log_compress(log, data, len);
	981	free(data);
	982	return ret;
	983	} while (0);
	984
	985	/* broke out of do above on i/o error */
	986	free(data);
	987	return -1;
	988	}
	989
	990	/* gzlog_write() return values:
	991	0: all good
	992	-1: file i/o error (usually access issue)
	993	-2: memory allocation failure
	994	-3: invalid log pointer argument */
	995	int gzlog_write(gzlog logd, void data, size_t len)
	996	{
	997	int fd, ret;
	998	struct log *log = logd;
	999
	1000	/* check arguments */
	1001	if (log == NULL \|\| strcmp(log->id, LOGID))
	1002	return -3;
	1003	if (data == NULL \|\| len <= 0)
	1004	return 0;
	1005
	1006	/* see if we lost the lock -- if so get it again and reload the extra
	1007	field information (it probably changed), recover last operation if
	1008	necessary */
	1009	if (log_check(log) && log_open(log))
	1010	return -1;
	1011
	1012	/* create and write .add file */
	1013	strcpy(log->end, ".add");
	1014	fd = open(log->path, O_WRONLY \| O_CREAT \| O_TRUNC, 0644);
	1015	if (fd < 0)
	1016	return -1;
	1017	ret = (size_t)write(fd, data, len) != len;
	1018	if (ret \| close(fd))
	1019	return -1;
	1020	log_touch(log);
	1021
	1022	/* mark log file with append in progress */
	1023	if (log_mark(log, APPEND_OP))
	1024	return -1;
	1025	BAIL(8);
	1026
	1027	/* append data (clears mark) */
	1028	if (log_append(log, data, len))
	1029	return -1;
	1030
	1031	/* check to see if it's time to compress -- if not, then done */
	1032	if (((log->last - log->first) >> 10) + (log->stored >> 10) < TRIGGER)
	1033	return 0;
	1034
	1035	/* time to compress */
	1036	return gzlog_compress(log);
	1037	}
	1038
	1039	/* gzlog_close() return values:
	1040	0: ok
	1041	-3: invalid log pointer argument */
	1042	int gzlog_close(gzlog *logd)
	1043	{
	1044	struct log *log = logd;
	1045
	1046	/* check arguments */
	1047	if (log == NULL \|\| strcmp(log->id, LOGID))
	1048	return -3;
	1049
	1050	/* close the log file and release the lock */
	1051	log_close(log);
	1052
	1053	/* free structure and return */
	1054	if (log->path != NULL)
	1055	free(log->path);
	1056	strcpy(log->id, "bad");
	1057	free(log);
	1058	return 0;
	1059	}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: