Context Navigation

TODO

Last change on this file was 1166, checked in by rossy, 4 years ago
Daodan: Replace MinGW build env with an up-to-date MSYS2 env
File size: 35.0 KB

Rev	Line
[1166]	1	Copyright 1999-2020 Free Software Foundation, Inc.
	2	Contributed by the AriC and Caramba projects, INRIA.
	3
	4	This file is part of the GNU MPFR Library.
	5
	6	The GNU MPFR Library is free software; you can redistribute it and/or modify
	7	it under the terms of the GNU Lesser General Public License as published by
	8	the Free Software Foundation; either version 3 of the License, or (at your
	9	option) any later version.
	10
	11	The GNU MPFR Library is distributed in the hope that it will be useful, but
	12	WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
	13	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
	14	License for more details.
	15
	16	You should have received a copy of the GNU Lesser General Public License
	17	along with the GNU MPFR Library; see the file COPYING.LESSER. If not, see
	18	https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
	19	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
	20
	21	Table of contents:
	22	1. Documentation
	23	2. Compiler/library detection
	24	3. Changes in existing functions
	25	4. New functions to implement
	26	5. Efficiency
	27	6. Miscellaneous
	28	7. Portability
	29
	30	##############################################################################
	31	1. Documentation
	32	##############################################################################
	33
	34	- add a description of the algorithms used and a proof of correctness
	35
	36	##############################################################################
	37	2. Compiler/library detection
	38	##############################################################################
	39
	40	- update ICC detection.
	41	* Use only __INTEL_COMPILER instead of the obsolete macro __ICC?
	42
	43	##############################################################################
	44	3. Changes in existing functions
	45	##############################################################################
	46
	47	- export mpfr_overflow and mpfr_underflow as public functions
	48
	49	- many functions currently taking into account the precision of the input
	50	variable to set the initial working precision (acosh, asinh, cosh, ...).
	51	This is nonsense since the "average" working precision should only depend
	52	on the precision of the output variable (and maybe on the value of
	53	the input in case of cancellation).
	54	-> remove those dependencies from the input precision.
	55
	56	- mpfr_can_round:
	57	change the meaning of the 2nd argument (err). Currently the error is
	58	at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the
	59	most significant bit of the approximation. I propose that the error
	60	is now at most 2^err ulps of the approximation, i.e.
	61	2^(MPFR_EXP(b)-MPFR_PREC(b)+err).
	62
	63	- mpfr_set_q first tries to convert the numerator and the denominator
	64	to mpfr_t. But this conversion may fail even if the correctly rounded
	65	result is representable. New way to implement:
	66	Function q = a/b. nq = PREC(q) na = PREC(a) nb = PREC(b)
	67	If na < nb
	68	a <- a*2^(nb-na)
	69	n <- na-nb+ (HIGH(a,nb) >= b)
	70	if (n >= nq)
	71	bb <- b*2^(n-nq)
	72	a = q*bb+r --> q has exactly n bits.
	73	else
	74	aa <- a*2^(nq-n)
	75	aa = q*b+r --> q has exactly n bits.
	76	If RNDN, takes nq+1 bits. (See also the new division function).
	77
	78	- revisit the conversion functions between a MPFR number and a native
	79	floating-point value.
	80	* Consequences if some exception is trapped?
	81	* Specify under which conditions (current rounding direction and
	82	precision of the FPU, whether a format has been recognized...),
	83	correct rounding is guaranteed. Fix the code if need be. Do not
	84	forget subnormals.
	85	* Provide mpfr_buildopt_* functions to tell whether the format of a
	86	native type (float / double / long double) has been recognized and
	87	which format it is?
	88	* For functions that return a native floating-point value (mpfr_get_flt,
	89	mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or
	90	overflow, follow the convention used for the functions in <math.h>?
	91	See §7.12.1 "Treatment of error conditions" of ISO C11, which provides
	92	two ways of handling error conditions, depending on math_errhandling:
	93	errno (to be set to ERANGE here) and floating-point exceptions.
	94	If floating-point exceptions need to be generated, do not use
	95	feraiseexcept(), as this function may require the math library (-lm);
	96	use a floating-point expression instead, such as DBL_MIN * DBL_MIN
	97	(underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe
	98	as used in the GNU libc implementation.
	99	* For testing the lack of subnormal support:
	100	see the -mfpu GCC option for ARM and
	101	https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level
	102
	103
	104	##############################################################################
	105	4. New functions to implement
	106	##############################################################################
	107
	108	- a function to compute the hash of a floating-point number
	109	(suggested by Patrick Pelissier)
	110	- implement new functions from the C++17 standard:
	111	http://en.cppreference.com/w/cpp/numeric/special_math
	112	assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3,
	113	cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2,
	114	ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre,
	115	sph_neumann.
	116	Already in mpfr4: beta and riemann_zeta.
	117	See also https://isocpp.org/files/papers/P0226R1.pdf and §29.9.5 in the
	118	C++17 draft:
	119	https://github.com/cplusplus/draft/blob/master/source/numerics.tex
	120	- implement mpfr_q_sub, mpfr_z_div, mpfr_q_div?
	121	- implement mpfr_pow_q and variants with two integers (native or mpz)
	122	instead of a rational? See IEEE P1788.
	123	- implement functions for random distributions, see for example
	124	https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html
	125	(suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010):
	126	* a Bernoulli distribution with prob p/q (exact)
	127	* a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker
	128	algorithm, but make it exact)
	129	* a uniform distribution in (a,b)
	130	* exponential distribution (mean lambda) (von Neumann's method?)
	131	* normal distribution (mean m, s.d. sigma) (ratio method?)
	132	- wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]:
	133	HypergeometricU(a,b,s) = 1/gamma(a)int(exp(-su)u^(a-1)*(1+u)^(b-a-1),
	134	u=0..infinity)
	135	JacobiThetaNullK
	136	PolylogP, PolylogD, PolylogDold: see http://arxiv.org/abs/math.CA/0702243
	137	and the references herein.
	138	JBessel(n, x) = BesselJ(n+1/2, x)
	139	KBessel, KBessel2 [2nd kind]
	140	JacobiTheta
	141	(see http://www.ams.org/journals/mcom/0000-000-00/S0025-5718-2017-03245-2/home.html)
	142	LogIntegral
	143	ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun)
	144	DawsonIntegral
	145	GammaD(x) = Gamma(x+1/2)
	146	- new functions of IEEE 754-2008, and more generally functions of the
	147	C binding draft TS 18661-4:
	148	http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf
	149	Some propositions about rootn: mpfr_rootn_si, mpfr_rootn_sj, mpfr_rootn_z,
	150	and versions with an unsigned integer: mpfr_rootn_ui (now implemented, as
	151	similar to mpfr_root) and mpfr_rootn_uj.
	152	- functions defined in the LIA-2 standard
	153	+ minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq
	154	and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax);
	155	+ rounding_rest, floor_rest, ceiling_rest (5.2.4);
	156	+ remr (5.2.5): x - round(x/y) y;
	157	+ error functions from 5.2.7 (if useful in MPFR);
	158	+ power1pm1 (5.3.6.7): (1 + x)^y - 1;
	159	+ logbase (5.3.6.12): \log_x(y);
	160	+ logbase1p1p (5.3.6.13): \log_{1+x}(1+y);
	161	+ rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi);
	162	+ axis_rad (5.3.9.1) if useful in MPFR;
	163	+ cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u);
	164	+ axis_cycle (5.3.10.1) if useful in MPFR;
	165	+ sinu, cosu, tanu, cotu, secu, cscu, cossinu, arcsinu, arccosu,
	166	arctanu, arccotu, arcsecu, arccscu (5.3.10.{2..14}):
	167	sin(x 2 pi / u), etc.;
	168	[from which sinpi(x) = sin(Pi*x), ... are trivial to implement, with u=2.]
	169	+ arcu (5.3.10.15): arctan2(y,x) u / (2 pi);
	170	+ rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}).
	171	- From GSL, missing special functions (if useful in MPFR):
	172	(cf https://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions)
	173	+ The Airy functions Ai(x) and Bi(x) defined by the integral representations:
	174	* Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt
	175	* Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt
	176	* Derivatives of Airy Functions
	177	+ The Bessel functions for n integer and n fractional:
	178	* Regular Modified Cylindrical Bessel Functions I_n
	179	* Irregular Modified Cylindrical Bessel Functions K_n
	180	* Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x,
	181	j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x
	182	Note: the "spherical" Bessel functions are solutions of
	183	x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy
	184	j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the
	185	classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99
	186	and mpfr.
	187	Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions
	188	*Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x,
	189	y_1(x)= -(\cos(x)/x+\sin(x))/x &
	190	y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x)
	191	* Regular Modified Spherical Bessel Functions i_n:
	192	i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x)
	193	* Irregular Modified Spherical Bessel Functions:
	194	k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x).
	195	+ Clausen Function:
	196	Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2))
	197	Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm).
	198	+ Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2).
	199	+ Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1))
	200	+ Elliptic Integrals:
	201	* Definition of Legendre Forms:
	202	F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t)))
	203	E(\phi,k) = \int_0^\phi dt \sqrt((1 - k^2 \sin^2(t)))
	204	P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t)))
	205	* Complete Legendre forms are denoted by
	206	K(k) = F(\pi/2, k)
	207	E(k) = E(\pi/2, k)
	208	* Definition of Carlson Forms
	209	RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1)
	210	RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2)
	211	RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2)
	212	RJ(x,y,z,p) = 3/2 \int_0^\infty dt
	213	(t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1)
	214	+ Elliptic Functions (Jacobi)
	215	+ N-relative exponential:
	216	exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!)
	217	+ exponential integral:
	218	E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2.
	219	Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0.
	220	Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t)
	221	+ Hyperbolic/Trigonometric Integrals
	222	Shi(x) = \int_0^x dt \sinh(t)/t
	223	Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t]
	224	Si(x) = \int_0^x dt \sin(t)/t
	225	Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0
	226	AtanInt(x) = \int_0^x dt \arctan(t)/t
	227	[ \gamma_E is the Euler constant ]
	228	+ Fermi-Dirac Function:
	229	F_j(x) := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1))
	230	+ Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in
	231	algorithms.bib
	232	logarithm of the Pochhammer symbol
	233	+ Gegenbauer Functions
	234	+ Laguerre Functions
	235	+ Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s)
	236	Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}.
	237	+ Lambert W Functions, W(x) are defined to be solutions of the equation:
	238	W(x) \exp(W(x)) = x.
	239	This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x))
	240	From Fredrik Johansson:
	241	See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular
	242	formulas 5.2 and 5.3 for the error bound: one first computes an
	243	approximation w, and then evaluates the residual w e^w - x. There is an
	244	expression for the error in terms of the residual and the derivative W'(t),
	245	where the derivative can be bounded by piecewise simple functions,
	246	something like min(1, 1/t) when t >= 0.
	247	See https://arxiv.org/abs/1705.03266 for rigorous error bounds.
	248	+ Trigamma Function psi'(x).
	249	and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0.
	250	- functions from ISO/IEC 24747:2009 (Extensions to the C Library,
	251	to Support Mathematical Special Functions).
	252	Standard: http://www.iso.org/iso/catalogue_detail.htm?csnumber=38857
	253	Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf
	254	Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf
	255	See also: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf
	256	(similar, for C++).
	257	Also check whether the functions that are already implemented in MPFR
	258	match this standard.
	259
	260	- from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html):
	261	- incomplete beta function, see message from Martin Maechler
	262	<maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in
	263	Abramowitz & Stegun
	264	- betaln
	265	- degrees
	266	- radians
	267	- sqrtpi
	268
	269	- mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey
	270	and answer from Granlund on mpfr list, May 2007)
	271	- [maybe useful for SAGE] implement companion frac_* functions to the rint_*
	272	functions. For example mpfr_frac_floor(x) = x - floor(x). (The current
	273	mpfr_frac function corresponds to mpfr_rint_trunc.)
	274	- scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html)
	275	- asec, acsc, acot, asech, acsch and acoth (mail from Björn Terelius on mpfr
	276	list, 18 June 2009)
	277
	278	- function to reduce the precision of a variable, with a ternary value in
	279	input, i.e. taking care of double rounding. Two possible forms: like
	280	mpfr_set (i.e. with input and output) or like mpfr_prec_round (i.e. with
	281	a single variable). mpfr_subnormalize and mpfr_round_nearest_away_end
	282	could use it.
	283
	284	- UBF functions for +, -, *, fmma, /, sqrt.
	285	Support UBF in mpfr_check_range or add mpfr_ubf_check_range?
	286	Make this available in the API, e.g. for MPC.
	287
	288	- mpfr_cmp_uj and mpfr_cmp_sj. They would be useful to test MPFR with
	289	_MPFR_EXP_FORMAT=4.
	290
	291	- base conversion with the round-trip property using a minimal precision,
	292	such as the to_chars functions from the C++ standard:
	293
	294	The functions [...] ensure that the string representation consists
	295	of the smallest number of characters such that there is at least
	296	one digit before the radix point (if present) and parsing the
	297	representation using the corresponding from_chars function
	298	recovers value exactly. [Note: This guarantee applies only if
	299	to_chars and from_chars are executed on the same implementation.
	300	— end note] If there are several such representations, the
	301	representation with the smallest difference from the
	302	floating-point argument value is chosen, resolving any remaining
	303	ties using rounding according to round_to_nearest.
	304
	305	Text from: https://www.zsh.org/mla/workers/2019/msg01138.html
	306
	307	- Serialization / Deserialization. Suggested by Frédéric Pétrot:
	308	https://sympa.inria.fr/sympa/arc/mpfr/2020-02/msg00006.html
	309	like mpfr_fpif_{import,export}, but with memory instead of file.
	310
	311	Idea of implementation to reuse most of the code and change very little:
	312
	313	Instead of passing a FILE fh, pass a struct ext_data h, and instead of
	314	using fread and fwrite, use
	315	h->read (h, buffer, size)
	316	h->write (h, buffer, size)
	317	respectively.
	318
	319	The struct ext_data structure could contain the following fields:
	320	* read: pointer to a wrapper function for the read method.
	321	* write: pointer to a wrapper function for the write method.
	322	* FILE *fh: to be used for operations with files.
	323	* unsigned char *arena: to be used for operations with memory.
	324
	325	The wrapper functions for the read method could be:
	326
	327	static int
	328	read_from_file (struct ext_data h, unsigned char buffer, size_t size)
	329	{
	330	return fread (buffer, size, 1, h->fh) != 1;
	331	}
	332
	333	static int
	334	read_from_memory (struct ext_data h, unsigned char buffer, size_t size)
	335	{
	336	if (h->arena == NULL)
	337	return 1;
	338	memcpy (buffer, h->arena, size);
	339	h->arena += size;
	340	return 0;
	341	}
	342
	343	So I expect very few changes in the existing code:
	344	* Write a few wrapper functions.
	345	* Rename mpfr_fpif_export to mpfr_fpif_export_aux and
	346	mpfr_fpif_import to mpfr_fpif_import_aux.
	347	* In the existing functions, replace FILE *fh, and fread/fwrite
	348	calls as mentioned above.
	349	* Add new mpfr_fpif_export, mpfr_fpif_import, mpfr_fpif_export_mem,
	350	mpfr_fpif_import_mem.
	351
	352	##############################################################################
	353	5. Efficiency
	354	##############################################################################
	355
	356	- Fredrik Johansson reports that mpfr_ai is slow for large arguments: an
	357	asymptotic expansion should be used (once done, remove REDUCE_EMAX from
	358	tests/tai.c and update the description in mpfr.texi).
	359	- for exp(x), Fredrik Johansson reports a 20% speed improvement starting from
	360	4000 bits, and up to a 75% memory improvement in his Arb implementation, by
	361	using recursive instead of iterative binary splitting:
	362	https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c
	363	- improve mpfr_grandom using the algorithm in http://arxiv.org/abs/1303.6257
	364	- implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a
	365	basecase variant
	366	- use mpn_div_q to speed up mpfr_div. However mpn_div_q, which is new in
	367	GMP 5, is not documented in the GMP manual, thus we are not sure it
	368	guarantees to return the same quotient as mpn_tdiv_qr.
	369	Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would
	370	be to first try with mpn_div_q, and if we cannot (easily) compute the
	371	rounding, then use the current code with mpn_divrem.
	372	- improve atanh(x) for small x by using atanh(x) = log1p(2x/(1-x)),
	373	and log1p should also be improved for small arguments.
	374	- compute exp by using the series for cosh or sinh, which has half the terms
	375	(see Exercise 4.11 from Modern Computer Arithmetic, version 0.3)
	376	The same method can be used for log, using the series for atanh, i.e.,
	377	atanh(x) = 1/2*log((1+x)/(1-x)).
	378	- improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible
	379	idea is to implement a fast algorithm for the argument reconstruction
	380	gamma(x+k): instead of performing k products by x+i, we could precompute
	381	x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products.
	382	One could also use the series for 1/gamma(x), see for example
	383	http://dlmf.nist.gov/5/7/ or formula (36) from
	384	http://mathworld.wolfram.com/GammaFunction.html
	385	- improve the computation of Bernoulli numbers: instead of computing just one
	386	B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a
	387	time, sharing the expensive computation of the 1/p^(2n) series.
	388	- fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for
	389	example on 3Ghz P4 with gmp-4.2, x=12.345:
	390	prec=50000 k=2 k=3 k=10 k=100
	391	mpz_root 0.036 0.072 0.476 7.628
	392	mpfr_mpz_root 0.004 0.004 0.036 12.20
	393	See also mail from Carl Witty on mpfr list, 09 Oct 2007.
	394	- for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for
	395	full precision when precision <= MPFR_EXP_THRESHOLD. The reason is
	396	that argument reduction kills sparsity. Maybe avoid argument reduction
	397	for sparse input?
	398	- speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198
	399	- improve mpfr_sin on values like ~pi (do not compute sin from cos, because
	400	of the cancellation). For instance, reduce the input modulo pi/2 in
	401	[-pi/4,pi/4], and define auxiliary functions for which the argument is
	402	assumed to be already reduced (so that the sin function can avoid
	403	unnecessary computations by calling the auxiliary cos function instead of
	404	the full cos function). This will require a native code for sin, for
	405	example using the reduction sin(3x)=3sin(x)-4sin(x)^3.
	406	See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and
	407	the following messages.
	408	- improve generic.c to work for number of terms <> 2^k
	409	- rewrite mpfr_greater_p... as native code.
	410
	411	- mpf_t uses a scheme where the number of limbs actually present can
	412	be less than the selected precision, thereby allowing low precision
	413	values (for instance small integers) to be stored and manipulated in
	414	an mpf_t efficiently.
	415
	416	Perhaps mpfr should get something similar, especially if looking to
	417	replace mpf with mpfr, though it'd be a major change. Alternately
	418	perhaps those mpfr routines like mpfr_mul where optimizations are
	419	possible through stripping low zero bits or limbs could check for
	420	that (this would be less efficient but easier).
	421
	422	- try the idea of the paper "Reduced Cancellation in the Evaluation of Entire
	423	Functions and Applications to the Error Function" by W. Gawronski, J. Mueller
	424	and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to
	425	avoid cancellation in say erfc(x) for x large, they compute the Taylor
	426	expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation),
	427	and then divide by exp(x^2/2) (which is simpler to compute).
	428
	429	- replace the *_THRESHOLD macros by global (TLS) variables that can be
	430	changed at run time (via a function, like other variables)? One benefit
	431	is that users could use a single MPFR binary on several machines (e.g.,
	432	a library provided by binary packages or shared via NFS) with different
	433	thresholds. On the default values, this would be a bit less efficient
	434	than the current code, but this isn't probably noticeable (this should
	435	be tested). Something like:
	436	long *mpfr_tune_get(void) to get the current values (the first value
	437	is the size of the array).
	438	int mpfr_tune_set(long *array) to set the tune values.
	439	int mpfr_tune_run(long level) to find the best values (the support
	440	for this feature is optional, this can also be done with an
	441	external function).
	442
	443	- better distinguish different processors (for example Opteron and Core 2)
	444	and use corresponding default tuning parameters (as in GMP). This could be
	445	done in configure.ac to avoid hacking config.guess, for example define
	446	MPFR_HAVE_CORE2.
	447	Note (VL): the effect on cross-compilation (that can be a processor
	448	with the same architecture, e.g. compilation on a Core 2 for an
	449	Opteron) is not clear. The choice should be consistent with the
	450	build target (e.g. -march or -mtune value with gcc).
	451	Also choose better default values. For instance, the default value of
	452	MPFR_MUL_THRESHOLD is 40, while the best values that have been found
	453	are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits!
	454
	455	- during the Many Digits competition, we noticed that (our implantation of)
	456	Mulders short product was slower than a full product for large sizes.
	457	This should be precisely analyzed and fixed if needed.
	458
	459	- for various functions, check the timings as a function of the magnitude
	460	of the input (and the input and/or output precisions?), and use better
	461	thresholds for asymptotic expansions.
	462
	463	- improve the special case of mpfr_{add,sub} (x, x, y, ...) when \|x\| > \|y\|
	464	to do the addition in-place and have a complexity of O(prec(y)) in most
	465	cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit
	466	from this change.
	467
	468	- in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p
	469	is really useful in all cases (see TODO in this file).
	470
	471	- optimize code that uses a test based on the fact that x >> s is
	472	undefined in C for s == width of x but the result is expected to
	473	be 0. ARM and PowerPC could benefit from such an optimization,
	474	but not x86. This needs support from the compiler.
	475	For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233
	476
	477	- deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ).
	478
	479	- instead of a fixed mparam.h, optionally use function multiversioning
	480	(FMV), currently only available with the GNU C++ front end:
	481	https://gcc.gnu.org/wiki/FunctionMultiVersioning
	482	According to https://lwn.net/Articles/691932/ the dispatch resolution
	483	is now done by the dynamic loader, so that this should be fast enough
	484	(the cost would be the reading of a static variable, initialized at
	485	load time, instead of a constant).
	486	In particular, binary package distributions would benefit from FMV as
	487	only one binary is generated for different processor families.
	488
	489
	490	##############################################################################
	491	6. Miscellaneous
	492	##############################################################################
	493
	494	- [suggested by Tobias Burnus <burnus(at)net-b.de> and
	495	Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007]
	496	support quiet and signaling NaNs in mpfr:
	497	* functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p,
	498	mpfr_set_qnan, mpfr_qnan_p
	499	* correctly convert to/from double (if encoding of s/qNaN is fixed in 754R)
	500	Note: Signaling NaNs are not specified by the ISO C standard and may
	501	not be supported by the implementation. GCC needs the -fsignaling-nans
	502	option (but this does not affect the C library, which may or may not
	503	accept signaling NaNs).
	504
	505	- check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in
	506	get_ld.c and the other constants, and provide a testcase for large and
	507	small numbers.
	508
	509	- from Kevin Ryde <user42@zip.com.au>:
	510	Also for pi.c, a pre-calculated compiled-in pi to a few thousand
	511	digits would be good value I think. After all, say 10000 bits using
	512	1250 bytes would still be small compared to the code size!
	513	Store pi in round to zero mode (to recover other modes).
	514
	515	- add other prototypes for round to nearest-away (mpfr_round_nearest_away
	516	only deals with the prototypes of say mpfr_sin) or implement it as a native
	517	rounding mode
	518	- add a new roundind mode: round to odd. If the result is not exactly
	519	representable, then round to the odd mantissa. This rounding
	520	has the nice property that for k > 1, if:
	521	y = round(x, p+k, TO_ODD)
	522	z = round(y, p, TO_NEAREST_EVEN), then
	523	z = round(x, p, TO_NEAREST_EVEN)
	524	so it avoids the double-rounding problem.
	525	VL: I prefer the (original?) term "sticky rounding", as used in
	526	J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked
	527	Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point
	528	Division Algorithm. IEEE Transactions on Computers, 1996.
	529	and
	530	http://www.russinoff.com/libman/text/node26.html
	531
	532	- new rounding mode MPFR_RNDE when the result is known to be exact?
	533	* In normal mode, this would allow MPFR to optimize using
	534	this information.
	535	* In debug mode, MPFR would check that the result is exact
	536	(i.e. that the ternary value is 0).
	537
	538	- add tests of the ternary value for constants
	539
	540	- When doing Extensive Check (--enable-assert=full), since all the
	541	functions use a similar use of MACROS (ZivLoop, ROUND_P), it should
	542	be possible to do such a scheme:
	543	For the first call to ROUND_P when we can round.
	544	Mark it as such and save the approximated rounding value in
	545	a temporary variable.
	546	Then after, if the mark is set, check if:
	547	- we still can round.
	548	- The rounded value is the same.
	549	It should be a complement to tgeneric tests.
	550
	551	- in div.c, try to find a case for which cy != 0 after the line
	552	cy = mpn_sub_1 (sp + k, sp + k, qsize, cy);
	553	(which should be added to the tests), e.g. by having {vp, k} = 0, or
	554	prove that this cannot happen.
	555
	556	- add a configure test for --enable-logging to ignore the option if
	557	it cannot be supported. Modify the "configure --help" description
	558	to say "on systems that support it".
	559
	560	- add generic bad cases for functions that don't have an inverse
	561	function that is implemented (use a single Newton iteration).
	562
	563	- add bad cases for the internal error bound (by using a dichotomy
	564	between a bad case for the correct rounding and some input value
	565	with fewer Ziv iterations?).
	566
	567	- add an option to use a 32-bit exponent type (int) on LP64 machines,
	568	mainly for developers, in order to be able to test the case where the
	569	extended exponent range is the same as the default exponent range, on
	570	such platforms.
	571	Tests can be done with the exp-int branch (added on 2010-12-17, and
	572	many tests fail at this time).
	573
	574	- test underflow/overflow detection of various functions (in particular
	575	mpfr_exp) in reduced exponent ranges, including ranges that do not
	576	contain 0.
	577
	578	- add an internal macro that does the equivalent of the following?
	579	MPFR_IS_ZERO(x) \|\| MPFR_GET_EXP(x) <= value
	580
	581	- check whether __gmpfr_emin and __gmpfr_emax could be replaced by
	582	a constant (see README.dev). Also check the use of MPFR_EMIN_MIN
	583	and MPFR_EMAX_MAX.
	584
	585	- add a test checking that no mpfr.h macros depend on mpfr-impl.h
	586	(the current tests cannot check that since mpfr-impl.h is always
	587	included).
	588
	589	- move some macro definitions from acinclude.m4 to the m4 directory
	590	as suggested by the Automake manual? The reason is that the
	591	acinclude.m4 file is big and a bit difficult to read.
	592
	593	- use symbol versioning.
	594
	595	- check whether mpz_t caching (pool) is necessary. Timings with -static
	596	with details about the C / C library implementation should be put
	597	somewhere as a comment in the source or in the doc. Using -static
	598	is important because otherwise the cache saves the dynamic call to
	599	mpz_init and mpz_clear; so, what we're measuring is not clear.
	600	See thread:
	601	https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html
	602	Summary: It will not be integrated in GMP because 1) This yields
	603	problems with threading (in MPFR, we have TLS variables, but this is
	604	not the case of GMP). 2) The gain (if confirmed with -static) would
	605	be due to a poor malloc implementation (timings would depend on the
	606	platform). 3) Applications would use more RAM.
	607	Additional notes [VL]: the major differences in the timings given
	608	by Patrick in 2014-01 under Linux were:
	609	Before:
	610	arccos(x) took 0.054689 ms (32767 eval in 1792 ms)
	611	arctan(x) took 0.042116 ms (32767 eval in 1380 ms)
	612	After:
	613	arccos(x) took 0.043580 ms (32767 eval in 1428 ms)
	614	arctan(x) took 0.035401 ms (32767 eval in 1160 ms)
	615	mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
	616	from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
	617	reallocations in GMP because mpz_init is used instead of mpz_init2 with
	618	the estimated maximum size. Other places in the code that uses mpz_init
	619	may be concerned.
	620	Issues with mpz_t caching:
	621	* The pool can take much memory, which may no longer be useful.
	622	For instance:
	623	mpfr_init2 (x, 10000000);
	624	mpfr_log_ui (x, 17, MPFR_RNDN);
	625	/* ... */
	626	mpfr_clear (x);
	627	/* followed by code using only small precision */
	628	while contrary to real caches, they contain no data. This is not
	629	valuable memory: freeing/allocating a large block of memory is
	630	much faster than the actual computations, so that mpz_t caching
	631	has no impact on the performance in such cases. A pool with large
	632	blocks also potentially destroys the data locality.
	633	* It assumes that the real GMP functions are __gmpz_init and
	634	__gmpz_clear, which are not part of the official GMP API, thus
	635	is based on GMP internals, which may change in the future or
	636	may be different in forks / compatible libraries / etc. This
	637	can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
	638	directly, avoiding the #define's.
	639	Questions that need to be answered:
	640	* What about the comparisons with other memory allocators?
	641	* Shouldn't the pool be part of the memory allocator?
	642	For the default memory allocator (malloc): RFE?
	643	If it is decided to keep some form of mpz_t caching, a possible solution
	644	for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
	645	take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
	646	similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
	647	mpz_clear but where the size argument is a hint for the pool; if it is
	648	too large, then the mpz_t should not be pushed back to the pool. The
	649	size argument of mpfr_mpz_init2 could also be a hint to decide which
	650	element to pull from the pool.
	651
	652	- in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
	653	that is, with a large error during the computation of the secondary term
	654	(when the TMD occurs).
	655
	656	- use the keyword "static" in array indices of parameter declarations with
	657	C99 compilers (6.7.5.3p7) when the pointer is expected not to be null?
	658	For instance, if mpfr.h is changed to have:
	659	__MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]);
	660	and one calls
	661	mpfr_dump (NULL);
	662	one gets a warning with Clang. This is just an example; this needs to be
	663	done in a clean way.
	664	See:
	665	http://stackoverflow.com/a/3430353/3782797
	666	https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html
	667
	668	- change most mpfr_urandomb occurrences to mpfr_urandom in the tests?
	669	(The one done in r10573 allowed us to find a bug even without
	670	assertion checking.)
	671
	672	- tzeta has been much slower since r9848 (which increases the precision
	673	of the input for the low output precisions), at least with the x86
	674	32-bit ABI. This seems to come from the fact that the working precision
	675	in the mpfr_zeta implementation depends on the precision of the input.
	676	Once mpfr_zeta has improved, change the last argument of test_generic
	677	in tzeta.c back to 5 (as it was before r10667).
	678
	679	- check the small-precision tables in the tests?
	680	This may require to export some pointer to the tables, but this could
	681	be done only if some debug macro is defined.
	682
	683	- optionally use malloc() for the caches? See mpfr_mp_memory_cleanup.
	684	Note: This can be implemented by adding a TLS flag saying whether we
	685	are under cache generation or not, and by making the MPFR allocation
	686	functions consider this flag. Moreover, this can only work for mpfr_t
	687	caching (floating-point constants), not for mpz_t caching (Bernoulli
	688	constants) because we do not have the control of memory allocation for
	689	mpz_init.
	690
	691	- use GCC's nonnull attribute (available since GCC 4.0) where applicable.
	692
	693	- avoid the use of MPFR_MANT(x) as an lvalue; use other (more high level)
	694	internal macros if possible, such as MPFR_TMP_INIT1, MPFR_TMP_INIT and
	695	MPFR_ALIAS.
	696
	697
	698	##############################################################################
	699	7. Portability
	700	##############################################################################
	701
	702	- add a web page with results of builds on different architectures
	703
	704	- [Kevin about texp.c long strings]
	705	For strings longer than c99 guarantees, it might be cleaner to
	706	introduce a "tests_strdupcat" or something to concatenate literal
	707	strings into newly allocated memory. I thought I'd done that in a
	708	couple of places already. Arrays of chars are not much fun.
	709
	710	- use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h
	711
	712	- By default, GNU Automake adds -I options to local directories, with
	713	the side effect that these directories have the precedence to search
	714	for system headers (#include <...>). This may make the build fail if
	715	a C implementation includes a file that has the same name as one used
	716	in such a directory.
	717	For instance, if one adds an empty file "src/bits/types.h", then the
	718	MPFR build fails under Linux because /usr/include/stdio.h has
	719	#include <bits/types.h>
	720	Possible workaround:
	721	* disable the default -I options with nostdinc as documented in
	722	the Automake manual;
	723	* have a rule that copies the needed files ("mpfr.h" or they should
	724	be prefixed with "mpfr-") to $(top_builddir)/include;
	725	* use "-I$(top_builddir)/include".

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format