Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log
Repository URL

TODO@ 1193

Last change on this file since 1193 was 1166, checked in by rossy, 4 years ago
Daodan: Replace MinGW build env with an up-to-date MSYS2 env
File size: 35.0 KB

Line
1	Copyright 1999-2020 Free Software Foundation, Inc.
2	Contributed by the AriC and Caramba projects, INRIA.
3
4	This file is part of the GNU MPFR Library.
5
6	The GNU MPFR Library is free software; you can redistribute it and/or modify
7	it under the terms of the GNU Lesser General Public License as published by
8	the Free Software Foundation; either version 3 of the License, or (at your
9	option) any later version.
10
11	The GNU MPFR Library is distributed in the hope that it will be useful, but
12	WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
14	License for more details.
15
16	You should have received a copy of the GNU Lesser General Public License
17	along with the GNU MPFR Library; see the file COPYING.LESSER. If not, see
18	https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc.,
19	51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
20
21	Table of contents:
22	1. Documentation
23	2. Compiler/library detection
24	3. Changes in existing functions
25	4. New functions to implement
26	5. Efficiency
27	6. Miscellaneous
28	7. Portability
29
30	##############################################################################
31	1. Documentation
32	##############################################################################
33
34	- add a description of the algorithms used and a proof of correctness
35
36	##############################################################################
37	2. Compiler/library detection
38	##############################################################################
39
40	- update ICC detection.
41	* Use only __INTEL_COMPILER instead of the obsolete macro __ICC?
42
43	##############################################################################
44	3. Changes in existing functions
45	##############################################################################
46
47	- export mpfr_overflow and mpfr_underflow as public functions
48
49	- many functions currently taking into account the precision of the input
50	variable to set the initial working precision (acosh, asinh, cosh, ...).
51	This is nonsense since the "average" working precision should only depend
52	on the precision of the output variable (and maybe on the value of
53	the input in case of cancellation).
54	-> remove those dependencies from the input precision.
55
56	- mpfr_can_round:
57	change the meaning of the 2nd argument (err). Currently the error is
58	at most 2^(MPFR_EXP(b)-err), i.e. err is the relative shift wrt the
59	most significant bit of the approximation. I propose that the error
60	is now at most 2^err ulps of the approximation, i.e.
61	2^(MPFR_EXP(b)-MPFR_PREC(b)+err).
62
63	- mpfr_set_q first tries to convert the numerator and the denominator
64	to mpfr_t. But this conversion may fail even if the correctly rounded
65	result is representable. New way to implement:
66	Function q = a/b. nq = PREC(q) na = PREC(a) nb = PREC(b)
67	If na < nb
68	a <- a*2^(nb-na)
69	n <- na-nb+ (HIGH(a,nb) >= b)
70	if (n >= nq)
71	bb <- b*2^(n-nq)
72	a = q*bb+r --> q has exactly n bits.
73	else
74	aa <- a*2^(nq-n)
75	aa = q*b+r --> q has exactly n bits.
76	If RNDN, takes nq+1 bits. (See also the new division function).
77
78	- revisit the conversion functions between a MPFR number and a native
79	floating-point value.
80	* Consequences if some exception is trapped?
81	* Specify under which conditions (current rounding direction and
82	precision of the FPU, whether a format has been recognized...),
83	correct rounding is guaranteed. Fix the code if need be. Do not
84	forget subnormals.
85	* Provide mpfr_buildopt_* functions to tell whether the format of a
86	native type (float / double / long double) has been recognized and
87	which format it is?
88	* For functions that return a native floating-point value (mpfr_get_flt,
89	mpfr_get_d, mpfr_get_ld, mpfr_get_decimal64), in case of underflow or
90	overflow, follow the convention used for the functions in <math.h>?
91	See §7.12.1 "Treatment of error conditions" of ISO C11, which provides
92	two ways of handling error conditions, depending on math_errhandling:
93	errno (to be set to ERANGE here) and floating-point exceptions.
94	If floating-point exceptions need to be generated, do not use
95	feraiseexcept(), as this function may require the math library (-lm);
96	use a floating-point expression instead, such as DBL_MIN * DBL_MIN
97	(underflow) or DBL_MAX * DBL_MAX (overflow), which are probably safe
98	as used in the GNU libc implementation.
99	* For testing the lack of subnormal support:
100	see the -mfpu GCC option for ARM and
101	https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level
102
103
104	##############################################################################
105	4. New functions to implement
106	##############################################################################
107
108	- a function to compute the hash of a floating-point number
109	(suggested by Patrick Pelissier)
110	- implement new functions from the C++17 standard:
111	http://en.cppreference.com/w/cpp/numeric/special_math
112	assoc_laguerre, assoc_legendre, comp_ellint_1, comp_ellint_2, comp_ellint_3,
113	cyl_bessel_i, cyl_bessel_j, cyl_bessel_k, cyl_neumann, ellint_1, ellint_2,
114	ellint_3, hermite, legendre, laguerre, sph_bessel, sph_legendre,
115	sph_neumann.
116	Already in mpfr4: beta and riemann_zeta.
117	See also https://isocpp.org/files/papers/P0226R1.pdf and §29.9.5 in the
118	C++17 draft:
119	https://github.com/cplusplus/draft/blob/master/source/numerics.tex
120	- implement mpfr_q_sub, mpfr_z_div, mpfr_q_div?
121	- implement mpfr_pow_q and variants with two integers (native or mpz)
122	instead of a rational? See IEEE P1788.
123	- implement functions for random distributions, see for example
124	https://sympa.inria.fr/sympa/arc/mpfr/2010-01/msg00034.html
125	(suggested by Charles Karney <ckarney@Sarnoff.com>, 18 Jan 2010):
126	* a Bernoulli distribution with prob p/q (exact)
127	* a general discrete distribution (i with prob w[i]/sum(w[i]) (Walker
128	algorithm, but make it exact)
129	* a uniform distribution in (a,b)
130	* exponential distribution (mean lambda) (von Neumann's method?)
131	* normal distribution (mean m, s.d. sigma) (ratio method?)
132	- wanted for Magma [John Cannon <john@maths.usyd.edu.au>, Tue, 19 Apr 2005]:
133	HypergeometricU(a,b,s) = 1/gamma(a)int(exp(-su)u^(a-1)*(1+u)^(b-a-1),
134	u=0..infinity)
135	JacobiThetaNullK
136	PolylogP, PolylogD, PolylogDold: see http://arxiv.org/abs/math.CA/0702243
137	and the references herein.
138	JBessel(n, x) = BesselJ(n+1/2, x)
139	KBessel, KBessel2 [2nd kind]
140	JacobiTheta
141	(see http://www.ams.org/journals/mcom/0000-000-00/S0025-5718-2017-03245-2/home.html)
142	LogIntegral
143	ExponentialIntegralEn (formula 5.1.4 of Abramowitz and Stegun)
144	DawsonIntegral
145	GammaD(x) = Gamma(x+1/2)
146	- new functions of IEEE 754-2008, and more generally functions of the
147	C binding draft TS 18661-4:
148	http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1946.pdf
149	Some propositions about rootn: mpfr_rootn_si, mpfr_rootn_sj, mpfr_rootn_z,
150	and versions with an unsigned integer: mpfr_rootn_ui (now implemented, as
151	similar to mpfr_root) and mpfr_rootn_uj.
152	- functions defined in the LIA-2 standard
153	+ minimum and maximum (5.2.2): max, min, max_seq, min_seq, mmax_seq
154	and mmin_seq (mpfr_min and mpfr_max correspond to mmin and mmax);
155	+ rounding_rest, floor_rest, ceiling_rest (5.2.4);
156	+ remr (5.2.5): x - round(x/y) y;
157	+ error functions from 5.2.7 (if useful in MPFR);
158	+ power1pm1 (5.3.6.7): (1 + x)^y - 1;
159	+ logbase (5.3.6.12): \log_x(y);
160	+ logbase1p1p (5.3.6.13): \log_{1+x}(1+y);
161	+ rad (5.3.9.1): x - round(x / (2 pi)) 2 pi = remr(x, 2 pi);
162	+ axis_rad (5.3.9.1) if useful in MPFR;
163	+ cycle (5.3.10.1): rad(2 pi x / u) u / (2 pi) = remr(x, u);
164	+ axis_cycle (5.3.10.1) if useful in MPFR;
165	+ sinu, cosu, tanu, cotu, secu, cscu, cossinu, arcsinu, arccosu,
166	arctanu, arccotu, arcsecu, arccscu (5.3.10.{2..14}):
167	sin(x 2 pi / u), etc.;
168	[from which sinpi(x) = sin(Pi*x), ... are trivial to implement, with u=2.]
169	+ arcu (5.3.10.15): arctan2(y,x) u / (2 pi);
170	+ rad_to_cycle, cycle_to_rad, cycle_to_cycle (5.3.11.{1..3}).
171	- From GSL, missing special functions (if useful in MPFR):
172	(cf https://www.gnu.org/software/gsl/manual/gsl-ref.html#Special-Functions)
173	+ The Airy functions Ai(x) and Bi(x) defined by the integral representations:
174	* Ai(x) = (1/\pi) \int_0^\infty \cos((1/3) t^3 + xt) dt
175	* Bi(x) = (1/\pi) \int_0^\infty (e^(-(1/3) t^3) + \sin((1/3) t^3 + xt)) dt
176	* Derivatives of Airy Functions
177	+ The Bessel functions for n integer and n fractional:
178	* Regular Modified Cylindrical Bessel Functions I_n
179	* Irregular Modified Cylindrical Bessel Functions K_n
180	* Regular Spherical Bessel Functions j_n: j_0(x) = \sin(x)/x,
181	j_1(x)= (\sin(x)/x-\cos(x))/x & j_2(x)= ((3/x^2-1)\sin(x)-3\cos(x)/x)/x
182	Note: the "spherical" Bessel functions are solutions of
183	x^2 y'' + 2 x y' + [x^2 - n (n+1)] y = 0 and satisfy
184	j_n(x) = sqrt(Pi/(2x)) J_{n+1/2}(x). They should not be mixed with the
185	classical Bessel Functions, also noted j0, j1, jn, y0, y1, yn in C99
186	and mpfr.
187	Cf https://en.wikipedia.org/wiki/Bessel_function#Spherical_Bessel_functions
188	*Irregular Spherical Bessel Functions y_n: y_0(x) = -\cos(x)/x,
189	y_1(x)= -(\cos(x)/x+\sin(x))/x &
190	y_2(x)= (-3/x^3+1/x)\cos(x)-(3/x^2)\sin(x)
191	* Regular Modified Spherical Bessel Functions i_n:
192	i_l(x) = \sqrt{\pi/(2x)} I_{l+1/2}(x)
193	* Irregular Modified Spherical Bessel Functions:
194	k_l(x) = \sqrt{\pi/(2x)} K_{l+1/2}(x).
195	+ Clausen Function:
196	Cl_2(x) = - \int_0^x dt \log(2 \sin(t/2))
197	Cl_2(\theta) = \Im Li_2(\exp(i \theta)) (dilogarithm).
198	+ Dawson Function: \exp(-x^2) \int_0^x dt \exp(t^2).
199	+ Debye Functions: D_n(x) = n/x^n \int_0^x dt (t^n/(e^t - 1))
200	+ Elliptic Integrals:
201	* Definition of Legendre Forms:
202	F(\phi,k) = \int_0^\phi dt 1/\sqrt((1 - k^2 \sin^2(t)))
203	E(\phi,k) = \int_0^\phi dt \sqrt((1 - k^2 \sin^2(t)))
204	P(\phi,k,n) = \int_0^\phi dt 1/((1 + n \sin^2(t))\sqrt(1 - k^2 \sin^2(t)))
205	* Complete Legendre forms are denoted by
206	K(k) = F(\pi/2, k)
207	E(k) = E(\pi/2, k)
208	* Definition of Carlson Forms
209	RC(x,y) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1)
210	RD(x,y,z) = 3/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-3/2)
211	RF(x,y,z) = 1/2 \int_0^\infty dt (t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2)
212	RJ(x,y,z,p) = 3/2 \int_0^\infty dt
213	(t+x)^(-1/2) (t+y)^(-1/2) (t+z)^(-1/2) (t+p)^(-1)
214	+ Elliptic Functions (Jacobi)
215	+ N-relative exponential:
216	exprel_N(x) = N!/x^N (\exp(x) - \sum_{k=0}^{N-1} x^k/k!)
217	+ exponential integral:
218	E_2(x) := \Re \int_1^\infty dt \exp(-xt)/t^2.
219	Ei_3(x) = \int_0^x dt \exp(-t^3) for x >= 0.
220	Ei(x) := - PV(\int_{-x}^\infty dt \exp(-t)/t)
221	+ Hyperbolic/Trigonometric Integrals
222	Shi(x) = \int_0^x dt \sinh(t)/t
223	Chi(x) := Re[ \gamma_E + \log(x) + \int_0^x dt (\cosh[t]-1)/t]
224	Si(x) = \int_0^x dt \sin(t)/t
225	Ci(x) = -\int_x^\infty dt \cos(t)/t for x > 0
226	AtanInt(x) = \int_0^x dt \arctan(t)/t
227	[ \gamma_E is the Euler constant ]
228	+ Fermi-Dirac Function:
229	F_j(x) := (1/r\Gamma(j+1)) \int_0^\infty dt (t^j / (\exp(t-x) + 1))
230	+ Pochhammer symbol (a)_x := \Gamma(a + x)/\Gamma(a) : see [Smith01] in
231	algorithms.bib
232	logarithm of the Pochhammer symbol
233	+ Gegenbauer Functions
234	+ Laguerre Functions
235	+ Eta Function: \eta(s) = (1-2^{1-s}) \zeta(s)
236	Hurwitz zeta function: \zeta(s,q) = \sum_0^\infty (k+q)^{-s}.
237	+ Lambert W Functions, W(x) are defined to be solutions of the equation:
238	W(x) \exp(W(x)) = x.
239	This function has multiple branches for x < 0 (2 funcs W0(x) and Wm1(x))
240	From Fredrik Johansson:
241	See https://cs.uwaterloo.ca/research/tr/1993/03/W.pdf, in particular
242	formulas 5.2 and 5.3 for the error bound: one first computes an
243	approximation w, and then evaluates the residual w e^w - x. There is an
244	expression for the error in terms of the residual and the derivative W'(t),
245	where the derivative can be bounded by piecewise simple functions,
246	something like min(1, 1/t) when t >= 0.
247	See https://arxiv.org/abs/1705.03266 for rigorous error bounds.
248	+ Trigamma Function psi'(x).
249	and Polygamma Function: psi^{(m)}(x) for m >= 0, x > 0.
250	- functions from ISO/IEC 24747:2009 (Extensions to the C Library,
251	to Support Mathematical Special Functions).
252	Standard: http://www.iso.org/iso/catalogue_detail.htm?csnumber=38857
253	Draft: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1292.pdf
254	Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1244.pdf
255	See also: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf
256	(similar, for C++).
257	Also check whether the functions that are already implemented in MPFR
258	match this standard.
259
260	- from gnumeric (www.gnome.org/projects/gnumeric/doc/function-reference.html):
261	- incomplete beta function, see message from Martin Maechler
262	<maechler@stat.math.ethz.ch> on 18 Jan 2016, and Section 6.6 in
263	Abramowitz & Stegun
264	- betaln
265	- degrees
266	- radians
267	- sqrtpi
268
269	- mpfr_inp_raw, mpfr_out_raw (cf mail "Serialization of mpfr_t" from Alexey
270	and answer from Granlund on mpfr list, May 2007)
271	- [maybe useful for SAGE] implement companion frac_* functions to the rint_*
272	functions. For example mpfr_frac_floor(x) = x - floor(x). (The current
273	mpfr_frac function corresponds to mpfr_rint_trunc.)
274	- scaled erfc (https://sympa.inria.fr/sympa/arc/mpfr/2009-05/msg00054.html)
275	- asec, acsc, acot, asech, acsch and acoth (mail from Björn Terelius on mpfr
276	list, 18 June 2009)
277
278	- function to reduce the precision of a variable, with a ternary value in
279	input, i.e. taking care of double rounding. Two possible forms: like
280	mpfr_set (i.e. with input and output) or like mpfr_prec_round (i.e. with
281	a single variable). mpfr_subnormalize and mpfr_round_nearest_away_end
282	could use it.
283
284	- UBF functions for +, -, *, fmma, /, sqrt.
285	Support UBF in mpfr_check_range or add mpfr_ubf_check_range?
286	Make this available in the API, e.g. for MPC.
287
288	- mpfr_cmp_uj and mpfr_cmp_sj. They would be useful to test MPFR with
289	_MPFR_EXP_FORMAT=4.
290
291	- base conversion with the round-trip property using a minimal precision,
292	such as the to_chars functions from the C++ standard:
293
294	The functions [...] ensure that the string representation consists
295	of the smallest number of characters such that there is at least
296	one digit before the radix point (if present) and parsing the
297	representation using the corresponding from_chars function
298	recovers value exactly. [Note: This guarantee applies only if
299	to_chars and from_chars are executed on the same implementation.
300	— end note] If there are several such representations, the
301	representation with the smallest difference from the
302	floating-point argument value is chosen, resolving any remaining
303	ties using rounding according to round_to_nearest.
304
305	Text from: https://www.zsh.org/mla/workers/2019/msg01138.html
306
307	- Serialization / Deserialization. Suggested by Frédéric Pétrot:
308	https://sympa.inria.fr/sympa/arc/mpfr/2020-02/msg00006.html
309	like mpfr_fpif_{import,export}, but with memory instead of file.
310
311	Idea of implementation to reuse most of the code and change very little:
312
313	Instead of passing a FILE fh, pass a struct ext_data h, and instead of
314	using fread and fwrite, use
315	h->read (h, buffer, size)
316	h->write (h, buffer, size)
317	respectively.
318
319	The struct ext_data structure could contain the following fields:
320	* read: pointer to a wrapper function for the read method.
321	* write: pointer to a wrapper function for the write method.
322	* FILE *fh: to be used for operations with files.
323	* unsigned char *arena: to be used for operations with memory.
324
325	The wrapper functions for the read method could be:
326
327	static int
328	read_from_file (struct ext_data h, unsigned char buffer, size_t size)
329	{
330	return fread (buffer, size, 1, h->fh) != 1;
331	}
332
333	static int
334	read_from_memory (struct ext_data h, unsigned char buffer, size_t size)
335	{
336	if (h->arena == NULL)
337	return 1;
338	memcpy (buffer, h->arena, size);
339	h->arena += size;
340	return 0;
341	}
342
343	So I expect very few changes in the existing code:
344	* Write a few wrapper functions.
345	* Rename mpfr_fpif_export to mpfr_fpif_export_aux and
346	mpfr_fpif_import to mpfr_fpif_import_aux.
347	* In the existing functions, replace FILE *fh, and fread/fwrite
348	calls as mentioned above.
349	* Add new mpfr_fpif_export, mpfr_fpif_import, mpfr_fpif_export_mem,
350	mpfr_fpif_import_mem.
351
352	##############################################################################
353	5. Efficiency
354	##############################################################################
355
356	- Fredrik Johansson reports that mpfr_ai is slow for large arguments: an
357	asymptotic expansion should be used (once done, remove REDUCE_EMAX from
358	tests/tai.c and update the description in mpfr.texi).
359	- for exp(x), Fredrik Johansson reports a 20% speed improvement starting from
360	4000 bits, and up to a 75% memory improvement in his Arb implementation, by
361	using recursive instead of iterative binary splitting:
362	https://github.com/fredrik-johansson/arb/blob/master/elefun/exp_sum_bs_powtab.c
363	- improve mpfr_grandom using the algorithm in http://arxiv.org/abs/1303.6257
364	- implement a mpfr_sqrthigh algorithm based on Mulders' algorithm, with a
365	basecase variant
366	- use mpn_div_q to speed up mpfr_div. However mpn_div_q, which is new in
367	GMP 5, is not documented in the GMP manual, thus we are not sure it
368	guarantees to return the same quotient as mpn_tdiv_qr.
369	Also mpfr_div uses the remainder computed by mpn_divrem. A workaround would
370	be to first try with mpn_div_q, and if we cannot (easily) compute the
371	rounding, then use the current code with mpn_divrem.
372	- improve atanh(x) for small x by using atanh(x) = log1p(2x/(1-x)),
373	and log1p should also be improved for small arguments.
374	- compute exp by using the series for cosh or sinh, which has half the terms
375	(see Exercise 4.11 from Modern Computer Arithmetic, version 0.3)
376	The same method can be used for log, using the series for atanh, i.e.,
377	atanh(x) = 1/2*log((1+x)/(1-x)).
378	- improve mpfr_gamma (see https://code.google.com/p/fastfunlib/). A possible
379	idea is to implement a fast algorithm for the argument reconstruction
380	gamma(x+k): instead of performing k products by x+i, we could precompute
381	x^2, ..., x^m for m ~ sqrt(k), and perform only sqrt(k) products.
382	One could also use the series for 1/gamma(x), see for example
383	http://dlmf.nist.gov/5/7/ or formula (36) from
384	http://mathworld.wolfram.com/GammaFunction.html
385	- improve the computation of Bernoulli numbers: instead of computing just one
386	B[2n] at a time in mpfr_bernoulli_internal, we could compute several at a
387	time, sharing the expensive computation of the 1/p^(2n) series.
388	- fix regression with mpfr_mpz_root (from Keith Briggs, 5 July 2006), for
389	example on 3Ghz P4 with gmp-4.2, x=12.345:
390	prec=50000 k=2 k=3 k=10 k=100
391	mpz_root 0.036 0.072 0.476 7.628
392	mpfr_mpz_root 0.004 0.004 0.036 12.20
393	See also mail from Carl Witty on mpfr list, 09 Oct 2007.
394	- for sparse input (say x=1 with 2 bits), mpfr_exp is not faster than for
395	full precision when precision <= MPFR_EXP_THRESHOLD. The reason is
396	that argument reduction kills sparsity. Maybe avoid argument reduction
397	for sparse input?
398	- speed up mpfr_atan for large arguments (to speed up mpc_log) see FR #6198
399	- improve mpfr_sin on values like ~pi (do not compute sin from cos, because
400	of the cancellation). For instance, reduce the input modulo pi/2 in
401	[-pi/4,pi/4], and define auxiliary functions for which the argument is
402	assumed to be already reduced (so that the sin function can avoid
403	unnecessary computations by calling the auxiliary cos function instead of
404	the full cos function). This will require a native code for sin, for
405	example using the reduction sin(3x)=3sin(x)-4sin(x)^3.
406	See https://sympa.inria.fr/sympa/arc/mpfr/2007-08/msg00001.html and
407	the following messages.
408	- improve generic.c to work for number of terms <> 2^k
409	- rewrite mpfr_greater_p... as native code.
410
411	- mpf_t uses a scheme where the number of limbs actually present can
412	be less than the selected precision, thereby allowing low precision
413	values (for instance small integers) to be stored and manipulated in
414	an mpf_t efficiently.
415
416	Perhaps mpfr should get something similar, especially if looking to
417	replace mpf with mpfr, though it'd be a major change. Alternately
418	perhaps those mpfr routines like mpfr_mul where optimizations are
419	possible through stripping low zero bits or limbs could check for
420	that (this would be less efficient but easier).
421
422	- try the idea of the paper "Reduced Cancellation in the Evaluation of Entire
423	Functions and Applications to the Error Function" by W. Gawronski, J. Mueller
424	and M. Reinhard, to be published in SIAM Journal on Numerical Analysis: to
425	avoid cancellation in say erfc(x) for x large, they compute the Taylor
426	expansion of erfc(x)*exp(x^2/2) instead (which has less cancellation),
427	and then divide by exp(x^2/2) (which is simpler to compute).
428
429	- replace the *_THRESHOLD macros by global (TLS) variables that can be
430	changed at run time (via a function, like other variables)? One benefit
431	is that users could use a single MPFR binary on several machines (e.g.,
432	a library provided by binary packages or shared via NFS) with different
433	thresholds. On the default values, this would be a bit less efficient
434	than the current code, but this isn't probably noticeable (this should
435	be tested). Something like:
436	long *mpfr_tune_get(void) to get the current values (the first value
437	is the size of the array).
438	int mpfr_tune_set(long *array) to set the tune values.
439	int mpfr_tune_run(long level) to find the best values (the support
440	for this feature is optional, this can also be done with an
441	external function).
442
443	- better distinguish different processors (for example Opteron and Core 2)
444	and use corresponding default tuning parameters (as in GMP). This could be
445	done in configure.ac to avoid hacking config.guess, for example define
446	MPFR_HAVE_CORE2.
447	Note (VL): the effect on cross-compilation (that can be a processor
448	with the same architecture, e.g. compilation on a Core 2 for an
449	Opteron) is not clear. The choice should be consistent with the
450	build target (e.g. -march or -mtune value with gcc).
451	Also choose better default values. For instance, the default value of
452	MPFR_MUL_THRESHOLD is 40, while the best values that have been found
453	are between 11 and 19 for 32 bits and between 4 and 10 for 64 bits!
454
455	- during the Many Digits competition, we noticed that (our implantation of)
456	Mulders short product was slower than a full product for large sizes.
457	This should be precisely analyzed and fixed if needed.
458
459	- for various functions, check the timings as a function of the magnitude
460	of the input (and the input and/or output precisions?), and use better
461	thresholds for asymptotic expansions.
462
463	- improve the special case of mpfr_{add,sub} (x, x, y, ...) when \|x\| > \|y\|
464	to do the addition in-place and have a complexity of O(prec(y)) in most
465	cases. The mpfr_{add,sub}_{d,ui} functions should automatically benefit
466	from this change.
467
468	- in gmp_op.c, for functions with mpz_srcptr, check whether mpz_fits_slong_p
469	is really useful in all cases (see TODO in this file).
470
471	- optimize code that uses a test based on the fact that x >> s is
472	undefined in C for s == width of x but the result is expected to
473	be 0. ARM and PowerPC could benefit from such an optimization,
474	but not x86. This needs support from the compiler.
475	For PowerPC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79233
476
477	- deal with MPFR_RNDF in mpfr_round_near_x (replaced by MPFR_RNDZ).
478
479	- instead of a fixed mparam.h, optionally use function multiversioning
480	(FMV), currently only available with the GNU C++ front end:
481	https://gcc.gnu.org/wiki/FunctionMultiVersioning
482	According to https://lwn.net/Articles/691932/ the dispatch resolution
483	is now done by the dynamic loader, so that this should be fast enough
484	(the cost would be the reading of a static variable, initialized at
485	load time, instead of a constant).
486	In particular, binary package distributions would benefit from FMV as
487	only one binary is generated for different processor families.
488
489
490	##############################################################################
491	6. Miscellaneous
492	##############################################################################
493
494	- [suggested by Tobias Burnus <burnus(at)net-b.de> and
495	Asher Langton <langton(at)gcc.gnu.org>, Wed, 01 Aug 2007]
496	support quiet and signaling NaNs in mpfr:
497	* functions to set/test a quiet/signaling NaN: mpfr_set_snan, mpfr_snan_p,
498	mpfr_set_qnan, mpfr_qnan_p
499	* correctly convert to/from double (if encoding of s/qNaN is fixed in 754R)
500	Note: Signaling NaNs are not specified by the ISO C standard and may
501	not be supported by the implementation. GCC needs the -fsignaling-nans
502	option (but this does not affect the C library, which may or may not
503	accept signaling NaNs).
504
505	- check the constants mpfr_set_emin (-16382-63) and mpfr_set_emax (16383) in
506	get_ld.c and the other constants, and provide a testcase for large and
507	small numbers.
508
509	- from Kevin Ryde <user42@zip.com.au>:
510	Also for pi.c, a pre-calculated compiled-in pi to a few thousand
511	digits would be good value I think. After all, say 10000 bits using
512	1250 bytes would still be small compared to the code size!
513	Store pi in round to zero mode (to recover other modes).
514
515	- add other prototypes for round to nearest-away (mpfr_round_nearest_away
516	only deals with the prototypes of say mpfr_sin) or implement it as a native
517	rounding mode
518	- add a new roundind mode: round to odd. If the result is not exactly
519	representable, then round to the odd mantissa. This rounding
520	has the nice property that for k > 1, if:
521	y = round(x, p+k, TO_ODD)
522	z = round(y, p, TO_NEAREST_EVEN), then
523	z = round(x, p, TO_NEAREST_EVEN)
524	so it avoids the double-rounding problem.
525	VL: I prefer the (original?) term "sticky rounding", as used in
526	J Strother Moore, Tom Lynch, Matt Kaufmann. A Mechanically Checked
527	Proof of the Correctness of the Kernel of the AMD5K86 Floating-Point
528	Division Algorithm. IEEE Transactions on Computers, 1996.
529	and
530	http://www.russinoff.com/libman/text/node26.html
531
532	- new rounding mode MPFR_RNDE when the result is known to be exact?
533	* In normal mode, this would allow MPFR to optimize using
534	this information.
535	* In debug mode, MPFR would check that the result is exact
536	(i.e. that the ternary value is 0).
537
538	- add tests of the ternary value for constants
539
540	- When doing Extensive Check (--enable-assert=full), since all the
541	functions use a similar use of MACROS (ZivLoop, ROUND_P), it should
542	be possible to do such a scheme:
543	For the first call to ROUND_P when we can round.
544	Mark it as such and save the approximated rounding value in
545	a temporary variable.
546	Then after, if the mark is set, check if:
547	- we still can round.
548	- The rounded value is the same.
549	It should be a complement to tgeneric tests.
550
551	- in div.c, try to find a case for which cy != 0 after the line
552	cy = mpn_sub_1 (sp + k, sp + k, qsize, cy);
553	(which should be added to the tests), e.g. by having {vp, k} = 0, or
554	prove that this cannot happen.
555
556	- add a configure test for --enable-logging to ignore the option if
557	it cannot be supported. Modify the "configure --help" description
558	to say "on systems that support it".
559
560	- add generic bad cases for functions that don't have an inverse
561	function that is implemented (use a single Newton iteration).
562
563	- add bad cases for the internal error bound (by using a dichotomy
564	between a bad case for the correct rounding and some input value
565	with fewer Ziv iterations?).
566
567	- add an option to use a 32-bit exponent type (int) on LP64 machines,
568	mainly for developers, in order to be able to test the case where the
569	extended exponent range is the same as the default exponent range, on
570	such platforms.
571	Tests can be done with the exp-int branch (added on 2010-12-17, and
572	many tests fail at this time).
573
574	- test underflow/overflow detection of various functions (in particular
575	mpfr_exp) in reduced exponent ranges, including ranges that do not
576	contain 0.
577
578	- add an internal macro that does the equivalent of the following?
579	MPFR_IS_ZERO(x) \|\| MPFR_GET_EXP(x) <= value
580
581	- check whether __gmpfr_emin and __gmpfr_emax could be replaced by
582	a constant (see README.dev). Also check the use of MPFR_EMIN_MIN
583	and MPFR_EMAX_MAX.
584
585	- add a test checking that no mpfr.h macros depend on mpfr-impl.h
586	(the current tests cannot check that since mpfr-impl.h is always
587	included).
588
589	- move some macro definitions from acinclude.m4 to the m4 directory
590	as suggested by the Automake manual? The reason is that the
591	acinclude.m4 file is big and a bit difficult to read.
592
593	- use symbol versioning.
594
595	- check whether mpz_t caching (pool) is necessary. Timings with -static
596	with details about the C / C library implementation should be put
597	somewhere as a comment in the source or in the doc. Using -static
598	is important because otherwise the cache saves the dynamic call to
599	mpz_init and mpz_clear; so, what we're measuring is not clear.
600	See thread:
601	https://gmplib.org/list-archives/gmp-devel/2015-September/004147.html
602	Summary: It will not be integrated in GMP because 1) This yields
603	problems with threading (in MPFR, we have TLS variables, but this is
604	not the case of GMP). 2) The gain (if confirmed with -static) would
605	be due to a poor malloc implementation (timings would depend on the
606	platform). 3) Applications would use more RAM.
607	Additional notes [VL]: the major differences in the timings given
608	by Patrick in 2014-01 under Linux were:
609	Before:
610	arccos(x) took 0.054689 ms (32767 eval in 1792 ms)
611	arctan(x) took 0.042116 ms (32767 eval in 1380 ms)
612	After:
613	arccos(x) took 0.043580 ms (32767 eval in 1428 ms)
614	arctan(x) took 0.035401 ms (32767 eval in 1160 ms)
615	mpfr_acos doesn't use mpz, but calls mpfr_atan, so that the issue comes
616	from mpfr_atan, which uses mpz a lot. The problem mainly comes from the
617	reallocations in GMP because mpz_init is used instead of mpz_init2 with
618	the estimated maximum size. Other places in the code that uses mpz_init
619	may be concerned.
620	Issues with mpz_t caching:
621	* The pool can take much memory, which may no longer be useful.
622	For instance:
623	mpfr_init2 (x, 10000000);
624	mpfr_log_ui (x, 17, MPFR_RNDN);
625	/* ... */
626	mpfr_clear (x);
627	/* followed by code using only small precision */
628	while contrary to real caches, they contain no data. This is not
629	valuable memory: freeing/allocating a large block of memory is
630	much faster than the actual computations, so that mpz_t caching
631	has no impact on the performance in such cases. A pool with large
632	blocks also potentially destroys the data locality.
633	* It assumes that the real GMP functions are __gmpz_init and
634	__gmpz_clear, which are not part of the official GMP API, thus
635	is based on GMP internals, which may change in the future or
636	may be different in forks / compatible libraries / etc. This
637	can be solved if MPFR code calls mpfr_mpz_init / mpfr_mpz_clear
638	directly, avoiding the #define's.
639	Questions that need to be answered:
640	* What about the comparisons with other memory allocators?
641	* Shouldn't the pool be part of the memory allocator?
642	For the default memory allocator (malloc): RFE?
643	If it is decided to keep some form of mpz_t caching, a possible solution
644	for both issues: define mpfr_mpz_init2 and mpfr_mpz_clear2, which both
645	take 2 arguments like mpz_init2, where mpfr_mpz_init2 behaves in a way
646	similar to mpz_init2, and mpfr_mpz_clear2 behaves in a way similar to
647	mpz_clear but where the size argument is a hint for the pool; if it is
648	too large, then the mpz_t should not be pushed back to the pool. The
649	size argument of mpfr_mpz_init2 could also be a hint to decide which
650	element to pull from the pool.
651
652	- in tsum, add testcases for mpfr_sum triggering the bug fixed in r9722,
653	that is, with a large error during the computation of the secondary term
654	(when the TMD occurs).
655
656	- use the keyword "static" in array indices of parameter declarations with
657	C99 compilers (6.7.5.3p7) when the pointer is expected not to be null?
658	For instance, if mpfr.h is changed to have:
659	__MPFR_DECLSPEC void mpfr_dump (const __mpfr_struct [static 1]);
660	and one calls
661	mpfr_dump (NULL);
662	one gets a warning with Clang. This is just an example; this needs to be
663	done in a clean way.
664	See:
665	http://stackoverflow.com/a/3430353/3782797
666	https://hamberg.no/erlend/posts/2013-02-18-static-array-indices.html
667
668	- change most mpfr_urandomb occurrences to mpfr_urandom in the tests?
669	(The one done in r10573 allowed us to find a bug even without
670	assertion checking.)
671
672	- tzeta has been much slower since r9848 (which increases the precision
673	of the input for the low output precisions), at least with the x86
674	32-bit ABI. This seems to come from the fact that the working precision
675	in the mpfr_zeta implementation depends on the precision of the input.
676	Once mpfr_zeta has improved, change the last argument of test_generic
677	in tzeta.c back to 5 (as it was before r10667).
678
679	- check the small-precision tables in the tests?
680	This may require to export some pointer to the tables, but this could
681	be done only if some debug macro is defined.
682
683	- optionally use malloc() for the caches? See mpfr_mp_memory_cleanup.
684	Note: This can be implemented by adding a TLS flag saying whether we
685	are under cache generation or not, and by making the MPFR allocation
686	functions consider this flag. Moreover, this can only work for mpfr_t
687	caching (floating-point constants), not for mpz_t caching (Bernoulli
688	constants) because we do not have the control of memory allocation for
689	mpz_init.
690
691	- use GCC's nonnull attribute (available since GCC 4.0) where applicable.
692
693	- avoid the use of MPFR_MANT(x) as an lvalue; use other (more high level)
694	internal macros if possible, such as MPFR_TMP_INIT1, MPFR_TMP_INIT and
695	MPFR_ALIAS.
696
697
698	##############################################################################
699	7. Portability
700	##############################################################################
701
702	- add a web page with results of builds on different architectures
703
704	- [Kevin about texp.c long strings]
705	For strings longer than c99 guarantees, it might be cleaner to
706	introduce a "tests_strdupcat" or something to concatenate literal
707	strings into newly allocated memory. I thought I'd done that in a
708	couple of places already. Arrays of chars are not much fun.
709
710	- use https://gcc.gnu.org/viewcvs/gcc/trunk/config/stdint.m4 for mpfr-gmp.h
711
712	- By default, GNU Automake adds -I options to local directories, with
713	the side effect that these directories have the precedence to search
714	for system headers (#include <...>). This may make the build fail if
715	a C implementation includes a file that has the same name as one used
716	in such a directory.
717	For instance, if one adds an empty file "src/bits/types.h", then the
718	MPFR build fails under Linux because /usr/include/stdio.h has
719	#include <bits/types.h>
720	Possible workaround:
721	* disable the default -I options with nostdinc as documented in
722	the Automake manual;
723	* have a rule that copies the needed files ("mpfr.h" or they should
724	be prefixed with "mpfr-") to $(top_builddir)/include;
725	* use "-I$(top_builddir)/include".

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format