ieee 754 - Converting from floating-point to decimal with floating-point computations -


i trying convert floating-point double-precision value x decimal 12 (correctly rounded) significant digits. assuming x between 10^110 , 10^111 such decimal representation of form x.xxxxxxxxxxxe110. and, fun, trying use floating-point arithmetic only.

i arrived pseudo-code below, operations double-precision operations, notation 1e98 double nearest mathematical 10^98, , 1e98_2 double nearest result of mathematical subtraction 10^98-1e98. notation fmadd(x * y + z) fused multiply-add operation operands x,y, z.

  y = x * 2^-1074;    // exact   q = y / 1e98;       // q denormal , significand of q interpreted                       // integer our candidate 12 decimal                       // digits of x    r = fmadd(q * 1e98 - y);  // close 1e98 * (error made during division)    // if 1e98_2 >= 0, divided number smaller wished   // correct answer may q or q+1.    if (r , 1e98_2 have opposite signs)   {     return significand of q;   }    s = copysign(2^-1074, r);   r1 = abs(r);   r2 = abs(1e98_2);    h = 1e98 * 0.5 * 2^-1074;    set rounding mode downwards    r3 = fmadd(r2 * q + r1);    if (r3 < h)   {     return significand of q;   }   else   {     return significand of (q + s)   } 

i apologize confusion pervades above pseudo-code, not clear me yet, hence following questions:

  1. does first fmadd work intended (to compute 1e98 * (error made during division))?

  2. the signs. cannot convince myself right. cannot convince myself wrong either.

  3. any idea, perhaps arguments, frequency algorithm might produce wrong result?

  4. if works @ all, there chance algorithm continue work if “q = y / 1e98” changed “q = y * 1e-98” (leaving other instructions same)?

i have not tested algorithm. not have computer fmadd instruction, although hope find 1 can execute above.

let y/d exact operation, , q=rnd(y/d) result rounded nearest float.
true error multiplied d rt=(rnd(y/d)-y/d)*d=q*d-y , operation performed fmadd r=rnd(q*d-y)
why q*d-y exact (fmadd no final rounding) less clear explain, q*d has limited number of bits (<nbits(q)+nbits(d)), exponent of y of q*d (+/- 1) , since error |rt|<0.5*ulp(q)*d, means first nbits(q) vanishing... answers question 1.

so q*1e98 - y = r , |r|*2^1074 <= 0.5e98 < 5*10^98 (2nd inequality lucky)

q*(10^98) - y = r + (10^98-1e98)*q |10^98-1e98|*q*2^1074 <= 0.5e95 (assuming @ least 15 digits precision, log(2^53)/log(10) > 15)

so ask whether |q*(10^98)-y|*2^1074>5*10^97

you have approximation of |q*(10^98)-y| r+1e98_2*q

since |r| < 5*10^98, , |r+(10^98-1e98)*q|<|r| if signs opposite, think answers positively question 2. wouldn't sure if 1e98_2 < 0.

if r , 1e98_2 have same sign might exceed 5*10^97, further handling discussion of r3 = 1e98_2*q + r versus h=0.5e98*2^-1074

for question 3, @ first sight, i'd 2 things might make algorithm fail:

  • 1e98_2 not exact (10^98-1e98-1e98_2 = -3.6e63 approx.)

  • and h not ht=0.5*10^98*2^-1074 bit smaller saw above.

the true error r3t approximately (1e98_2-3e63)*q + r < r3 (and case when >0 interesting us, because 1e98_2>0).

so approximation of error r3 falling above approximated tie h when true error r3t below true tie ht lead incorrect rounding. possible, , if yes how frequent question 3?

to mitigate above inequality risk, tried truncate magnitude of r3, r3 <= 1e98_2*q + r. felt bit tired perform true analysis of error bounds...

so scanned error, , first failing example found 1.0000000001835e110 (i assume correctly rounded nearest double, in fact 1000000000183.49999984153799821120915424942630528225695526491963291846957919215885146546696544423465444842668032e98).

in case, r , 1e98_2 have same sign, ,

  • (x/1e98) > 1000000000183.50000215

  • q significand rounded 1000000000184

  • r3>h (r3*2^1074 approx. 5.000001584620017e97) , incorrectly incremented q+s, when should have been q-s, definitely bug.

my answers are:

  1. yes, r=fmadd(q * 1e98 - y) 1e98*(error made during division), don't care of division, it's providing guess, counts subtraction exact.

  2. yes, sign correct because |r| < 5*10^98, , |r+(10^98-1e98)*q|<|r| if signs opposite. wouldn't sure if 1e98_2 < 0.

  3. taking first failing example (1.0000000001835e110 - 1.0e110)/1.0e110 ulp -> 1.099632e6, very naive conjecture 1 case out of million, r3 falling on h... once q+s corrected q-s, occurence of r3>h while r3t<ht much smaller 1/1,000,000 in case... there more 10^15 doubles in range of interest, consider not serious answer...

  4. yes, discussion above solely guess q, independently of way produced, , subtraction in 1. still exact...


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -