c++ - c++11 fast constexpr integer powers -


beating dead horse here. typical (and fast) way of doing integer powers in c classic:

int64_t ipow(int64_t base, int exp){   int64_t result = 1;   while(exp){     if(exp & 1)       result *= base;     exp >>= 1;     base *= base;   }   return result; } 

however needed compile time integer power went ahead , made recursive implementation using constexpr:

constexpr int64_t ipow_(int base, int exp){   return exp > 1 ? ipow_(base, (exp>>1) + (exp&1)) * ipow_(base, exp>>1) : base; } constexpr int64_t ipow(int base, int exp){   return exp < 1 ? 1 : ipow_(base, exp); } 

the second function handle exponents less 1 in predictable way. passing exp<0 error in case.

the recursive version 4 times slower

i generate vector of 10e6 random valued bases , exponents in range [0,15] , time both algorithms on vector (after doing non-timed run try remove caching effects). without optimization recursice method twice fast loop. -o3 (gcc) loop 4 times faster recursice method.

my question guys this: can 1 come faster ipow() function handles exponent , bases of 0 , can used constexpr?

(disclaimer: don't need faster ipow, i'm interested see smart people here can come with).

a optimizing compiler transform tail-recursive functions run fast imperative code. can transform function tail recursive pumping. gcc 4.8.1 compiles test program:

#include <cstdint>  constexpr int64_t ipow(int64_t base, int exp, int64_t result = 1) {   return exp < 1 ? result : ipow(base*base, exp/2, (exp % 2) ? result*base : result); }  int64_t foo(int64_t base, int exp) {   return ipow(base, exp); } 

into loop (see @ gcc.godbolt.org):

foo(long, int):     testl   %esi, %esi     movl    $1, %eax     jle .l4 .l3:     movq    %rax, %rdx     imulq   %rdi, %rdx     testb   $1, %sil     cmovne  %rdx, %rax     imulq   %rdi, %rdi     sarl    %esi     jne .l3     rep; ret .l4:     rep; ret 

vs. your while loop implementation:

ipow(long, int):     testl   %esi, %esi     movl    $1, %eax     je  .l4 .l3:     movq    %rax, %rdx     imulq   %rdi, %rdx     testb   $1, %sil     cmovne  %rdx, %rax     imulq   %rdi, %rdi     sarl    %esi     jne .l3     rep; ret .l4:     rep; ret 

instruction-by-instruction identical enough me.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -