Optimize c++ bitmap processing algorithm -
i have written next algorithm (for android/ndk) apply levels bitmap. problem is slow, on fast device such sgsiii can take 4 seconds 8mp image. , on devices armv6 takes ages (over 10 seconds). there way optimize it?
void applylevels(unsigned int *rgb, const unsigned int width, const unsigned int height, const float exposure, const float brightness, const float contrast, const float saturation) { float r, g, b; unsigned int pixelindex = 0; float exposurefactor = powf(2.0f, exposure); float brightnessfactor = brightness / 10.0f; float contrastfactor = contrast > 0.0f ? contrast : 0.0f; (int y = 0; y < height; y++) { (int x = 0; x < width; x++) { const int pixelvalue = buffer[pixelindex]; r = ((pixelvalue & 0xff0000) >> 16) / 255.0f; g = ((pixelvalue & 0xff00) >> 8) / 255.0f; b = (pixelvalue & 0xff) / 255.0f; // clamp values r = r > 1.0f ? 1.0f : r < 0.0f ? 0.0f : r; g = g > 1.0f ? 1.0f : g < 0.0f ? 0.0f : g; b = b > 1.0f ? 1.0f : b < 0.0f ? 0.0f : b; // exposure r *= exposurefactor; g *= exposurefactor; b *= exposurefactor; // contrast r = (((r - 0.5f) * contrastfactor) + 0.5f); g = (((g - 0.5f) * contrastfactor) + 0.5f); b = (((b - 0.5f) * contrastfactor) + 0.5f); // saturation float gray = (r * 0.3f) + (g * 0.59f) + (b * 0.11f); r = gray * (1.0f - saturation) + r * saturation; g = gray * (1.0f - saturation) + g * saturation; b = gray * (1.0f - saturation) + b * saturation; // brightness r += brightnessfactor; g += brightnessfactor; b += brightnessfactor; // clamp values r = r > 1.0f ? 1.0f : r < 0.0f ? 0.0f : r; g = g > 1.0f ? 1.0f : g < 0.0f ? 0.0f : g; b = b > 1.0f ? 1.0f : b < 0.0f ? 0.0f : b; // store new pixel value r *= 255.0f; g *= 255.0f; b *= 255.0f; buffer[pixelindex] = ((int)r << 16) | ((int)g << 8) | (int)b; pixelindex++; } } }
most of computations can trivially tabled... whole processing can become
for (int i=0; i<n; i++) { int px = buffer[i]; int r = tab1[(px >> 16) & 255]; int g = tab1[(px >> 8) & 255]; int b = tab1[px & 255]; gray = (kr*r + kg*g + kb*b) >> 16; grayval = tsat1[gray]; r = brtab[tsat2[r] + grayval]; g = brtab[tsat2[g] + grayval]; b = brtab[tsat2[b] + grayval]; buffer[i] = (r << 16) | (g << 16) | b; } where
tab1table of 256 bytes tabling result of exposure , constrast processingtsat1,tsat2256 bytes tables saturation processingbrtab512-bytes table brightness processing
note without saturation processing need lookup per component in 256 bytes table.
a huge speed problem can because using floating-point computations there no dedicated hardware it. software implementation of floating point slow.
Comments
Post a Comment