The author is confusing bins with bin edges. In their first plot, the standard approach looks strange because 0-7 should be the bin edges, not the center points as shown in the plot.
You can see this confusion again in the histogram example. There are only 255 bins, not 256. If you fix that mistake and remove the 0.5 offset, then the histogram is distributed correctly at both ends.
I'll argue for the +0.5 solution. First, I don't like half-sized intervals at the edges, and second, a 255-based representation is typically a SDR (not HDR) image.
RGB values represent luminances against some adapted state, and a "zero" in a daylit scene is not "zero luminance" - it's just about 0.001x as bright as the brightest point - it's millions of photons, way more than zero. In a sense our eyes experience contrast on a sliding scale, and there is no absolute zero in the system. For example, broadcast systems historically used 16-235 as their luminance range for SDR. I think any argument that says "we must have zero" is going to have a bias, but I don't think zero is needed for most things.
As someone with a lot of experience in this area doing image processing and rendering for VFX (including writing image readers and writers for my own software and commercial VFX software), I think you might be forgetting that colourspace conversion (to sRGB 'linear' rec709 for old-school SDR, but other more wider gamuts for newer formats) would happen after this, so the 'squish' of the dynamic range would happen after loading.
Also, a lot of workflows for image processing and compositing do assume that 0 means zero, whether correctly or not (often incorrectly). So there are often assumptions that for 8-bit, 0u maps to 0.0f and 255 maps to 1.0f for things like masking or alpha: as soon as you have 0 values which become just over 0.0, you then have artifacts because some code somewhere is using a hard threshold of 0.0 to mask some other operation, and vice-versa for 1.0 with alpha, where suddenly because the 255 values are no longer 1.0, you have very slightly see-through objects (often only visible in certain situations or when pixel-peeping) after pre-multiplication.
I agree. Additionally, both 0.0 and 1.0 don't really exist for dithered signals, so a byte should map to [0.5, 255.5] before division by 256. This also solves the signed integer asymmetry, as a signed byte maps to [-127.5, 127.5] before division by 128. I wonder if audio DSP folks have done this already.
Interesting idea, but somehow I feel the world is shaking. For the processing program, what used to black(0.0) and white(1.0) has became very dark gray and very bright gray.
> In a sense our eyes experience contrast on a sliding scale
There's a whole visual center to check the amount of incoming light and adjust your pupils for you. It's intentionally reactive.
> and there is no absolute zero in the system.
There maybe is. I think we call that "blind."
> broadcast systems historically used 16-235 as their luminance range for SDR
Mostly because it was a fully analog system and these all translate down to signal voltage. Jokingly NTSC used to be referred to as "Never Twice the Same Color" due to being a compromise bolted onto the side of an already compromised system.
That was a fun article to read of something I haven't had to think about in a while. It brought to mind moments in game development of having pixel art needing to be drawn on an integer value despite the game logic using floating point math. I tried something similar to the +0.5 in places so that it wouldn't look as bad (especially when there's a moving camera, which also needed to be truncated..).
I also enjoyed the 2002 article by Jonathan Blow [1] that's linked at the bottom. The visualization from the first article helped a lot once this started to go more in-depth.
I'm confused by that analogy. Is the ârulerâ a 255-inch ruler with 256 points labeled 0â255, or is it a 256-inch ruler with 256 1-inch segments, making L = 256Ă1?
You donât divide a float by 256 by shifting it right eight bits; that would yield complete garbage. You subtract 8 from the exponent, then check if you got an underflow.
Useful, then, that you can start several vectorized floating-point muls each cycle. (E.g., most modern x86 are 3/0.5 cycles for vmulps. No 20 cycles in sight.)
FP Division by constant is optimized by a compiler into a multiply. Graphics processing typically happens on the GPU these days, and on all recent GPUs FPMUL belongs to the class of lowest-latency operations. That is, there are no other instructions that complete faster.
Only with things like -ffast-math enabled will compilers do the reciprocal.
It can make a fair difference in some cases, but it's often better to selectively use it in code locations you know are acceptable by doing it manually in the code.
If the latter is 10x faster, the issue is some kind of weird compilation failure for the above version. For one, it only cuts a third of the multiplies.
You should multiply by 255.0, optionally add a dither (triangular is okay), and then let the FPU round using its default IEEE 754 round-to-nearest-ties-to-nearest-even mode. None of this crazy 0.5 stuff. :-)
- i = min(floor(f * 256), 255) (from float to uint8)
- f = i / 255 (from uint8 to float)
Basically a mix of the 2 approaches mentioned in the article.
For all integers between [0,255], if I do uint8 -> float -> uint8 conversion, I will get the same result.
edit: I wonder what's the maximum jitter amount that I can introduce to the float and get the same uint8 value. And also these 0->0.0 and 255->1.0 should map properly.
With my approach at the top, maximum jitter that I can introduce is ~1/65280.
But as the article mentioned, this is the approach:
A similar issue exists in the audio world, for example 16-bit integer audio is between [-32768, 32767] (non-symmetric), but floating point audio is [-1.0, 1.0].
note that floating point audio very often exceeds [-1.0, 1.0] within the pipeline, just to be tamed at the very end of the mix to fit within those bounds. this is pretty much why every modern DAW uses floating point these days.
"Letâs say youâre writing an image processing program. The program takes in an image, converts it to floating point, does some processing and finally saves the modified pixels to disk as 8-bit colors. "
excuse to argue about the best way aside, if this is the goal you should not be rolling your own image file reading. you should use openimageio. idk what approach it takes in its internal conversion to float, but that library is more likely to have the right answer than you trying to roll it yourself given its the library used internally by tons of professional image manipulation software...
If you're a beginner, or just want something which works quickly, sure.
However OIIO is far from perfect in all situations (having had to debug and fix issues with its mip-map generation filtering code in the past), so don't always assume that just because there's a mature open source library out there doing something that it's always perfect.
sure of course nothing is perfect and oiio has a lot of surface area / is still oss. thats good advice.
ive just seen a lot of "ai researchers" who are getting into professional image processing and are both beginners and want things quickly and so could do much worse than just starting from what they get out of oiio. especially for a lot of the non-obvious stuff (more of that in color handling than just the io stuff though)
The author is confusing bins with bin edges. In their first plot, the standard approach looks strange because 0-7 should be the bin edges, not the center points as shown in the plot.
You can see this confusion again in the histogram example. There are only 255 bins, not 256. If you fix that mistake and remove the 0.5 offset, then the histogram is distributed correctly at both ends.
I'll argue for the +0.5 solution. First, I don't like half-sized intervals at the edges, and second, a 255-based representation is typically a SDR (not HDR) image.
RGB values represent luminances against some adapted state, and a "zero" in a daylit scene is not "zero luminance" - it's just about 0.001x as bright as the brightest point - it's millions of photons, way more than zero. In a sense our eyes experience contrast on a sliding scale, and there is no absolute zero in the system. For example, broadcast systems historically used 16-235 as their luminance range for SDR. I think any argument that says "we must have zero" is going to have a bias, but I don't think zero is needed for most things.
As someone with a lot of experience in this area doing image processing and rendering for VFX (including writing image readers and writers for my own software and commercial VFX software), I think you might be forgetting that colourspace conversion (to sRGB 'linear' rec709 for old-school SDR, but other more wider gamuts for newer formats) would happen after this, so the 'squish' of the dynamic range would happen after loading.
Also, a lot of workflows for image processing and compositing do assume that 0 means zero, whether correctly or not (often incorrectly). So there are often assumptions that for 8-bit, 0u maps to 0.0f and 255 maps to 1.0f for things like masking or alpha: as soon as you have 0 values which become just over 0.0, you then have artifacts because some code somewhere is using a hard threshold of 0.0 to mask some other operation, and vice-versa for 1.0 with alpha, where suddenly because the 255 values are no longer 1.0, you have very slightly see-through objects (often only visible in certain situations or when pixel-peeping) after pre-multiplication.
I agree. Additionally, both 0.0 and 1.0 don't really exist for dithered signals, so a byte should map to [0.5, 255.5] before division by 256. This also solves the signed integer asymmetry, as a signed byte maps to [-127.5, 127.5] before division by 128. I wonder if audio DSP folks have done this already.
Interesting idea, but somehow I feel the world is shaking. For the processing program, what used to black(0.0) and white(1.0) has became very dark gray and very bright gray.
Both solutions add 0.5, the difference is where in the process it happens.
> In a sense our eyes experience contrast on a sliding scale
There's a whole visual center to check the amount of incoming light and adjust your pupils for you. It's intentionally reactive.
> and there is no absolute zero in the system.
There maybe is. I think we call that "blind."
> broadcast systems historically used 16-235 as their luminance range for SDR
Mostly because it was a fully analog system and these all translate down to signal voltage. Jokingly NTSC used to be referred to as "Never Twice the Same Color" due to being a compromise bolted onto the side of an already compromised system.
That was a fun article to read of something I haven't had to think about in a while. It brought to mind moments in game development of having pixel art needing to be drawn on an integer value despite the game logic using floating point math. I tried something similar to the +0.5 in places so that it wouldn't look as bad (especially when there's a moving camera, which also needed to be truncated..).
I also enjoyed the 2002 article by Jonathan Blow [1] that's linked at the bottom. The visualization from the first article helped a lot once this started to go more in-depth.
[1] https://web.archive.org/web/20240706043551/https://number-no...
If you have a ruler and it goes to 12 inches, you should normalize by the length L and not by 13, the number of points on the ruler.
I'm confused by that analogy. Is the ârulerâ a 255-inch ruler with 256 points labeled 0â255, or is it a 256-inch ruler with 256 1-inch segments, making L = 256Ă1?
yes but >> 8 is so much faster
You donât divide a float by 256 by shifting it right eight bits; that would yield complete garbage. You subtract 8 from the exponent, then check if you got an underflow.
Same point; divide by power of 2 is a fast subtraction operation in float world, while divide by 255 shits all over the whole float
It's just multiplication. Floating multiply is extraordinarily fast.
The difference between 20 cycles and 1 clock cycle in a hot loop is very noticeable
It's 3 cycles for float multiplication (and 1 for shift right):
https://uops.info/table.html?search=mulss&cb_lat=on&cb_tp=on...
https://uops.info/table.html?search=shr&cb_lat=on&cb_tp=on&c...
In throughput it's even less of a difference: 2 per cycle vs 3 per cycle.
Useful, then, that you can start several vectorized floating-point muls each cycle. (E.g., most modern x86 are 3/0.5 cycles for vmulps. No 20 cycles in sight.)
FP Division by constant is optimized by a compiler into a multiply. Graphics processing typically happens on the GPU these days, and on all recent GPUs FPMUL belongs to the class of lowest-latency operations. That is, there are no other instructions that complete faster.
Only with things like -ffast-math enabled will compilers do the reciprocal. It can make a fair difference in some cases, but it's often better to selectively use it in code locations you know are acceptable by doing it manually in the code.
That's only valid to do if the reciprocal is representable exactly.
Only in micro-benchmarks.
For real usage, today's CPUs are limited by memory bandwidth.
What are you talking about in a hot loop in my software renderer this is like 10x faster
If the latter is 10x faster, the issue is some kind of weird compilation failure for the above version. For one, it only cuts a third of the multiplies.
Because you are working in the cache.
Also, you should use SIMD.
> Also, you should use SIMD. ironically no clang is better at auto vectorizing
Iâm dumb. Doesnât 0 start at the beginning?
Both of these assume a linear transfer function, which is rarely the case.
Basically never for 8-bit color channels.
Advice for anyone on mobile: read in landscape mode if you want to be able to see the division by 256 version code example at the start.
The HTML/CSS is bad that lets it completely overflow the right edge of the page instead of wrapping.
I re-read this post three times in total confusion before I figured out the most important piece was off-screen entirely.
You should multiply by 255.0, optionally add a dither (triangular is okay), and then let the FPU round using its default IEEE 754 round-to-nearest-ties-to-nearest-even mode. None of this crazy 0.5 stuff. :-)
Interesting article. I tend to use
- i = min(floor(f * 256), 255) (from float to uint8)
- f = i / 255 (from uint8 to float)
Basically a mix of the 2 approaches mentioned in the article.
For all integers between [0,255], if I do uint8 -> float -> uint8 conversion, I will get the same result.
edit: I wonder what's the maximum jitter amount that I can introduce to the float and get the same uint8 value. And also these 0->0.0 and 255->1.0 should map properly.
With my approach at the top, maximum jitter that I can introduce is ~1/65280.
But as the article mentioned, this is the approach:
- i = floor(f * 255 + 0.5)
- f = i / 255
with maximum jitter margin of ~1/510.
It's worth pointing out that the article explicitly calls out your first mixed technique:
> Finally, one should never mix the encode and decode steps of the two quantizers. Thatâs just broken code. Itâs an easy mistake to make, though.
This is what I do for the former:
Oh very nice idea to get rid of the min operator.
Should always be 0-255 as that fits an unsigned byte.
That's not what the article is about.
> assume that in both cases the output values are clamped before the final typecast
A similar issue exists in the audio world, for example 16-bit integer audio is between [-32768, 32767] (non-symmetric), but floating point audio is [-1.0, 1.0].
note that floating point audio very often exceeds [-1.0, 1.0] within the pipeline, just to be tamed at the very end of the mix to fit within those bounds. this is pretty much why every modern DAW uses floating point these days.
255 gives 0-255, which gives you a zero value. 256 is 1-256, you lose the option of setting 0.
That's not what the article is about.
Both. 255 for each color and the last 1 as the alpha for each channel.
Why not??? Fight me
"Letâs say youâre writing an image processing program. The program takes in an image, converts it to floating point, does some processing and finally saves the modified pixels to disk as 8-bit colors. "
excuse to argue about the best way aside, if this is the goal you should not be rolling your own image file reading. you should use openimageio. idk what approach it takes in its internal conversion to float, but that library is more likely to have the right answer than you trying to roll it yourself given its the library used internally by tons of professional image manipulation software...
If you're a beginner, or just want something which works quickly, sure.
However OIIO is far from perfect in all situations (having had to debug and fix issues with its mip-map generation filtering code in the past), so don't always assume that just because there's a mature open source library out there doing something that it's always perfect.
sure of course nothing is perfect and oiio has a lot of surface area / is still oss. thats good advice.
ive just seen a lot of "ai researchers" who are getting into professional image processing and are both beginners and want things quickly and so could do much worse than just starting from what they get out of oiio. especially for a lot of the non-obvious stuff (more of that in color handling than just the io stuff though)
OpenImageIO uses the standard division by 255 technique: https://openimageio.readthedocs.io/en/latest/imageoutput.htm...