Rasterio and OpenCV reads JPEG differently

Question

I realise a very similar question was asked  as Rasterio and OpenCV shows two different pixel arrays for same image. However my situation is slightly different and I only wish to know why this difference occurs.
I converted a GeoTIFF (uint16) of mine to JPEG (uint8) via gdal_translate as follows:
gdal_translate -of JPEG -scale ./rgb.tif rgb.jpg

To confirm whether the image was read correctly, I ran the following script (named test.py):
import cv2
import rasterio as rio
from skimage import io
import numpy as np

def check(path):
    # All images should be in RGB mode
    im1 = rio.open(path)
    im1 = im1.read()
    im1 = im1.transpose(1, 2, 0) # CHW to HWC
    im2 = cv2.imread(path)
    im2 = im2[:,:,::-1] # BGR to RGB
    im3 = io.imread(path)

print(f"Rasterio and OpenCV = {np.all(im1[:,:,0] == im2[:,:,0])}")
    print(f"Rasterio and Skimage = {np.all(im1[:,:,0] == im3[:,:,0])}")

if __name__ == "__main__":
    check("rgb.jpg")

which returns
Rasterio and OpenCV = False
Rasterio and Skimage = True

Apparently OpenCV reads the image differently from rasterio and skimage. I can also confirm rgb.jpg is of type uint8 and is read as such by the three libraries.
So does anyone have an idea as to why this happens? Is this expected behaviour?
Versions:
OS: Ubuntu 16.04
pip = 21.0
python = 3.7.3

Link to image here
How to reproduce
conda create -n s1 -c conda-forge rasterio scikit-image python=3.7
conda activate s1
pip install opencv-python

Running test.py with s1 'fails' as rasterio and cv2 reads image differently.
Adding the output of gdalinfo rgb.jpg
Gdalinfo output
river: JPEG/JPEG JFIF
Files: rgb.jpg
       rgb.jpg.aux.xml
Size is 1000, 1000
Coordinate System is:
PROJCS["unknown",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Lambert_Azimuthal_Equal_Area"],
    PARAMETER["latitude_of_center",-90],
    PARAMETER["longitude_of_center",0],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1],
    AXIS["Easting",NORTH],
    AXIS["Northing",NORTH]]
Origin = (-2383139.439709701109678,1465870.867263491498306)
Pixel Size = (25.610955512797485,-25.610955512797485)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=JPEG
  INTERLEAVE=PIXEL
  SOURCE_COLOR_SPACE=YCbCr
Corner Coordinates:
Upper Left  (-2383139.440, 1465870.867) ( 58d24'15.47"W, 64d43'49.98"S)
Lower Left  (-2383139.440, 1440259.912) ( 58d51'11.39"W, 64d51'11.12"S)
Upper Right (-2357528.484, 1465870.867) ( 58d 7'38.49"W, 64d55'50.63"S)
Lower Right (-2357528.484, 1440259.912) ( 58d34'42.36"W, 65d 3'15.06"S)
Center      (-2370333.962, 1453065.390) ( 58d29'26.93"W, 64d53'32.72"S)
Band 1 Block=1000x1 Type=Byte, ColorInterp=Red
  Overviews: 500x500, 250x250
  Image Structure Metadata:
    COMPRESSION=JPEG
Band 2 Block=1000x1 Type=Byte, ColorInterp=Green
  Overviews: 500x500, 250x250
  Image Structure Metadata:
    COMPRESSION=JPEG
Band 3 Block=1000x1 Type=Byte, ColorInterp=Blue
  Overviews: 500x500, 250x250
  Image Structure Metadata:
    COMPRESSION=JPEG

Ashwin Nair · Accepted Answer

TL;DR: opencv from pip uses libjpeg-turbo whereas opencv from conda uses libjpeg i.e. two different libraries hence two different results.
The problem was due to how OpenCV was installed (conda vs pip). The reason for the difference between the conda and pip versions is because they use different libraries for reading jpeg files. The conda version uses libjpeg while the pip version uses libjpeg-turbo. According to this issue, it seems rasterio has no plans to support libjpeg-turbo.
This can be verified by running the following:
python -c "import cv2; print(cv2.getBuildInformation())" | grep jpeg

If opencv was installed via pip
3rdparty dependencies:       ittnotify libprotobuf libjpeg-turbo libwebp libtiff libopenjp2 IlmImf quirc ippiw ippicv
JPEG:                        libjpeg-turbo (ver 2.0.6-62)

If opencv was installed via conda,
JPEG:        /home/ash/anaconda3/envs/s2/lib/libjpeg.so (ver 90)

A Github issue was raised here regarding this very matter. Simply put, due to the lossy nature of jpeg and the optimizations done in libjpeg-turbo there can be no guarantee that the libraries will produce the same results.
Verification
OpenCV officially moved from libjpeg to libjpeg-turbo post 3.3.0.10. So to verify if this was indeed the issue, I simply needed to compare them.
# For opencv-python<=3.3.0.10 
>>python -c "import cv2; print(cv2.getBuildInformation())" | grep jpeg

3rdparty dependencies:       ittnotify libprotobuf libjpeg libwebp libpng libtiff libjasper IlmImf
JPEG:                        libjpeg (ver 90)

This returns True for Rasterio vs OpenCV comparison

# For opencv-python>=3.3.1.11
>>python -c "import cv2; print(cv2.getBuildInformation())" | grep jpeg

3rdparty dependencies:       ittnotify libprotobuf libjpeg-turbo libwebp libtiff libopenjp2 IlmImf quirc ippiw ippicv
JPEG:                        libjpeg-turbo (ver 2.0.6-62)

This returns False for Rasterio vs OpenCV comparison

Thanks to @user2856 and laurent.berger from the opencv thread for helping me narrow down the cause.

swiss_knight · Answer

Very interesting issue...
I also checked on my side with (Python 3.6.9 on Ubuntu 18.04) and I can reproduce it:
rio.__version__: 1.1.8
cv2.__version__: 4.4.0
skimage.__version__: 0.17.2

Let me share my investigations.
First, two tiny practical differences:
I reshaped rasterio image using their built-in tool as explained here:
https://rasterio.readthedocs.io/en/latest/topics/image_processing.html#imageorder
from rasterio.plot import reshape_as_image
im1 = reshape_as_image(im1)

Second, I also used an OpenCV parameter to convert the image to RGB as stated here: https://docs.opencv.org/master/d8/d01/group__imgproc__color__conversions.html
im2 = cv2.cvtColor(im2, cv2.COLOR_BGR2RGB)

Please, also note that using the cv2.IMREAD_UNCHANGED flag while reading the image with OpenCV doesn't change anything.
I finally double checked the dtypes:
ims = [im1, im2, im3]
for i in range(len(ims)):
    print("shape: {} and dtype: {}".format(ims[i].shape, ims[i].dtype))

The output is consistent:
shape: (1000, 1000, 3) and dtype: uint8
shape: (1000, 1000, 3) and dtype: uint8
shape: (1000, 1000, 3) and dtype: uint8

From here, I dig into the image using a plotting function:
def plotlist(lst, cmap=None):
    figsize = (12,12)
    fig, axs = plt.subplots(
        nrows=1,
        ncols=len(lst),
        figsize=figsize,
        dpi=100,
        sharex=True,
        sharey=True, )
    for i, ax in enumerate(axs):
        ax.imshow(lst[i], interpolation='none', cmap=cmap)

plt.tight_layout()
    plt.show()

And used a super tiny subset of only 4 pixels (the four last on the right of the bottom row of the image):
subimgs = [im[-1:,-4:,:] for im in ims] # bottom right
plotlist(subimgs)

Which rendered:

Rendering of the 4 last pixels of the last row.
Pretty much the same right?
Let's plot the differences:
deltas = []
deltas.append(np.subtract(subimgs[0],subimgs[1])) # rio - cv2
deltas.append(np.subtract(subimgs[0],subimgs[2])) # rio - skimage
deltas.append(np.subtract(subimgs[2],subimgs[1])) # skimage - cv2
plotlist(deltas)

Which rendered:

Rendering of the differences between the images on the 4 last pixels of the last row.
Whoops, something wrong with OpenCV? Yes, maybe, but from there you cannot be sure, it may also be the case that both rasterio and skimage are wrong in the same amount (tiny chances but...)
Let's make this clear by printing the actual values of these 4 last pixels:
>>> subimgs[0] # rio
array([[[58, 40, 38],
        [57, 37, 36],
        [63, 42, 39],
        [54, 30, 28]]], dtype=uint8)

>>> subimgs[2] # cv2
array([[[58, 40, 38],
        [56, 38, 36],
        [62, 42, 41],
        [51, 31, 30]]], dtype=uint8)

>>> subimgs[2] # skimage

array([[[58, 40, 38],
        [57, 37, 36],
        [63, 42, 39],
        [54, 30, 28]]], dtype=uint8)

Again, they look pretty much the same right? But wait... there actually are tiny little differences... (carefully look at each individual values...)
So, what is the truth?
This may (or may not) help, but using a third party tool can help decide. So I opened the image using Gimp and used the pipette on these 4 pixels, and here are the results:

Gimp
pixel 1 RGB: [58, 40, 38]
pixel 2 RGB: [56, 38, 36]
pixel 3 RGB: [62, 42, 41]
pixel 4 RGB: [51, 31, 30]

Yes, these values are the same than the one from OpenCV!
As a GIS person, I also decided to load the image into QGIS and query for the values on each of these pixels; results are the same than in Gimp and OpenCV.
As a last test, I also see what happens with Python GDAL:
from osgeo import gdal
print(gdal.__version__) # 3.1.0
ds = gdal.Open(path)
for i in range(3):
    i+=1
    print("{}".format(np.array(ds.GetRasterBand(i).ReadAsArray())[-1:,-4:]))

# Output (read pixel RGB values as columns here as each row represents a single band):
[[58 56 62 51]]
[[40 38 42 31]]
[[38 36 41 30]]

Again, the same than Gimp and OpenCV...
So now we have balanced the problem;
OpenCV == Gimp == QGIS == GDAL
rasterio == skimage
As I said, there are 2 possibilities from here;

maybe rasterio and skimage use the same lib under the hood and the artifact is hidden here?
maybe Gimp, QGIS, GDAL and OpenCV are based on the same lib (which I doubt) and the artifact is hidden here?

If someone know what is the dependency tree of all the image libraries used here, it could help fixing it.
I would intuitively say that the chances are higher that there is a glitch in skimage and that rasterio is, somehow, based on it.
My advice; consider also investigating in this direction instead of only OpenCV.

Rasterio and OpenCV reads JPEG differently

Versions:

Link to image here

How to reproduce

Gdalinfo output

2 Answers

TL;DR: opencv from pip uses `libjpeg-turbo` whereas opencv from conda uses `libjpeg` i.e. two different libraries hence two different results.

Verification

Add your own answers!

Ask a Question