Exporting GEE satellite imagery to Numpy array

Question

I'm trying to extract LANDSAT-8 images around a lat/lon location, using the Google Earth Engine API for Python.
Specifically, I want to generate a bounding box of 1km by 1km around my point, and extract the pixels for the RGB bands, so my output should be 33 by 33 by 3, given that LANDSAT-8 resolution is 30m.
However, I can't get this to work. I'm generating my box using ee.Geometry.Point.buffer and ee.Geometry.Polygon.bounds, and then I'm trying to overlay my geometry over LANDSAT-8 images of 2020, of which I want to get the average value pixel-wise (so the annual average image should be 33 by 33 by 3). After that, I plan to extract the data as a numpy.Array using the ee.Image.sampleRectangle function, but it is not working. Here is a minimal working example:
#importing packages
import numpy as np
import ee
ee.Authenticate()
ee.Initialize()

#converting a lon/lat pair to an ee.point object
point=ee.Geometry.Point( [-122.2036486, 37.4237011] )
# converting the point to a patch: we define a circle with a 500m radius, and then we put a box around it
patch=point.buffer(500).bounds()

#checking that the patch has approximately the right area 1000^2 m^2
patch.area(1).getInfo()

#defining the image: Landsat 8 collection of images from 2020, with RGB bands. We take the mean of them
imagery=ee.ImageCollection("LANDSAT/LC08/C01/T1").filterDate('2020-01-01', '2020-12-31').select(['B3','B4','B2']).mean()

#getting a matrix from the image for our patch of interest
rect_image = imagery.sampleRectangle(region=patch)

#we should obtain a 33*33 matrix for each one of the 3 bands

#extracting one band to check
band_b4 = rect_image.get('B4')
#checking the shape of the resulting matrix
np.array(band_b4.getInfo()).shape

#the result is 1*1

I feel that one problem here is that the mean() function is reducing collapsing the dimensions of the images, so my "fix" to that is instead using the first() function to get a proper image, even if it is not the average image I want. Here is what happens:
imagery2=ee.ImageCollection("LANDSAT/LC08/C01/T1").filterDate('2020-01-01', '2020-12-31').select(['B3','B4','B2']).first()

#checking the dimensions
imagery2.getInfo()['bands'][1]['dimensions']

#getting a matrix from the image for our patch of interest
rect_image2 = imagery2.sampleRectangle(region=patch)

#in theory, since this is 1000m*1000m and the resolution of the satellite is 30m,
#we should obtain a 33*33 matrix for each one of the 3 bands

#extracting 1 band to check
band_b4_2 = rect_image2.get('B4')

#checking the shape of the resulting matrix
np.array(band_b4_2.getInfo()).shape

#now we get an error

How could I solve this?
I have tried a lot of things like using ee.ImageCollection.filterBound over the image collection, ee.Image.clip, and I have tried to export the images in TIFF or TFRecord formats with the ee.batch.Export.image.toDrive method, but that also hasn't worked.

Nuno César de Sá · Answer

You can export the image to your drive (or gdrive):
Consider l8_image is an image ee object which you want to save.
### Getting the filenames - might be useful
l8_img_meta = l8_img.getInfo()
imagename = l8_img_meta.get('properties',{}).get('PRODUCT_ID') #fetches the name

Storing the image as a .tif
This is a task command to GEE, you will have to wait a bit for it to finish
task_config = {
    'image': l8_img,
    'fileFormat': 'GeoTIFF',
    'folder': '<foldername>',
    'fileNamePrefix': imagename[0:19],
    'description': "clipped area",
    'scale':20,
    'region':poly_area
}

# This is how we order it to start
task = ee.batch.Export.image.toDrive(**task_config )
task.start()

### check task status - you can see if it failed, it's running or finished
task.status()

Operating with rasters in Python:
# required: pyrsgis, rasterio, pyproj

import rasterio
import rasterio.plot
import pyproj
from pyrsgis.convert import changeDimension

### loading the file
s2_data = "path to file"

### loading and checking
ds1, bands = raster.read(s2_data)
print(ds1)
print(bands.shape)

bandByPixel = changeDimension(bands)/10000. #we have to devide all values by 10k - its a conversion from bits to reflectances
bandByPixel_t = np.transpose(bandByPixel)
print(bandByPixel.shape)
print(bandByPixel_t.shape)

#opening the raster
with rasterio.open(s2_data) as src:
    subset = src.read(7)/10000.

#plotting the raster
plt.figure(figsize=(6,8.5))
plt.imshow(subset)
plt.colorbar(shrink=0.5)
#plt.title(f'Band 4 Subsetn{window}')
plt.xlabel('Column #')
plt.ylabel('Row #')

By here, it should be np.array or easily convertible.

Exporting GEE satellite imagery to Numpy array

One Answer

Add your own answers!

Ask a Question