Geographic Information Systems Asked on January 30, 2021
I am trying to retrieve data from 324 eopatches I sampled beforehand. Patches are basically numpy arrays as explained in this documentation EOPatch.
This is an example of the content of a patch after I resampled it :
EOPatch(
data: {
FEATURES: FeatureIO(/data/FEATURES.npy)
FEATURES_SAMPLED: FeatureIO(/data/FEATURES_SAMPLED.npy)
}
mask: {}
scalar: {}
label: {}
vector: {}
data_timeless: {}
mask_timeless: {
LULC: FeatureIO(/mask_timeless/LULC.npy)
LULC_ERODED: FeatureIO(/mask_timeless/LULC_ERODED.npy)
LULC_ERODED_SAMPLED: FeatureIO(/mask_timeless/LULC_ERODED_SAMPLED.npy)
}
scalar_timeless: {}
label_timeless: {}
vector_timeless: {}
meta_info: {}
bbox: BBox(((500077.09501641133, 5095402.981379905), (501746.98615037295, 5097085.523204274)), crs=CRS('32633'))
timestamp: [datetime.datetime(2017, 1, 1, 0, 0), ..., datetime.datetime(2017, 12, 19, 0, 0)], length=23
)
To retrieve certain data from the patch as my training and testing data. I first loaded the 324 patches like this:
# load sampled eopatches
eopatches = []
path_out_sampled = './eopatches_sampled_Slovenia'
for idx in range(len(patchIDs)):
eopatches.append(EOPatch.load(f'{path_out_sampled}/eopatch_{idx}', lazy_loading=True))
eopatches = np.array(eopatches)
And then I used :
labels_train = np.array([eopatch.mask_timeless['LULC_ERODED_SAMPLED'] for eopatch in eopatches[patchIDs]])
features_train = np.array([eopatch.data['FEATURES_SAMPLED'] for eopatch in eopatches[patchIDs]])
However, it was taking forever executing and it was taking up so much RAM.
patchIDs
is a list of numbers ranging from 0 to 323.
I tried to use this instead:
def data_retrieval(eopatch):
feature_data=[]
label_data=[]
for eopatch in eopatches:
f=eopatch.data['FEATURES_SAMPLED']
feature_data.append(f)
l=eopatch.mask_timeless['LULC_ERODED_SAMPLED']
label_data.append(l)
feature_data=np.array(feature_data)
label_data=np.array(label_data)
return feature_data, label_data
I just ended up with the same result.
I have also reinitialized the environment multiple times but nothing changed and I am using GPU.
Using h5py didn’t help as well. I was able to load only 20 patches out of the 324.
I suspect that you really have a large dataset underneath your patch objects, and the lazy loading is masking that until you start to retrieve specific subsets. I suggest you use a tool more suited to large datasets, such as h5py. Hopefully that or a similar tool will still meet your requirements.
Answered by Modern geoSystems on January 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP