TransWikia.com

Using an attribute index to find matching attributes of two layers faster?

Geographic Information Systems Asked by MrXsquared on February 17, 2021

Similar to Indexing attribute field of shapefile in QGIS, I am wondering if such a thing like an attribute index exists for PyQGIS. Goal of its usage would be to iterate over two vector layers and find matching attribute values of a specified field in each layer. So it would work like a spatial index, just using attributes instead. So far I could only find, that I can create an Index using createAttributeIndex() as stated here and here. But absolutely no further information about its usage, the way it works or examples.

Basically the idea is to speed up code written like this:

vectorlayer_a = QgsProject.instance().mapLayersByName("layer_a")[0]
vectorlayer_b = QgsProject.instance().mapLayersByName("layer_b")[0]

for feat_a in vectorlayer_a.getFeatures():
    value_a = feat_a.attribute(1)
    for feat_b in vectorlayer_b.getFeatures():
        value_b = feat_b.attribute(1)
        if value_a == value_b:
            print('Hurray, finally found (another) one. Can I find all of them faster with an attribute index?')
            # Do stuff some stuff like...
            geom_a = feat_a.geometry()
            geom_b = feat_b.geometry()

Also, could attribute(1) have any datatype or would such a thing only work with numerical values, if this ‘thing’ exists at all?

One Answer

Still, I don't know if there is an attribute index for PyQGIS and if so, how I could use it. But comments from bwp8nt and Michael Stimson pointed me into the right direction of making use of dictionaries to optimize my code without it. With the help of this great answer on SO, I finally managed to achieve my desired optimization without using an attribute index (explanation as comments):

vectorlayer_a = QgsProject.instance().mapLayersByName("layer_a")[0]
vectorlayer_b = QgsProject.instance().mapLayersByName("layer_b")[0]

# Creating a dictionary of both layers containing feature id and desired attribute
# feature id is needed to access desired features later on
# attribute is needed to find matches later on
# loop through both layers only once!
dict_a = {}
dict_b = {}
for feat_a in vectorlayer_a.getFeatures():
    dict_a[feat_a.id()] = feat_a.attribute(1) # feature id is used as key and attribute of column 1 as value (can have any datatype and must not be unique)
for feat_b in vectorlayer_b.getFeatures():
    dict_b[feat_b.id()] = feat_b.attribute(1) # feature id is used as key and attribute of column 1 as value (can have any datatype and must not be unique)

# Avoid unnecessary loops through layer_b by using a dictionary for desired matches
# Source: https://stackoverflow.com/a/64597197/8947209 (dont forget to upvote!)
dic2 = {}
# re-sort: make keys of dict_b the values and values of dict_b the now unique keys
for i in dict_b.keys():
    elem = dict_b[i]
    if dic2.get(elem, None):
        dic2[elem].append(i)
    else:
        dic2[elem] = [i]
matches = {}
# find the matching dict_a keys of re-sorted keys 
for i in dict_a.keys():
    elem = dict_a[i]
    x = dic2.get(elem, None)
    if x:
        matches[i] = x
#print(dic2)
#print(matches)

# Access desired features from matching dictionary by using feature ids
for k, v in matches.items(): # loop through key and value of matching dictionary
    i = 0 # counter to access value in values
    for l in v: # loop through list of current value
        featureid_layer_a = k # key of matching dict represents keys of dict_a and therefore featureids of layer_a
        featureid_layer_b = v[i] # values of matching dict represent keys of dict_b and therefore featureids of layer_b
        print('Hurray, found (another) pair really fast: ' + 'matching-dict-key|dict_a-key|layer_a-featureid = ' + str(featureid_layer_a) + ' | matching-dict-value|dict_b-key|layer_b-featureid = ' + str(featureid_layer_b))
        geom_a = vectorlayer_a.getFeature(featureid_layer_a).geometry() # accessing stuff by using featureid
        geom_b = vectorlayer_b.getFeature(featureid_layer_b).geometry() # accessing stuff by using featureid
        #print('geom_a: ' + str(geom_a))
        #print('geom_b: ' + str(geom_b))
        i += 1

Correct answer by MrXsquared on February 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP