TransWikia.com

How to specify from which file duplicates should be deleted after appending two datasets

Geographic Information Systems Asked by Olive on December 22, 2020

I am using python script and ArcGIS Desktop 10.8.1 to synchronize two datasets. There are many (thousands) of duplicate features that I would like to drop in an output. How can I specify that, in the case of identical features across datasets, I want to keep data from dmanfile and delete duplicates from cadfile. I am totally new to python, but here are the relevant parts of the code I have so far:

#input files from user console
dmanfile = input("./DSchemaFix1/D_Man_OG/D_Man_Fields_Complete.shp")
cadfile = input("./DSchemaFix1/CurrentCAD/CurrentCADFiles.shp")

gdf = gpd.read_file(dmanfile)
cad = gpd.read_file(cadfile)

gdf_appended = cad.append(gdf)

gdf_dupdropped = gdf_appended.drop_duplicates(keep='first', subset=['StreetName','Address', 'Apartment','ZipCode'])

One Answer

Add a source column, sort by it and drop duplicates:

import geopandas as gpd

dman = gpd.read_file('/home/bera/Desktop/tempgis/dman.shp')
dman['source'] = 'dman'
cad = gpd.read_file('/home/bera/Desktop/tempgis/cadfile.shp')
cad['source'] = 'cad'

both = dman.append(cad)
no_dups = both.sort_values(by='source', ascending=False).drop_duplicates(subset=['StreetName','Address', 'Apartment','ZipCode'], keep='first') #dman come before cad and are kept
no_dups.to_file('/home/bera/Desktop/tempgis/nodups.shp')

Answered by BERA on December 22, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP