Use Python to Find duplicate values in a feature Class and populate a field

Question

So I found this python script on another post here and was trying to adjust it for my needs.  I'm a very novice python user so I'm struggling with how to modify the script.  I have a Feature Class stored in a feature Dataset that I want to search a field for duplicate values and populate a new field with Y for duplicate or N for none.  The below script is what I found that looks like it will work once I find a way to drill down into my file geodatabase.

from arcpy import *

inShapefile = pointsShapefile

checkField = "xyCombine"
        updateField = "dplicate"

#List of values found once
        occursOnce = []
       #list of values found twice
        occursTwice = []

cursor = da.SearchCursor (inShapefile, [checkField])
          for row in cursor:
          #Check value is not null
       if row[0]:
    #If not already found to occur twice, proceed
    if not row[0] in occursTwice:
        #If hasn't occured once yet
        if not row[0] in occursOnce:
            #Add to occurs once list
            occursOnce.append (row[0])
        #If value has already been found once
        else:
            #Add to occurs twice list (duplicates)
            occursTwice.append (row[0])
    del cursor

cursor = da.UpdateCursor (inShapefile, [checkField, updateField])
     for row in cursor:
      #Check value is not null
        if row[0]:
           #check if value in occursTwice list (i.e. is duplicate)
    if row[0] in occursTwice:
        row[1] = "Y"
    else:
        row[1] = "N"
    cursor.updateRow(row)
   del cursor

crmackey · Accepted Answer

Something like this should work:

import arcpy

inShapefile = pointsShapefile
checkField = "xyCombine"
updateField = "dplicate"

with arcpy.da.SearchCursor(inShapefile, [checkField]) as rows:
    values = [r[0] for r in rows]

d = {}
for item in set(values):
    if values.count(item) > 1:
        d[item] = 'Y'
    else:
        d[item] = 'N'

with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
    for row in rows:
        if row[0] in d:
            row[1] = d[row[0]]
            rows.updateRow(row)

And as @mr.adam suggested, the dictionary is not needed.  here is the cleaner version:

import arcpy

def findDupes(inShapefile, checkField, updateField):
    with arcpy.da.SearchCursor(inShapefile, [checkField]) as rows:
        values = [r[0] for r in rows]

with arcpy.da.UpdateCursor(inShapefile, [checkField, updateField]) as rows:
        for row in rows:
            if values.count(row[0]) > 1:
                row[1] = 'Y'
            else:
                row[1] = 'N'
            rows.updateRow(row)

if __name__ == '__main__':
    fc = r'C:TEMPcrm_test.gdbtest'
    fld = 'Project_Manager'
    up = 'duplicates'

findDupes(fc, fld, up)

recurvata · Answer

If you have an Advanced or Info license, another option in Arc is to use the Find Identical tool. This will give you a table of ID rows with matching values. Use the ONLY_DUPLICATES option. Then join the table to the feature class (fc ObjectID to InFID of table), using the KEEP_COMMON keyword for the join type (this is similar to a definition query, in that your feature class will only display matching records).. Then perform a field calculation on the layer. Finally, remove the join so the rest of the features are available.

I don't know how this compares with the da cursor for efficiency. Just another option.

Pfalbaum · Answer

I'm providing a more recent solution for finding duplicates and adding the count to a new field. It's straight from ESRI's help document: How to identify duplicate or unique values in ArcGIS Pro.
import arcpy

'''
This script will count the number of occurences of a value in a field ("field_in") and write them to a 
new field ("field_out")
'''

arcpy.env.workspace = r"C:UsersDuplicateTesting.gdb" #path to GDB goes here
infeature = "backup_02232021" #name of feature class goes here
field_in = "location_string_output" #column you're looking for the duplicates in
field_out = "COUNT_"+field_in
arcpy.AddField_management(infeature, field_out,"SHORT")

lista= []
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
    i=row.getValue(field_in)
    lista.append(i)
del cursor1, row

cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
    i=row.getValue(field_in)
    occ=lista.count(i)
    row.setValue(field_out, occ)
    cursor2.updateRow(row)
del cursor2, row
print("----done----")

Use Python to Find duplicate values in a feature Class and populate a field

3 Answers

Add your own answers!

Ask a Question