Data deduplication via Automation?

Soonimproveduser · February 14, 2024, 9:05am

Hey,

the Data deduplication plugin works fine. However, is there any chance to do this via automation?

Has someone an idea?

Best regards

cdb · February 14, 2024, 9:14am

This question was already answered in the past:

Soonimproveduser · February 14, 2024, 9:32am

Works perfect!!!

Thank you for the assistants.

webdienste · February 14, 2024, 12:21pm

I would so love to have this python automation if it could do the following:

Find duplicate rows on the basis of a “name” column
Copy information from selected columns in the duplicate row(s) into the original (i.e. first) row
Delete the duplicate row(s) to leave only the original first row

My use case: I have a table that should contain only one copy per registered person. Every time a user signs up for a course, a new row is created. I then have to manually merge the information from the new row into the original to leave only one row per person. The information I have to copy is only ever the course they are signed up for. However, this course is linked from a “courses” table.

Soonimproveduser · February 14, 2024, 2:37pm

I had a similar problem and fixed it like this:

I had a date and a Used ID in one table.
I created a table in which I had summaries, e.g. the link was to the date column in my case, in your case it could be the course to which you link. Leading to many rows of the same date for the same user. However, through the link column each row for the date had all the links for the user in it. So I created a formula column in which I combined {date}+{id}, which I had like three or four per day per person. The phyton script only controlled the formula column of {date}+{id} and deleted the extra rows via automation.

In your case {course name/id}+{user id} as formula column for the phyton script could be the solution.

system · February 16, 2024, 2:38pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.