Main Nav

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

Comments

DP,

 

Banner has a system in place for this, called Common Matching.  It’s configurable so that you can decide which fields to use.  The success of the system and the best way to configure it is highly dependent upon what data you have in the base system and what data is coming in.  Even though your first definite match example seems entirely reasonable, it may not be good in cases where you have incoming frequent bad data (for example foreign students making up SSNs – we actually see that regularly).  Your second definite match example may work if you don’t have a large database, but we have millions of records in ours, and it isn’t uncommon to have two Michael Smiths with the same birthday.  The more fields you have populated in the database and the incoming data the better, so you don’t have to bother with cases where there isn’t a clear mismatch, yet isn’t a clear match either.  Good luck.

 

Kevin

 

From: Harris, DP (LLU) [mailto:dpharris@LLU.EDU]
Sent: Tuesday, August 07, 2012 11:46 AM
Subject: Preventing Duplicate Records in Key Person Database

 

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

Sent on the go from my smartphone.
please excuse +yp0s....

On Aug 9, 2012 4:45 PM, "Shalla, Kevin" <kshalla@uic.edu> wrote:

DP,

 

Banner has a system in place for this, called Common Matching.  It’s configurable so that you can decide which fields to use.  The success of the system and the best way to configure it is highly dependent upon what data you have in the base system and what data is coming in.  Even though your first definite match example seems entirely reasonable, it may not be good in cases where you have incoming frequent bad data (for example foreign students making up SSNs – we actually see that regularly).  Your second definite match example may work if you don’t have a large database, but we have millions of records in ours, and it isn’t uncommon to have two Michael Smiths with the same birthday.  The more fields you have populated in the database and the incoming data the better, so you don’t have to bother with cases where there isn’t a clear mismatch, yet isn’t a clear match either.  Good luck.

 

Kevin

 

From: Harris, DP (LLU) [mailto:dpharris@LLU.EDU]
Sent: Tuesday, August 07, 2012 11:46 AM
Subject: Preventing Duplicate Records in Key Person Database

 

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

DP,

 

Kevin is right on target about how common matching works in Banner.  We share the problem of student typos and false SSN reporting.  Sitting at roughly half a million records converted from the legacy system into Banner several years ago, we have a rather high rate of matches kicked out.  Our experience at Suffolk County CC has been that we like the matching criteria looser so that more potential duplicates are kicked out for review.  For us, the real challenge has been to make sure that the Admissions staff has a solid process to vet the new records that get kicked out.  There needs to be good training and continuing evaluation of the success of the methodology used.  My advice would be to not depend heavily on the system to automatically resolve duplicates, but to have the team/department responsible for data integrity work up a sustainable system for staff to manage them.

 

Regards,

 

Doug