Main Nav

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

Comments

DP,

 

Banner has a system in place for this, called Common Matching.  It’s configurable so that you can decide which fields to use.  The success of the system and the best way to configure it is highly dependent upon what data you have in the base system and what data is coming in.  Even though your first definite match example seems entirely reasonable, it may not be good in cases where you have incoming frequent bad data (for example foreign students making up SSNs – we actually see that regularly).  Your second definite match example may work if you don’t have a large database, but we have millions of records in ours, and it isn’t uncommon to have two Michael Smiths with the same birthday.  The more fields you have populated in the database and the incoming data the better, so you don’t have to bother with cases where there isn’t a clear mismatch, yet isn’t a clear match either.  Good luck.

 

Kevin

 

From: Harris, DP (LLU) [mailto:dpharris@LLU.EDU]
Sent: Tuesday, August 07, 2012 11:46 AM
Subject: Preventing Duplicate Records in Key Person Database

 

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

Sent on the go from my smartphone.
please excuse +yp0s....

On Aug 9, 2012 4:45 PM, "Shalla, Kevin" <kshalla@uic.edu> wrote:

DP,

 

Banner has a system in place for this, called Common Matching.  It’s configurable so that you can decide which fields to use.  The success of the system and the best way to configure it is highly dependent upon what data you have in the base system and what data is coming in.  Even though your first definite match example seems entirely reasonable, it may not be good in cases where you have incoming frequent bad data (for example foreign students making up SSNs – we actually see that regularly).  Your second definite match example may work if you don’t have a large database, but we have millions of records in ours, and it isn’t uncommon to have two Michael Smiths with the same birthday.  The more fields you have populated in the database and the incoming data the better, so you don’t have to bother with cases where there isn’t a clear mismatch, yet isn’t a clear match either.  Good luck.

 

Kevin

 

From: Harris, DP (LLU) [mailto:dpharris@LLU.EDU]
Sent: Tuesday, August 07, 2012 11:46 AM
Subject: Preventing Duplicate Records in Key Person Database

 

Friends,

 

Do any of you have an algorithm or solution for challenging and existing database to ensure that a new record does not constitute a duplicate?

 

Example:

Match: If SSN and LastName match, it is a definite match.

Match: If Birthdate, FirstName and LastName match, it is a definite match.

 

Possible Match:

·         Birthdate and LastName

·         FirstName, LastName and ZipCode

 

Etc?

 

Any and all feedback is welcomed and appreciated.  Locally developed algorithms as well as vendor solutions are equally welcomed.  (Of course, vendors should reply separately.)

 

 

DP Harris, Phd—Vice President/CIO

LOMA LINDA UNIVERSITY | Information Services

 

11139 Anderson Street, Loma Linda, California 92350

(909) 558-7600

 

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

********** Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.

DP,

 

Kevin is right on target about how common matching works in Banner.  We share the problem of student typos and false SSN reporting.  Sitting at roughly half a million records converted from the legacy system into Banner several years ago, we have a rather high rate of matches kicked out.  Our experience at Suffolk County CC has been that we like the matching criteria looser so that more potential duplicates are kicked out for review.  For us, the real challenge has been to make sure that the Admissions staff has a solid process to vet the new records that get kicked out.  There needs to be good training and continuing evaluation of the success of the methodology used.  My advice would be to not depend heavily on the system to automatically resolve duplicates, but to have the team/department responsible for data integrity work up a sustainable system for staff to manage them.

 

Regards,

 

Doug

 

Close
Close


Annual Conference
September 29–October 2
Register Now!

Events for all Levels and Interests

Whether you're looking for a conference to attend face-to-face to connect with peers, or for an online event for team professional development, see what's upcoming.

Close

Digital Badges
Member recognition effort
Earn yours >

Career Center


Leadership and Management Programs

EDUCAUSE Institute
Project Management

 

 

Jump Start Your Career Growth

Explore EDUCAUSE professional development opportunities that match your career aspirations and desired level of time investment through our interactive online guide.

 

Close
EDUCAUSE organizes its efforts around three IT Focus Areas

 

 

Join These Programs If Your Focus Is

Close

Get on the Higher Ed IT Map

Employees of EDUCAUSE member institutions and organizations are invited to create individual profiles.
 

 

Close

2014 Strategic Priorities

  • Building the Profession
  • IT as a Game Changer
  • Foundations


Learn More >

Uncommon Thinking for the Common Good™

EDUCAUSE is the foremost community of higher education IT leaders and professionals.