Main Nav

Message from benjamin.oshrin@the.oshrinium.net

For those of you with home grown ID match systems (perl scripts, etc), what, if any fuzzy matching do you do and how do you do it? I'm specifically looking for details like "We have a lookup table of firstnames, so if we get a first name of Bill we'll also check the database for Will and William" or "We pull any records with the same SSN and the same (or transposed) DOB, and then use the perl module Text::LevenshteinXS to compare the candidate names." Thanks, -Benn-

Comments

We keep our matching simple. We have found this sufficiently mitigates the amount of manual matching that must be done.
Benn, We also keep our matching simple today. we've built the standard record match into the record creation mechanism (home grown API with an alternative REST interface to the API). The process does a simple lookup in the DB for matching SSN, and last name if it finds more than one it bails. We have about 12 different SORs some provide SSNs and some provide partial SSNs. We expect we'll soon need to remove SSNs from the Person Registry so the entire matching process will need to be re-visited along with our ID proofing mechanisms where SSN is used today. Other matching types (ex. student number in payroll and real student number, lastname and birthdate) are manually reviewed by staff in our account help desk. -- Jonathan Pass, IAM Technical Lead UW-IT Identity and Access Management 206-543-0278 On Thu, 15 Nov 2012, Jones, Mark B wrote: > We keep our matching simple. We have found this sufficiently mitigates the amount of manual matching that must be done. > > > >
Don't remove SSNs from your Person Registry !!! Yes, protect your Registry like you protect your HR system. No, don't REQUIRE SSN. But matching is a valid use of SSN if you already have it. Any identifier attributes you have are of rare value with respect to matching.
We don't do any fuzzy matching on individual elements (e.g., matching Bill to Will, checking for character transposition, etc). 

Our match is on 4 basic data elements, and our "fuzzy" logic is based on how many of those data elements match and how many mismatch. (If a value such as DoB is blank on a record, then we don't record either a match or a mismatch).

So in general our rules look like:

- If 4 attributes match, it's a match
- If 3 attributes match and none mismatch, it's a match
- If 3 attributes match and one mismatches, it's a fuzzy match
- If 2 attributes match and none mismatch, it's a fuzzy match
- Else it's a new record

I may have the logic slightly wrong (e.g., we might call 2 match/1 mismatch "fuzzy" for example), but the overall idea still holds.

This works reasonably well for our institution, though the "fuzzy match resolver" folks get very unhappy with us at the start of every enrollment cycle, when hundreds of fuzzy matches suddenly appear in the system on the same day.

--- Eric

No matter what set of matching rules you use there will be five possible outcomes:

Positive match – person already exists

Negative match – new person

Possible match – can’t tell

False Positive match error – matched an existing person but is actually new

False Negative match error – looked new but actually already exists

 

A strict rule set will increase the rate of Possible matches and False Negative matches and decrease False Positive matches.

A loose rule set will increase the rate of False Positive matches and decrease Possible matches and False Negative matches.

 

The errors and Possible matches all require manual intervention.  So the goal in tuning the rule set you use is to minimize the amount of manual intervention required.  Note that False Positive matches can be hard or impossible to resolve without data loss.

 

My advice is to lean toward a strict rule set and then loosen rules until manual intervention is minimized for your population.

 

We have also found that it is best if you can adjust the rule set independently for each of your Systems of Record.  For instance, we have found that for our Guest system the matching rule set can be much more loose than with our other SORs.

 

From: Identity Management Constituent Group Discussion list [mailto:IDM@LISTSERV.EDUCAUSE.EDU] On Behalf Of Eric Goodman
Sent: Thursday, November 15, 2012 11:23 AM
To: IDM@LISTSERV.EDUCAUSE.EDU
Subject: Re: [IDM] Home Grown ID Match Fuzzy Match?

 

We don't do any fuzzy matching on individual elements (e.g., matching Bill to Will, checking for character transposition, etc). 

 

Our match is on 4 basic data elements, and our "fuzzy" logic is based on how many of those data elements match and how many mismatch. (If a value such as DoB is blank on a record, then we don't record either a match or a mismatch).

 

So in general our rules look like:

 

- If 4 attributes match, it's a match

- If 3 attributes match and none mismatch, it's a match

- If 3 attributes match and one mismatches, it's a fuzzy match

- If 2 attributes match and none mismatch, it's a fuzzy match

- Else it's a new record

 

I may have the logic slightly wrong (e.g., we might call 2 match/1 mismatch "fuzzy" for example), but the overall idea still holds.

 

This works reasonably well for our institution, though the "fuzzy match resolver" folks get very unhappy with us at the start of every enrollment cycle, when hundreds of fuzzy matches suddenly appear in the system on the same day.

 

--- Eric

 

Close
Close


Annual Conference
September 29–October 2
Register Now!

Events for all Levels and Interests

Whether you're looking for a conference to attend face-to-face to connect with peers, or for an online event for team professional development, see what's upcoming.

Close

Digital Badges
Member recognition effort
Earn yours >

Career Center


Leadership and Management Programs

EDUCAUSE Institute
Project Management

 

 

Jump Start Your Career Growth

Explore EDUCAUSE professional development opportunities that match your career aspirations and desired level of time investment through our interactive online guide.

 

Close
EDUCAUSE organizes its efforts around three IT Focus Areas

 

 

Join These Programs If Your Focus Is

Close

Get on the Higher Ed IT Map

Employees of EDUCAUSE member institutions and organizations are invited to create individual profiles.
 

 

Close

2014 Strategic Priorities

  • Building the Profession
  • IT as a Game Changer
  • Foundations


Learn More >

Uncommon Thinking for the Common Good™

EDUCAUSE is the foremost community of higher education IT leaders and professionals.