Home / Documentation / Basic Usage / Exact Name Matching

Basic Usage Tutorials 📖

This tutorial is part of a series to help you learn and perform the basic functions of zentity. Each tutorial adds a little more sophistication to the prior tutorials, so you can start simple and learn the more advanced features over time.

  1. Exact Name Matching ← You are here.
  2. Robust Name Matching
  3. Multiple Attribute Resolution
  4. Multiple Resolver Resolution
  5. Cross Index Resolution
  6. Scoping Resolution

Exact Name Matching

Welcome to the "Hello world!" of entity resolution.

This tutorial will guide you through one the simplest forms of entity resolution – exact name matching. You will learn how to create an entity model and how to resolve an entity using a single attribute mapped to a single field of a single index. This is meant to introduce you to the most basic functions of entity resolution with zentity.

Let's dive in.

Important

You must install Elasticsearch, Kibana, and zentity to complete this tutorial. This tutorial was tested with zentity-1.0.2-elasticsearch-6.7.0.

1. Prepare for the tutorial

1.1 Open the Kibana Console UI

The Kibana Console UI makes it easy to submit requests to Elasticsearch and read responses.

1.2 Delete any old tutorial indices

Let's start from scratch. Delete any tutorial indices you might have created from other tutorials.

DELETE .zentity-tutorial-*

1.3 Create the tutorial index

Now create the index for this tutorial.

PUT .zentity-tutorial-index
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "first_name": {
          "type": "text"
        },
        "last_name": {
          "type": "text"
        },
        "street": {
          "type": "text"
        },
        "city": {
          "type": "text"
        },
        "state": {
          "type": "text"
        },
        "phone": {
          "type": "text"
        },
        "email": {
          "type": "text"
        }
      }
    }
  }
}

1.4 Load the tutorial data

Add the tutorial data to the index.

POST _bulk?refresh
{"index": {"_id": "1", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "[email protected]", "first_name": "Allie", "id": "1", "last_name": "Jones", "phone": "202-555-1234", "state": "DC", "street": "123 Main St"}
{"index": {"_id": "2", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "", "first_name": "Alicia", "id": "2", "last_name": "Johnson", "phone": "202-123-4567", "state": "DC", "street": "300 Main St"}
{"index": {"_id": "3", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "", "first_name": "Allie", "id": "3", "last_name": "Jones", "phone": "", "state": "DC", "street": "123 Main St"}
{"index": {"_id": "4", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "", "email": "", "first_name": "Ally", "id": "4", "last_name": "Joans", "phone": "202-555-1234", "state": "", "street": ""}
{"index": {"_id": "5", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Arlington", "email": "[email protected]", "first_name": "Eli", "id": "5", "last_name": "Jonas", "phone": "", "state": "VA", "street": "500 23rd Street"}
{"index": {"_id": "6", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "[email protected]", "first_name": "Allison", "id": "6", "last_name": "Jones", "phone": "202-555-1234", "state": "DC", "street": "123 Main St"}
{"index": {"_id": "7", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "", "first_name": "Allison", "id": "7", "last_name": "Smith", "phone": "+1 (202) 555 1234", "state": "DC", "street": "555 Broad St"}
{"index": {"_id": "8", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "[email protected]", "first_name": "Alan", "id": "8", "last_name": "Smith", "phone": "202-000-5555", "state": "DC", "street": "555 Broad St"}
{"index": {"_id": "9", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "[email protected]", "first_name": "Alan", "id": "9", "last_name": "Smith", "phone": "2020005555", "state": "DC", "street": "555 Broad St"}
{"index": {"_id": "10", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "", "first_name": "Alison", "id": "10", "last_name": "Smith", "phone": "202-555-9876", "state": "DC", "street": "555 Broad St"}
{"index": {"_id": "11", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "", "email": "[email protected]", "first_name": "Alison", "id": "11", "last_name": "Jones-Smith", "phone": "2025559867", "state": "", "street": ""}
{"index": {"_id": "12", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Washington", "email": "[email protected]", "first_name": "Allison", "id": "12", "last_name": "Jones-Smith", "phone": "", "state": "DC", "street": "555 Broad St"}
{"index": {"_id": "13", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Arlington", "email": "[email protected]", "first_name": "Allison", "id": "13", "last_name": "Jones Smith", "phone": "703-555-5555", "state": "VA", "street": "1 Corporate Way"}
{"index": {"_id": "14", "_index": ".zentity-tutorial-index", "_type": "_doc"}}
{"city": "Arlington", "email": "[email protected]", "first_name": "Elise", "id": "14", "last_name": "Jonas", "phone": "703-555-5555", "state": "VA", "street": "1 Corporate Way"}

Here's what the tutorial data looks like.

id first_name last_name street city state phone email
1 Allie Jones 123 Main St Washington DC 202-555-1234 [email protected]
2 Alicia Johnson 300 Main St Washington DC 202-123-4567
3 Allie Jones 123 Main St Washington DC
4 Ally Joans 202-555-1234
5 Eli Jonas 500 23rd Street Arlington VA [email protected]
6 Allison Jones 123 Main St Washington DC 202-555-1234 [email protected]
7 Allison Smith 555 Broad St Washington DC +1 (202) 555 1234
8 Alan Smith 555 Broad St Washington DC 202-000-5555 [email protected]
9 Alan Smith 555 Broad St Washington DC 2020005555 [email protected]
10 Alison Smith 555 Broad St Washington DC 202-555-9876
11 Alison Jones-Smith 2025559867 [email protected]
12 Allison Jones-Smith 555 Broad St Washington DC [email protected]
13 Allison Jones Smith 1 Corporate Way Arlington VA 703-555-5555 [email protected]
14 Elise Jonas 1 Corporate Way Arlington VA 703-555-5555 [email protected]

2. Create the entity model

Let's use the Models API to create the entity model below. We'll review each part of the model in depth.

PUT _zentity/models/zentity-tutorial-person
{
  "attributes": {
    "first_name": {
      "type": "string"
    },
    "last_name": {
      "type": "string"
    }
  },
  "resolvers": {
    "name_only": {
      "attributes": [ "first_name", "last_name" ]
    }
  },
  "matchers": {
    "simple": {
      "clause": {
        "match": {
          "{{ field }}": "{{ value }}"
        }
      }
    }
  },
  "indices": {
    ".zentity-tutorial-index": {
      "fields": {
        "first_name": {
          "attribute": "first_name",
          "matcher": "simple"
        },
        "last_name": {
          "attribute": "last_name",
          "matcher": "simple"
        }
      }
    }
  }
}

2.1 Review the attributes

We defined two attributes called "first_name" and "last_name" as shown in this section:

{
  "attributes": {
    "first_name": {
      "type": "string"
    },
    "last_name": {
      "type": "string"
    }
  }
}

The default type of any attribute is "string". You can exclude "type" to simplify the entity model like this:

{
  "attributes": {
    "first_name": {},
    "last_name": {}
  }
}

2.2 Review the resolvers

We defined a single resolver called "name_only" as shown in this section:

{
  "resolvers": {
    "name_only": {
      "attributes": [ "first_name", "last_name" ]
    }
  }
}

This resolver requires only the "first_name" and "last_name" attributes to resolve an entity. So if you try to resolve a person named "Alice," then every document with the name "Alice" will be grouped with her. Obviously this would raise many false positives in the real world. We're doing this as a gentle introduction to the concept of entity resolution.

Tip

Most resolvers should use multiple attributes to resolve an entity to minimize false positives. Many people share the same name, but few people share the same name and address. Consider all the combinations of attributes that could resolve an entity with confidence, and then create a resolver for each combination. Other tutorials explore how to use resolvers with multiple attributes.

2.3 Review the matchers

We defined a single matcher called "simple" as shown in this section:

{
  "matchers": {
    "simple": {
      "clause": {
        "match": {
          "{{ field }}": "{{ value }}"
        }
      }
    }
  }
}

This matcher uses a simple match clause:

{
  "match": {
    "{{ field }}": "{{ value }}"
  }
}

The "{{ field }}" and "{{ value }}" strings are special variables. Every matcher should have these variables defined somewhere in the "clause" field. zentity will replace the "{{ field }}" variable with the name of an index field and the "{{ value }}" variable with the value of an attribute.

2.4 Review the indices

We defined a single index as shown in this section:

{
  "indices": {
    ".zentity-tutorial-index": {
      "fields": {
        "first_name": {
          "attribute": "first_name",
          "matcher": "simple"
        },
        "last_name": {
          "attribute": "last_name",
          "matcher": "simple"
        }
      }
    }
  }
}

3. Resolve an entity

Let's use the Resolution API to resolve a person with a known first name and last name:

POST _zentity/resolution/zentity-tutorial-person?pretty
{
  "attributes": {
    "first_name": [ "Allie" ],
    "last_name": [ "Jones" ]
  }
}

The results will look like this:

{
  "took" : 2,
  "hits" : {
    "total" : 2,
    "hits" : [ {
      "_index" : ".zentity-tutorial-index",
      "_type" : "_doc",
      "_id" : "1",
      "_hop" : 0,
      "_attributes" : {
        "first_name" : "Allie",
        "last_name" : "Jones"
      },
      "_source" : {
        "city" : "Washington",
        "email" : "[email protected]",
        "first_name" : "Allie",
        "id" : "1",
        "last_name" : "Jones",
        "phone" : "202-555-1234",
        "state" : "DC",
        "street" : "123 Main St"
      }
    }, {
      "_index" : ".zentity-tutorial-index",
      "_type" : "_doc",
      "_id" : "3",
      "_hop" : 0,
      "_attributes" : {
        "first_name" : "Allie",
        "last_name" : "Jones"
      },
      "_source" : {
        "city" : "Washington",
        "email" : "",
        "first_name" : "Allie",
        "id" : "3",
        "last_name" : "Jones",
        "phone" : "",
        "state" : "DC",
        "street" : "123 Main St"
      }
    } ]
  }
}

As expected, we retrieved two documents each with a first name that exactly matches "Allie" and a last name that exactly matches "Jones." All other documents, including those that were similar to these, were excluded from the results because we required exact matches on those two fields.

Conclusion

Congratulations! You just did one of the simplest forms of entity resolution – exact name matching.

Not too exciting yet, right? Let's make things a little more interesting.

The next tutorial will show how you can accomplish robust name matching using multiple forms of a name to handle challenges such as typos or phonetic variance. You will resolve an entity using a single attribute matched to multiple fields of a single index, rather than a single field of a single index.

 


Continue Reading

Basic Usage Robust Name Matching
© 2018 Dave Moore.
Licensed under the Apache License, Version 2.0.
Elasticsearch is a trademark of Elasticsearch BV.
This website uses Google Analytics.