Home / Documentation / Advanced Usage

Basic Usage Tutorials 📖

This tutorial series will help you learn and perform the basic functions of zentity. Each tutorial adds a little more sophistication to the prior tutorials, so you can start simple and learn the more advanced features over time.

  1. Exact Name Matching
  2. Robust Name Matching
  3. Multiple Attribute Resolution
  4. Multiple Resolver Resolution
  5. Cross Index Resolution
  6. Scoping Resolution

Prerequisites

You must know how to use the Elasticsearch APIs before you can learn how to use zentity.

Specifically you should know:

If you truly wish to master the most important aspects of Elasticsearch for zentity, then I would recommend you take these training courses offered by Elastic, the creators of Elasticsearch.

If you have some basic experience with Elasticsearch, then you are ready to learn how to use zentity.

How to use zentity

Before we dive in, let's look at the typical usage of zentity at a high level.

You can think of zentity as three step process:

Let's break it down a bit more.

Step 1. Index some data

zentity operates on data that is indexed in Elasticsearch, an open source search engine for real-time search and analytics at scale. The most common tools for indexing documents in Elasticsearch are Logstash and Beats. You can also index single documents using the Index API or Bulk API. You need to have data in Elasticsearch before you can use zentity. You need to know how to use Elasticsearch, too.

Each tutorial in this series will give you sample data that you can use for practice.

Step 2. Define an entity model

Entity models are the most important constructs you need to learn about. zentity uses entity models to construct queries, match attributes across disparate indices, and resolve entities.

An entity model defines the logic for resolving an entity type such as a person or organization. It defines the attributes of the entity ("attributes"), the logic to match each attribute ("matchers"), the logic to resolve documents to an entity based on the matching attributes ("resolvers"), and the associations between attributes and matchers with index fields ("indices"). This is the step that demands the most thinking. You need to think about what attributes constitute an entity type, what logic goes into matching each attribute, which attributes and matchers map to which fields of which indices, and what combinations of matched attributes lead to resolution.

Luckily, all this thinking will pay off quickly, because entity models have two great features:

Reusability

Once you have an entity model you can use it everywhere. As you index new data sets with fields that map to familiar attributes, you can include them in your entity resolution jobs. If you index data with new attributes that aren't already in your model, you can simply update your model to support them.

Flexibility

You don't need to change your data to use an entity model. An entity model only controls the execution of queries. So there's no risk in updating or experimenting with an entity model.

Step 3. Resolve an entity

So you have some data and an entity model. Now you can resolve entities!

Once you have an entity model, you can use the Resolution API to run an entity resolution job using some input.

Example

Run an entity resolution job using an indexed entity model called person.

POST _zentity/resolution/person?pretty
{
  "attributes": {
    "name": [ "Alice Jones" ],
    "dob": [ "1984-01-01" ],
    "phone": [ "555-123-4567", "555-987-6543" ]
  }
}

Run an entity resolution job using an embeded entity model. This example uses three attributes, two resolvers, and two indices.

POST _zentity/resolution?pretty
{
  "attributes": {
    "name": [ "Alice Jones" ],
    "dob": [ "1984-01-01" ],
    "phone": [ "555-123-4567", "555-987-6543" ]
  },
  "model": {
    "attributes": {
      "name": {
        "type": "string"
      },
      "dob": {
        "type": "string"
      },
      "phone": {
        "type": "string"
      }
    },
    "resolvers": {
      "name_dob": {
        "attributes": [
          "name", "dob"
        ]
      },
      "name_phone": {
        "attributes": [
          "name", "phone"
        ]
      }
    },
    "matchers": {
      "exact": {
        "clause": {
          "term": {
            "{{ field }}": "{{ value }}"
          }
        }
      },
      "fuzzy": {
        "clause": {
          "match": {
            "{{ field }}": {
              "query": "{{ value }}",
              "fuzziness": "{{ params.fuzziness }}"
            }
          }
        },
        "params": {
          "fuzziness": "auto"
        }
      },
      "standard": {
        "clause": {
          "match": {
            "{{ field }}": "{{ value }}"
          }
        }
      }
    },
    "indices": {
      "foo_index": {
        "fields": {
          "full_name": {
            "attribute": "name",
            "matcher": "fuzzy"
          },
          "full_name.phonetic": {
            "attribute": "name",
            "matcher": "standard"
          },
          "date_of_birth.keyword": {
            "attribute": "dob",
            "matcher": "exact"
          },
          "telephone.keyword": {
            "attribute": "phone",
            "matcher": "exact"
          }
        }
      },
      "bar_index": {
        "fields": {
          "nm": {
            "attribute": "name",
            "matcher": "fuzzy"
          },
          "db": {
            "attribute": "dob",
            "matcher": "standard"
          },
          "ph": {
            "attribute": "phone",
            "matcher": "standard"
          }
        }
      }
    }
  }
}

Now that you have a sense of what to expect, let's walk through some guided tutorials to help you master the basic functions of zentity.

 


Continue Reading

Installation Exact Name Matching
© 2018 Dave Moore.
Licensed under the Apache License, Version 2.0.
Elasticsearch is a trademark of Elasticsearch BV.
This website uses Google Analytics.