Home / Documentation / Entity Resolution / Output Specification

Entity Resolution Output Specification

{
  "took": TOOK_IN_MILLIS,
  "hits": {
    "total": HITS_TOTAL,
    "hits": [
      {
        "_index": INDEX_NAME,
        "_id": DOC_ID,
        "_hop": HOP_NUMBER,
        "_query": QUERY_NUMBER,
        "_score": COMPOSITE_IDENTITY_CONFIDENCE_SCORE,
        "_attributes": {
          ATTRIBUTE_NAME: [
            ATTRIBUTE_VALUE,
            ...
          ],
          ...
        },
        "_explanation": {
          "resolvers": {
            RESOLVER_NAME: {
              "attributes": [
                ATTRIBUTE_NAME,
                ...
              ]
            },
            ...
          },
          "matches": [
            {
              "attribute": ATTRIBUTE_NAME,
              "target_field": FIELD_NAME,
              "target_value": FIELD_VALUE
              "input_value": ATTRIBUTE_VALUE,
              "input_matcher": MATCHER_NAME,
              "input_matcher_params": MATCHER_PARAMS,
              "score": ATTRIBUTE_IDENTITY_CONFIDENCE_SCORE
            },
            ...
          ]
        },
        "_source": {
          FIELD_NAME: FIELD_VALUE,
          ...
        }
      },
      ...
    ]
  },
  "queries": [
    {
      "_hop": HOP_NUMBER,
      "_query": QUERY_NUMBER,
      "_index": INDEX_NAME,
      "filters": {
        "attributes": {
          "tree": {
            WEIGHT_LEVEL: {
              ATTRIBUTE_NAME: {
                ...
              },
              ...
            },
            ...
          },
          "resolvers": {
            RESOLVER_NAME: {
              "attributes": [
                ATTRIBUTE_NAME,
                ...
              ]
            },
            ...
          }
        },
        "terms": {
          "tree": {
            WEIGHT_LEVEL: {
              ATTRIBUTE_NAME: {
                ...
              },
              ...
            },
            ...
          },
          "resolvers": {
            RESOLVER_NAME: {
              "attributes": [
                ATTRIBUTE_NAME,
                ...
              ]
            },
            ...
          }
        }
      },
      "search": {
        "request": SEARCH_REQUEST,
        "response": SEARCH_RESPONSE
      }
    },
    ...
  ]
}

Entity resolution outputs are JSON documents. In the framework shown above, lowercase quoted values (e.g. "attributes") are constant fields, uppercase literal values (e.g. ATTRIBUTE_NAME) are variable fields or values, and elipses (...) are optional repetitions of the preceding field or value.

An entity resolution output is the response to a resolution request. Its structure is similar to the response of an Elasticsearch Search API query. It contains the documents ("hits") associated with the entity as well as information about the job itself. Documents can contain the original source values, the normalized attribute values, and information about the index and hop number from which the document was retrieved.

The "queries", "_source", "_attributes", and "_hits" fields each can be excluded from the output. By default, "queries" is excluded to reduce the amount of data transferred from the cluster to the client.

"took"

The number of milliseconds elapsed between the start time and stop time of the entity resolution job. This excludes the time it takes to validate the request, request and parse the entity model, and serialize the response.

"hits"

An object containing the documents that matched the input across all hops.

This field can be excluded from the output by setting hits=false in the URI parameters of the resolution request. This can help to get a slightly more accurate measurement of "took" by excluding some processing of the responses from Elasticsearch. This can also help to get slightly more accurate timing measurements when stress testing zentity by minimizing the amount of data transferred from the cluster to the client.

"hits"."total"

The total number of documents that matched the input across all hops.

"hits"."hits"

An array of objects, each of which is a document that matched the input.

"hits"."hits"."_index"

The name of the index from which the document was retrieved.

"hits"."hits"."_id"

The _id of the document.

"hits"."hits"."_hop"

The hop number at which the document was received. A "hop" is an iteration in which zentity submits a query to each index that can be queried.

"hits"."hits"."_query"

The query number of a given "_hop" at which the document was received.

"hits"."hits"."_score"

The composite identity confidence score calculated from the attribute identity confidence scores.

"hits"."hits"."_attributes"

An object containing the normalized values for each attribute of the document. This object is constructed by taking each "_source" field that is associated with an attribute in the entity model and mapping it to the name of the attribute. Some values, such as dates, are normalized into a format that will be common across documents from disparate indices.

This field can be excluded from the output by setting _attributes=false in the URI parameters of the resolution request.

"hits"."hits"."_explanation"

An object that explains which "resolvers" caused the document to match and the reasons for the "matches".

This field can be included in the output by setting _explanation=true in the URI parameters of the resolution request.

"hits"."hits"."_explanation"."resolvers"

An object that explains which resolvers caused the document to match. A resolver is listed if each of its "attributes" has at least one match. The reason for each match is listed in the "matches" array.

"hits"."hits"."_explanation"."resolvers".RESOLVER_NAME."attributes"

The attributes of a resolver that caused the document to match.

"hits"."hits"."_explanation"."matches"

An array of objects that explains how the inputs of the search matched the fields of the document. Each object represents a match between an input value and an indexed value for a given attribute using a given matcher and its params.

"hits"."hits"."_explanation"."matches"."attribute"

The name of the attribute for a match.

"hits"."hits"."_explanation"."matches"."target_field"

The name of the index field for a match.

"hits"."hits"."_explanation"."matches"."target_value"

The value of the index field for a match.

"hits"."hits"."_explanation"."matches"."input_value"

The value of the input for a match.

"hits"."hits"."_explanation"."matches"."input_matcher"

The name of the matcher for a match.

"hits"."hits"."_explanation"."matches"."input_matcher_params"

The params of the matcher for a match.

"hits"."hits"."_explanation"."matches"."score"

The attribute identity confidence score calculated from the attribute identity confidence base score, matcher quality score, and index field quality score.

"hits"."hits"."_source"

The original fields from the document.

This field can be excluded from the output by setting _source=false in the URI parameters of the resolution request.

"queries"

An object containing information about the queries that were submitted to Elasticsearch during the resolution job.

This field is excluded from the output by default. It can be included by setting queries=true in the URI parameters of the resolution request.

"queries"."_hop"

The hop number at which the query was submitted.

"queries"."_query"

The query number at which the query was submitted for a given "_hop".

"queries"."_index"

The index that was queried.

"queries"."filters"

An object containing information about the filters created to construct the query.

"queries"."filters"."attributes"

An object containing information about the filters created for known attribute values of the entity.

"queries"."filters"."attributes"."tree"

A recursive object containing the attributes of the resolvers as they were constructed in the query.

Different resolvers can share many of the same attributes. Consider the following:

[
  [ "name", "street", "city", "state" ],
  [ "name", "street", "zip" ],
  [ "name", "dob", "state" ],
  [ "name", "phone" ],
  [ "name", "email" ],
  [ "id" ],
]

Many of these attributes ("name", "street", "state") are shared by multiple resolvers. It would be highly inefficient to populate the values of each attribute multiple times in a single query. zentity optimizes queries by determining how the attributes can be nested to minimize redundant clauses.

Here's the effect of the optimization:

{
  "0": {
    "name": {
      "street": {
        "city": {
          "state": {
            "zip": {}
          }
        },
        "zip": {}
      },
      "dob": {
        "state": {}
      },
      "phone": {},
      "email": {}
    },
    "id": {}
  }
}

In this example, zentity eliminates four redundant copies of "name" values and one redundant copy of "street" values. The clauses of attributes at the same level of the hierarchy are combined with their siblings using OR, while the clauses of child attributes are combined with their parents using AND.

"queries"."filters"."attributes"."resolvers"

An object containing the resolvers that were used to construct the filter for the known attributes of the entity.

"queries"."filters"."terms"

An object containing information about the filters created for arbitrary terms given in the resolution input.

"queries"."filters"."terms"."tree"

A recursive object containing the attributes of the resolvers as they were constructed in the query. Follows the same structure as "queries"."filters"."attributes"."tree".

"queries"."filters"."terms"."resolvers"

An object containing the resolvers that were used to construct the filter for arbitrary terms given in the resolution input.

"queries"."search"

An object containing information about the search request and response to and from Elasticsearch.

"queries"."search"."request"

An object containing the search request payload to Elasticsearch.

"queries"."search"."response"

An object containing the search response payload from Elasticsearch.

If profile=true was set in the URI parameters of the resolution request, then the query profile data will be included in this response field.

 


Continue Reading

Entity Resolution Input Specification REST APIs
© 2018 - 2024 Dave Moore.
Licensed under the Apache License, Version 2.0.
Elasticsearch is a trademark of Elasticsearch BV.
This website uses Google Analytics.