Home / Documentation / REST APIs / Resolution API
Resolution API
Runs an entity resolution job and returns the results.
The request accepts two endpoints:
POST _zentity/resolution
POST _zentity/resolution/{entity_type}
Example Request
This example request resolves a person identified by a name, a dob, and
two phone values, while limiting the search to one index called users_index
and two resolvers called name_dob and name_phone. The request passes a param
called fuzziness to the phone attribute, which can be referenced in any
matcher clause that uses the fuzziness param. Note that an attribute can
accept either an array of values or an object with the values specified in a
field called "values". It's also valid to specify an attribute with no values
but to override the default params, such as to format the results of any date
attributes in the response.
Read the input specification for complete details about the structure of a request.
POST _zentity/resolution/person?pretty
{
"attributes": {
"name": [ "Alice Jones" ],
"dob": {
"values": [ "1984-01-01" ]
},
"phone": {
"values": [
"555-123-4567",
"555-987-6543"
],
"params": {
"fuzziness": 2
}
}
},
"scope": {
"exclude": {
"attributes": {
"name": [
"unknown",
"n/a"
],
"phone": "555-555-5555"
}
},
"include": {
"indices": [
"users_index"
],
"resolvers": [
"name_dob",
"name_phone"
]
}
}
}
Example Response
This example response took 64 milliseconds and returned 2 hits. The _source
field contains the fields and values as they exist in the document indexed in
Elasticsearch. The _attributes field contains any values from the _source
field that can be mapped to the "attributes"
field of the entity model. The _hop field shows the level of recursion at
which the document was fetched. Entities with many documents can span many hops
if they have highly varied attribute values.
Read the output specification for complete details about the structure of a response.
{
"took": 64,
"hits": {
"total": 2,
"hits": [
{
"_index": "users_index",
"_id": "iaCn-mABDJZDR09hUNon",
"_hop": 0,
"_attributes": {
"city": "Beverly Halls",
"first_name": "Alice",
"last_name": "Jones",
"phone": "555 123 4567",
"state": "CA",
"street": "123 Main St",
"zip": "90210-0000"
},
"_source": {
"@version": "1",
"city": "Beverly Halls",
"fname": "Alice",
"lname": "Jones",
"phone": "555 987 6543",
"state": "CA",
"street": "123 Main St",
"zip": "90210-0000"
}
},
{
"_index": "users_index",
"_id": "iqCn-mABDJZDR09hUNoo",
"_hop": 0,
"_attributes": {
"city": "Beverly Hills",
"first_name": "Alice",
"last_name": "Jones",
"phone": "(555)-987-6543",
"state": "CA",
"street": "123 W Main Street",
"zip": "90210"
}
"_source": {
"@version": "1",
"city": "Beverly Hills",
"fname": "Alice",
"lname": "Jones",
"phone": "(555)-987-6543",
"state": "CA",
"street": "123 W Main Street",
"zip": "90210"
}
}
]
}
}
HTTP Headers
| Header | Value |
|---|---|
Content-Type |
application/json |
URL Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
_attributes |
Boolean | true |
No | Return the "_attributes" field in each doc. |
_explanation |
Boolean | false |
No | Return the "_explanation" field in each doc. |
_seq_no_primary_term |
Boolean | false |
No | Return the "_seq_no" and "_primary_term" fields in each doc. |
_source |
Boolean | true |
No | Return the "_source" field in each doc. |
_version |
Boolean | false |
No | Return the "_version" field in each doc. |
entity_type |
String | Depends | The entity type. Required if model is not specified. |
|
error_trace |
Boolean | true |
No | Return the Java stack trace when an exception is thrown. |
hits |
Boolean | true |
No | Return the "hits" field in the response. |
max_docs_per_query |
Integer | 1000 |
No | Maximum number of docs per query result. See size |
max_hops |
Integer | 100 |
No | Maximum level of recursion. |
max_time_per_query |
String | 10s |
No | Timeout per query. Uses time units. Timeouts are best effort and not guaranteed (more info). |
pretty |
Boolean | false |
No | Indents the JSON response data. |
profile |
Boolean | false |
No | Profile each query. Used for debugging. |
queries |
Boolean | false |
No | Return the "queries" field in the response. Used for debugging. |
URL Parameters (advanced)
These are advanced search optimizations. Most users will not require them. It's recommended to use the default settings of the cluster unless you know what you're doing.
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
search.allow_partial_search_results |
Boolean | Cluster default | No | allow_partial_search_results |
search.batched_reduce_size |
Integer | Cluster default | No | batched_reduce_size |
search.max_concurrent_shard_requests |
Integer | Cluster default | No | max_concurrent_shard_requests |
search.preference |
String | Cluster default | No | preference |
search.pre_filter_shard_size |
Integer | Cluster default | No | pre_filter_shard_size |
search.request_cache |
Boolean | Cluster default | No | request_cache |
Request Body Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
attributes |
Object | Deopends | The initial attribute values to search. Required if terms and ids are not specified. |
|
terms |
Object | Depends | The initial terms to search. Required if attributes and ids are not specified. |
|
ids |
Object | Depends | The initial document _ids to search. Required if attributes and terms are not specified. |
|
scope.exclude |
Object | No | The names of indices to limit the job to. | |
scope.exclude.attributes |
Object | No | The names and values of attributes to exclude in each query. | |
scope.exclude.indices |
Object | No | The names of indices to exclude in each query. | |
scope.exclude.resolvers |
Object | No | The names of resolvers to exclude in each query. | |
scope.include.attributes |
Object | No | The names and values of attributes to require in each query. | |
scope.include.indices |
Object | No | The names of indices to require in each query. | |
scope.include.resolvers |
Object | No | The names of resolvers to require in each query. | |
model |
Object | Depends | The entity model. Required if entity_type is not specified. |
Notes
- If you define an
entity_type, zentity will use its model from the.zentity-modelsindex. - If you don't define an
entity_type, then you must include amodelobject in the request body. - You can define an
entity_typein the request body or the URL, but not both.
Tips
- If you only need to search a few indices, use
scope.exclude.indicesandscope.include.indicesparameter to prevent the job from searching unnecessary indices in the entity model at each hop. - Beware if your data is transactional or has many duplicates.
You might need to lower the values of
max_hopsandmax_docs_per_queryif your jobs are timing out. - Use
scope.exclude.attributesto prevent entities from being over-resolved (a.k.a. "snowballed") due to common meaningless values such as "unknown" or "n/a". - Use
scope.include.attributesto limit the job within a particular context, such as by matching documents only within a given state or country.
Continue Reading
| ‹ | Bulk Models API | Bulk Resolution API | › |
|---|---|---|---|