Home / Documentation / REST APIs / Resolution API
Resolution API
Runs an entity resolution job and returns the results.
The request accepts two endpoints:
POST _zentity/resolution
POST _zentity/resolution/{entity_type}
Example Request
This example request resolves a person
identified by a name
, a dob
, and
two phone
values, while limiting the search to one index called users_index
and two resolvers called name_dob
and name_phone
. The request passes a param
called fuzziness
to the phone
attribute, which can be referenced in any
matcher clause that uses the fuzziness
param. Note that an attribute can
accept either an array of values or an object with the values specified in a
field called "values"
. It's also valid to specify an attribute with no values
but to override the default params, such as to format the results of any date
attributes in the response.
Read the input specification for complete details about the structure of a request.
POST _zentity/resolution/person?pretty
{
"attributes": {
"name": [ "Alice Jones" ],
"dob": {
"values": [ "1984-01-01" ]
},
"phone": {
"values": [
"555-123-4567",
"555-987-6543"
],
"params": {
"fuzziness": 2
}
}
},
"scope": {
"exclude": {
"attributes": {
"name": [
"unknown",
"n/a"
],
"phone": "555-555-5555"
}
},
"include": {
"indices": [
"users_index"
],
"resolvers": [
"name_dob",
"name_phone"
]
}
}
}
Example Response
This example response took 64 milliseconds and returned 2 hits. The _source
field contains the fields and values as they exist in the document indexed in
Elasticsearch. The _attributes
field contains any values from the _source
field that can be mapped to the "attributes"
field of the entity model. The _hop
field shows the level of recursion at
which the document was fetched. Entities with many documents can span many hops
if they have highly varied attribute values.
Read the output specification for complete details about the structure of a response.
{
"took": 64,
"hits": {
"total": 2,
"hits": [
{
"_index": "users_index",
"_id": "iaCn-mABDJZDR09hUNon",
"_hop": 0,
"_attributes": {
"city": "Beverly Halls",
"first_name": "Alice",
"last_name": "Jones",
"phone": "555 123 4567",
"state": "CA",
"street": "123 Main St",
"zip": "90210-0000"
},
"_source": {
"@version": "1",
"city": "Beverly Halls",
"fname": "Alice",
"lname": "Jones",
"phone": "555 987 6543",
"state": "CA",
"street": "123 Main St",
"zip": "90210-0000"
}
},
{
"_index": "users_index",
"_id": "iqCn-mABDJZDR09hUNoo",
"_hop": 0,
"_attributes": {
"city": "Beverly Hills",
"first_name": "Alice",
"last_name": "Jones",
"phone": "(555)-987-6543",
"state": "CA",
"street": "123 W Main Street",
"zip": "90210"
}
"_source": {
"@version": "1",
"city": "Beverly Hills",
"fname": "Alice",
"lname": "Jones",
"phone": "(555)-987-6543",
"state": "CA",
"street": "123 W Main Street",
"zip": "90210"
}
}
]
}
}
HTTP Headers
Header | Value |
---|---|
Content-Type |
application/json |
URL Parameters
Parameter | Type | Default | Required | Description |
---|---|---|---|---|
_attributes |
Boolean | true |
No | Return the "_attributes" field in each doc. |
_explanation |
Boolean | false |
No | Return the "_explanation" field in each doc. |
_seq_no_primary_term |
Boolean | false |
No | Return the "_seq_no" and "_primary_term" fields in each doc. |
_source |
Boolean | true |
No | Return the "_source" field in each doc. |
_version |
Boolean | false |
No | Return the "_version" field in each doc. |
entity_type |
String | Depends | The entity type. Required if model is not specified. |
|
error_trace |
Boolean | true |
No | Return the Java stack trace when an exception is thrown. |
hits |
Boolean | true |
No | Return the "hits" field in the response. |
max_docs_per_query |
Integer | 1000 |
No | Maximum number of docs per query result. See size |
max_hops |
Integer | 100 |
No | Maximum level of recursion. |
max_time_per_query |
String | 10s |
No | Timeout per query. Uses time units. Timeouts are best effort and not guaranteed (more info). |
pretty |
Boolean | false |
No | Indents the JSON response data. |
profile |
Boolean | false |
No | Profile each query. Used for debugging. |
queries |
Boolean | false |
No | Return the "queries" field in the response. Used for debugging. |
URL Parameters (advanced)
These are advanced search optimizations. Most users will not require them. It's recommended to use the default settings of the cluster unless you know what you're doing.
Parameter | Type | Default | Required | Description |
---|---|---|---|---|
search.allow_partial_search_results |
Boolean | Cluster default | No | allow_partial_search_results |
search.batched_reduce_size |
Integer | Cluster default | No | batched_reduce_size |
search.max_concurrent_shard_requests |
Integer | Cluster default | No | max_concurrent_shard_requests |
search.preference |
String | Cluster default | No | preference |
search.pre_filter_shard_size |
Integer | Cluster default | No | pre_filter_shard_size |
search.request_cache |
Boolean | Cluster default | No | request_cache |
Request Body Parameters
Parameter | Type | Default | Required | Description |
---|---|---|---|---|
attributes |
Object | Deopends | The initial attribute values to search. Required if terms and ids are not specified. |
|
terms |
Object | Depends | The initial terms to search. Required if attributes and ids are not specified. |
|
ids |
Object | Depends | The initial document _ids to search. Required if attributes and terms are not specified. |
|
scope.exclude |
Object | No | The names of indices to limit the job to. | |
scope.exclude.attributes |
Object | No | The names and values of attributes to exclude in each query. | |
scope.exclude.indices |
Object | No | The names of indices to exclude in each query. | |
scope.exclude.resolvers |
Object | No | The names of resolvers to exclude in each query. | |
scope.include.attributes |
Object | No | The names and values of attributes to require in each query. | |
scope.include.indices |
Object | No | The names of indices to require in each query. | |
scope.include.resolvers |
Object | No | The names of resolvers to require in each query. | |
model |
Object | Depends | The entity model. Required if entity_type is not specified. |
Notes
- If you define an
entity_type
, zentity will use its model from the.zentity-models
index. - If you don't define an
entity_type
, then you must include amodel
object in the request body. - You can define an
entity_type
in the request body or the URL, but not both.
Tips
- If you only need to search a few indices, use
scope.exclude.indices
andscope.include.indices
parameter to prevent the job from searching unnecessary indices in the entity model at each hop. - Beware if your data is transactional or has many duplicates.
You might need to lower the values of
max_hops
andmax_docs_per_query
if your jobs are timing out. - Use
scope.exclude.attributes
to prevent entities from being over-resolved (a.k.a. "snowballed") due to common meaningless values such as "unknown" or "n/a". - Use
scope.include.attributes
to limit the job within a particular context, such as by matching documents only within a given state or country.
Continue Reading
‹ | Bulk Models API | Bulk Resolution API | › |
---|---|---|---|