Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to group by sysCode by max date that is get the latest documents for each sysCode and then aggregate on employeeId and type field. The below query does not return me desired results, for GER it returns employeeId=1 and for IND it returns employeeId=3 which I do not want.

sample json document

{
  "sysCode": "GER",
  "employeeId": 1,
  "date": "2014-06-14",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "GER",
  "employeeId": 2,
  "date": "2014-06-15",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "LATE"
      }
    ]
  }
}
{
  "sysCode": "IND",
  "employeeId": 3,
  "date": "2014-06-16",
  "categories": {
    "pb": [
      {
        "metric": "OVERDUE",
        "type": "MISSED"
      }
    ]
  }
}

aggregation query

{
  "aggs": {
    "result_by_sys_code": {
      "terms": {
        "field": "sysCode"
      },
      "aggs": {
        "max_as_of_date": {
          "max": {
            "field": "date"
          }
        },
        "employees": {
          "terms": {
            "field": "employeeId"
          },
          "aggs": {
            "nested": {
              "nested": {
                "path": "categories.pb"
              },
              "aggs": {
                "metrics": {
                  "terms": {
                    "field": "categories.pb.type.keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

mappings

{
  "mappings": {
    "properties": {
      "date": {
        "type": "date"
      },
      "categories": {
        "properties": {
          "pb": {
            "type": "nested",
            "properties": {
              "metric": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "type": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      },
      "controlCode": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}
question from:https://stackoverflow.com/questions/66059413/elasticsearch-aggregation-by-max-date-giving-wrong-results

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
582 views
Welcome To Ask or Share your Answers For Others

1 Answer

I think using top_hits, you can fulfil your requirement. From the official documentation:

This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "top_tags": {
      "terms": {
        "field": "type",
        "size": 3
      },
      "aggs": {
        "top_sales_hits": {
          "top_hits": {
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [ "date", "price" ]
            },
            "size": 1
          }
        }
      }
    }
  }
}
'

we group the sales by type and per type we show the last sale. For each sale only the date and price fields are being included in the source.

Also, Sampler Aggregations do something similar but in a different way.

A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

Can refer Limit ElasticSearch aggregation to top n query results for a TL;DR on Sampler.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...