Text search for matches in _path

I’m implementing a global text content search for a site. Truly loving being lifted by the elastic engine!

At the moment I have this query (somewhat simplified for the sake of the example:

ngram('displayName^5, _allText', 'my search query', 'AND')

This produces fairly decent results ordered by score, but I would really like the search to 1) also include _path in the fields it searches and 2) boost the score of those hits.

I’ve played a bit with the pathMatch function, but I don’t seem to get a hit for i.e “microsoft” if the path is /content/whatever/microsoft/morepath, and also I don’t quite see how I could combine a pathMatch with an ngram or fulltext search in the same query.

Does anyone have any good pointers here?

(It would also be nice to boost the score of hits in “rare” content types, but that would probably be somewhat complex.)

Hi. pathMatch is most useful for scoring documents so that the most matching path gets the highest score, while partly matching paths also are matched but scored decreasing on the number of path-elements that matches.

Sadly, I dont think this is what you are looking for, and I see that we have a weakness with regards to searching efficiently for a random path-element within a path - Ill add a task for this.

The only way atm to achieve this is to use the dreaded like *stuff* expression, but this will probably be ok for now if you dont have _a_lot of content.

Matching several boolean expressions will increase the score, so adding multiple expressions will help you:

ngram('displayName^5, _allText', 'my search query', 'AND') OR 
( ngram('displayName^5, _allText', 'my search query', 'AND') AND _path like '*microsoft*')

You could also add expressions for boosting content-types to further increase score, e.g

type IN ('rareContentType', 'rareContentType2')

You could also have a look at terms-aggregation to help categorizing the results further, e.g by content-type

1 Like

Excellent – I didn’t think of “doubling” the query like this, with OR’ing together several complete hit options. Playing with it some more, I ended up with your solution, just adding the possibility of just hits to the _path.

ngram('displayName^5, _allText', 'my search query', 'AND')
OR ( ngram('displayName^5, _allText', 'my search query', 'AND') AND _path like '*my search query*' )
OR _path like '*my search query*'

Having just a low thousand content items I see no noticeable performance hit in the search.

Thanks!

I hope to be able to expose the query-language as json also in the future, which increases the cleanliness and usability of the boolean-expressions, which expressions like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "fulltext": {
            "field": "someField",
            "query": "my query string"
          }
        }
      ],
      "should": [
        {
          "like": "*microsoft*"
        },
        {
          "in": [
            "fisk",
            "ost"
          ]
        }
      ]
    }
  }
}
2 Likes