Aggregation on term is case insensitive (returns term in lowercase)

Enonic version: 6.8.0
OS: Linux

Which make it kinda hard to filter by query on exact match…
Lets say country is stored as “Norway”.

Then query needs to be ‘data.country.name = “Norway”’
But aggregation country returns ‘norway’

So if I use the aggregation to build filter dropdowns on the website, I must somehow know that ‘norway’ means ‘Norway’.

I guess I could work around if by doing ‘data.country.name like “norway”’

But I also localize “Norway” to “Norge”. (using my own js structure built from a phrase contenttype)
And again ‘norway’ !== ‘Norway’;

I guess as long as people are involved I should suspect that any content field has inconsistencies in case.
In this instance data.country.name is populated via the content api.
So I could store values in lowercase.
I could also add a config regexp to the TextLine input which only accepts lowercase chars.

There might be future usecases where case sensitivity matters.

It should at least be documented that aggregations are case insensitive and returns lowercase version of whatever was stored.

But really should it not be possible to do case sensitive aggregations?

http://xp.readthedocs.io/en/stable/developer/search/aggregations/terms.html

1 Like

All indexing and query operations in XP are lowercased. Both storage and query time. For your search example. I.e. ‘data.country.name’ = “norway” or “Norway” will produce the same result.

Hence, the aggregations will always return as lowercase.

So when I want to display a combobox on a web page with the different values which are stored in XP.
How then can I display a human friendly version of the lowercase aggregation bucket key?

  • I could upcase the first letter of the key. (might be wrong in some languages)
  • I could upcase the first letter of every word in the key. (might be wrong in some languages)
  • I could store a “translation” for every lowercase value, of which there might be thousands. (too much work)
  • I could fetch all content, and process it in js to figure out the human friendly strings. (slow)
  • I could make another content type, and aggregate on id, and fetch the human friendly version (more code when storing, higher maintenance)

Any other ideas?

This would be very easy with ‘GROUP BY’ in SQL.

Or a ‘SELECT DISTINCT’

I guess it depends on which sql server we are talking about.
I’m most familiar with PostegreSQL.

So the only viable solution I see is the last idea.

But that will be a lot of hazzle and code everytime you want to aggregate something.

Well, do you want “Norway” and “norway” as different entities when aggregating on user defined tags?

Its a tradeoff between ensuring that you dont need to know the case when searching/aggregating and this issue. We have discussed the possibility to store a raw value also, but its not obvious how to solve every scenario.

I do understand that it is a tradeoff.

I guess I want it both ways :slight_smile:

How about another argument to the term aggregation to choose case sensitivity? With the default beeing lowercase.

A “benefit” to case sensitivity: It would make it easier to spot data inconsistencies.

Anyways, I guess I’ll have to start implementing the relation for now.

Consider that if the data is consistent, then case sensitive filtering would be working perfectly. And look good to the end user too.

Are we talking about an unlimited number of items to be selected in your combobox? Where are these values coming from, who creates them?

Every time I wanted to use aggregations for something, I couldn’t because it all comes out lower case. For example, if you want to make a tag cloud for Java classes then it should look like this:

HistogramAggregationQuery AdminToolDescriptor ContentTypeName FindNodesByQueryResult InputTypeNotFoundException ReorderChildNodesResult

But if you use aggregations then your tag cloud will come out like this:

histogramaggregationquery admintooldescriptor contenttypename findnodesbyqueryresult inputtypenotfoundexception reorderchildnodesresult

2 Likes

The values are the aggregations from content in XP.
Thus not unlimited number of items, but varying depending on synced data.

The reason I sync the data instead of using the external API directly is I want a common interface to sort and filter. The part will be used for more than that API. It’s already used for manually added content in XP.

I am bumping this thread again, could it be possible to return raw data with an option? :slight_smile:

Hi @an2n,

This somehow got overseen, sorry.

What do you mean by “raw data”, what is your use case?

Hey @ase,

I had a support ticket on this and it was marked as not doable at the moment. With raw I meant how the data was stored, but aggregation returned lowercase.

Funny, I wanted to ask the same question as I currently also have this issue :slight_smile:

My use case is the following:
I am working on an headless application in React using graphQL.
My content contains a Tag-Property. So I am workimg on a TagInputcomponent.
In Contentstudio I get the suggestions for tags as entered before (without lowercase!).
I also use term-aggregation to fetch the tags.

How did you implement this in Content-Studio ?? (or how do you fetch the data from repo?)

I found a way by extending my graphQL-API and would like to share this:

const options = {
    creationCallbacks: {
        HeadlessCms: function (context, params) {
            params.fields.getDistinctTerm = {
                type: graphQlLib.list(graphQlLib.GraphQLString),
                args: {
                    dataField: graphQlLib.GraphQLString,
                    contentType: graphQlLib.GraphQLString,
                },
                resolve: function (env) {
                    let queryParams = {
                        query: "type LIKE '*'",
                        contentType: env.args.contentType ? env.args.contentType :undefined,
                        filter:{'exists':{'field':env.args.dataField}}
                    };

                    const result = contentLib.query(queryParams);
                    const hits = result.hits;

                    let terms = [];
                    hits.forEach(function (hit) {
                        const certs = hit.data[env.args.dataField];
                        if (Array.isArray(certs)) {
                            certs.forEach((cert) => {
                                terms.push(cert);
                            })
                        } else {
                            terms.push(certs);
                        }
                    })

                    return terms.filter((v, i, a) => a.indexOf(v) === i).sort();
                }
            };
        }
    }
}

example how to use (here in graphql-studio):

query( $field: String, $type: String) {
  guillotine {
    getDistinctTerm(dataField:$field, contentType: $type)
  }
}

The response is an Array with unique entries in alphabetical Order!
Any feedback is welcome!