Aggregation on term is case insensitive (returns term in lowercase)

Cwe · December 15, 2016, 12:34pm

Enonic version: 6.8.0
OS: Linux

Which make it kinda hard to filter by query on exact match…
Lets say country is stored as “Norway”.

Then query needs to be ‘data.country.name = “Norway”’
But aggregation country returns ‘norway’

So if I use the aggregation to build filter dropdowns on the website, I must somehow know that ‘norway’ means ‘Norway’.

I guess I could work around if by doing ‘data.country.name like “norway”’

But I also localize “Norway” to “Norge”. (using my own js structure built from a phrase contenttype)
And again ‘norway’ !== ‘Norway’;

I guess as long as people are involved I should suspect that any content field has inconsistencies in case.
In this instance data.country.name is populated via the content api.
So I could store values in lowercase.
I could also add a config regexp to the TextLine input which only accepts lowercase chars.

There might be future usecases where case sensitivity matters.

It should at least be documented that aggregations are case insensitive and returns lowercase version of whatever was stored.

But really should it not be possible to do case sensitive aggregations?

http://xp.readthedocs.io/en/stable/developer/search/aggregations/terms.html

tsi · December 15, 2016, 12:51pm

All indexing and query operations in XP are lowercased. Both storage and query time. For your search example. I.e. ‘data.country.name’ = “norway” or “Norway” will produce the same result.

Hence, the aggregations will always return as lowercase.

Cwe · December 15, 2016, 1:02pm

So when I want to display a combobox on a web page with the different values which are stored in XP.
How then can I display a human friendly version of the lowercase aggregation bucket key?

I could upcase the first letter of the key. (might be wrong in some languages)
I could upcase the first letter of every word in the key. (might be wrong in some languages)
I could store a “translation” for every lowercase value, of which there might be thousands. (too much work)
I could fetch all content, and process it in js to figure out the human friendly strings. (slow)
I could make another content type, and aggregate on id, and fetch the human friendly version (more code when storing, higher maintenance)

Any other ideas?

Cwe · December 15, 2016, 1:03pm

This would be very easy with ‘GROUP BY’ in SQL.

Cwe · December 15, 2016, 1:10pm

Or a ‘SELECT DISTINCT’

I guess it depends on which sql server we are talking about.
I’m most familiar with PostegreSQL.

Cwe · December 15, 2016, 1:14pm

So the only viable solution I see is the last idea.

But that will be a lot of hazzle and code everytime you want to aggregate something.

rmy · December 15, 2016, 1:30pm

Well, do you want “Norway” and “norway” as different entities when aggregating on user defined tags?

Its a tradeoff between ensuring that you dont need to know the case when searching/aggregating and this issue. We have discussed the possibility to store a raw value also, but its not obvious how to solve every scenario.

Cwe · December 15, 2016, 2:15pm

I do understand that it is a tradeoff.

I guess I want it both ways

How about another argument to the term aggregation to choose case sensitivity? With the default beeing lowercase.

A “benefit” to case sensitivity: It would make it easier to spot data inconsistencies.

Anyways, I guess I’ll have to start implementing the relation for now.

Cwe · December 15, 2016, 3:29pm

Consider that if the data is consistent, then case sensitive filtering would be working perfectly. And look good to the end user too.

tsi · December 15, 2016, 4:36pm

Are we talking about an unlimited number of items to be selected in your combobox? Where are these values coming from, who creates them?

mla · December 15, 2016, 7:54pm

Every time I wanted to use aggregations for something, I couldn’t because it all comes out lower case. For example, if you want to make a tag cloud for Java classes then it should look like this:

HistogramAggregationQuery AdminToolDescriptor ContentTypeName FindNodesByQueryResult InputTypeNotFoundException ReorderChildNodesResult

But if you use aggregations then your tag cloud will come out like this:

histogramaggregationquery admintooldescriptor contenttypename findnodesbyqueryresult inputtypenotfoundexception reorderchildnodesresult

Cwe · December 19, 2016, 8:27am

The values are the aggregations from content in XP.
Thus not unlimited number of items, but varying depending on synced data.

The reason I sync the data instead of using the external API directly is I want a common interface to sort and filter. The part will be used for more than that API. It’s already used for manually added content in XP.

an2n · June 27, 2022, 1:44pm

I am bumping this thread again, could it be possible to return raw data with an option?

Alan · September 19, 2022, 8:42am

Hi @an2n,

This somehow got overseen, sorry.

What do you mean by “raw data”, what is your use case?

an2n · September 20, 2022, 6:46am

Hey @Alan,

I had a support ticket on this and it was marked as not doable at the moment. With raw I meant how the data was stored, but aggregation returned lowercase.

luctho · September 20, 2022, 8:34am

Funny, I wanted to ask the same question as I currently also have this issue

My use case is the following:
I am working on an headless application in React using graphQL.
My content contains a Tag-Property. So I am workimg on a TagInputcomponent.
In Contentstudio I get the suggestions for tags as entered before (without lowercase!).
I also use term-aggregation to fetch the tags.

How did you implement this in Content-Studio ?? (or how do you fetch the data from repo?)

luctho · September 22, 2022, 2:43pm

I found a way by extending my graphQL-API and would like to share this:

const options = {
    creationCallbacks: {
        HeadlessCms: function (context, params) {
            params.fields.getDistinctTerm = {
                type: graphQlLib.list(graphQlLib.GraphQLString),
                args: {
                    dataField: graphQlLib.GraphQLString,
                    contentType: graphQlLib.GraphQLString,
                },
                resolve: function (env) {
                    let queryParams = {
                        query: "type LIKE '*'",
                        contentType: env.args.contentType ? env.args.contentType :undefined,
                        filter:{'exists':{'field':env.args.dataField}}
                    };

                    const result = contentLib.query(queryParams);
                    const hits = result.hits;

                    let terms = [];
                    hits.forEach(function (hit) {
                        const certs = hit.data[env.args.dataField];
                        if (Array.isArray(certs)) {
                            certs.forEach((cert) => {
                                terms.push(cert);
                            })
                        } else {
                            terms.push(certs);
                        }
                    })

                    return terms.filter((v, i, a) => a.indexOf(v) === i).sort();
                }
            };
        }
    }
}

example how to use (here in graphql-studio):

query( $field: String, $type: String) {
  guillotine {
    getDistinctTerm(dataField:$field, contentType: $type)
  }
}

The response is an Array with unique entries in alphabetical Order!
Any feedback is welcome!

Aggregation on term is case insensitive (returns term in lowercase)

Enonic version: 6.8.0 OS: Linux

Enonic version: 6.8.0
OS: Linux