tokenize - Elasticsearch Facet Tokenization -

i using terms facet top terms in elasticsearch server. tags "indian-government" not treated 1 tag. treated "indian" "government". , so, used tag "indian". how can fix this? should change tokenization?

        'settings': {                 'analysis': {                         'analyzer': {                                 'my_ngram_analyzer' : {                                         'tokenizer' : 'my_ngram_tokenizer',                                          'filter': ['my_synonym_filter']                                  }                         },                          'filter': {                                  'my_synonym_filter': {                                         'type': 'synonym',                                         'format': 'wordnet',                                         'synonyms_path': 'analysis/wn_s.pl'                                 }                         },                          'tokenizer' : {                                  'my_ngram_tokenizer' : {                                         'type' : 'ngram',                                         'min_gram' : '1',                                         'max_gram' : '50'                                 }                         }                 }         }

edit: based on comments, indexing follows. results not change though:

    es.indices.create(             index="article-index",             body={                     'settings': {                             'analysis': {                                     'analyzer': {                                             'my_ngram_analyzer' : {                                                     'tokenizer' : 'my_ngram_tokenizer',                                                     'filter': ['my_synonym_filter']                                             }                                     },                                     'filter': {                                             'my_synonym_filter': {                                                     'type': 'synonym',                                                     'format': 'wordnet',                                                     'synonyms_path': 'analysis/wn_s.pl'                                             }                                     },                                     'tokenizer' : {                                             'my_ngram_tokenizer' : {                                                     'type' : 'ngram',                                                     'min_gram' : '1',                                                     'max_gram' : '50'                                             }                                     }                             }                     },                        'mappings': {                             'my_mapping_type': {                                   '_all': {                                     'enabled': false                                   },                                   '_source': {                                     'compressed': true                                   },                                   'properties': {                                     "tags": {                                       "type": "string",                                       "index": "not_analyzed"                                     }                                   }                             }                     }             },             # ignore existing index             ignore=400     )

edit: solved. my_mapping_type has replaced doc_type (in case, 'article') , works :)

making field not_analysed should work if fits requirement.

curl -xput localhost:9200/index -d '{   "settings": {     "number_of_shards": 5,     "number_of_replicas": 2   },   "mappings": {     "my_type": {       "_all": {         "enabled": false       },       "_source": {         "compressed": true       },       "properties": {         "tag": {           "type": "string",           "index": "not_analyzed"         }       }     }   } }'

Search This Blog

WIKI

tokenize - Elasticsearch Facet Tokenization -

Comments

Post a Comment

Popular posts from this blog

android - Automated my builds -

how to proxy from https to http with lighttpd -

python - Flask migration error -