tokenize - Elasticsearch Facet Tokenization -
i using terms facet top terms
in elasticsearch server. tags "indian-government"
not treated 1 tag. treated "indian"
"government"
. , so, used tag "indian"
. how can fix this? should change tokenization?
'settings': { 'analysis': { 'analyzer': { 'my_ngram_analyzer' : { 'tokenizer' : 'my_ngram_tokenizer', 'filter': ['my_synonym_filter'] } }, 'filter': { 'my_synonym_filter': { 'type': 'synonym', 'format': 'wordnet', 'synonyms_path': 'analysis/wn_s.pl' } }, 'tokenizer' : { 'my_ngram_tokenizer' : { 'type' : 'ngram', 'min_gram' : '1', 'max_gram' : '50' } } } }
edit: based on comments, indexing follows. results not change though:
es.indices.create( index="article-index", body={ 'settings': { 'analysis': { 'analyzer': { 'my_ngram_analyzer' : { 'tokenizer' : 'my_ngram_tokenizer', 'filter': ['my_synonym_filter'] } }, 'filter': { 'my_synonym_filter': { 'type': 'synonym', 'format': 'wordnet', 'synonyms_path': 'analysis/wn_s.pl' } }, 'tokenizer' : { 'my_ngram_tokenizer' : { 'type' : 'ngram', 'min_gram' : '1', 'max_gram' : '50' } } } }, 'mappings': { 'my_mapping_type': { '_all': { 'enabled': false }, '_source': { 'compressed': true }, 'properties': { "tags": { "type": "string", "index": "not_analyzed" } } } } }, # ignore existing index ignore=400 )
edit: solved. my_mapping_type has replaced doc_type (in case, 'article') , works :)
making field not_analysed
should work if fits requirement.
curl -xput localhost:9200/index -d '{ "settings": { "number_of_shards": 5, "number_of_replicas": 2 }, "mappings": { "my_type": { "_all": { "enabled": false }, "_source": { "compressed": true }, "properties": { "tag": { "type": "string", "index": "not_analyzed" } } } } }'
Comments
Post a Comment