python - How do I test whether an nltk resource is already installed on the machine running my code? -
i started first nltk project , confused proper setup. need several resources punkt tokenizer , maxent pos tagger. myself downloaded them using gui nltk.download()
. collaborators of course want things downloaded automatically. haven't found idiomatic code in docu.
am supposed put nltk.data.load('tokenizers/punkt/english.pickle')
, code? going download resources every time script run? provide feedback user (i.e. co-developers) of being downloaded , why taking long? there must gear out there job, right? :)
//edit explify question:
how test whether nltk resource (like punkt tokenizer) installed on machine running code, , install if not?
you can use nltk.data.find()
function, see https://github.com/nltk/nltk/blob/develop/nltk/data.py:
>>> import nltk >>> nltk.data.find('tokenizers/punkt.zip') zipfilepathpointer(u'/home/alvas/nltk_data/tokenizers/punkt.zip', u'')
when resource not available you'll find error:
traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/local/lib/python2.7/dist-packages/nltk-3.0a3-py2.7.egg/nltk/data.py", line 615, in find raise lookuperror(resource_not_found) lookuperror: ********************************************************************** resource u'punkt.zip' not found. please use nltk downloader obtain resource: >>> nltk.download() searched in: - '/home/alvas/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************
most probably, ensure collaborators have package:
>>> try: ... nltk.data.find('tokenizers/punkt') ... except lookuperror: ... nltk.download('punkt') ... [nltk_data] downloading package punkt /home/alvas/nltk_data... [nltk_data] package punkt up-to-date! true
Comments
Post a Comment