How to read UTF file char by char in Python -


i have utf-8 file , want replace characters 2 bytes html tags.

i wanted make python script that. read file, char char, , put if , on.

problem have following, if read char char, reading 1 byte, characaters 1 byte , 2 bytes long.

how solve ?

i need feature read char char, know char size of 1 or 2 byte.

you need open file while specifying correct encoding. in python 3, that's done using

with open("myfile.txt", "r", encoding="utf-8-sig") myfile:     contents = myfile.read()     char in contents:         # character 

in python 2, can use codecs module:

import codecs codecs.open("myfile.txt", "r", encoding="utf-8-sig") myfile:     contents = myfile.read()     char in contents:         # character 

note in case, python 2 not automatic newline conversion, need handle \r\n line endings explicitly.

as alternative (python 2), can open file , decode afterwards; normalize line endings \n:

with open("myfile.txt", "r") myfile:     contents = myfile.read().decode("utf-8-sig")     char in contents:         # character 

note in both cases, end unicode objects in python 2, not strings (in python 3, strings unicode objects).


Comments

Popular posts from this blog

android - Automated my builds -

how to proxy from https to http with lighttpd -

python - Flask migration error -