How to read UTF file char by char in Python -


i have utf-8 file , want replace characters 2 bytes html tags.

i wanted make python script that. read file, char char, , put if , on.

problem have following, if read char char, reading 1 byte, characaters 1 byte , 2 bytes long.

how solve ?

i need feature read char char, know char size of 1 or 2 byte.

you need open file while specifying correct encoding. in python 3, that's done using

with open("myfile.txt", "r", encoding="utf-8-sig") myfile:     contents = myfile.read()     char in contents:         # character 

in python 2, can use codecs module:

import codecs codecs.open("myfile.txt", "r", encoding="utf-8-sig") myfile:     contents = myfile.read()     char in contents:         # character 

note in case, python 2 not automatic newline conversion, need handle \r\n line endings explicitly.

as alternative (python 2), can open file , decode afterwards; normalize line endings \n:

with open("myfile.txt", "r") myfile:     contents = myfile.read().decode("utf-8-sig")     char in contents:         # character 

note in both cases, end unicode objects in python 2, not strings (in python 3, strings unicode objects).


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -