How to read UTF file char by char in Python -
i have utf-8 file , want replace characters 2 bytes html tags.
i wanted make python script that. read file, char char, , put if , on.
problem have following, if read char char, reading 1 byte, characaters 1 byte , 2 bytes long.
how solve ?
i need feature read char char, know char size of 1 or 2 byte.
you need open file while specifying correct encoding. in python 3, that's done using
with open("myfile.txt", "r", encoding="utf-8-sig") myfile: contents = myfile.read() char in contents: # character
in python 2, can use codecs
module:
import codecs codecs.open("myfile.txt", "r", encoding="utf-8-sig") myfile: contents = myfile.read() char in contents: # character
note in case, python 2 not automatic newline conversion, need handle \r\n
line endings explicitly.
as alternative (python 2), can open file , decode afterwards; normalize line endings \n
:
with open("myfile.txt", "r") myfile: contents = myfile.read().decode("utf-8-sig") char in contents: # character
note in both cases, end unicode objects in python 2, not strings (in python 3, strings unicode objects).
Comments
Post a Comment