Getting lines of a file of any encoding type in Python

I really don’t want to know the encoding.  I only want the data.  In other words, I don’t want to think.  I don’t want to open notepad++ and convert between types of encoding.

My old standby doesn’t work on various file encodings that aren’t ansi (ascii, cp1252, whatever):

f = open("poo.txt", "r")
lines = f.readlines()
for line in lines:

I have had enough.  (I am also venturing into Python 3 as I have been on Python 2 forever but that is a different story.)

The following code will read a file of different encoding and split them into lines:

import os

def DecodeBytes(byteArray, codecs=['utf-8', 'utf-16']):
  for codec in codecs:
      return byteArray.decode(codec)

def ReadLinesFromFile(filename):
  file = open(filename, "rb")
  rawbytes =
  content = DecodeBytes(rawbytes)
  if content is not None:
    return content.split(os.linesep)

lines = ReadLinesFromFile("poo.txt")
for line in lines:

If you need to add encodings, simply add them to the codecs default assignment (or make it more elegant as you deem).


Leave a Reply

Your email address will not be published. Required fields are marked *