encoding Tutorial => How to detect the encoding of a text file with...

Example

There is a useful package in Python - chardet, which helps to detect the encoding used in your file. Actually there is no program that can say with 100% confidence which encoding was used - that's why chardet gives the encoding with the highest probability the file was encoded with. Chardet can detect following encodings:

ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
EUC-KR, ISO-2022-KR (Korean)
KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
ISO-8859-2, windows-1250 (Hungarian)
ISO-8859-5, windows-1251 (Bulgarian)
windows-1252 (English)
ISO-8859-7, windows-1253 (Greek)
ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
TIS-620 (Thai)

You can install chardet with a pip command:

pip install chardet

Afterward you can use chardet either in the command line:

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

or in python:

import chardet    
rawdata = open(file, "r").read()
result = chardet.detect(rawdata)
charenc = result['encoding']

PDF - Download encoding for free

Previous Next

encoding

Fastest Entity Framework Extensions

Example

Got any encoding Question?

encoding

encoding Getting started with encoding How to detect the encoding of a text file with Python?

Fastest Entity Framework Extensions

Example

Got any encoding Question?