Comparing string in a case insensitive way seems like something that's trivial, but it's not. This section only considers unicode strings (the default in Python 3). Note that Python 2 may have subtle weaknesses relative to Python 3 - the later's unicode handling is much more complete.
The first thing to note it that case-removing conversions in unicode aren't trivial. There is text for which
text.lower() != text.upper().lower(), such as
>>> "ß".lower() 'ß' >>> "ß".upper().lower() 'ss'
But let's say you wanted to caselessly compare
"Buße". Heck, you probably also want to compare
"BUẞE" equal - that's the newer capital form. The recommended way is to use
>>> help(str.casefold) """ Help on method_descriptor: casefold(...) S.casefold() -> str Return a version of S suitable for caseless comparisons. """
Do not just use
casefold is not available, doing
.upper().lower() helps (but only somewhat).
Then you should consider accents. If your font renderer is good, you probably think
"ê" == "ê" - but it doesn't:
>>> "ê" == "ê" False
This is because they are actually
>>> import unicodedata >>> [unicodedata.name(char) for char in "ê"] ['LATIN SMALL LETTER E WITH CIRCUMFLEX'] >>> [unicodedata.name(char) for char in "ê"] ['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']
The simplest way to deal with this is
unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does
>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê") True
To finish up, here this is expressed in functions:
import unicodedata def normalize_caseless(text): return unicodedata.normalize("NFKD", text.casefold()) def caseless_equal(left, right): return normalize_caseless(left) == normalize_caseless(right)