An assumption which pops up regularly is that when dealing with English text only, it’s unlikely to encounter characters outside the ASCII character set. To avoid problems with handling Unicode correctly, people are tempted to do things like stripping non-ASCII characters, or removing any accents ...