ftfy简介
ftfy的目标是输入有问题的Unicode,输出正确的Unicode
适用于以下一些情况:
- 原本Unicode文本被用其他编码解码造成的乱码,可以通过ftfy更正
- 像html中的
&
等标记会被ftfy更正 - 某些终端会带有一些控制符,如控制颜色,当复制时,就会复制这些多余的控制符
- 当从某些地方复制来的文本会出现一些显示问题,ftfy能更正
需要注意的是:输入的文本原本是Unicode,而不是其他的编码
fix_text
|
|
注意:fix_text
每次只会处理一行,因为有可能其他行是其他的编码
当你确定整个段都是同一种编码的时候,可以使用fix_text_segment
来代替fix_text
,从而使整段更快的被更正。
fix_encoding
Fix text with incorrectly-decoded garbage (“mojibake”) whenever possible.
|
|
fix_file
Fix text that is found in a file.
If the file is being read as Unicode text, use that. If it’s being read as bytes, then we hope an encoding was supplied. If not, unfortunately, we have to guess what encoding it is. We’ll try a few common encodings, but we make no promises. See the guess_bytes
function for how this is done.
The output is a stream of fixed lines of text.
explain_unicode
显示ftfy是如何更正的
|
|
命令行工具
ftfy can be used from the command line. By default, it takes UTF-8 input and writes it to UTF-8 output, fixing problems in its Unicode as it goes.
Here’s the usage documentation for the ftfy
command:
|
|
ftfy.fixes模块
ftfy.fixes.unescape_html
专门用来解码html中的标签
|
|
ftfy.fixes.remove_terminal_escapes
移除终端转义符
|
|
ftfy.fixes.fix_line_breaks
Convert all line breaks to Unix style.
|
|
formatting模块
ftfy.formatting.display_center
|
|
ftfy.formatting.display_ljust
|
|
ftfy.formatting.display_rjust
|
|