用户工具

站点工具


python-basic:difflib

差别

这里会显示出您选择的修订版和当前版本之间的差别。

到此差别页面的链接

python-basic:difflib [2010/06/02 01:18]
python-basic:difflib [2010/06/02 01:18] (当前版本)
行 1: 行 1:
 +====== Python difflib|SequenceMatcher|Differ|HtmlDiff 使用方法 ======
  
 +===== 介绍 =====
 +difflib 是python提供的比较序列(string list)差异的模块。 \\
 +实现了三个类: \\
 +   * SequenceMatcher ​ 任意类型序列的比较 (可以比较字符串)
 +   * Differ ​          ​对字符串进行比较
 +   * HtmlDiff ​        ​将比较结果输出为html格式 ​
 +
 +===== SequenceMatcher 实例 =====
 +
 +==== 代码: ====
 +
 +<code python>
 +import difflib
 +from pprint import pprint
 +
 +a = '​pythonclub.org is wonderful'​
 +b = '​Pythonclub.org also wonderful'​
 +#​构造SequenceMatcher类
 +s = difflib.SequenceMatcher(None,​ a, b)
 +
 +#​得到相同的block
 +print "​s.get_matching_blocks():"​
 +pprint(s.get_matching_blocks())
 +print 
 +print "​s.get_opcodes():"​
 +for tag, i1, i2, j1, j2 in s.get_opcodes():​
 +    print ("%7s a[%d:%d] (%s) b[%d:%d] (%s)" %  (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))
 +    #​在此实现你的功能
 +
 +</​code>​
 +
 +==== 输出为: ====
 +<​file>​
 +s.get_matching_blocks():​
 +[(1, 1, 14), (16, 17, 1), (17, 19, 10), (27, 29, 0)]
 +
 +s.get_opcodes():​
 +replace a[0:1] (p) b[0:1] (P)
 +  equal a[1:15] (ythonclub.org ) b[1:15] (ythonclub.org )
 +replace a[15:16] (i) b[15:17] (al)
 +  equal a[16:17] (s) b[17:18] (s)
 + ​insert a[17:17] () b[18:19] (o)
 +  equal a[17:27] ( wonderful) b[19:29] ( wonderful)
 +</​file>​
 +
 +
 +===== SequenceMatcher find_longest_match BUG=====
 +<code python>
 +import difflib
 +
 +str1 = "Poor Impulse Control: A Good Babysitter Is Hard To Find"
 +
 +str2 = """ ​    A Good Babysitter Is Hard To Find    This is Frederick
 +by Leo Lionni, the first book I picked for myself.
 +I was in kindergarten,​ I believe, which would be either 1968 or 1969.
 +Frederick has a specific lesson for children about how art is as
 +important in life as bread, but there'​s a secondary consideration
 +I took away: if we pool our talents our lives are immeasurably better.
 +Curiously, this book is the story of my life, however one interprets
 +those things. I expect Mickey Rooney to show up any time with a barn
 +and a plan for a show, though my mom is not making costumes. My sisters
 +own a toy store with a fantastic selection of imaginative children'​s books.
 +I try not to open them because I can't close them and put them back.
 +My tantrums are setting a bad example for the kids. Anyway, I mention
 +this because yesterday was Mr. Rogers'​ 40th anniversary. I appreciate
 +the peaceful gentleman more as time passes, as I play with finger puppets
 +in department meetings, as I eye hollow trees for Lady Elaine Fairchild
 +infestations. Maybe Pete can build me trolley tracks!Labels:​ To Take
 +Your Heart Away   """​
 +
 +s = difflib.SequenceMatcher(None,​ str1, str2)
 +print len(str1), len(str2)
 +star_a, start_b, length = s.find_longest_match(0,​ len(str1)-1,​ 0, len(str2)-1)
 +print star_a, start_b, length
 +print str1[star_a:​star_a + length]
 +</​code>​
 +
 +输出结果为:​
 +<​file>​
 +55 1116
 +0 1048 1
 +P
 +
 +版本为:
 +Python 2.5.1 (r251:​54863,​ Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on
 +win32
 +Type "​help",​ "​copyright",​ "​credits"​ or "​license"​ for more information.
 +>>>​
 +</​file>​
 +而最长的应该为 A Good Babysitter Is Hard To Find. 
 +
 +==== 解决方法 ====
 +将 str1 于 str2 交换一下, len(str1) > len(str2). \\
 +则输出结果是想得到的结果。 \\
 +**感谢 davies(at)newsmth** \\
 +
 +原来这是个已知的bug:​ http://​psf.upfronthosting.co.za/​roundup/​tracker/​issue1528074 \\
 +第二个字符串长度不能超过200,\\
 +Work Around为: 将较长的字符串设为第一个,而较短的设为第二个。
python-basic/difflib.txt · 最后更改: 2010/06/02 01:18 (外部编辑)