21 快速排序

把递归思想用到排序中

大纲

快速排序算法
找出文本高频词

快速排序算法

在前面理解的递归基础上，我们来理解一种更快的排序方法，快速排序。快速排序使用了递归的思路，如果一个列表长度小于2，那么它不需要排序了，就直接输出这个列表，这是我们的极端状态，如果不是极端状态，我们就把这个列表进行分拆缩减，使之往极端状态走，这里分拆的做法是将先找一个基准值mid，再把列表中大于基准值的放在一堆，小于基准值的放在一堆，这样就实现了分拆，分拆后对两堆数字分别进行递归排序的调用。

def quick_sort_list(x):
    if len(x)<2:
        return x
    else:
        mid = x[0]
        less = [value for value in x[1:] if value <= mid]
        greater = [value for value in x[1:] if value > mid]
        output = quick_sort_list(less)+[mid]+quick_sort_list(greater)
        return output

quick_sort_list([14,6,7,4,9,30])

[4, 6, 7, 9, 14, 30]

上面代码中，极端状态就是当len(x)<2的状态，也就是说列表中只有一个元素。当有多个元素时，会将第一个元素做为基准值，让所有其它元素去和基准值做对比，如果小于基准值的放一起，大于的放一起，分别是用列表解析，存在less和greater中，这样就实现了分拆，再对分拆后的列表再进行排序调用。

我们可以将上面函数稍微修改一下，用来对课程成绩进行排序。

def quick_sort_list(x):
    if len(x)<2:
        return x
    else:
        mid = x[0]
        less = [value for value in x[1:] if value[1] <= mid[1]]
        greater = [value for value in x[1:] if value[1] > mid[1]]
        output = quick_sort_list(less)+[mid]+quick_sort_list(greater)
        return output

course = [['语文',85],['数学',90],['英语',94],['历史',87],['体育',85],['音乐',98]]
quick_sort_list(course)

[['体育', 85], ['语文', 85], ['历史', 87], ['数学', 90], ['英语', 94], ['音乐', 98]]

找到高频词

词频统计问题：来处理一个有趣的例子，我们将一篇英文短文输入成一个长长的字符串，然后这个短文分切成一个列表，再看一下这个列表中，单词出现最高的前三个单词是哪三个。

word_string = '''Last week I went to the theatre.
I had a very good seat. 
The play was very interesting. 
I did not enjoy it.
A young man and a young woman were sitting behind me. 
They were talking loudly. 
I got very angry. 
I could not hear the actors.
I turned round. 
I looked at the man and the woman angrily.
They did not pay any attention. 
In the end, I could not bear it.
I turned round again.
"I can't hear a word!" I said angrily.
It's none of your business," the young man said rudely.
"This is a private conversation!"
'''

在python中除了可以用双引号和单引号来定义字符串以外，还可以用三引号来定义这种大段的字符串。

word_string

'Last week I went to the theatre.\nI had a very good seat. \nThe play was very interesting. \nI did not enjoy it.\nA young man and a young woman were sitting behind me. \nThey were talking loudly. \nI got very angry. \nI could not hear the actors.\nI turned round. \nI looked at the man and the woman angrily.\nThey did not pay any attention. \nIn the end, I could not bear it.\nI turned round again.\n"I can\'t hear a word!" I said angrily.\nIt\'s none of your business," the young man said rudely.\n"This is a private conversation!"\n'

观察这个字符串变量，你会发现，其中有一个奇怪的符号，比如"\n"这种，这个是表示了换行符号。

text_list = word_string.split()

使用前面学到的字符串分拆成列表的方法，用spllit函数进行分解，它默认使用换行、空格等符号进行分解。分解后得到一个列表，每个元素是一个单词。

text_list = [w.strip('.,"!') for w in text_list]

但是单词前后还会有一些标点符号，这些是我们不需要的，使用strip函数把这些符号删除。

word_freq_list = []
for word in text_list:
    word_list = [word[0] for word in word_freq_list]
    freq_list = [word[1] for word in word_freq_list]
    if word not in word_list:
        word_freq_list.append([word,1])
    else:
        num = word_list.index(word)
        freq_list[num] = freq_list[num] + 1
        word_freq_list[num][1] = freq_list[num]

用一个嵌套列表来保存需要的信息，即单词和对应的出现次数。我们遍历了所有单词text_list，如果之前没遇到过这个单词，就在列表中保存这个单词和对应的次数1，如果之前遇到过这个单词，就将次数加1.

得到单词和次数的信息最终保存在变量word_freq_list中，这种信息称为词频表。它的前四个元素就是下面这个样子。

word_freq_list[:4]

[['Last', 1], ['week', 1], ['I', 11], ['went', 1]]

然后我们就可以拿排序函数去排序了。

sorted_word = quick_sort_list(word_freq_list)

sorted_word[-3:]

[['a', 4], ['the', 6], ['I', 11]]

可以发现频数最高的几个单词是最常见的几个单词。

Previous20 递归算法实操 Next22 汉诺塔游戏

Last updated 4 years ago

Was this helpful?