Python的代码性能测试工具

Python有挺多代码性能测试的工具，分CPU和内存的性能测试。

CPU性能体现在运行时间上，工具有time、timeit、perf_counter、profile、cProfile、line_profiler等。内存性能检测体现在内存占用和内存泄漏上，工具有memory_profiler、objgraph等。

以一段插入排序的代码的性能测试为例：

test_profile.py

# coding: utf-8
"""A simple profile test."""

from random import randint
from time import perf_counter as pc
import cProfile
import pstats


def insertion_sort(data):
    """从列表data取出数据，有顺序地插入新的列表并返回"""
    result = []
    for value in data:
        insert_value(result, value)
    return result


def insert_value(array, value):
    """按照从小到大插入array"""
    for i, d in enumerate(array):
        if d > value:
            array.insert(i, value)
            return
    # array中没有元素比value大，插到最后
    array.append(value)


# 随机生成data
MAX_SIZE = 10 ** 4
data = [randint(0, MAX_SIZE) for _ in range(MAX_SIZE)]

# 用来被测试的函数
test = lambda: insertion_sort(data)

if __name__ == '__main__':
    test()

最简单的可以用time命令从命令行运行py文件：

time python3 test_profile.py

输出：

python3 test_profile.py  1.69s user 0.03s system 98% cpu 1.751 total

用Python3的time.perf_counter测试：

if __name__ == '__main__':
    # 利用Python3 time.perf_counter来输出程序运行时间
    t = pc()
    test()
    print(pc() - t)

直接运行代码输出：

1.638762446003966

利用标准库cProfile测试:

直接命令行运行最开始的test_profile.py代码：

python3 -m cProfile test_profile.py

可见输出信息很详细，调用了几次函数，耗时多少等。

在代码里面使用cProfile：

if __name__ == '__main__':
    cProfile.run('test()')

输出：

      20005 function calls in 1.618 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.618    1.618 <string>:1(<module>)
        1    0.003    0.003    1.618    1.618 test_profile.py:10(insertion_sort)
    10000    1.598    0.000    1.615    0.000 test_profile.py:18(insert_value)
        1    0.000    0.000    1.618    1.618 test_profile.py:33(<lambda>)
        1    0.000    0.000    1.618    1.618 {built-in method builtins.exec}
       12    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     9988    0.017    0.000    0.017    0.000 {method 'insert' of 'list' objects}

运行后输出比直接在命令行使用cProfile少了，但主要的性能信息都在，推荐在代码中使用cProfile。

主要的输出字段解释：

ncalls：函数运行次数
tottime：函数的总的运行时间，减去函数中调用子函数的运行时间
第一个percall：percall = tottime / nclalls
cumtime:函数及其所有子函数调整的运行时间，也就是函数开始调用到结束的时间。
第二个percall：percall = cumtime / nclalls

在代码中运行的好处还有就是配合pstats工具的使用：

if __name__ == '__main__':
    # cProfile.run('test()', 'profile_with_pstats')  # 为profile取名
    p = pstats.Stats('profile_with_pstats')
    p.strip_dirs().sort_stats(-1).print_stats()

    # 按time排序并显示前10行
    p.sort_stats('time').print_stats(10)

    # 先按time排序再按cum排序，只输出50%
    p.sort_stats('time', 'cumtime').print_stats(.5)

部分函数用法解释：

strip_dirs(): 移除模块名之前的路径信息
sort_stats(-1): 按filename:lineno(function)排序

代码优化：

分析代码性能是为了优化代码，通过分析可知这段代码的瓶颈在insert_value上，占用了绝大多数的代码运行时间。实际上也不难知道这种插入排序的性能是较低的。Python工程实践中一个重要的原则是尽量使用Python自带的、标准库中已经有的，不要重复造轮子，否则往往是事倍功半。写代码前先想想Python标准库中的数据结构和算法能不能满足自己的需求。事实上绝大多使用需求都可以从标准库中得到满足，毕竟Python都发展了近30年了，极其强大的标准库和优秀的第三方工具数不胜数，这也是Python的优势所在。

顺序插入可以使用标准库中的bisect，insert_value代码改进如下：

from bisect import bisect_left
def insert_value(array, value):
    i = bisect_left(array, value)
    array.insert(i, value)

输出结果：

      30005 function calls in 0.021 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.021    0.021 <string>:1(<module>)
        1    0.002    0.002    0.021    0.021 test_profile.py:10(insertion_sort)
    10000    0.003    0.000    0.019    0.000 test_profile.py:30(insert_value)
        1    0.000    0.000    0.021    0.021 test_profile.py:39(<lambda>)
    10000    0.005    0.000    0.005    0.000 {built-in method _bisect.bisect_left}
        1    0.000    0.000    0.021    0.021 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    10000    0.011    0.000    0.011    0.000 {method 'insert' of 'list' objects}

使用bisect_left的时间是原来的78分之一左右。效率极大地提高了。顺序查找的时间是线O(n)，折半查找的时间是O(logn)。加上遍历就是O(n^2)与O(nlogn)。所以效率相差是很大的。 n越大，效果越明显，当data设置为10万时，相差135倍。

更多参考资料：

Profile、cProfile官方文档
Python的7种性能测试工具
python性能检测工具整理
Python脚本性能剖析
《Effective Python: 59 Specific Ways to Write Better Python》

Python的代码性能测试工具

See Also

最近文章

标签

友情链接