使用 gdb 调试运行中的 Python 进程

假设一个服务器上运行了下面这样的 test.py 程序,我们怎样才能知道程序是否在正常运行,运行到哪一步了呢?

import time


def do(x):
    time.sleep(10)


def main():
    for x in range(10000):
        do(x)


if __name__ == '__main__':
    main()

这个程序既没有日志也没有 print 输出,通过查看日志文件/标准输出/标准错误是没有办法确认程序状况的。 一种可行的办法就是使用 gdb 来查看程序当前的运行状况。

测试环境

  • 系统: Ubuntu 16.04.1 LTS
  • Python: 2.7.12

准备工作

安装 gdb 和 python2.7-dbg:

$ sudo apt-get install gdb python2.7-dbg

设置 /proc/sys/kernel/yama/ptrace_scope:

$ echo 0 |sudo tee /proc/sys/kernel/yama/ptrace_scope

运行 test.py:

$ python test.py &
[1] 6489

通过 gdb python PID 来调试运行中的进程:

$ gdb python 6489
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug/.build-id/90/d1300febaeb0a626baa2540d19df2416cd3361.debug...done.
done.
...
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug//lib/i386-linux-gnu/ld-2.23.so...done.
done.
0xb778fc31 in __kernel_vsyscall ()
(gdb)

生成 core file

为了不影响运行中的进程,可以通过生成 core file 的方式来保存进程的当前信息:

(gdb) generate-core-file
warning: target file /proc/6489/cmdline contained unexpected null characters
Saved corefile core.6489
(gdb) quit
A debugging session is active.

    Inferior 1 [process 6489] will be detached.

Quit anyway? (y or n) y

可以通过 gdb python core.PID 的方式来读取 core file:

$ gdb python core.6489
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
...
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug/.build-id/90/d1300febaeb0a626baa2540d19df2416cd3361.debug...done.
done.

warning: core file may not match specified executable file.
[New LWP 6489]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `python'.
#0  0xb778fc31 in __kernel_vsyscall ()
(gdb)

可用的 python 相关的命令

可以通过输入 py 然后加 tab 键的方式来查看可用的命令:

(gdb) py
py-bt               py-down             py-locals           py-up               python-interactive
py-bt-full          py-list             py-print            python

可以通过 help cmd 查看各个命令的说明:

(gdb) help py-bt
Display the current python frame and all the frames within its call stack (if any)

当前执行位置的源码

(gdb) py-list
   1    # -*- coding: utf-8 -*-
   2    import time
   3
   4
   5    def do(x):
  >6        time.sleep(10)
   7
   8
   9    def main():
  10        for x in range(10000):
  11            do(x)
(gdb)

可以看到当前正在执行 time.sleep(10)

当前位置的调用栈

(gdb) py-bt
Traceback (most recent call first):
  <built-in function sleep>
  File "test.py", line 6, in do
    time.sleep(10)
  File "test.py", line 11, in main
    do(x)
  File "test.py", line 15, in <module>
    main()
(gdb)

可以看出来是 main() -> do(x) -> time.sleep(10)

查看变量的值

(gdb) py-list
   1    # -*- coding: utf-8 -*-
   2    import time
   3
   4
   5    def do(x):
  >6        time.sleep(10)
   7
   8
   9    def main():
  10        for x in range(10000):
  11            do(x)
(gdb) py-print x
local 'x' = 12
(gdb)
(gdb) py-locals
x = 12
(gdb)

查看上层调用方的信息

(gdb) py-up
#9 Frame 0xb74c0994, for file test.py, line 11, in main (x=12)
    do(x)
(gdb) py-list
   6        time.sleep(10)
   7
   8
   9    def main():
  10        for x in range(10000):
 >11            do(x)
  12
  13
  14    if __name__ == '__main__':
  15        main()
(gdb) py-print x
local 'x' = 12
(gdb)

可以通过 py-down 回去:

(gdb) py-down
#6 Frame 0xb74926e4, for file test.py, line 6, in do (x=12)
    time.sleep(10)
(gdb) py-list
   1    # -*- coding: utf-8 -*-
   2    import time
   3
   4
   5    def do(x):
  >6        time.sleep(10)
   7
   8
   9    def main():
  10        for x in range(10000):
  11            do(x)
(gdb)

调试多线程程序

测试程序 test2.py:

# -*- coding: utf-8 -*-
from threading import Thread
import time


def do(x):
    x = x * 3
    time.sleep(x * 60)


def main():
    threads = []
    for x in range(1, 3):
        t = Thread(target=do, args=(x,))
        t.start()
    for x in threads:
        x.join()


if __name__ == '__main__':
    main()
$ python test2.py &
[2] 12281

查看所有线程

info threads

$ gdb python core.12281

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0xb74b9700 (LWP 11039) 0xb7711c31 in __kernel_vsyscall ()
  2    Thread 0xb73b8b40 (LWP 11040) 0xb7711c31 in __kernel_vsyscall ()
  3    Thread 0xb69ffb40 (LWP 11041) 0xb7711c31 in __kernel_vsyscall ()
(gdb)

可以看到这个程序当前有 3 个线程, 当前进入的是 1 号线程。

切换线程

thread ID

(gdb) thread 3
[Switching to thread 3 (Thread 0xb69ffb40 (LWP 11041))]
#0  0xb7711c31 in __kernel_vsyscall ()
(gdb) info threads
  Id   Target Id         Frame
  1    Thread 0xb74b9700 (LWP 11039) 0xb7711c31 in __kernel_vsyscall ()
  2    Thread 0xb73b8b40 (LWP 11040) 0xb7711c31 in __kernel_vsyscall ()
* 3    Thread 0xb69ffb40 (LWP 11041) 0xb7711c31 in __kernel_vsyscall ()
(gdb)

现在切换到了 3 号线程。

可以通过前面所说的 py- 命令来查看当前线程的其他信息:

[Current thread is 1 (Thread 0xb74b9700 (LWP 11039))]
(gdb) py-list
 335            waiter.acquire()
 336            self.__waiters.append(waiter)
 337            saved_state = self._release_save()
 338            try:    # restore state no matter what (e.g., KeyboardInterrupt)
 339                if timeout is None:
>340                    waiter.acquire()
 341                    if __debug__:
 342                        self._note("%s.wait(): got it", self)
 343                else:
 344                    # Balancing act:  We can't afford a pure busy loop, so we
 345                    # have to sleep; but if we sleep the whole timeout time,
(gdb) thread 2
[Switching to thread 2 (Thread 0xb73b8b40 (LWP 11040))]
#0  0xb7711c31 in __kernel_vsyscall ()
(gdb) py-list
   3    import time
   4
   5
   6    def do(x):
   7        x = x * 3
  >8        time.sleep(x * 60)
   9
  10
  11    def main():
  12        threads = []
  13        for x in range(1, 3):
(gdb)

同时操作所有线程

thread apply all CMDt a a CMD

(gdb) thread apply all py-list

Thread 3 (Thread 0xb69ffb40 (LWP 11041)):
   3    import time
   4
   5
   6    def do(x):
   7        x = x * 3
  >8        time.sleep(x * 60)
   9
  10
  11    def main():
  12        threads = []
  13        for x in range(1, 3):

Thread 2 (Thread 0xb73b8b40 (LWP 11040)):
   3    import time
   4
   5
   6    def do(x):
   7        x = x * 3
  >8        time.sleep(x * 60)
   9
  10
  11    def main():
  12        threads = []
  13        for x in range(1, 3):

---Type <return> to continue, or q <return> to quit---
Thread 1 (Thread 0xb74b9700 (LWP 11039)):
 335            waiter.acquire()
 336            self.__waiters.append(waiter)
 337            saved_state = self._release_save()
 338            try:    # restore state no matter what (e.g., KeyboardInterrupt)
 339                if timeout is None:
>340                    waiter.acquire()
 341                    if __debug__:
 342                        self._note("%s.wait(): got it", self)
 343                else:
 344                    # Balancing act:  We can't afford a pure busy loop, so we
 345                    # have to sleep; but if we sleep the whole timeout time,
(gdb)

常用的 gdb python 相关的操作就是这些, 同时也不要忘记原来的 gdb 命令都是可以使用的哦。


Comments