Python: Debugging running processes by means of preset backdoors (debugging interfaces)

Preface

Whenever there is a problem with a running program/service, we want to be able to locate the cause of the problem as soon as possible, to be able to reproduce the problem and to solve it. We usually think that it would be good to know what the program is doing, it would be good to reproduce the problem immediately, and it would be good to know the value of the current variables in the program. There are many ways to know this information, and this article introduces an idea and method to debug a running process by means of a preset backdoor.

A common way to preset the backdoor is by defining a signal handler that performs specific, easy to debug and find problems when the corresponding signal is received. For example, the following two examples will be given:

  • Outputs specific information for debugging when a signal is received
  • Enable remote debugging service when signal is received

Outputs specific information for debugging when a signal is received

For example, the traceback information of the current program is output so that you can know where the current program is running and even the values of local and global variables at the location where the code is executed.

n the following example program, the program outputs the traceback information of the program when the USR1 signal is received:

# -*- coding: utf-8 -*-
from queue import Queue
import signal
import sys
import threading
import time
import traceback


def output_tracebacks(signum, frame):
    id2thread = {}
    for thread in threading.enumerate():
        id2thread[thread.ident] = thread
    for thread_id, stack in sys._current_frames().items():
        stack_list = traceback.format_list(traceback.extract_stack(stack))
        print('thread {}:'.format(id2thread[thread_id]))
        print(''.join(stack_list))


def setup_backdoor():
    signal.signal(signal.SIGUSR1, output_tracebacks)


def worker(q):
    while True:
        task = q.get()
        if task is None:
            break
        # do something with task
        time.sleep(1.2)


def producer(q):
    for x in range(100):
        q.put(x)
        time.sleep(1)
    q.put(None)


setup_backdoor()
q = Queue()
t1 = threading.Thread(target=producer, args=(q,))
t1.start()
t2 = threading.Thread(target=worker, args=(q,))
t2.start()
for t in [t1, t2]:
    t.join()

Run the program and activate the backdoor with the USR1 signal to get the traceback information of the program.

$ python testa.py &
[1] 79163
$ kill -s USR1 79163
thread <Thread(Thread-2, started 123145565609984)>:
  File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/xxx/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "testa.py", line 30, in worker
    time.sleep(1.2)

thread <Thread(Thread-1, started 123145560354816)>:
  File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/xxx/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "testa.py", line 36, in producer
    time.sleep(1)

thread <_MainThread(MainThread, started 140736812057536)>:
  File "testa.py", line 47, in <module>
    t.join()
  File "/xxx/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/xxx/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File "testa.py", line 15, in output_tracebacks
    stack_list = traceback.format_list(traceback.extract_stack(stack))

For more on getting traceback information see Python: Getting traceback information for concurrent programs (threading/gevent/asyncio)

Enable remote debugging service when signal is received

For example, if you open a remote debugger (Python Shell) that uses the current runtime environment of the process, you can access the global variables that change at runtime in this debugger and execute code using the current runtime environment of the process:

# -*- coding: utf-8 -*-
from code import InteractiveConsole
from queue import Queue
import signal
import socketserver
import sys
import threading
import time
import traceback


class FileLikeObject(object):
    def __init__(self, rfile, wfile):
        self._rfile = rfile
        self._wfile = wfile

    def __getattr__(self, name):
        try:
            return getattr(self._rfile, name)
        except AttributeError:
            return getattr(self._wfile, name)

    def write(self, data):
        if not isinstance(data, bytes):
            data = data.encode('utf-8')
        self._wfile.write(data)

    def isatty(self):
        return True

    def flush(self):
        pass

    def readline(self, *args):
        try:
            data = self._rfile.readline(*args).replace(b'\r\n', b'\n')
            if not isinstance(data, str):
                data = data.decode('utf-8')
            return data
        except UnicodeError:
            return ''


class DebuggerTCPHandler(socketserver.StreamRequestHandler):

    def handle(self):
        fileobj = FileLikeObject(self.rfile, self.wfile)
        sys.stdin = sys.stdout = sys.stderr = fileobj

        try:
            console = InteractiveConsole(locals=globals())
            console.interact(banner='== debug server ==', exitmsg='')
        except SystemExit:
            pass
        finally:
            sys.stdin = sys.__stdin__
            sys.stdout = sys.__stdout__
            sys.stderr = sys.__stderr__


def output_tracebacks():
    id2thread = {}
    for thread in threading.enumerate():
        id2thread[thread.ident] = thread
    for thread_id, stack in sys._current_frames().items():
        stack_list = traceback.format_list(traceback.extract_stack(stack))
        print('thread {}:'.format(id2thread[thread_id]))
        print(''.join(stack_list))


debugger = None


def start_debugger(signum, frame):
    print('start debugger...')
    server = socketserver.TCPServer(('localhost', 9999), DebuggerTCPHandler)
    t = threading.Thread(target=server.serve_forever)
    t.start()
    global debugger
    debugger = (server, t)
    print('started debugger')


def close_debugger(signum, frame):
    print('close debugger...')
    if debugger is None:
        print('closed debugger')
        return

    server, t = debugger
    server.shutdown()
    server.server_close()
    t.join()
    print('closed debugger')


def setup_backdoor():
    signal.signal(signal.SIGUSR1, start_debugger)
    signal.signal(signal.SIGUSR2, close_debugger)


def worker(q):
    while True:
        task = q.get()
        if task is None:
            break
        # do something with task
        time.sleep(1.2)


def producer(q):
    for x in range(100):
        q.put(x)
        time.sleep(1)
    q.put(None)


setup_backdoor()
q = Queue()
t1 = threading.Thread(target=producer, args=(q,))
t1.start()
t2 = threading.Thread(target=worker, args=(q,))
t2.start()
for t in [t1, t2]:
    t.join()

Run the program, activate the remote debugger via USR1 and close the remote debugging service via USR2 after debugging:

$ python testb.py &
[1] 87173
$ kill -s USR1 87173
start debugger...
started debugger
$
$ telnet 127.0.0.1 9999
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
== debug server ==
>>> output_tracebacks()
thread <Thread(Thread-3, started 123145482240000)>:
  File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/xxx/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/xxx/lib/python3.6/socketserver.py", line 238, in serve_forever
    self._handle_request_noblock()
  File "/xxx/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/xxx/lib/python3.6/socketserver.py", line 348, in process_request
    self.finish_request(request, client_address)
  File "/xxx/lib/python3.6/socketserver.py", line 361, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/xxx/lib/python3.6/socketserver.py", line 696, in __init__
    self.handle()
  File "testb.py", line 52, in handle
    console.interact(banner='== debug server ==', exitmsg='')
  File "/xxx/lib/python3.6/code.py", line 233, in interact
    more = self.push(line)
  File "/xxx/lib/python3.6/code.py", line 259, in push
    more = self.runsource(source, self.filename)
  File "/xxx/lib/python3.6/code.py", line 75, in runsource
    self.runcode(code)
  File "/xxx/lib/python3.6/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "testb.py", line 66, in output_tracebacks
    stack_list = traceback.format_list(traceback.extract_stack(stack))

thread <Thread(Thread-2, started 123145476984832)>:
  File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/xxx/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "testb.py", line 108, in worker
    time.sleep(1.2)

thread <Thread(Thread-1, started 123145471729664)>:
  File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/xxx/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "testb.py", line 114, in producer
    time.sleep(1)

thread <_MainThread(MainThread, started 140736812057536)>:
  File "testb.py", line 125, in <module>
    t.join()
  File "/xxx/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/xxx/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):

>>> q
<queue.Queue object at 0x10a2e8fd0>
>>> q.qsize()
13
>>> q.qsize()
14
>>> exit()
Connection closed by foreign host.

$ jobs
[1]+  Running                 python testb.py &
$ kill -s USR2 87173
close debugger...
closed debugger

$ telnet 127.0.0.1 9999
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host

The above code is just a rough demonstration of how to implement a remote debugger, for a formal remote debugger you can refer to and use gevent.backdoor or twisted.conch.manhole ionelmc/python-manhole and other full-featured third-party modules.

Summary

The above two examples are just common pre-set backdoors, in fact, you can also pre-set other functions (for example, pre-set an HTTP server, access different URLs to get different runtime information or do some auxiliary debugging operations), all for debugging, all in order to locate and solve problems as soon as possible. Which backdoors need to be preset should be determined by the actual situation, on the one hand, to consider whether it will affect the normal operation of the service, on the other hand, we also need to consider which way which information can help us locate and solve problems faster, and most importantly to consider security issues, to do a good job of security protection, do not expose the port to the Internet.

Although the title and the examples in the article are Python related, the idea is not limited to Python, but can be applied to services written in other languages as well. Feel free to share and discuss debugging techniques and problem solving with me.


Comments