Preface¶
Whenever there is a problem with a running program/service, we want to be able to locate the cause of the problem as soon as possible, to be able to reproduce the problem and to solve it. We usually think that it would be good to know what the program is doing, it would be good to reproduce the problem immediately, and it would be good to know the value of the current variables in the program. There are many ways to know this information, and this article introduces an idea and method to debug a running process by means of a preset backdoor.
A common way to preset the backdoor is by defining a signal handler that performs specific, easy to debug and find problems when the corresponding signal is received. For example, the following two examples will be given:
- Outputs specific information for debugging when a signal is received
- Enable remote debugging service when signal is received
Outputs specific information for debugging when a signal is received¶
For example, the traceback information of the current program is output so that you can know where the current program is running and even the values of local and global variables at the location where the code is executed.
n the following example program, the program outputs the traceback information of the program when the USR1 signal is received:
# -*- coding: utf-8 -*-
from queue import Queue
import signal
import sys
import threading
import time
import traceback
def output_tracebacks(signum, frame):
id2thread = {}
for thread in threading.enumerate():
id2thread[thread.ident] = thread
for thread_id, stack in sys._current_frames().items():
stack_list = traceback.format_list(traceback.extract_stack(stack))
print('thread {}:'.format(id2thread[thread_id]))
print(''.join(stack_list))
def setup_backdoor():
signal.signal(signal.SIGUSR1, output_tracebacks)
def worker(q):
while True:
task = q.get()
if task is None:
break
# do something with task
time.sleep(1.2)
def producer(q):
for x in range(100):
q.put(x)
time.sleep(1)
q.put(None)
setup_backdoor()
q = Queue()
t1 = threading.Thread(target=producer, args=(q,))
t1.start()
t2 = threading.Thread(target=worker, args=(q,))
t2.start()
for t in [t1, t2]:
t.join()
Run the program and activate the backdoor with the USR1 signal to get the traceback information of the program.
$ python testa.py &
[1] 79163
$ kill -s USR1 79163
thread <Thread(Thread-2, started 123145565609984)>:
File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/xxx/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "testa.py", line 30, in worker
time.sleep(1.2)
thread <Thread(Thread-1, started 123145560354816)>:
File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/xxx/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "testa.py", line 36, in producer
time.sleep(1)
thread <_MainThread(MainThread, started 140736812057536)>:
File "testa.py", line 47, in <module>
t.join()
File "/xxx/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/xxx/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File "testa.py", line 15, in output_tracebacks
stack_list = traceback.format_list(traceback.extract_stack(stack))
For more on getting traceback information see Python: Getting traceback information for concurrent programs (threading/gevent/asyncio)
Enable remote debugging service when signal is received¶
For example, if you open a remote debugger (Python Shell) that uses the current runtime environment of the process, you can access the global variables that change at runtime in this debugger and execute code using the current runtime environment of the process:
# -*- coding: utf-8 -*-
from code import InteractiveConsole
from queue import Queue
import signal
import socketserver
import sys
import threading
import time
import traceback
class FileLikeObject(object):
def __init__(self, rfile, wfile):
self._rfile = rfile
self._wfile = wfile
def __getattr__(self, name):
try:
return getattr(self._rfile, name)
except AttributeError:
return getattr(self._wfile, name)
def write(self, data):
if not isinstance(data, bytes):
data = data.encode('utf-8')
self._wfile.write(data)
def isatty(self):
return True
def flush(self):
pass
def readline(self, *args):
try:
data = self._rfile.readline(*args).replace(b'\r\n', b'\n')
if not isinstance(data, str):
data = data.decode('utf-8')
return data
except UnicodeError:
return ''
class DebuggerTCPHandler(socketserver.StreamRequestHandler):
def handle(self):
fileobj = FileLikeObject(self.rfile, self.wfile)
sys.stdin = sys.stdout = sys.stderr = fileobj
try:
console = InteractiveConsole(locals=globals())
console.interact(banner='== debug server ==', exitmsg='')
except SystemExit:
pass
finally:
sys.stdin = sys.__stdin__
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
def output_tracebacks():
id2thread = {}
for thread in threading.enumerate():
id2thread[thread.ident] = thread
for thread_id, stack in sys._current_frames().items():
stack_list = traceback.format_list(traceback.extract_stack(stack))
print('thread {}:'.format(id2thread[thread_id]))
print(''.join(stack_list))
debugger = None
def start_debugger(signum, frame):
print('start debugger...')
server = socketserver.TCPServer(('localhost', 9999), DebuggerTCPHandler)
t = threading.Thread(target=server.serve_forever)
t.start()
global debugger
debugger = (server, t)
print('started debugger')
def close_debugger(signum, frame):
print('close debugger...')
if debugger is None:
print('closed debugger')
return
server, t = debugger
server.shutdown()
server.server_close()
t.join()
print('closed debugger')
def setup_backdoor():
signal.signal(signal.SIGUSR1, start_debugger)
signal.signal(signal.SIGUSR2, close_debugger)
def worker(q):
while True:
task = q.get()
if task is None:
break
# do something with task
time.sleep(1.2)
def producer(q):
for x in range(100):
q.put(x)
time.sleep(1)
q.put(None)
setup_backdoor()
q = Queue()
t1 = threading.Thread(target=producer, args=(q,))
t1.start()
t2 = threading.Thread(target=worker, args=(q,))
t2.start()
for t in [t1, t2]:
t.join()
Run the program, activate the remote debugger via USR1 and close the remote debugging service via USR2 after debugging:
$ python testb.py &
[1] 87173
$ kill -s USR1 87173
start debugger...
started debugger
$
$ telnet 127.0.0.1 9999
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
== debug server ==
>>> output_tracebacks()
thread <Thread(Thread-3, started 123145482240000)>:
File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/xxx/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/xxx/lib/python3.6/socketserver.py", line 238, in serve_forever
self._handle_request_noblock()
File "/xxx/lib/python3.6/socketserver.py", line 317, in _handle_request_noblock
self.process_request(request, client_address)
File "/xxx/lib/python3.6/socketserver.py", line 348, in process_request
self.finish_request(request, client_address)
File "/xxx/lib/python3.6/socketserver.py", line 361, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/xxx/lib/python3.6/socketserver.py", line 696, in __init__
self.handle()
File "testb.py", line 52, in handle
console.interact(banner='== debug server ==', exitmsg='')
File "/xxx/lib/python3.6/code.py", line 233, in interact
more = self.push(line)
File "/xxx/lib/python3.6/code.py", line 259, in push
more = self.runsource(source, self.filename)
File "/xxx/lib/python3.6/code.py", line 75, in runsource
self.runcode(code)
File "/xxx/lib/python3.6/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "testb.py", line 66, in output_tracebacks
stack_list = traceback.format_list(traceback.extract_stack(stack))
thread <Thread(Thread-2, started 123145476984832)>:
File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/xxx/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "testb.py", line 108, in worker
time.sleep(1.2)
thread <Thread(Thread-1, started 123145471729664)>:
File "/xxx/lib/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/xxx/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/xxx/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "testb.py", line 114, in producer
time.sleep(1)
thread <_MainThread(MainThread, started 140736812057536)>:
File "testb.py", line 125, in <module>
t.join()
File "/xxx/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/xxx/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
>>> q
<queue.Queue object at 0x10a2e8fd0>
>>> q.qsize()
13
>>> q.qsize()
14
>>> exit()
Connection closed by foreign host.
$ jobs
[1]+ Running python testb.py &
$ kill -s USR2 87173
close debugger...
closed debugger
$ telnet 127.0.0.1 9999
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host
The above code is just a rough demonstration of how to implement a remote debugger, for a formal remote debugger you can refer to and use gevent.backdoor or twisted.conch.manhole ionelmc/python-manhole and other full-featured third-party modules.
Summary¶
The above two examples are just common pre-set backdoors, in fact, you can also pre-set other functions (for example, pre-set an HTTP server, access different URLs to get different runtime information or do some auxiliary debugging operations), all for debugging, all in order to locate and solve problems as soon as possible. Which backdoors need to be preset should be determined by the actual situation, on the one hand, to consider whether it will affect the normal operation of the service, on the other hand, we also need to consider which way which information can help us locate and solve problems faster, and most importantly to consider security issues, to do a good job of security protection, do not expose the port to the Internet.
Although the title and the examples in the article are Python related, the idea is not limited to Python, but can be applied to services written in other languages as well. Feel free to share and discuss debugging techniques and problem solving with me.
References¶
- 18.8. signal — Set handlers for asynchronous events — Python 3.6.4 documentation
- Python: get traceback information(threading/gevent/asyncio) - mozillazg's blog
- gevent.backdoor – Interactive greenlet-based network console that can be used in any process — gevent 1.3.0.dev0 documentation
- twisted.conch.manhole : API documentation
- ionelmc/python-manhole: Debugging manhole for python applications.
- 21.21. socketserver — A framework for network servers — Python 3.6.4 documentation
- 30.1. code — Interpreter base classes — Python 3.6.4 documentation
Comments