1
0
Files
BSc-Thesis/thesis/src/02.intercept.tex

272 lines
11 KiB
TeX

\chapter{Intercepting Function Calls}\label{ch:intercepting-function-calls}
Lorem Ipsum.
\section{Identified Methods for Intercepting Function and System Calls}\label{sec:methods-for-intercepting}
Lorem Ipsum.
\subsection{\texttt{ptrace} System Call}\label{subsec:ptrace}
The first thing that pops up when researching on how to intercept system calls in Linux is the \texttt{ptrace} (``process trace'') system call.
This system call allows one process to observe and control the execution of another process (including memory and registers).
The control is handed from the traced process to the tracing process each time any signal is delivered.
\cite{ptrace.2}
To make use of this system call, a corresponding command already exists.
See~\ref{subsec:strace}.
\subsection{\texttt{strace} Command}\label{subsec:strace}
The \texttt{strace} (``system call/signal trace'') command may be used to run a specified command and to thereby intercept and record the system calls which are made.
Each system call is recorded as a line and either written to the standard error output or a specified file.
\cite{strace.1}
Listings \ref{lst:main.c} and \ref{lst:strace} give a simple example of what this output looks like.
It is clearly visible that only (``pure'') system calls are recorded, and calls to library functions (like \texttt{malloc} or \texttt{free}) do not appear.
Also note that, arguments to the calls are displayed in a ``pretty'' way.
For example, strings arguments would be simple pointers, but \texttt{strace} displays them as C-like strings.
\begin{listing}[htbp]
\begin{minted}[linenos]{c}
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(const int argc, char *const argv[]) {
char *str = malloc(10);
strcpy(str, "Abc123");
printf("Hello World!\nString: %s\n", str);
free(str);
}
\end{minted}
\caption{Contents of \texttt{main.c}.}
\label{lst:main.c}
\end{listing}
\begin{listing}[htbp]
\begin{minted}{text}
execve("./main", ["./main"], 0x7ffd63b32bb0 /* 71 vars */) = 0
[-- 32 lines omitted --]
write(1, "Hello World!\n", 13) = 13
write(1, "String: Abc123\n", 15) = 15
exit_group(0) = ?
+++ exited with 0 +++
\end{minted}
\caption{Output of \texttt{strace ./main}.}
\label{lst:strace}
\end{listing}
This approach works great for debugging and other use-cases,
but only intercepting system calls does not statisfy the requirements for this work.
\subsection{\texttt{ltrace} Command}\label{subsec:ltrace}
The \texttt{ltrace} (``library call trace'') command may be used to trace dynamic library calls instead of system calls.
It works similarly to \texttt{strace} (see \ref{subsec:strace}).
\cite{ltrace.1}
Listings \ref{lst:main.c} and \ref{lst:ltrace} illustrate what the output of \texttt{ltrace} looks like.
In contrast to the output of \texttt{strace} now only ``real'' calls to library functions are included in the output.
Therefore, a lot less ``noise'' is generated (see omitted lines in listing \ref{lst:strace}).
Again, the function arguments are displayed in a ``pretty'' way.
This command uses so-called prototype functions~\cite{ltrace.conf.5} to format function arguments.
\begin{listing}[htbp]
\begin{minted}{text}
malloc(10) = 0x55624164b2a0
printf("Hello World!\nString: %s\n", "Abc123") = 28
free(0x55624164b2a0) = <void>
+++ exited (status 0) +++
\end{minted}
\caption{Output of \texttt{ltrace ./main}.}
\label{lst:ltrace}
\end{listing}
This method fits the requirements for this work a lot better than \texttt{strace} (see~\ref{subsec:strace}),
but it is not very flexible and offers no means to modify the intercepted function calls.
\subsection{Wrapper Functions in gcc}\label{subsec:wrapper-functions}
A different approach to intercepting function calls is to tell the compiler directly, which functions should be intercepted.
The compiler, and the linker respectively, then directly link calls to the specified functions to wrapper functions.
(See \ref{subsec:preloading} for more details.)
The default linker \texttt{ld} includes such a feature.
See the OPTIONS section in the ld(1) Linux manual page~\cite{ld.1}:
\begin{quote}
\begin{description}
\item[\texttt{-{}-wrap=\textit{symbol}}]
Use a wrapper function for \texttt{\textit{symbol}}.
Any undefined reference to \texttt{\textit{symbol}} will be resolved to \texttt{\_\_wrap\_\textit{symbol}}.
Any undefined reference to \texttt{\_\_real\_\textit{symbol}} will be resolved to \texttt{\textit{symbol}}.
This can be used to provide a wrapper for a system function.
The wrapper function should be called \texttt{\_\_wrap\_\textit{symbol}}.
If it wishes to call the system function, it should call \texttt{\_\_real\_\textit{symbol}}.
\lbrack\dots\rbrack
\end{description}
\end{quote}
The gcc compiler also supports this, by allowing to pass options to the linker.
See the OPTIONS section in the gcc(1) Linux manual page~\cite{gcc.1}:
\begin{quote}
\begin{description}
\item[\texttt{-Wl,\textit{option}}]
Pass \texttt{\textit{option}} as an option to the linker.
If \texttt{\textit{option}} contains commas, it is split into multiple options at the commas.
You can use this syntax to pass an argument to the option.
For example, \texttt{-Wl,-Map,output.map} passes \texttt{-Map output.map} to the linker.
When using the GNU linker, you can also get the same effect with \texttt{-Wl,-Map=output.map}.
\lbrack\dots\rbrack
\end{description}
\end{quote}
This means, by specifying \texttt{-Wl,-{}-wrap=\textit{symbol}} when compiling using gcc,
all calls from the currently compiled program to \texttt{\textit{symbol}} are redirected to \texttt{\_\_wrap\_\textit{symbol}}.
To call the real function inside the wrapper, \texttt{\_\_real\_\textit{symbol}} may be used.
Listings \ref{lst:wrap.c} and \ref{lst:wrap} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
\begin{listing}[htbp]
\begin{minted}[linenos]{c}
#include <stddef.h>
extern void *__real_malloc(size_t size);
void *__wrap_malloc(size_t size) {
// before call to malloc
void *ret = __real_malloc(size);
// after call to malloc
return ret;
}
\end{minted}
\caption{Contents of \texttt{wrap.c}.}
\label{lst:wrap.c}
\end{listing}
\begin{listing}[htbp]
\begin{minted}{shell}
gcc -o main_wrapped main.c wrap.c -Wl,--wrap=malloc
./main_wrapped
\end{minted}
\caption{Compile \texttt{main.c} and \texttt{wrap.c} and run the resulting program.}
\label{lst:wrap}
\end{listing}
This approach allows wrapping any function in a relatively clean way.
But it is not possible to override functions in any given binary program.
It is required to re-compile (or to re-link) a given program to use this feature of ld.
Therefore, the source code (or the corresponding \texttt{*.out} files) needs to be available.
Note, only calls from the targeted source code will be redirected, calls from other libraries won't.
Theoretically, it should be possible to re-link a given binary without having access to its source code.
But due to other more straight-forward methods (see \ref{subsec:preloading}), this has not been further investigated.
\subsection{Preloading using \texttt{LD\_PRELOAD}}\label{subsec:preloading}
To execute binary files on Linux systems, a dynamic linker is needed at runtime.
(Unless the binaries were statically linked at compile-time.)
Usually, \texttt{ld.so} and \texttt{ld-linux.so} are used as dynamic linkers.
They find and load the shared objects (shared libraries) needed by a program, prepare the program and finally run it.
\cite{ld.so.8}
As the overwhelming majority of programs are dynamically linked,
most function calls to other libraries (like to the C standard library) reference a shared object, which has to be loaded by the linker at runtime.
Therefore, it would be possible to ``hijack'' (or intercept) these function calls,
when the linker would allow loading other functions instead of the proper ones.
Luckily, \texttt{ld.so} allows this so-called ``preloading''.
See the ENVIRONMENT section in the ld.so(8) Linux manual page~\cite{ld.so.8}:
\begin{quote}
\begin{description}
\item[\texttt{LD\_PRELOAD}]
A list of additional, user-specified, ELF shared objects to be loaded before all others.
This feature can be used to selectively override functions in other shared objects.
\lbrack\dots\rbrack
\end{description}
\end{quote}
This means, by setting the environment variable \texttt{LD\_PRELOAD}, it is possible to override specific functions.
Listings \ref{lst:preload.c} and \ref{lst:preload} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
\begin{listing}[htbp]
\begin{minted}[linenos]{c}
#include <stdlib.h>
#include <dlfcn.h>
#include <errno.h>
void *malloc(size_t size) {
// before call to malloc
void *(*_malloc)(size_t);
if ((_malloc = dlsym(RTLD_NEXT, "malloc")) == NULL) {
errno = ENOSYS;
return NULL;
}
void *ret = _malloc(size);
// after call to malloc
return ret;
}
\end{minted}
\caption{Contents of \texttt{preload.c}.}
\label{lst:preload.c}
\end{listing}
\begin{listing}[htbp]
\begin{minted}{shell}
# ./main is already compiled and ready
gcc -shared -fPIC -o preload.so preload.c
LD_PRELOAD="$(pwd)/preload.so" ./main
\end{minted}
\caption{Compile \texttt{preload.c} and run a program with \texttt{LD\_PRELOAD}.}
\label{lst:preload}
\end{listing}
The function \texttt{dlsym} is used to retrieve the original address of the \texttt{malloc} function.
\texttt{RTLD\_NEXT} indicates to find the next occurrence of \texttt{malloc} in the search order after the current object.
\cite{dlsym.3}
By using this method, it is possible to override, and therefore wrap, any function as long as the targeted binary was not statically linked.
Although, one has to be aware that not only function calls inside the targeted binary, but also calls inside other libraries (e.g., to \texttt{malloc}) are redirected to the overriding function.
\subsection{Conclusion}\label{subsec:conclusion}
Lorem Ipsum.
\section{Combining Preloading and Wrapper Functions}\label{sec:combining-preloading-and-wrapper-functions}
Lorem Ipsum.
\section{Retrieving Function Argument Values}\label{sec:Retrieving-function-argument-values}
Lorem Ipsum.
\section{Determining Function Call Location}\label{sec:determining-function-call-location}
Lorem Ipsum.
\section{Example}\label{sec:intercepting-example}
Lorem Ipsum.
\section{Analyzing Intercepted Function Calls}\label{sec:analyzing-intercepted-function-calls}
Lorem Ipsum.
\section{Parsing Intercepted Function Calls in Python}\label{sec:parsing-intercepted-function-calls}
Lorem Ipsum.
\section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls}
Lorem Ipsum.