540 lines
28 KiB
TeX
540 lines
28 KiB
TeX
|
|
\chapter{Intercepting Function Calls}\label{ch:intercepting-function-calls}
|
|
|
|
In this chapter all steps on how to intercept function calls in this work are discussed.
|
|
An example of what the resulting interception looks like may be found in section \ref{sec:intercepting-example}.
|
|
Furthermore, an overview on how to test given programs is presented in section \ref{sec:automated-testing-on-intercepted-function-calls}.
|
|
This chapter does not discuss how these function calls may be manipulated in any way.
|
|
For that see chapter \ref{ch:manipulating-function-calls}.
|
|
|
|
|
|
\section{Identified Methods for Intercepting Function and System Calls}\label{sec:methods-for-intercepting}
|
|
|
|
First, one has to answer the question on \textit{how exactly} to intercept function or system calls.
|
|
At the beginning of this work it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see\todo{Goal}).
|
|
This first section tries to list all possible methods on how to intercept function or system calls but does not claim completeness.
|
|
The order of the following subsections is roughly based on the thought process on finding the most appropriate method suitable for this work.
|
|
|
|
|
|
\subsection{\texttt{ptrace} System Call}\label{subsec:ptrace}
|
|
|
|
The first thing that pops up when researching on how to intercept system calls in Linux is the \texttt{ptrace} (``process trace'') system call.
|
|
This system call allows one process to observe and control the execution of another process (including memory and registers).
|
|
The control is handed from the traced process to the tracing process each time any signal is delivered.
|
|
\cite{ptrace.2}
|
|
|
|
To make use of this system call, a corresponding command already exists.
|
|
See~\ref{subsec:strace}.
|
|
|
|
|
|
\subsection{\texttt{strace} Command}\label{subsec:strace}
|
|
|
|
The \texttt{strace} (``system call/signal trace'') command may be used to run a specified command and to thereby intercept and record the system calls which are made.
|
|
Each system call is recorded as a line and either written to the standard error output or a specified file.
|
|
\cite{strace.1}
|
|
|
|
Listings \ref{lst:main.c} and \ref{lst:strace} give a simple example of what this output looks like.
|
|
It is clearly visible that only (``pure'') system calls are recorded, and calls to library functions (like \texttt{malloc} or \texttt{free}) do not appear.
|
|
Also note that arguments to the calls are displayed in a ``pretty'' way.
|
|
For example, string arguments would be simple pointers, but \texttt{strace} displays them as C-like strings.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[linenos]{c}{listings/main.c}
|
|
\caption{Contents of \texttt{main.c}.}
|
|
\label{lst:main.c}
|
|
\end{listing}
|
|
|
|
\begin{listing}[htbp]
|
|
\begin{minted}{text}
|
|
execve("./main", ["./main"], 0x7ffd63b32bb0 /* 71 vars */) = 0
|
|
[-- 32 lines omitted --]
|
|
write(1, "Hello World!\n", 13) = 13
|
|
write(1, "String: Abc123\n", 15) = 15
|
|
exit_group(0) = ?
|
|
+++ exited with 0 +++
|
|
\end{minted}
|
|
\caption{Output of \texttt{strace ./main}.}
|
|
\label{lst:strace}
|
|
\end{listing}
|
|
|
|
This approach works great for debugging and other use-cases,
|
|
but only intercepting system calls does not statisfy the requirements for this work.
|
|
|
|
|
|
\subsection{\texttt{ltrace} Command}\label{subsec:ltrace}
|
|
|
|
The \texttt{ltrace} (``library call trace'') command may be used to trace dynamic library calls instead of system calls.
|
|
It works similarly to \texttt{strace} (see \ref{subsec:strace}).
|
|
\cite{ltrace.1}
|
|
|
|
Listings \ref{lst:main.c} and \ref{lst:ltrace} illustrate what the output of \texttt{ltrace} looks like.
|
|
In contrast to the output of \texttt{strace} now only ``real'' calls to library functions are included in the output.
|
|
Therefore, a lot less ``noise'' is generated (see omitted lines in listing \ref{lst:strace}).
|
|
Again, the function arguments are displayed in a ``pretty'' way.
|
|
This command uses so-called prototype functions~\cite{ltrace.conf.5} to format function arguments.
|
|
|
|
\begin{listing}[htbp]
|
|
\begin{minted}{text}
|
|
malloc(10) = 0x55624164b2a0
|
|
printf("Hello World!\nString: %s\n", "Abc123") = 28
|
|
free(0x55624164b2a0) = <void>
|
|
+++ exited (status 0) +++
|
|
\end{minted}
|
|
\caption{Output of \texttt{ltrace ./main}.}
|
|
\label{lst:ltrace}
|
|
\end{listing}
|
|
|
|
This method fits the requirements for this work a lot better than \texttt{strace} (see~\ref{subsec:strace}),
|
|
but it is not very flexible and offers no means to modify the intercepted function calls.
|
|
|
|
\subsection{Kernel Module}\label{subsec:kernel-module}
|
|
|
|
Another possibility to intercept system calls is to intercept them directly in the kernel via a kernel module.
|
|
However, this work did not explore this approach further due to time constraints and other, better-fitting alternatives.
|
|
See \cite[Section~7.2]{netsectools2005} for more details on how to intercept system calls using kernel modules.
|
|
|
|
|
|
\subsection{Wrapper Functions in gcc}\label{subsec:wrapper-functions}
|
|
|
|
A different approach to intercepting function calls is to tell the compiler directly which functions should be intercepted.
|
|
The compiler, and the linker respectively, then directly link calls to the specified functions to wrapper functions.
|
|
(See \ref{subsec:preloading} for more details.)
|
|
|
|
The default linker \texttt{ld} includes such a feature.
|
|
See the OPTIONS section in the ld(1) Linux manual page~\cite{ld.1}:
|
|
|
|
\begin{quote}
|
|
\begin{description}
|
|
\item[\texttt{-{}-wrap=\textit{symbol}}]
|
|
Use a wrapper function for \texttt{\textit{symbol}}.
|
|
Any undefined reference to \texttt{\textit{symbol}} will be resolved to \texttt{\_\_wrap\_\textit{symbol}}.
|
|
Any undefined reference to \texttt{\_\_real\_\textit{symbol}} will be resolved to \texttt{\textit{symbol}}.
|
|
|
|
This can be used to provide a wrapper for a system function.
|
|
The wrapper function should be called \texttt{\_\_wrap\_\textit{symbol}}.
|
|
If it wishes to call the system function, it should call \texttt{\_\_real\_\textit{symbol}}.
|
|
\lbrack\dots\rbrack
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
The gcc compiler also supports this by allowing passing options to the linker.
|
|
See the OPTIONS section in the gcc(1) Linux manual page~\cite{gcc.1}:
|
|
|
|
\begin{quote}
|
|
\begin{description}
|
|
\item[\texttt{-Wl,\textit{option}}]
|
|
Pass \texttt{\textit{option}} as an option to the linker.
|
|
If \texttt{\textit{option}} contains commas, it is split into multiple options at the commas.
|
|
You can use this syntax to pass an argument to the option.
|
|
For example, \texttt{-Wl,-Map,output.map} passes \texttt{-Map output.map} to the linker.
|
|
When using the GNU linker, you can also get the same effect with \texttt{-Wl,-Map=output.map}.
|
|
\lbrack\dots\rbrack
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
This means, by specifying \texttt{-Wl,-{}-wrap=\textit{symbol}} when compiling using gcc,
|
|
all calls from the currently compiled program to \texttt{\textit{symbol}} are redirected to \texttt{\_\_wrap\_\textit{symbol}}.
|
|
To call the real function inside the wrapper, \texttt{\_\_real\_\textit{symbol}} may be used.
|
|
Listings \ref{lst:wrap.c} and \ref{lst:wrap} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[linenos]{c}{listings/wrap.c}
|
|
\caption{Contents of \texttt{wrap.c}.}
|
|
\label{lst:wrap.c}
|
|
\end{listing}
|
|
|
|
\begin{listing}[htbp]
|
|
\begin{minted}{shell}
|
|
gcc -o main_wrapped main.c wrap.c -Wl,--wrap=malloc
|
|
./main_wrapped
|
|
\end{minted}
|
|
\caption{Compile \texttt{main.c} and \texttt{wrap.c} and run the resulting program.}
|
|
\label{lst:wrap}
|
|
\end{listing}
|
|
|
|
This approach allows wrapping any function in a relatively clean way.
|
|
But it is not possible to override functions in any given binary program.
|
|
It is required to re-compile (or to re-link) a given program to use this feature of ld.
|
|
Therefore, the source code (or the corresponding \texttt{*.out} files) needs to be available.
|
|
Note, only calls from the targeted source code will be redirected, calls from other libraries won't.
|
|
|
|
Theoretically, it should be possible to re-link a given binary without having access to its source code.
|
|
But due to other more straight-forward methods (see \ref{subsec:preloading}), this has not been further investigated.
|
|
|
|
|
|
\subsection{Preloading using \texttt{LD\_PRELOAD}}\label{subsec:preloading}
|
|
|
|
To execute binary files on Linux systems, a dynamic linker is needed at runtime.
|
|
(Unless the binaries were statically linked at compile-time.)
|
|
Usually, \texttt{ld.so} and \texttt{ld-linux.so} are used as dynamic linkers.
|
|
They find and load the shared objects (shared libraries) needed by a program, prepare the program and finally run it.
|
|
\cite{ld.so.8}
|
|
|
|
As the overwhelming majority of programs are dynamically linked,
|
|
most function calls to other libraries (like to the C standard library) reference a shared object, which has to be loaded by the linker at runtime.
|
|
Therefore, it would be possible to ``hijack'' (or intercept) these function calls
|
|
when the linker would allow loading other functions instead of the proper ones.
|
|
|
|
Luckily, \texttt{ld.so} allows this so-called ``preloading''.
|
|
See the ENVIRONMENT section in the ld.so(8) Linux manual page~\cite{ld.so.8}:
|
|
|
|
\begin{quote}
|
|
\begin{description}
|
|
\item[\texttt{LD\_PRELOAD}]
|
|
A list of additional, user-specified, ELF shared objects to be loaded before all others.
|
|
This feature can be used to selectively override functions in other shared objects.
|
|
\lbrack\dots\rbrack
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
This means, by setting the environment variable \texttt{LD\_PRELOAD}, it is possible to override specific functions.
|
|
Listings \ref{lst:preload.c} and \ref{lst:preload} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[linenos]{c}{listings/preload.c}
|
|
\caption{Contents of \texttt{preload.c}.}
|
|
\label{lst:preload.c}
|
|
\end{listing}
|
|
|
|
\begin{listing}[htbp]
|
|
\begin{minted}{shell}
|
|
# ./main is already compiled and ready
|
|
gcc -shared -fPIC -o preload.so preload.c
|
|
LD_PRELOAD="$(pwd)/preload.so" ./main
|
|
\end{minted}
|
|
\caption{Compile \texttt{preload.c} and run a program with \texttt{LD\_PRELOAD}.}
|
|
\label{lst:preload}
|
|
\end{listing}
|
|
|
|
The function \texttt{dlsym} is used to retrieve the original address of the \texttt{malloc} function.
|
|
\texttt{RTLD\_NEXT} indicates to find the next occurrence of \texttt{malloc} in the search order after the current object.
|
|
\cite{dlsym.3}
|
|
|
|
By using this method, it is possible to override, and therefore wrap, any function as long as the targeted binary was not statically linked.
|
|
Although, one has to be aware that not only function calls inside the targeted binary, but also calls inside other libraries (e.g., to \texttt{malloc}) are redirected to the overriding function.
|
|
|
|
|
|
\subsection{Conclusion}\label{subsec:methods-for-intercepting-conclusion}
|
|
|
|
During the research on different approaches to intercepting system and function calls,
|
|
it has been found that the most reliable way to achieve the goals of this work (see \todo{goals}) is to intercept function calls instead of system calls.
|
|
This is because (as long as the programs to test are dynamically linked), intercepting function calls allows one to intercept many more calls and in a more flexible way.
|
|
Therefore, from now on this work only considers function calls and no system calls directly.
|
|
|
|
In this work preloading (see \ref{subsec:preloading}) was chosen to be used
|
|
because it is simple to use (``clean'' source code, easy to compile and run programs with it) and offers the means to arbitrarily execute code when the intercepted function call is redirected.
|
|
The following sections concern the next steps in what else is needed to create a powerful ``interceptor''.
|
|
|
|
|
|
\section{Fundamental Project Structure}\label{sec:fundameltal-project-structure}
|
|
|
|
After deciding to use the preloading method to intercept function calls, a more detailed plan is needed to continue developing.
|
|
It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable.
|
|
The easiest and most straightforward way to structure the source code was to put all code in one single C file.
|
|
Listing \ref{lst:intercept-preload.c} gives an overview over the grounding code structure.
|
|
For each function that should be intercepted, this function simply has to be declared and defined the same way \texttt{malloc} was.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[linenos]{c}{listings/intercept-preload.c}
|
|
\caption{Contents of \texttt{intercept-preload.c}.}
|
|
\label{lst:intercept-preload.c}
|
|
\end{listing}
|
|
|
|
|
|
\section{Retrieving Function Argument Values}\label{sec:retrieving-function-argument-values}
|
|
|
|
Now that the first steps have been done, one needs to think about what exactly to record when intercepting.
|
|
A simple notification that a given function was called would be too less.
|
|
Within the following subsections it is tried to get as much information as possible from each function call.
|
|
|
|
As already mentioned, \texttt{ltrace} uses prototype functions to format its function arguments.
|
|
This allows \texttt{ltrace} to ``dynamically'' display function arguments for any new or unknown functions without the need for recompilation.
|
|
\cite{ltrace.conf.5}
|
|
|
|
However, due to implementation complexity reasons and the need for ``complex''\todo{} return types (see~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work.
|
|
This means that each function formats its arguments and return values itself without any configuration option.
|
|
|
|
The reason for retrieving as much information as possible from each function call is that at a later point in time it is possible to completely reconstruct the exact function calls an their sequence.
|
|
This allows analysis on these records to be performed independently of the corresponding execution of the program.
|
|
It should always be possible for any parser to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
|
|
|
|
|
|
\subsection{Numbers}\label{subsec:retrieving-numbers}
|
|
|
|
The most simple types of argument are plain numbers, like integers (\texttt{int}, \texttt{long}, \ldots) or floating point numbers (\texttt{float}, \texttt{double}).
|
|
(In fact, \textit{all} arguments are represented as numbers or integers.
|
|
See the following subsections for examples.)
|
|
Plain numbers may be formatted simply as what they are, in base 10 notation, or with a prefix like \texttt{0x} for hexadecimal or \texttt{0} for octal representation.
|
|
|
|
Example: \texttt{malloc(123)} (or \texttt{malloc(0x7B)}).
|
|
|
|
\subsection{Unspecific Pointers}\label{subsec:retrieving-unspecific-pointers}
|
|
|
|
Pointers with no further information known about (like \texttt{void *}) are essentially integers.
|
|
Therefore, they may be treated as such.
|
|
|
|
Example: \texttt{free(0x55624164b2a0)}.
|
|
|
|
\subsection{Strings and Buffers}\label{subsec:retrieving-strings-buffers}
|
|
|
|
Strings in C are simple pointers to a place in memory which is null-terminated.
|
|
This means that the strings end with the first occurrence of the null-byte (\texttt{0x00}).
|
|
To distinguish unspecific pointers from pointers to strings, it was chosen to use a colon (\texttt{:}) after the pointer numerical value.
|
|
The colon is followed by the contents of the string with beginning and ending quoted (\texttt{"}).
|
|
Special values inside the string are escaped with a backslash.
|
|
|
|
Example: \texttt{sem\_unlink(0x1234:"/test-semaphore")}.
|
|
|
|
Another type of ``string'' in C is a buffer with a known length.
|
|
When buffers are used, usually another argument is passed to the function which indicates the length of the buffer.
|
|
This fact may be used to print out the contents of the buffer in the same way as normal C strings.
|
|
|
|
Example: \texttt{write(3, 0x1234:"Test\textbackslash{}x00ABC", 8)}.
|
|
|
|
\subsection{Flags}\label{subsec:retrieving-flags}
|
|
|
|
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR.
|
|
These arguments are also of type integer.
|
|
To distinguish flag arguments from others, a pipe symbol (\texttt{|}) is used after the colon and between the flags.
|
|
|
|
Example: \texttt{open(0x1234:"test.txt", 0102:|O\_CREAT|O\_RDWR|, 0644)}.
|
|
|
|
\subsection{Constants}\label{subsec:retrieving-constants}
|
|
|
|
For some functions constants are used.
|
|
These constants are typically used C macros in the source code.
|
|
This makes the source code more readable (and portable).
|
|
Constants are represented as an integer again followed by a colon, this time without any special characters to disdinguish them from other types.
|
|
|
|
Example: \texttt{socket(2:AF\_INET, 1:SOCK\_STREAM, 6)}.
|
|
|
|
\subsection{Pointers to Arrays}\label{subsec:retrieving-pointers-to-arrays}
|
|
|
|
Sometimes arrays are used as arguments.
|
|
Arrays in C work similar to strings, they are either null-terminated (by an element being of value 0), or their length is explicitly given.
|
|
So to represent them, two brackets are used (\texttt{[]}) and a comma (\texttt{,}) to separate the respective elements.
|
|
Each element may be represented as an ``argument'' on its own (as illustrated by the example).
|
|
|
|
Example: \\
|
|
\texttt{getopt(2, 0x7f0b8:[0x7feb3:"./main", 0x7fee6:"arg"], 0x123:"v")}.
|
|
|
|
\subsection{Pointers to Structures}\label{subsec:retrieving-pointers-to-structures}
|
|
|
|
In rare cases structures (\texttt{struct}) are used as argument types.
|
|
Two curly brackets (\texttt{\{\}}) are used to indicate structures.
|
|
Then the field names are displayed plainly, followed by a colon and then the value of that field.
|
|
Commas are used to separate the fields respectively.
|
|
|
|
Example: \texttt{\tiny connect(2, 0x123:\{sa\_family: 2:AF\_INET, sin\_addr: "1.1.1.1", sin\_port: 80\}, 16)}.
|
|
|
|
|
|
\section{Retrieving Function Return Values}\label{sec:retrieving-function-return-values}
|
|
|
|
It might seem that retrieving return values of functions is as straightforward as retrieving their arguments, but this is not entirely the case.
|
|
Most libc functions return -1 on error and set \texttt{errno} to indicate the exact type of error.
|
|
Other functions (like \texttt{read}, \texttt{pipe}, or \texttt{sem\_getvalue}) even store their output in a pointer which was given to them as an argument.
|
|
The following examples illustrate how this challenge was solved.
|
|
|
|
Example (\texttt{malloc}): \\
|
|
\texttt{return 0x1234; errno 0}, \\
|
|
\texttt{return -1; errno ENOMEM}.
|
|
|
|
Example (\texttt{pipe}): \\
|
|
\texttt{return 0; errno 0; fildes=[3,4]}, \\
|
|
\texttt{return -1; errno ENFILE}.
|
|
|
|
Example (\texttt{read}): \\
|
|
\texttt{return 12; errno 0; buf=0x7fff70:"Hello World!"}, \\
|
|
\texttt{return -1; errno EINTR}.
|
|
|
|
|
|
\section{Determining Function Call Location}\label{sec:determining-function-call-location}
|
|
|
|
\todo{}
|
|
Besides from argument values and return values, it would be interesting to know from where inside the intercepted program the function call came from.
|
|
At first this seems quite impossible.
|
|
But\dots
|
|
|
|
\subsection{Return Address and Relative Position}\label{subsec:return-address-and-relative-position}
|
|
|
|
\todo{}
|
|
See in the manual of GCC~\cite[Section~7.6]{gcc}:
|
|
|
|
\begin{quote}
|
|
\begin{description}
|
|
\item[\texttt{void *\_\_builtin\_return\_address(unsigned int \textit{level})}] \ \
|
|
|
|
This function returns the return address of the current function, or of one of its callers.
|
|
The \textit{level} argument is number of frames to scan up the call stack.
|
|
A value of \texttt{0} yields the return address of the current function, a value of \texttt{1} yields the return address of the caller of the current function, and so forth.
|
|
\lbrack\dots\rbrack
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
\todo{}
|
|
See the dladdr(3) Linux manual page~\cite{dladdr.3}:
|
|
|
|
\begin{quote}
|
|
|
|
\begin{description}
|
|
\item[\texttt{int dladdr(const void *addr, Dl\_info *info)}] \ \
|
|
|
|
The function \texttt{dladdr()} determines whether the address specified in \textit{addr} is located in one of the shared objects loaded by the calling application.
|
|
If it is, then \texttt{dladdr()} returns information about the shared object and symbol that overlaps \textit{addr}.
|
|
This information is returned in a \texttt{Dl\_info} structure:
|
|
|
|
\begin{minted}{C}
|
|
typedef struct {
|
|
const char *dli_fname; /* Pathname of shared object
|
|
that contains address */
|
|
void *dli_fbase; /* Base address at which
|
|
shared object is loaded */
|
|
const char *dli_sname; /* Name of symbol whose
|
|
definition overlaps addr */
|
|
void *dli_saddr; /* Exact address of symbol
|
|
named in dli_sname */
|
|
} Dl_info;
|
|
\end{minted}
|
|
|
|
\lbrack\dots\rbrack
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
|
|
\subsection{Source File and Line Number}\label{subsec:source-file-and-line-number}
|
|
|
|
\todo{}
|
|
See the OPTIONS section in the readelf(1) Linux manual page~\cite{readelf.1}:
|
|
|
|
\begin{quote}
|
|
\begin{description}
|
|
\item[\texttt{-{}-debug-dump}]
|
|
Displays the contents of the DWARF debug sections in the file, if any are present.
|
|
[\dots]
|
|
The letters and words refer to the following information:
|
|
\begin{description}
|
|
\item {}[\dots]
|
|
\item[\texttt{=rawline}] Displays the contents of the \texttt{.debug\_line }section in a raw format.
|
|
\item[\texttt{=decodedline}] Displays the interpreted contents of the \texttt{.debug\_line} section.
|
|
\item {}[\dots]
|
|
\end{description}
|
|
\end{description}
|
|
\end{quote}
|
|
|
|
|
|
\section{\texttt{intercept.so} Library}\label{sec:intercept.so-library}
|
|
|
|
The time has come for putting it all together.
|
|
As mentioned in \ref{sec:fundameltal-project-structure}, almost the whole project exists in one source file, \texttt{intercept.c}.
|
|
This file is compiled to \texttt{intercept.so}, which may be preloaded using \texttt{LD\_PRELOAD} and controlled with other environment variables.
|
|
These other environment variables are described in the following:
|
|
|
|
\begin{description}
|
|
\item[\texttt{INTERCEPT}]
|
|
This variable has to be set to enable function call interception.
|
|
The value decides where to output/print/write/send the recorded function calls.
|
|
Values may be \texttt{stdout}, \texttt{stderr}, \texttt{file:\textit{<path>}}, \texttt{unix:\textit{<path>}}.
|
|
\item[\texttt{INTERCEPT\_VERBOSE}]
|
|
This variable indicates whether string and structure types should be printed fully or empty.
|
|
Possible values are \texttt{0} and \texttt{1} (default).
|
|
\item[\texttt{INTERCEPT\_FUNCTIONS}]
|
|
This variable is used to specify which function calls should be intercepted.
|
|
It is a list separated by commas, colons, or semicolons.
|
|
Wildcards (\texttt{*}) at the end of function names are possible.
|
|
A prefix of \texttt{-} indicates that the following function should not be intercepted.
|
|
Example: \texttt{*,-sem\_} intercepts all functions except those which start with \texttt{sem\_}.
|
|
By default, all (implemented) functions are intercepted.
|
|
\item[\texttt{INTERCEPT\_LIBRARIES}]
|
|
This variable is used to specify which libraries' function calls should be intercepted.
|
|
It is a list separated by commas, colons, or semicolons.
|
|
Wildcards (\texttt{*}) at the end of library paths are possible.
|
|
A prefix of \texttt{-} indicates that the following library path should not be intercepted.
|
|
Example: \texttt{*,-/lib*,-/usr/lib*} intercepts only function calls originating from binaries outside \texttt{/lib*} or \texttt{/usr/lib*} which in most cases is the executed program itself.
|
|
By default, function calls from everywhere are intercepted.
|
|
\end{description}
|
|
|
|
The shared object currently supports intercepting the following functions:
|
|
\texttt{malloc}, \texttt{calloc}, \texttt{realloc}, \texttt{reallocarray}, \texttt{free}, \texttt{getopt}, \texttt{exit},
|
|
\texttt{read}, \texttt{pread}, \texttt{write}, \texttt{pwrite}, \texttt{close}, \texttt{sigaction}, \texttt{sem\_init},
|
|
\texttt{sem\_open}, \texttt{sem\_post}, \texttt{sem\_wait}, \texttt{sem\_trywait}, \texttt{sem\_timedwait}, \texttt{sem\_getvalue},
|
|
\texttt{sem\_close}, \texttt{sem\_unlink}, \texttt{sem\_destroy}, \texttt{shm\_open}, \texttt{shm\_unlink}, \texttt{mmap},
|
|
\texttt{munmap}, \texttt{ftruncate}, \texttt{fork}, \texttt{wait}, \texttt{waitpid}, \texttt{execl}, \texttt{execlp},
|
|
\texttt{execle}, \texttt{execv}, \texttt{execvp}, \texttt{execvpe}, \texttt{execve}, \texttt{fexecve}, \texttt{pipe},
|
|
\texttt{dup}, \texttt{dup2}, \texttt{dup3}, \texttt{socket}, \texttt{bind}, \texttt{listen}, \texttt{accept}, \texttt{connect},
|
|
\texttt{getaddrinfo}, \texttt{freeaddrinfo}, \texttt{send}, \texttt{sendto}, \texttt{sendmsg}, \texttt{recv}, \texttt{recvfrom},
|
|
\texttt{recvmsg}, \texttt{getline}, \texttt{getdelim}.
|
|
|
|
\section{\texttt{intercept} Command}\label{sec:intercept-command}
|
|
|
|
To make the usage of the aforementioned shared object more easy, a simple python script has been put together.
|
|
This script may be used as a command line tool.
|
|
See listing \ref{lst:intercept}.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[linenos]{python}{../proj/intercept/intercept}
|
|
\caption{Contents of \texttt{intercept}.}
|
|
\label{lst:intercept}
|
|
\end{listing}
|
|
|
|
The synopsis of the command is as follows:
|
|
\begin{minted}{text}
|
|
intercept [-h] [-F FUNCTIONS] [-s] [-o | -L LIBRARIES] \
|
|
[-l LOG | -i INTERCEPT] [--] COMMAND [ARGS...]
|
|
\end{minted}
|
|
|
|
\begin{description}
|
|
\item[\texttt{-F}, \texttt{-{}-functions}]
|
|
A list of functions to intercept.
|
|
See \ref{sec:intercept.so-library} for more details.
|
|
Default value is \texttt{*}.
|
|
\item[\texttt{-s}, \texttt{-{}-sparse}]
|
|
Indicates that strings and structures should be printed empty to save bandwidth.
|
|
\item[\texttt{-o}, \texttt{-{}-only-own}]
|
|
A shorthand for \texttt{-L *,-/lib*,-/usr/lib*}.
|
|
This has the effect, that only function calls from the executed binary itself are recorded.
|
|
\item[\texttt{-L}, \texttt{-{}-libraries}]
|
|
A list of library paths to intercept function calls from.
|
|
See \ref{sec:intercept.so-library} for more details.
|
|
Default value is \texttt{*} (except when \texttt{-o} is present).
|
|
\item[\texttt{-l}, \texttt{-{}-log}]
|
|
Used to specify in which file the recorded function calls should be logged.
|
|
Shorthand for \texttt{-i file:\textit{<arg>}}.
|
|
\item[\texttt{-i}, \texttt{-{}-intercept}]
|
|
Decides where to output/print/write/send the recorded function calls.
|
|
Values may be \texttt{stdout}, \texttt{stderr}, \texttt{file:\textit{<path>}}, \texttt{unix:\textit{<path>}}.
|
|
See \ref{sec:intercept.so-library} for more details.
|
|
\end{description}
|
|
|
|
|
|
\section{Example}\label{sec:intercepting-example}
|
|
|
|
To make it easier for the reader listing \ref{lst:intercept-client} provides some recorded function calls.
|
|
Most lines had to be broken up into multiple lines for better readability.
|
|
The recorded calls stem from a program written by myself as a solution for an assignment in the Operating Systems course at university.
|
|
It is a simple HTTP client.
|
|
The program was invoked using \texttt{./intercept -o -{}- ./client http://www.complang.tuwien.ac.at/}.
|
|
|
|
The first number on each line indicates unix time with nanosecond precision.
|
|
The second and third numbers correspond to the process ID and thread ID respectively.
|
|
Each line contains either a recorded call to a function or a recorded return of a function.
|
|
After the arguments of each function call a colon (\texttt{:}) indicates the beginning of meta-information.
|
|
This information always includes the return address to where the function jumps when completed.
|
|
If available, the interpretation of the return address is also provided.
|
|
This includes the offset relative to the calling binary and a source file and line number combination if the binary was compiled using \texttt{gcc -g} or \texttt{gcc -gdwarf}.
|
|
|
|
\begin{listing}[htbp]
|
|
\inputminted[fontsize=\tiny]{text}{listings/intercept-client.txt}
|
|
\caption{Recoreded function calls from \texttt{./client}.}
|
|
\label{lst:intercept-client}
|
|
\end{listing}
|
|
|
|
|
|
\section{Analyzing Intercepted Function Calls}\label{sec:analyzing-intercepted-function-calls}
|
|
|
|
Lorem Ipsum.
|
|
|
|
|
|
\section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls}
|
|
|
|
Lorem Ipsum.
|