thesis: Incorporate feedback
This commit is contained in:
@@ -1,18 +1,17 @@
|
||||
|
||||
\chapter{Intercepting Function Calls}\label{ch:intercepting-function-calls}
|
||||
|
||||
In this chapter all steps on how to intercept function calls in this work are discussed.
|
||||
In this chapter, all steps on how to intercept function calls in this work are discussed.
|
||||
An example of what the resulting interception looks like may be found in Section~\ref{sec:intercepting-example}.
|
||||
Furthermore, an overview on how to test given programs is presented in Section~\ref{sec:automated-testing-on-intercepted-function-calls}.
|
||||
This chapter does not discuss how these function calls may be manipulated in any way.
|
||||
For that see Chapter~\ref{ch:manipulating-function-calls}.
|
||||
How these function calls may be manipulated is discussed in Chapter~\ref{ch:manipulating-function-calls}.
|
||||
|
||||
|
||||
\section{Identified Methods for Intercepting Function and System Calls}\label{sec:methods-for-intercepting}
|
||||
|
||||
First, one has to answer the question on \textit{how exactly} to intercept function or system calls.
|
||||
At the beginning of this work it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see Section~\ref{sec:motivation-and-goal}).
|
||||
This first section tries to list all possible methods on how to intercept function or system calls but does not claim completeness.
|
||||
At the beginning of this work, it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see Section~\ref{sec:motivation-and-goal}).
|
||||
This first section tries to list all possible and relevant methods on how to intercept function or system calls but does not claim exhaustiveness.
|
||||
The order of the following subsections is roughly based on the thought process on finding the most appropriate method suitable for this work.
|
||||
|
||||
|
||||
@@ -135,7 +134,7 @@ See the gcc(1) Linux manual page~\cite[Section OPTIONS]{gcc.1}:
|
||||
This means, by specifying \texttt{-Wl,-{}-wrap=\textit{symbol}} when compiling using gcc,
|
||||
all calls from the currently compiled program to \texttt{\textit{symbol}} are redirected to \texttt{\_\_wrap\_\textit{symbol}}.
|
||||
To call the real function inside the wrapper, \texttt{\_\_real\_\textit{symbol}} may be used.
|
||||
Listings~\ref{lst:wrap.c} and~\ref{lst:wrap} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
||||
Listings~\ref{lst:wrap.c} and~\ref{lst:wrap} illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
||||
|
||||
\begin{listing}[htbp]
|
||||
\inputminted[linenos]{c}{src/listings/wrap.c}
|
||||
@@ -159,7 +158,7 @@ Therefore, the source code (or the corresponding \texttt{*.out} files) needs to
|
||||
Note, only calls from the targeted source code will be redirected, calls from other libraries won't.
|
||||
|
||||
Theoretically, it should be possible to re-link a given binary without having access to its source code.
|
||||
But due to other more straight-forward methods (see Subsection~\ref{subsec:preloading}), this has not been further investigated.
|
||||
But due to other more straight-forward methods (see Subsection~\ref{subsec:preloading}), this has not been investigated further.
|
||||
|
||||
|
||||
\subsection{Preloading using \texttt{LD\_PRELOAD}}\label{subsec:preloading}
|
||||
@@ -188,7 +187,7 @@ See the ld.so(8) Linux manual page~\cite[Section ENVIRONMENT]{ld.so.8}:
|
||||
\end{quote}
|
||||
|
||||
This means, by setting the environment variable \texttt{LD\_PRELOAD}, it is possible to override specific functions.
|
||||
Listings~\ref{lst:preload.c} and~\ref{lst:preload} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
||||
Listings~\ref{lst:preload.c} and~\ref{lst:preload} illustrate this by overriding the \texttt{malloc} function of the C standard library.
|
||||
|
||||
\begin{listing}[htbp]
|
||||
\inputminted[linenos]{c}{src/listings/preload.c}
|
||||
@@ -218,10 +217,10 @@ Although, one has to be aware that not only function calls inside the targeted b
|
||||
|
||||
During the research on different approaches to intercepting system and function calls,
|
||||
it has been found that the most reliable way to achieve the goals of this work (see Section~\ref{sec:motivation-and-goal}) is to intercept function calls instead of system calls.
|
||||
This is because (as long as the programs to test are dynamically linked), intercepting function calls allows one to intercept many more calls and in a more flexible way.
|
||||
This is because---as long as the programs to test are dynamically linked---, intercepting function calls allows one to intercept many more calls and in a more flexible way.
|
||||
Therefore, from now on this work only considers function calls and no system calls directly.
|
||||
|
||||
In this work preloading (see Subsection~\ref{subsec:preloading}) was chosen to be used
|
||||
In this work, preloading (see Subsection~\ref{subsec:preloading}) was chosen to be used
|
||||
because it is simple to use (``clean'' source code, easy to compile and run programs with it) and offers the means to arbitrarily execute code when the intercepted function call is redirected.
|
||||
The following sections concern the next steps in what else is needed to create a powerful ``interceptor''.
|
||||
|
||||
@@ -231,7 +230,7 @@ The following sections concern the next steps in what else is needed to create a
|
||||
After deciding to use the preloading method to intercept function calls, a more detailed plan is needed to continue developing.
|
||||
It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable.
|
||||
The easiest and most straightforward way to structure the source code was to put all code in one single C file.
|
||||
Listing~\ref{lst:intercept-preload.c} gives an overview over the grounding code structure.
|
||||
Listing~\ref{lst:intercept-preload.c} gives an overview of the underlying code structure.
|
||||
For each function that should be intercepted, this function simply has to be declared and defined the same way \texttt{malloc} was.
|
||||
|
||||
\begin{listing}[htbp]
|
||||
@@ -244,8 +243,8 @@ For each function that should be intercepted, this function simply has to be dec
|
||||
\section{Retrieving Function Argument Values}\label{sec:retrieving-function-argument-values}
|
||||
|
||||
Now that the first steps have been done, one needs to think about what exactly to record when intercepting.
|
||||
A simple notification that a given function was called would be too less.
|
||||
Within the following subsections it is tried to get as much information as possible from each function call.
|
||||
A simple notification that a given function was called would not be sufficient.
|
||||
Within the following subsections, effort is put into getting as much information as possible from each function call.
|
||||
|
||||
As already mentioned, \texttt{ltrace} uses prototype functions to format its function arguments.
|
||||
This allows \texttt{ltrace} to ``dynamically'' display function arguments for any new or unknown functions without the need for recompilation.
|
||||
@@ -254,9 +253,9 @@ This allows \texttt{ltrace} to ``dynamically'' display function arguments for an
|
||||
However, due to implementation complexity reasons and the need for ``complex'' return types for string/buffer and structure values (see Section~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work.
|
||||
This means that each function formats its arguments and return values itself without any configuration option.
|
||||
|
||||
The reason for retrieving as much information as possible from each function call is that at a later point in time it is possible to completely reconstruct the exact function calls and their sequence.
|
||||
The reason for retrieving as much information as possible from each function call is that at a later point in time, it is possible to completely reconstruct the exact function calls and their sequence.
|
||||
This allows analysis on these records to be performed independently of the corresponding execution of the program.
|
||||
It should always be possible for any parser to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
|
||||
It should always be possible to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
|
||||
|
||||
|
||||
\subsection{Numbers}\label{subsec:retrieving-numbers}
|
||||
@@ -293,7 +292,7 @@ Example: \texttt{write(3, 0x1234:"Test\textbackslash{}x00ABC", 8)}.
|
||||
|
||||
\subsection{Flags}\label{subsec:retrieving-flags}
|
||||
|
||||
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR.
|
||||
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR\@.
|
||||
These arguments are also of type integer.
|
||||
To distinguish flag arguments from others, a pipe symbol (\texttt{|}) is used after the colon and between the flags.
|
||||
|
||||
@@ -304,7 +303,7 @@ Example: \texttt{open(0x1234:"test.txt", 0102:|O\_CREAT|O\_RDWR|, 0644)}.
|
||||
For some functions constants are used.
|
||||
These constants are typically used C macros in the source code.
|
||||
This makes the source code more readable (and portable).
|
||||
Constants are represented as an integer again followed by a colon, this time without any special characters to disdinguish them from other types.
|
||||
Constants are represented as an integer again followed by a colon, this time without any special characters to distinguish them from other types.
|
||||
|
||||
Example: \texttt{socket(2:AF\_INET, 1:SOCK\_STREAM, 6)}.
|
||||
|
||||
@@ -320,7 +319,7 @@ Example: \\
|
||||
|
||||
\subsection{Pointers to Structures}\label{subsec:retrieving-pointers-to-structures}
|
||||
|
||||
In rare cases structures (\texttt{struct}) are used as argument types.
|
||||
In rare cases, structures (\texttt{struct}) are used as argument types.
|
||||
Two curly brackets (\texttt{\{\}}) are used to indicate structures.
|
||||
Then the field names are displayed plainly, followed by a colon and then the value of that field.
|
||||
Commas are used to separate the fields respectively.
|
||||
@@ -347,13 +346,15 @@ Example (\texttt{read}): \\
|
||||
\texttt{return 12; errno 0; buf=0x7fff70:"Hello World!"}, \\
|
||||
\texttt{return -1; errno EINTR}.
|
||||
|
||||
\todo{Explain Examples}
|
||||
|
||||
|
||||
\section{Determining Function Call Location}\label{sec:determining-function-call-location}
|
||||
|
||||
Besides from argument values and return values, it would be interesting to know from where inside the intercepted program the function call came from.
|
||||
Besides argument values and return values, it would be interesting to know from where inside the intercepted program the function call came.
|
||||
At first this seems quite impossible.
|
||||
But a function always knows at least the return address, the address to set then instruction pointer to when the function finishes.
|
||||
With this information it may be estimated where the call to the current function came from.
|
||||
But a function always knows at least the return address, the address to set the instruction pointer to when the function finishes.
|
||||
With this information, it may be estimated where the call to the current function came from.
|
||||
|
||||
\subsection{Return Address and Relative Position}\label{subsec:return-address-and-relative-position}
|
||||
|
||||
@@ -486,7 +487,7 @@ The shared object currently supports intercepting the following functions:
|
||||
|
||||
\section{\texttt{intercept} Command}\label{sec:intercept-command}
|
||||
|
||||
To make the usage of the aforementioned shared object more easy, a simple python script has been put together.
|
||||
To make the usage of the aforementioned shared object easier, a simple python script has been put together.
|
||||
This script may be used as a command line tool.
|
||||
See Listing~\ref{lst:intercept}.
|
||||
|
||||
@@ -551,14 +552,14 @@ This includes the offset relative to the calling binary and a source file and li
|
||||
|
||||
\section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls}
|
||||
|
||||
The recorded function calls of a program run now may be used to perform checks and tests on them.
|
||||
The recorded function calls of a program run may now be used to perform checks and tests on them.
|
||||
It is trivially possible to check which functions were called and in what order.
|
||||
Furthermore, it is possible to check various pre- and post-conditions for each function call.
|
||||
This is beneficial because many library functions in C rely on these pre- and post-conditions, which are not enforced by the compiler or in any other way.
|
||||
|
||||
For example, the \texttt{malloc} function has the post-condition that the returned value later needs to be passed to \texttt{free} to avoid memory leaks.
|
||||
The \texttt{free} function, on the other hand, has the pre-condition that the passed value was previously acquired using \texttt{malloc} and may not be yet free'd.
|
||||
Any violation of such pre- and post-conditions may be reported as incompliant behavior.
|
||||
Any violation of such pre- and post-conditions may be reported as non-compliant behavior.
|
||||
\cite{malloc.3}
|
||||
|
||||
This means that intercepted function calls allow a tester to check if programmers use library function in compliance to their specification.
|
||||
|
||||
Reference in New Issue
Block a user