diff --git a/thesis/src/02.intercept.tex b/thesis/src/02.intercept.tex index 2add520..c463c18 100644 --- a/thesis/src/02.intercept.tex +++ b/thesis/src/02.intercept.tex @@ -232,7 +232,7 @@ After deciding to use the preloading method to intercept function calls, a more It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable. The easiest and most straightforward way to structure the source code was to put all code in one single C file. Listing \ref{lst:intecept-preload.c} gives an overview over the grounding code structure. -For each function that should be intercepted, this function simply has to be declared and defined as \texttt{malloc} was. +For each function that should be intercepted, this function simply has to be declared and defined the same way \texttt{malloc} was. \begin{listing}[htbp] \inputminted[linenos]{c}{listings/intercept-preload.c} @@ -241,26 +241,118 @@ For each function that should be intercepted, this function simply has to be dec \end{listing} -\section{Retrieving Function Argument Values}\label{sec:Retrieving-function-argument-values} +\section{Retrieving Function Argument Values}\label{sec:retrieving-function-argument-values} + +Now that the first steps have been done, one needs to think about what exactly to record when intercepting. +A simple notification that a given function was called would be too less. +Within the following subsections it is tried to get as much information as possible from each function call. + +As already mentioned, \texttt{ltrace} uses prototype functions to format its function arguments. +This allows \texttt{ltrace} to ``dynamically'' display function arguments for any new or unknown functions without the need for recompilation. +\cite{ltrace.conf.5} + +However, due to implementation complexity reasons and the need for ``complex''\todo{} return types (see~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work. +This means that each function formats its arguments and return values itself without any configuration option. + +The reason for retrieving as much information as possible from each function call is that at a later point in time it is possible to completely reconstruct the exact function calls an their sequence. +This allows analysis on these records to be performed independently of the corresponding execution of the program. +It should always be possible for any parser to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type. + + +\subsection{Numbers}\label{subsec:retrieving-numbers} + +The most simple types of argument are plain numbers, like integers (\texttt{int}, \texttt{long}, \ldots) or floating point numbers (\texttt{float}, \texttt{double}). +(In fact, \textit{all} arguments are represented as numbers or integers. +See the following subsections for examples.) +Plain numbers may be formatted simply as what they are, in base 10 notation, or with a prefix like \texttt{0x} for hexadecimal or \texttt{0} for octal representation. + +Example: \texttt{malloc(123)} (or \texttt{malloc(0x7B)}). + +\subsection{Unspecific Pointers}\label{subsec:retrieving-unspecific-pointers} + +Pointers with no further information known about (like \texttt{void *}) are essentially integers. +Therefore, they may be treated as such. + +Example: \texttt{free(0x55624164b2a0)}. + +\subsection{Strings and Buffers}\label{subsec:retrieving-strings-buffers} + +Strings in C are simple pointers to a place in memory which is null-terminated. +This means that the strings end with the first occurrence of the null-byte (\texttt{0x00}). +To distinguish unspecific pointers from pointers to strings, it was chosen to use a colon (\texttt{:}) after the pointer numerical value. +The colon is followed by the contents of the string with beginning and ending quoted (\texttt{"}). +Special values inside the string are escaped with a backslash. + +Example: \texttt{sem\_unlink(0x1234:"/test-semaphore")}. + +Another type of ``string'' in C is a buffer with a known length. +When buffers are used, usually another argument is passed to the function which indicates the length of the buffer. +This fact may be used to print out the contents of the buffer in the same way as normal C strings. + +Example: \texttt{write(3, 0x1234:"Test\textbackslash{}x00ABC", 8)}. + +\subsection{Flags}\label{subsec:retrieving-flags} + +Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR. +These arguments are also of type integer. +To distinguish flag arguments from others, a pipe symbol (\texttt{|}) is used after the colon and between the flags. + +Example: \texttt{open(0x1234:"test.txt", 0102:|O\_CREAT|O\_RDWR|, 0644)}. + +\subsection{Constants}\label{subsec:retrieving-constants} + +For some functions constants are used. +These constants are typically used C macros in the source code. +This makes the source code more readable (and portable). +Constants are represented as an integer again followed by a colon, this time without any special characters to disdinguish them from other types. + +Example: \texttt{socket(2:AF\_INET, 1:SOCK\_STREAM, 6)}. + +\subsection{Pointers to Arrays}\label{subsec:retrieving-pointers-to-arrays} + +Sometimes arrays are used as arguments. +Arrays in C work similar to strings, they are either null-terminated (by an element being of value 0), or their length is explicitly given. +So to represent them, two brackets are used (\texttt{[]}) and a comma (\texttt{,}) to separate the respective elements. +Each element may be represented as an ``argument'' on its own (as illustrated by the example). + +Example: \\ +\texttt{getopt(2, 0x7f0b8:[0x7feb3:"./main", 0x7fee6:"arg"], 0x123:"v")}. + +\subsection{Pointers to Structures}\label{subsec:retrieving-pointers-to-structures} + +In rare cases structures (\texttt{struct}) are used as argument types. +Two curly brackets (\texttt{\{\}}) are used to indicate structures. +Then the field names are displayed plainly, followed by a colon and then the value of that field. +Commas are used to separate the fields respectively. + +Example: \texttt{\tiny connect(2, 0x123:\{sa\_family: 2:AF\_INET, sin\_addr: "1.1.1.1", sin\_port: 80\}, 16)}. + + +\section{Retrieving Function Return Values}\label{sec:retrieving-function-return-values} Lorem Ipsum. + \section{Determining Function Call Location}\label{sec:determining-function-call-location} Lorem Ipsum. + \section{Example}\label{sec:intercepting-example} Lorem Ipsum. + \section{Analyzing Intercepted Function Calls}\label{sec:analyzing-intercepted-function-calls} Lorem Ipsum. + \section{Parsing Intercepted Function Calls in Python}\label{sec:parsing-intercepted-function-calls} Lorem Ipsum. + \section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls} Lorem Ipsum.