1
0

thesis: Complete 2.3

This commit is contained in:
2025-07-16 20:04:08 +02:00
parent 846a0c49c4
commit 10a2f9a897

View File

@@ -232,7 +232,7 @@ After deciding to use the preloading method to intercept function calls, a more
It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable. It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable.
The easiest and most straightforward way to structure the source code was to put all code in one single C file. The easiest and most straightforward way to structure the source code was to put all code in one single C file.
Listing \ref{lst:intecept-preload.c} gives an overview over the grounding code structure. Listing \ref{lst:intecept-preload.c} gives an overview over the grounding code structure.
For each function that should be intercepted, this function simply has to be declared and defined as \texttt{malloc} was. For each function that should be intercepted, this function simply has to be declared and defined the same way \texttt{malloc} was.
\begin{listing}[htbp] \begin{listing}[htbp]
\inputminted[linenos]{c}{listings/intercept-preload.c} \inputminted[linenos]{c}{listings/intercept-preload.c}
@@ -241,26 +241,118 @@ For each function that should be intercepted, this function simply has to be dec
\end{listing} \end{listing}
\section{Retrieving Function Argument Values}\label{sec:Retrieving-function-argument-values} \section{Retrieving Function Argument Values}\label{sec:retrieving-function-argument-values}
Now that the first steps have been done, one needs to think about what exactly to record when intercepting.
A simple notification that a given function was called would be too less.
Within the following subsections it is tried to get as much information as possible from each function call.
As already mentioned, \texttt{ltrace} uses prototype functions to format its function arguments.
This allows \texttt{ltrace} to ``dynamically'' display function arguments for any new or unknown functions without the need for recompilation.
\cite{ltrace.conf.5}
However, due to implementation complexity reasons and the need for ``complex''\todo{} return types (see~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work.
This means that each function formats its arguments and return values itself without any configuration option.
The reason for retrieving as much information as possible from each function call is that at a later point in time it is possible to completely reconstruct the exact function calls an their sequence.
This allows analysis on these records to be performed independently of the corresponding execution of the program.
It should always be possible for any parser to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
\subsection{Numbers}\label{subsec:retrieving-numbers}
The most simple types of argument are plain numbers, like integers (\texttt{int}, \texttt{long}, \ldots) or floating point numbers (\texttt{float}, \texttt{double}).
(In fact, \textit{all} arguments are represented as numbers or integers.
See the following subsections for examples.)
Plain numbers may be formatted simply as what they are, in base 10 notation, or with a prefix like \texttt{0x} for hexadecimal or \texttt{0} for octal representation.
Example: \texttt{malloc(123)} (or \texttt{malloc(0x7B)}).
\subsection{Unspecific Pointers}\label{subsec:retrieving-unspecific-pointers}
Pointers with no further information known about (like \texttt{void *}) are essentially integers.
Therefore, they may be treated as such.
Example: \texttt{free(0x55624164b2a0)}.
\subsection{Strings and Buffers}\label{subsec:retrieving-strings-buffers}
Strings in C are simple pointers to a place in memory which is null-terminated.
This means that the strings end with the first occurrence of the null-byte (\texttt{0x00}).
To distinguish unspecific pointers from pointers to strings, it was chosen to use a colon (\texttt{:}) after the pointer numerical value.
The colon is followed by the contents of the string with beginning and ending quoted (\texttt{"}).
Special values inside the string are escaped with a backslash.
Example: \texttt{sem\_unlink(0x1234:"/test-semaphore")}.
Another type of ``string'' in C is a buffer with a known length.
When buffers are used, usually another argument is passed to the function which indicates the length of the buffer.
This fact may be used to print out the contents of the buffer in the same way as normal C strings.
Example: \texttt{write(3, 0x1234:"Test\textbackslash{}x00ABC", 8)}.
\subsection{Flags}\label{subsec:retrieving-flags}
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR.
These arguments are also of type integer.
To distinguish flag arguments from others, a pipe symbol (\texttt{|}) is used after the colon and between the flags.
Example: \texttt{open(0x1234:"test.txt", 0102:|O\_CREAT|O\_RDWR|, 0644)}.
\subsection{Constants}\label{subsec:retrieving-constants}
For some functions constants are used.
These constants are typically used C macros in the source code.
This makes the source code more readable (and portable).
Constants are represented as an integer again followed by a colon, this time without any special characters to disdinguish them from other types.
Example: \texttt{socket(2:AF\_INET, 1:SOCK\_STREAM, 6)}.
\subsection{Pointers to Arrays}\label{subsec:retrieving-pointers-to-arrays}
Sometimes arrays are used as arguments.
Arrays in C work similar to strings, they are either null-terminated (by an element being of value 0), or their length is explicitly given.
So to represent them, two brackets are used (\texttt{[]}) and a comma (\texttt{,}) to separate the respective elements.
Each element may be represented as an ``argument'' on its own (as illustrated by the example).
Example: \\
\texttt{getopt(2, 0x7f0b8:[0x7feb3:"./main", 0x7fee6:"arg"], 0x123:"v")}.
\subsection{Pointers to Structures}\label{subsec:retrieving-pointers-to-structures}
In rare cases structures (\texttt{struct}) are used as argument types.
Two curly brackets (\texttt{\{\}}) are used to indicate structures.
Then the field names are displayed plainly, followed by a colon and then the value of that field.
Commas are used to separate the fields respectively.
Example: \texttt{\tiny connect(2, 0x123:\{sa\_family: 2:AF\_INET, sin\_addr: "1.1.1.1", sin\_port: 80\}, 16)}.
\section{Retrieving Function Return Values}\label{sec:retrieving-function-return-values}
Lorem Ipsum. Lorem Ipsum.
\section{Determining Function Call Location}\label{sec:determining-function-call-location} \section{Determining Function Call Location}\label{sec:determining-function-call-location}
Lorem Ipsum. Lorem Ipsum.
\section{Example}\label{sec:intercepting-example} \section{Example}\label{sec:intercepting-example}
Lorem Ipsum. Lorem Ipsum.
\section{Analyzing Intercepted Function Calls}\label{sec:analyzing-intercepted-function-calls} \section{Analyzing Intercepted Function Calls}\label{sec:analyzing-intercepted-function-calls}
Lorem Ipsum. Lorem Ipsum.
\section{Parsing Intercepted Function Calls in Python}\label{sec:parsing-intercepted-function-calls} \section{Parsing Intercepted Function Calls in Python}\label{sec:parsing-intercepted-function-calls}
Lorem Ipsum. Lorem Ipsum.
\section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls} \section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls}
Lorem Ipsum. Lorem Ipsum.