1
0

thesis: Incorporate feedback

This commit is contained in:
2025-08-13 12:23:20 +02:00
parent a4378a7c8e
commit b48c5b4921
3 changed files with 39 additions and 38 deletions

View File

@@ -1,18 +1,17 @@
\chapter{Intercepting Function Calls}\label{ch:intercepting-function-calls}
In this chapter all steps on how to intercept function calls in this work are discussed.
In this chapter, all steps on how to intercept function calls in this work are discussed.
An example of what the resulting interception looks like may be found in Section~\ref{sec:intercepting-example}.
Furthermore, an overview on how to test given programs is presented in Section~\ref{sec:automated-testing-on-intercepted-function-calls}.
This chapter does not discuss how these function calls may be manipulated in any way.
For that see Chapter~\ref{ch:manipulating-function-calls}.
How these function calls may be manipulated is discussed in Chapter~\ref{ch:manipulating-function-calls}.
\section{Identified Methods for Intercepting Function and System Calls}\label{sec:methods-for-intercepting}
First, one has to answer the question on \textit{how exactly} to intercept function or system calls.
At the beginning of this work it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see Section~\ref{sec:motivation-and-goal}).
This first section tries to list all possible methods on how to intercept function or system calls but does not claim completeness.
At the beginning of this work, it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see Section~\ref{sec:motivation-and-goal}).
This first section tries to list all possible and relevant methods on how to intercept function or system calls but does not claim exhaustiveness.
The order of the following subsections is roughly based on the thought process on finding the most appropriate method suitable for this work.
@@ -135,7 +134,7 @@ See the gcc(1) Linux manual page~\cite[Section OPTIONS]{gcc.1}:
This means, by specifying \texttt{-Wl,-{}-wrap=\textit{symbol}} when compiling using gcc,
all calls from the currently compiled program to \texttt{\textit{symbol}} are redirected to \texttt{\_\_wrap\_\textit{symbol}}.
To call the real function inside the wrapper, \texttt{\_\_real\_\textit{symbol}} may be used.
Listings~\ref{lst:wrap.c} and~\ref{lst:wrap} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
Listings~\ref{lst:wrap.c} and~\ref{lst:wrap} illustrate this by overriding the \texttt{malloc} function of the C standard library.
\begin{listing}[htbp]
\inputminted[linenos]{c}{src/listings/wrap.c}
@@ -159,7 +158,7 @@ Therefore, the source code (or the corresponding \texttt{*.out} files) needs to
Note, only calls from the targeted source code will be redirected, calls from other libraries won't.
Theoretically, it should be possible to re-link a given binary without having access to its source code.
But due to other more straight-forward methods (see Subsection~\ref{subsec:preloading}), this has not been further investigated.
But due to other more straight-forward methods (see Subsection~\ref{subsec:preloading}), this has not been investigated further.
\subsection{Preloading using \texttt{LD\_PRELOAD}}\label{subsec:preloading}
@@ -188,7 +187,7 @@ See the ld.so(8) Linux manual page~\cite[Section ENVIRONMENT]{ld.so.8}:
\end{quote}
This means, by setting the environment variable \texttt{LD\_PRELOAD}, it is possible to override specific functions.
Listings~\ref{lst:preload.c} and~\ref{lst:preload} try to illustrate this by overriding the \texttt{malloc} function of the C standard library.
Listings~\ref{lst:preload.c} and~\ref{lst:preload} illustrate this by overriding the \texttt{malloc} function of the C standard library.
\begin{listing}[htbp]
\inputminted[linenos]{c}{src/listings/preload.c}
@@ -218,10 +217,10 @@ Although, one has to be aware that not only function calls inside the targeted b
During the research on different approaches to intercepting system and function calls,
it has been found that the most reliable way to achieve the goals of this work (see Section~\ref{sec:motivation-and-goal}) is to intercept function calls instead of system calls.
This is because (as long as the programs to test are dynamically linked), intercepting function calls allows one to intercept many more calls and in a more flexible way.
This is because---as long as the programs to test are dynamically linked---, intercepting function calls allows one to intercept many more calls and in a more flexible way.
Therefore, from now on this work only considers function calls and no system calls directly.
In this work preloading (see Subsection~\ref{subsec:preloading}) was chosen to be used
In this work, preloading (see Subsection~\ref{subsec:preloading}) was chosen to be used
because it is simple to use (``clean'' source code, easy to compile and run programs with it) and offers the means to arbitrarily execute code when the intercepted function call is redirected.
The following sections concern the next steps in what else is needed to create a powerful ``interceptor''.
@@ -231,7 +230,7 @@ The following sections concern the next steps in what else is needed to create a
After deciding to use the preloading method to intercept function calls, a more detailed plan is needed to continue developing.
It was decided to have one single \texttt{intercept.so} file as a resulting artifact which then may be loaded via the \texttt{LD\_PRELOAD} environment variable.
The easiest and most straightforward way to structure the source code was to put all code in one single C file.
Listing~\ref{lst:intercept-preload.c} gives an overview over the grounding code structure.
Listing~\ref{lst:intercept-preload.c} gives an overview of the underlying code structure.
For each function that should be intercepted, this function simply has to be declared and defined the same way \texttt{malloc} was.
\begin{listing}[htbp]
@@ -244,8 +243,8 @@ For each function that should be intercepted, this function simply has to be dec
\section{Retrieving Function Argument Values}\label{sec:retrieving-function-argument-values}
Now that the first steps have been done, one needs to think about what exactly to record when intercepting.
A simple notification that a given function was called would be too less.
Within the following subsections it is tried to get as much information as possible from each function call.
A simple notification that a given function was called would not be sufficient.
Within the following subsections, effort is put into getting as much information as possible from each function call.
As already mentioned, \texttt{ltrace} uses prototype functions to format its function arguments.
This allows \texttt{ltrace} to ``dynamically'' display function arguments for any new or unknown functions without the need for recompilation.
@@ -254,9 +253,9 @@ This allows \texttt{ltrace} to ``dynamically'' display function arguments for an
However, due to implementation complexity reasons and the need for ``complex'' return types for string/buffer and structure values (see Section~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work.
This means that each function formats its arguments and return values itself without any configuration option.
The reason for retrieving as much information as possible from each function call is that at a later point in time it is possible to completely reconstruct the exact function calls and their sequence.
The reason for retrieving as much information as possible from each function call is that at a later point in time, it is possible to completely reconstruct the exact function calls and their sequence.
This allows analysis on these records to be performed independently of the corresponding execution of the program.
It should always be possible for any parser to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
It should always be possible to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type.
\subsection{Numbers}\label{subsec:retrieving-numbers}
@@ -293,7 +292,7 @@ Example: \texttt{write(3, 0x1234:"Test\textbackslash{}x00ABC", 8)}.
\subsection{Flags}\label{subsec:retrieving-flags}
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR.
Some functions have one of their arguments dedicated to flags which may be combined by bitwise XOR\@.
These arguments are also of type integer.
To distinguish flag arguments from others, a pipe symbol (\texttt{|}) is used after the colon and between the flags.
@@ -304,7 +303,7 @@ Example: \texttt{open(0x1234:"test.txt", 0102:|O\_CREAT|O\_RDWR|, 0644)}.
For some functions constants are used.
These constants are typically used C macros in the source code.
This makes the source code more readable (and portable).
Constants are represented as an integer again followed by a colon, this time without any special characters to disdinguish them from other types.
Constants are represented as an integer again followed by a colon, this time without any special characters to distinguish them from other types.
Example: \texttt{socket(2:AF\_INET, 1:SOCK\_STREAM, 6)}.
@@ -320,7 +319,7 @@ Example: \\
\subsection{Pointers to Structures}\label{subsec:retrieving-pointers-to-structures}
In rare cases structures (\texttt{struct}) are used as argument types.
In rare cases, structures (\texttt{struct}) are used as argument types.
Two curly brackets (\texttt{\{\}}) are used to indicate structures.
Then the field names are displayed plainly, followed by a colon and then the value of that field.
Commas are used to separate the fields respectively.
@@ -347,13 +346,15 @@ Example (\texttt{read}): \\
\texttt{return 12; errno 0; buf=0x7fff70:"Hello World!"}, \\
\texttt{return -1; errno EINTR}.
\todo{Explain Examples}
\section{Determining Function Call Location}\label{sec:determining-function-call-location}
Besides from argument values and return values, it would be interesting to know from where inside the intercepted program the function call came from.
Besides argument values and return values, it would be interesting to know from where inside the intercepted program the function call came.
At first this seems quite impossible.
But a function always knows at least the return address, the address to set then instruction pointer to when the function finishes.
With this information it may be estimated where the call to the current function came from.
But a function always knows at least the return address, the address to set the instruction pointer to when the function finishes.
With this information, it may be estimated where the call to the current function came from.
\subsection{Return Address and Relative Position}\label{subsec:return-address-and-relative-position}
@@ -486,7 +487,7 @@ The shared object currently supports intercepting the following functions:
\section{\texttt{intercept} Command}\label{sec:intercept-command}
To make the usage of the aforementioned shared object more easy, a simple python script has been put together.
To make the usage of the aforementioned shared object easier, a simple python script has been put together.
This script may be used as a command line tool.
See Listing~\ref{lst:intercept}.
@@ -551,14 +552,14 @@ This includes the offset relative to the calling binary and a source file and li
\section{Automated Testing on Intercepted Function Calls}\label{sec:automated-testing-on-intercepted-function-calls}
The recorded function calls of a program run now may be used to perform checks and tests on them.
The recorded function calls of a program run may now be used to perform checks and tests on them.
It is trivially possible to check which functions were called and in what order.
Furthermore, it is possible to check various pre- and post-conditions for each function call.
This is beneficial because many library functions in C rely on these pre- and post-conditions, which are not enforced by the compiler or in any other way.
For example, the \texttt{malloc} function has the post-condition that the returned value later needs to be passed to \texttt{free} to avoid memory leaks.
The \texttt{free} function, on the other hand, has the pre-condition that the passed value was previously acquired using \texttt{malloc} and may not be yet free'd.
Any violation of such pre- and post-conditions may be reported as incompliant behavior.
Any violation of such pre- and post-conditions may be reported as non-compliant behavior.
\cite{malloc.3}
This means that intercepted function calls allow a tester to check if programmers use library function in compliance to their specification.