thesis: Complete 2.5

2025-07-24 09:07:48 +02:00
parent 12bd315eb1
commit cfa68b78e7
2 changed files with 42 additions and 13 deletions
@@ -11,7 +11,7 @@ For that see Chapter~\ref{ch:manipulating-function-calls}.
 \section{Identified Methods for Intercepting Function and System Calls}\label{sec:methods-for-intercepting}

 First, one has to answer the question on \textit{how exactly} to intercept function or system calls.
-At the beginning of this work it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see\todo{Goal}).
+At the beginning of this work it was not yet determined if the interception of function calls, system calls, or both should be used to achieve the overarching goal (see\todo{Goals}).
 This first section tries to list all possible methods on how to intercept function or system calls but does not claim completeness.
 The order of the following subsections is roughly based on the thought process on finding the most appropriate method suitable for this work.

@@ -101,7 +101,7 @@ The compiler, and the linker respectively, then directly link calls to the speci
 (See Subsection~\ref{subsec:preloading} for more details.)

 The default linker \texttt{ld} includes such a feature.
-See the OPTIONS section in the ld(1) Linux manual page~\cite{ld.1}:
+See the ld(1) Linux manual page~\cite[Section OPTIONS]{ld.1}:

 \begin{quote}
  \begin{description}
@@ -118,7 +118,7 @@ See the OPTIONS section in the ld(1) Linux manual page~\cite{ld.1}:
 \end{quote}

 The gcc compiler also supports this by allowing passing options to the linker.
-See the OPTIONS section in the gcc(1) Linux manual page~\cite{gcc.1}:
+See the gcc(1) Linux manual page~\cite[Section OPTIONS]{gcc.1}:

 \begin{quote}
  \begin{description}
@@ -176,7 +176,7 @@ Therefore, it would be possible to ``hijack'' (or intercept) these function call
 when the linker would allow loading other functions instead of the proper ones.

 Luckily, \texttt{ld.so} allows this so-called ``preloading''.
-See the ENVIRONMENT section in the ld.so(8) Linux manual page~\cite{ld.so.8}:
+See the ld.so(8) Linux manual page~\cite[Section ENVIRONMENT]{ld.so.8}:

 \begin{quote}
  \begin{description}
@@ -217,7 +217,7 @@ Although, one has to be aware that not only function calls inside the targeted b
 \subsection{Conclusion}\label{subsec:methods-for-intercepting-conclusion}

 During the research on different approaches to intercepting system and function calls,
-it has been found that the most reliable way to achieve the goals of this work (see \todo{goals}) is to intercept function calls instead of system calls.
+it has been found that the most reliable way to achieve the goals of this work (see \todo{Goals}) is to intercept function calls instead of system calls.
 This is because (as long as the programs to test are dynamically linked), intercepting function calls allows one to intercept many more calls and in a more flexible way.
 Therefore, from now on this work only considers function calls and no system calls directly.

@@ -350,14 +350,15 @@ Example (\texttt{read}): \\

 \section{Determining Function Call Location}\label{sec:determining-function-call-location}

-\todo{}
 Besides from argument values and return values, it would be interesting to know from where inside the intercepted program the function call came from.
 At first this seems quite impossible.
-But\dots
+But a function always knows at least the return address, the address to set then instruction pointer to when the function finishes.
+With this information it may be estimated where the call to the current function came from.

 \subsection{Return Address and Relative Position}\label{subsec:return-address-and-relative-position}

-\todo{}
+As already mentioned, the return address of a function is vital for estimating where the call came from.
+Luckily, GCC provides the means to get the return address of the current function.
 See in the manual of GCC~\cite[Section~7.6]{gcc}:

 \begin{quote}
@@ -371,11 +372,15 @@ See in the manual of GCC~\cite[Section~7.6]{gcc}:
  \end{description}
 \end{quote}

-\todo{}
+The return address on its own is of limited use.
+Because, among other things, of Address Space Layout Randomization (ASLR) in almost all modern programs.
+ASLR is a security feature that randomly places shared objects (libraries) in the virtual memory of a program on each execution.
+In contrast to always positioning the same object at the same address each time, this makes it harder to exploit internal memory structures.
+
+Fortunately, the dynamic linking library includes a function to translate a given virtual memory address to symbolic information without having to worry about ASLR and other obstacles.
 See the dladdr(3) Linux manual page~\cite{dladdr.3}:

 \begin{quote}
-
  \begin{description}
    \item[\texttt{int dladdr(const void *addr, Dl\_info *info)}] \ \

@@ -400,11 +405,23 @@ typedef struct {
  \end{description}
 \end{quote}

+Using information from \texttt{Dl\_info}, it is possible to exactly determine the (shared) object from where the call came from (\texttt{dli\_fname}).
+Furthermore, it is possible to calculate the relative position inside this (shared) object using \texttt{dli\_fbase} and the return address itself.
+Keep in mind that the return address may only be used as an estimation for the origin of the call.
+Especially heavily optimized programs might use the same return address for functions in different code paths.
+Optionally, a name of a ``symbol'' (function) may be retrieved from where the function call came from.
+

 \subsection{Source File and Line Number}\label{subsec:source-file-and-line-number}

-\todo{}
-See the OPTIONS section in the readelf(1) Linux manual page~\cite{readelf.1}:
+DWARF is a file format used for storing debugging information (like source file, line number) inside compiled binaries.
+This allows various debuggers and other analysis programs to better give feedback to the user.
+\cite{dwarfstd.org}
+
+This also helps to find the origin of a given function call.
+When a program is compiled with GCC using the flags \texttt{-g} or \texttt{-gdwarf} GCC includes the DWARF debug section in the resulting binary.
+Using the readelf tool, it is possible to make use of this debug section.
+See the readelf(1) Linux manual page~\cite[Section OPTIONS]{readelf.1}:

 \begin{quote}
  \begin{description}
@@ -414,13 +431,16 @@ See the OPTIONS section in the readelf(1) Linux manual page~\cite{readelf.1}:
    The letters and words refer to the following information:
    \begin{description}
      \item {}[\dots]
-      \item[\texttt{=rawline}] Displays the contents of the \texttt{.debug\_line }section in a raw format.
+      \item[\texttt{=rawline}] Displays the contents of the \texttt{.debug\_line} section in a raw format.
      \item[\texttt{=decodedline}] Displays the interpreted contents of the \texttt{.debug\_line} section.
      \item {}[\dots]
    \end{description}
  \end{description}
 \end{quote}

+Using the resulting output, which sets relative address and source file and line number in relation, it is possible to retrieve both values from any given relative address inside the binary.
+If this information is present, it is printed within the meta-information of the function call (see Section~\ref{sec:intercepting-example}).
+

 \section{\texttt{intercept.so} Library}\label{sec:intercept.so-library}

@@ -1,3 +1,9 @@
+@online{dwarfstd.org,
+    author = {DWARF Committee},
+    title = {DWARF Debugging Information Format},
+    date = {2025-06-24},
+    url = {https://dwarfstd.org/},
+}
@manual{ld.so.8,
    title = {ld.so(8) -- System Manager's Manual -- Linux manual pages},
 }
@@ -28,6 +34,9 @@
@manual{readelf.1,
    title = {READELF(1) -- GNU Development Tools -- Linux manual pages},
 }
+@manual{malloc.3,
+    title = {malloc(3) -- Library Functions Manual -- Linux manual pages},
+}
@book{netsectools2005,
    author = {Dhanjani, Nitesh and Clarke, Justin},
    title = {Network Security Tools},