From 50167182831f59d19018b6aab4e88a75f30c65a9 Mon Sep 17 00:00:00 2001 From: Lorenz Stechauner Date: Sun, 24 Aug 2025 17:38:31 +0200 Subject: [PATCH] thesis: Commas --- thesis/src/01.introduction.tex | 15 +++++++-------- thesis/src/02.intercept.tex | 6 +++--- thesis/src/03.manipulate.tex | 12 ++++++------ thesis/src/05.evaluation.tex | 6 +++--- 4 files changed, 19 insertions(+), 20 deletions(-) diff --git a/thesis/src/01.introduction.tex b/thesis/src/01.introduction.tex index f6bf9fc..4f1f001 100644 --- a/thesis/src/01.introduction.tex +++ b/thesis/src/01.introduction.tex @@ -8,14 +8,13 @@ This chapter gives a general overview about what the motivation and goal for thi \section{Motivation and Goal}\label{sec:motivation-and-goal} -When teaching students about Operating Systems, their interfaces, and standard libraries, C is still a widely used language. -Especially when using Linux. +When teaching students about Operating Systems, their interfaces, and standard libraries, C is still a widely used language, especially when using Linux. Therefore, it is obvious, why many university courses still require students to write their assignments and exams in C\@. -The problem when trying to verify whether students have correctly implemented their assignment is that low-level OS constructs (like semaphores, pipes, sockets, memory management) make it hard to run automated tests, because the testing system needs to keep track, set up, and verify the usage of these resources. +The problem, when trying to verify whether students have correctly implemented their assignment, is that low-level OS constructs (like semaphores, pipes, sockets, memory management) make it hard to run automated tests, because the testing system needs to keep track, set up, and verify the usage of these resources. -The goal of this work was to find a way to easily intercept system or function calls and to verify if students called the right functions with the right arguments at the right time. +The goal of this work was to find a way to easily intercept system or function calls, and to verify if students called the right functions, with the right arguments, at the right time. This restriction in scope allows focusing on simple binary programs without having to think about complex or I/O heavy programs. -Furthermore, in this setting the source code of the student's programs is obviously available because this is what they need to deliver. +Furthermore, in this setting the source code of the student's programs is obviously available, because this is what they need to deliver. The availability of source code is a key concern when trying to intercept function or system calls, as will be clear in the next chapters. @@ -27,7 +26,7 @@ The following subsections concern these definitions. \subsection{Function Calls}\label{subsec:function-calls} -Generally, a function in C (and also most other programming languages) is a piece of code that may be called and therefore executed from elsewhere. +Generally, a function in C (and also most other programming languages) is a piece of code that may be called, and therefore executed, from elsewhere. Functions have zero or more arguments and return a single value. When calling a function, the caller places the return address onto the stack. This address indicates where the function should continue executing when it is finished. @@ -43,7 +42,7 @@ Intercepting calls to functions allows one to see the function name, arguments, In contrast to functions, system calls are calls to the kernel itself. Many operations on a modern operating system require special privileges, which a simple user-space process does not have. -By invoking a system call, the (user-space) process hands control over to the (privileged) kernel and requests an operation to be performed. +By invoking a system call, the (user-space) process hands control over to the (privileged) kernel, and requests an operation to be performed. \cite[Chapter~10]{linuxkernel} How exactly these system calls work depends on the architecture and operating system. @@ -52,6 +51,6 @@ Then the kernel executes the requested operation and places the return value ins \cite[Chapter~10]{linuxkernel} Intercepting calls to system calls allows one to see the system call number, arguments, and return value. -One has to keep in mind that many system-related functionalities are not in fact translated to system calls one-to-one. +One has to keep in mind, that many system-related functionalities are not, in fact, translated to system calls one-to-one. For example, \texttt{malloc}~\cite{malloc.3} has no dedicated system call, it is managed by the C standard library internally. Many system calls have corresponding wrapper functions in the C standard library (like \texttt{open}, \texttt{close}, \texttt{sem\_wait}). diff --git a/thesis/src/02.intercept.tex b/thesis/src/02.intercept.tex index b9bca04..cdb4f33 100644 --- a/thesis/src/02.intercept.tex +++ b/thesis/src/02.intercept.tex @@ -209,13 +209,13 @@ The function \texttt{dlsym} is used to retrieve the original address of the \tex \cite{dlsym.3} By using this method, it is possible to override, and therefore wrap, any function as long as the targeted binary was not statically linked. -However, one must be aware that not only function calls inside the targeted binary, but also calls inside other libraries (e.g., \texttt{malloc}) are redirected to the overriding function. +However, one must be aware that, not only function calls inside the targeted binary, but also calls inside other libraries (e.g., \texttt{malloc}), are redirected to the overriding function. \subsection{Conclusion}\label{subsec:methods-for-intercepting-conclusion} During the research on different approaches to intercepting system and function calls, -it has been found that the most reliable way to achieve the goals of this work (see Section~\ref{sec:motivation-and-goal}) is to intercept function calls instead of system calls. +it has been found, that the most reliable way to achieve the goals of this work (see Section~\ref{sec:motivation-and-goal}) is to intercept function calls instead of system calls. This is because---as long as the programs to test are dynamically linked---, intercepting function calls allows one to intercept many more calls and in a more flexible way. Therefore, from now on this work only considers function calls and no system calls directly. @@ -252,7 +252,7 @@ This allows \texttt{ltrace} to ``dynamically'' display function arguments for an However, due to implementation complexity reasons and the need for ``complex'' return types for string/buffer and structure values (see Section~\ref{sec:retrieving-function-return-values}) a statically compiled approach has been used for this work. This means that each function formats its arguments and return values itself without any configuration option. -The reason for retrieving as much information as possible from each function call is that at a later point in time, it is possible to completely reconstruct the exact function calls and their sequence. +The reason for retrieving as much information as possible from each function call, is that at a later point in time, it is possible to completely reconstruct the exact function calls and their sequence. This allows analysis on these records to be performed independently of the corresponding execution of the program. It should always be possible to fully parse the recorded calls without any specific knowledge of specific functions, their argument types, or return value type. diff --git a/thesis/src/03.manipulate.tex b/thesis/src/03.manipulate.tex index 5a67802..e94a54f 100644 --- a/thesis/src/03.manipulate.tex +++ b/thesis/src/03.manipulate.tex @@ -4,11 +4,11 @@ This chapter discusses how to manipulate function calls and how this may be used to test programs. How function calls may be intercepted at all is discussed in Chapter~\ref{ch:intercepting-function-calls}. This chapter builds on the basis of the previous one and expands its functions. -In this context, ``manipulation'' means changing the arguments of a function before calling it with the modified arguments, or skipping the execution of the real function completely and simply returning a given value (``mocking''). +In this context, ``manipulation'' means changing the arguments of a function, before calling it with the modified arguments, or skipping the execution of the real function completely and simply returning a given value (``mocking''). These techniques allow in-depth testing of programs. -In contrast to simply recording and logging function calls which may be controlled via environment variables, manipulation of such function calls requires some other process to indicate how to handle each call. -This work uses simple sockets to communicate between the process of the program to be tested and a ``server'' which decides what action to perform for each function call. +In contrast to simply recording and logging function calls, which may be controlled via environment variables, manipulation of such function calls requires some other process to indicate how to handle each call. +This work uses simple sockets to communicate between the process of the program to be tested, and a ``server'', which decides what action to perform for each function call. Currently, only communication over Unix sockets is implemented, but communication over TCP sockets is also easily possible. Figure~\ref{fig:control-flow} illustrates the control flow for manipulating function calls. @@ -97,10 +97,10 @@ The contents of this message type correspond to the second line of an intercepte \section{Automated Testing using Function Call Manipulation}\label{sec:automated-testing-using-function-call-manipulation} As seen in Figure~\ref{fig:control-flow} function call manipulation allows for mocking individual calls. -Mocking may be used to see how the program behaves when individual calls to function fail or return an unusual, but valid, value. +Mocking may be used to see how the program behaves when individual calls to function fail, or return an unusual, but valid, value. The simplest way to automatically test programs is to run them multiple times, allowing a single function call to fail in each run. The resulting sequence of function calls now may be put together to a call sequence graph (or tree). -By analyzing this call graph, it is possible to decide if a program correctly terminated when faced with a failed function call. +By analyzing this call graph, it is possible to decide, if a program correctly terminated, when faced with a failed function call. This may be the case when the following function calls differ from those which were recorded on a default run (without any mocked function calls). @@ -113,7 +113,7 @@ In reality, there are multiple failing paths, each for every possible error retu To test, if a programmer always checked the return value of a function and acted accordingly, this resulting call sequence graph now may be analyzed. At first glance, this test appears trivial. -The simplest approach is to verify that after a failing function call only ``cleanup'' function calls (\texttt{free}, \texttt{close}, \texttt{exit}, \dots) follow. +The simplest approach is to verify, that after a failing function call, only ``cleanup'' function calls (\texttt{free}, \texttt{close}, \texttt{exit}, \dots) follow. For simple programs, this assumption may hold, but there are many exceptions. For example, what if the program recognizes the failed call correctly as failed but recovers and continues to operate normally? Or what if the ``cleanup'' path is very complex and includes function calls not priorly marked as valid cleanup functions? diff --git a/thesis/src/05.evaluation.tex b/thesis/src/05.evaluation.tex index 70047f7..cf74b43 100644 --- a/thesis/src/05.evaluation.tex +++ b/thesis/src/05.evaluation.tex @@ -12,7 +12,7 @@ Up until recently the Operating Systems course (mentioned in Section~\ref{sec:mo Files, Shared Memory, Semaphores; Related Processes and Inter-Process Communication via Unnamed Pipes; and Sockets. Table~\ref{tab:functions} lists all functions presented in the course and their implementation status in \texttt{intercept.so}. As one may see, simple file stream functions are not currently implemented in \texttt{intercept.so}. -This is because of time restrictions on this work and the fact that simple file operations may be tested easily in the conventional way of checking the resulting output. +This is because of time restrictions on this work, and the fact that simple file operations may be tested easily in the conventional way of checking the resulting output. Note that the future implementation of single functions is not very complex. All other functions have at least interception and mocking (returning, failing) implemented. For some functions the modification of function arguments has been implemented. @@ -80,11 +80,11 @@ To test the performance of \texttt{intercept.so}, the following measurement envi On an x86-64 machine with an AMD Ryzen 7 7700X 8-Core processor, a simple program was called with an increasing number of iterations it had to perform. The program simply called the \texttt{pipe} function and then closed the created pipes in a for loop. At first execution time of the program was measured without using \texttt{intercept.so} (``Baseline''). -Then \texttt{intercept.so} was preloaded but without any action to perform when intercepting (``Intercepting''). +Then \texttt{intercept.so} was preloaded, but without any action to perform when intercepting (``Intercepting''). After that, logging to \texttt{stderr} was enabled (``Logging to stderr''). Finally, the logs were written to a file (``Logging to file''). For each of the four variants, the program was called with an iteration count beginning at 100 and increasing in steps of 100 up to 5000. -Each measurement was taken 30 times with one second between program executions to rule out statistical outliers. +Each measurement was taken 30 times, with one second between program executions to rule out statistical outliers. Figure~\ref{fig:performance} illustrates the results. It is clearly visible that the initialization step of \texttt{intercept.so} always takes around 10 ms.