diff --git a/thesis/src/01.introduction.tex b/thesis/src/01.introduction.tex index e3ec0c8..f6bf9fc 100644 --- a/thesis/src/01.introduction.tex +++ b/thesis/src/01.introduction.tex @@ -11,7 +11,7 @@ This chapter gives a general overview about what the motivation and goal for thi When teaching students about Operating Systems, their interfaces, and standard libraries, C is still a widely used language. Especially when using Linux. Therefore, it is obvious, why many university courses still require students to write their assignments and exams in C\@. -The problem when trying to verify, if students correctly implemented their assignment is that low-level OS constructs (like semaphores, pipes, sockets, memory management) make it hard to run automated tests, because the testing system needs to keep track, set up, and verify the usage of these resources. +The problem when trying to verify whether students have correctly implemented their assignment is that low-level OS constructs (like semaphores, pipes, sockets, memory management) make it hard to run automated tests, because the testing system needs to keep track, set up, and verify the usage of these resources. The goal of this work was to find a way to easily intercept system or function calls and to verify if students called the right functions with the right arguments at the right time. This restriction in scope allows focusing on simple binary programs without having to think about complex or I/O heavy programs. @@ -36,7 +36,7 @@ Functions are used to structure programs, reuse functionality, or expose functio Other languages than C differentiate between functions, methods, procedures, and so on. A function written in the source code is almost always compiled to a function in the resulting binary. -Intercepting calls to functions would one allow seeing the name of the function, arguments, return value, and return address. +Intercepting calls to functions allows one to see the function name, arguments, return value, and return address. \subsection{System Calls}\label{subsec:system-calls} @@ -46,12 +46,12 @@ Many operations on a modern operating system require special privileges, which a By invoking a system call, the (user-space) process hands control over to the (privileged) kernel and requests an operation to be performed. \cite[Chapter~10]{linuxkernel} -How exactly these system calls work is architecture and system specific. +How exactly these system calls work depends on the architecture and operating system. But generally, the process places the system call number and its arguments in defined registers and then executes a special system call opcode. Then the kernel executes the requested operation and places the return value inside another register, and lastly hands the execution back to the process. \cite[Chapter~10]{linuxkernel} -Intercepting calls to system calls would one allow seeing the system call number, arguments, and return value. +Intercepting calls to system calls allows one to see the system call number, arguments, and return value. One has to keep in mind that many system-related functionalities are not in fact translated to system calls one-to-one. For example, \texttt{malloc}~\cite{malloc.3} has no dedicated system call, it is managed by the C standard library internally. Many system calls have corresponding wrapper functions in the C standard library (like \texttt{open}, \texttt{close}, \texttt{sem\_wait}). diff --git a/thesis/src/02.intercept.tex b/thesis/src/02.intercept.tex index c8c0a00..94b1d23 100644 --- a/thesis/src/02.intercept.tex +++ b/thesis/src/02.intercept.tex @@ -56,8 +56,7 @@ exit_group(0) = ? \label{lst:strace} \end{listing} -This approach works great for debugging and other use-cases, -but only intercepting system calls does not statisfy the requirements for this work. +This approach works well for debugging and other use cases, but intercepting system calls alone does not satisfy the requirements of this work. \subsection{\texttt{ltrace} Command}\label{subsec:ltrace} @@ -210,7 +209,7 @@ The function \texttt{dlsym} is used to retrieve the original address of the \tex \cite{dlsym.3} By using this method, it is possible to override, and therefore wrap, any function as long as the targeted binary was not statically linked. -Although, one has to be aware that not only function calls inside the targeted binary, but also calls inside other libraries (e.g., to \texttt{malloc}) are redirected to the overriding function. +However, one must be aware that not only function calls inside the targeted binary, but also calls inside other libraries (e.g., \texttt{malloc}) are redirected to the overriding function. \subsection{Conclusion}\label{subsec:methods-for-intercepting-conclusion} @@ -260,7 +259,7 @@ It should always be possible to fully parse the recorded calls without any speci \subsection{Numbers}\label{subsec:retrieving-numbers} -The most simple types of argument are plain numbers, like integers (\texttt{int}, \texttt{long}, \ldots) or floating point numbers (\texttt{float}, \texttt{double}). +The simplest types of arguments are plain numbers, like integers (\texttt{int}, \texttt{long}, \ldots) or floating point numbers (\texttt{float}, \texttt{double}). (In fact, \textit{all} arguments are represented as numbers or integers. See the following subsections for examples.) Plain numbers may be formatted simply as what they are, in base 10 notation, or with a prefix like \texttt{0x} for hexadecimal or \texttt{0} for octal representation. @@ -284,7 +283,7 @@ Special values inside the string are escaped with a backslash. Example: \texttt{sem\_unlink(0x1234:"/test-semaphore")}. -Another type of ``string'' in C is a buffer with a known length. +Another type of string-like data in C is a buffer with a known length. When buffers are used, usually another argument is passed to the function which indicates the length of the buffer. This fact may be used to print out the contents of the buffer in the same way as normal C strings. @@ -338,8 +337,8 @@ Example (\texttt{malloc}): \\ \texttt{return 0x1234; errno 0}, \\ \texttt{return -1; errno ENOMEM}. -Some libc functions return their results via a pointer which was previously given to them as an argument. -The \texttt{pipe} function is called with an \texttt{int} array of size two as argument and stores its two pipe ends into this array. +Some libc functions return their results via a pointer, which was previously given to them as an argument. +The \texttt{pipe} function is called with an \texttt{int} array of size two as an argument and stores its two pipe ends into this array. The \texttt{read} function is called with a pointer to a buffer and a corresponding size and stores its read data into this buffer. Example (\texttt{pipe}): \\ @@ -548,7 +547,7 @@ This includes the offset relative to the calling binary and a source file and li \begin{listing}[htbp] \inputminted[fontsize=\tiny]{text}{src/listings/intercept-client.txt} - \caption{Recoreded function calls from \texttt{./client}.} + \caption{Recorded function calls from \texttt{./client}.} \label{lst:intercept-client} \end{listing} @@ -565,7 +564,7 @@ The \texttt{free} function, on the other hand, has the pre-condition that the pa Any violation of such pre- and post-conditions may be reported as non-compliant behavior. \cite{malloc.3} -This means that intercepted function calls allow a tester to check if programmers use library function in compliance to their specification. +This means that intercepted function calls allow a tester to check whether programmers use library functions in compliance with their specifications. Other checks may also include guards to calls to ``forbidden'' functions, or that specific functions must be called exactly three times. Another important post-condition of most library functions is the return value, which in most cases indicates success or failure of an operation. However, intercepting of calls alone may not be able to verify if a program really checks the return value of a function and acts accordingly. @@ -597,7 +596,7 @@ By only intercepting these functions, it is possible to check if all allocated m \subsection{Validating Resource Management}\label{subsec:validating-resource-management} -Besides memory management, the proper use of other resources, most notably file descriptors, may be checked. +In addition to memory management, the proper use of other resources---most notably file descriptors---can also be checked. Many functions in the C standard library rely on file descriptors. It may be checked if file descriptors were properly acquired, if only previously acquired file descriptors are used, and if these file descriptors are closed after their use. Relevant for this work are also semaphores because they do not rely on file descriptor in their API\@. diff --git a/thesis/src/03.manipulate.tex b/thesis/src/03.manipulate.tex index 0df187e..f427272 100644 --- a/thesis/src/03.manipulate.tex +++ b/thesis/src/03.manipulate.tex @@ -4,7 +4,7 @@ This chapter discusses how to manipulate function calls and how this may be used to test programs. How function calls may be intercepted at all is discussed in Chapter~\ref{ch:intercepting-function-calls}. This chapter builds on the basis of the previous one and expands its functions. -``Manipulation'' in this context means to change the arguments of a function then calling it with those changed arguments, or skipping the execution of the real function completely and simply returning a given value (``mocking''). +In this context, ``manipulation'' means changing the arguments of a function before calling it with the modified arguments, or skipping the execution of the real function completely and simply returning a given value (``mocking''). These techniques allow in-depth testing of programs. In contrast to simply recording and logging function calls which may be controlled via environment variables, manipulation of such function calls requires some other process to indicate how to handle each call. @@ -98,7 +98,7 @@ The contents of this message type correspond to the second line of an intercepte As seen in Figure~\ref{fig:control-flow} function call manipulation allows for mocking individual calls. Mocking may be used to see how the program behaves when individual calls to function fail or return an unusual, but valid, value. -The simplest way to automatically test programs is to run them multiple times and on each run let a single function call fail. +The simplest way to automatically test programs is to run them multiple times, allowing a single function call to fail in each run. The resulting sequence of function calls now may be put together to a call sequence graph (or tree). By analyzing this call graph, it is possible to decide if a program correctly terminated when faced with a failed function call. This may be the case when the following function calls differ from those which were recorded on a default run (without any mocked function calls). @@ -112,7 +112,7 @@ Edges labeled with ``fail'' indicate the next function call after a mocked faile In reality, there are multiple failing paths, each for every possible error return value, but in this example they all yield the same resulting path, therefore, they have been collapsed. To test, if a programmer always checked the return value of a function and acted accordingly, this resulting call sequence graph now may be analyzed. -This test seems trivial at first. +At first glance, this test appears trivial. The simplest approach is to verify that after a failing function call only ``cleanup'' function calls (\texttt{free}, \texttt{close}, \texttt{exit}, \dots) follow. For simple programs, this assumption may hold, but there are many exceptions. For example, what if the program recognizes the failed call correctly as failed but recovers and continues to operate normally? diff --git a/thesis/src/04.related-work.tex b/thesis/src/04.related-work.tex index e7003db..f524c41 100644 --- a/thesis/src/04.related-work.tex +++ b/thesis/src/04.related-work.tex @@ -19,7 +19,7 @@ This excludes techniques already discussed in Section~\ref{sec:methods-for-inter like \texttt{ptrace} (Subsection~\ref{subsec:ptrace}), and \texttt{strace} (Subsection~\ref{subsec:strace}). Almost all following methods use binary rewriting to replace system calls with other instructions (except SUD, Subsection~\ref{subsec:syscall-user-dispatch}). This is one of the reasons why they were not mentioned in Section~\ref{sec:methods-for-intercepting}. -Another one is that the focus of this work is function call interception, and not system call interception. +Another reason is that this work focuses on function call interception rather than system call interception. \subsection{\texttt{int3} Signaling}\label{subsec:int3-signaling} @@ -27,7 +27,7 @@ Another one is that the focus of this work is function call interception, and no \texttt{int3} is a one-byte instruction (\texttt{0xcc}) that invokes a software interrupt. On Linux, the kernel handles it and raises \texttt{SIGTRAP} to the user-space process that executed \texttt{int3}. The \texttt{int3} signaling technique exploits this behavior to hook system calls; it replaces \texttt{syscall}/\texttt{sysenter} with \texttt{int3} and employs the signal handler for \texttt{SIGTRAP} as the hook function. -Since \texttt{int3} is one byte, it can replace an arbitrary instruction without breaking the neighbor instructions. +Since \texttt{int3} is one byte, it can replace an arbitrary instruction without breaking neighboring instructions. This technique is traditionally used in debuggers to implement breakpoints. However, signal handling incurs a large overhead because it involves context manipulation by the kernel. \cite{zpoline} diff --git a/thesis/src/05.evaluation.tex b/thesis/src/05.evaluation.tex index 3a1b9ba..c5e6f62 100644 --- a/thesis/src/05.evaluation.tex +++ b/thesis/src/05.evaluation.tex @@ -6,12 +6,12 @@ Lorem Ipsum. \section{Usefulness for the Operating Systems Course}\label{sec:usefulness} -Up until recently the Operating Systems Course (mentioned in Section~\ref{sec:motivation-and-goal}) was split into three exercise blocks: +Up until recently the Operating Systems course (mentioned in Section~\ref{sec:motivation-and-goal}) was split into three exercise blocks: Files, Shared Memory, Semaphores; Related Processes and Inter-Process Communication via Unnamed Pipes; and Sockets. Table~\ref{tab:functions} lists all functions presented in the course and their implementation status in \texttt{intercept.so}. As one may see, simple file stream functions are not currently implemented in \texttt{intercept.so}. -This is because of time restrictions on this work and the fact, that simple file operations may be tested easily in the conventional way of checking the resulting output. -Note, that the future implementation of single functions is not very complex. +This is because of time restrictions on this work and the fact that simple file operations may be tested easily in the conventional way of checking the resulting output. +Note that the future implementation of single functions is not very complex. All other functions have at least interception and mocking (returning, failing) implemented. For some functions the modification of function arguments has been implemented. @@ -68,14 +68,14 @@ For some functions the modification of function arguments has been implemented. \section{Performance}\label{sec:performance} -Although high performance was not a primary goal of this work, the resulting performance degradation by intercepting and manipulation should not be too excessive. +Although high performance was not a primary goal of this work, the performance degradation caused by interception and manipulation should not be excessive. The following two subsections test and discuss the performance degradation of \texttt{intercept.so} compared to running a program without any intercepting or hooking. \subsection{Performance when Intercepting}\label{subsec:performance-intercepting} -To test the performance of \texttt{intercept.so} the following measurement environment was set up. -On an x86-64 machine with an AMD Ryzen 7 7700X 8-Core processor a simple program was called with an increasing number of iterations it had to perform. +To test the performance of \texttt{intercept.so}, the following measurement environment was set up. +On an x86-64 machine with an AMD Ryzen 7 7700X 8-Core processor, a simple program was called with an increasing number of iterations it had to perform. The program simply called the \texttt{pipe} function and then closed the created pipes in a for loop. At first execution time of the program was measured without using \texttt{intercept.so} (``Baseline''). Then \texttt{intercept.so} was preloaded but without any action to perform when intercepting (``Intercepting''). @@ -85,9 +85,9 @@ For each of the four variants, the program was called with an iteration count be Each measurement was taken 30 times with one second between program executions to rule out statistical outliers. Figure~\ref{fig:performance} illustrates the results. -It is clearly visible, that the initialization step of \texttt{intercept.so} always takes around 10 ms. -In comparison to that, the performance degradation of the intercepting procedure alone is negligible, only around +13 \% compared to the baseline execution (see ``Intercept -- 10 ms'' and ``Baseline''). -Most delay comes from logging the recorded function calls. +It is clearly visible that the initialization step of \texttt{intercept.so} always takes around 10 ms. +In comparison to that, the performance degradation of the intercepting procedure alone is negligible, only around +13\% compared to the baseline execution (see ``Intercept -- 10 ms'' and ``Baseline''). +Most of the delay is caused by logging the recorded function calls. \begin{figure} \centering @@ -150,4 +150,4 @@ Measuring performance for function call manipulation makes no sense without know As seen in Subsection~\ref{subsec:performance-intercepting}, most delay comes not from intercepting itself, but from the further processing. This also applies to function call manipulation. The performance degradation heavily depends on the response speed of the used socket. -Therefore, an explicit performance test on manipulation was deemed not yielding meaningful result and not carried out. +Therefore, an explicit performance test on manipulation was deemed unlikely to yield meaningful results and was not carried out. diff --git a/thesis/src/06.conclusion.tex b/thesis/src/06.conclusion.tex index d9e2a7d..4ba685f 100644 --- a/thesis/src/06.conclusion.tex +++ b/thesis/src/06.conclusion.tex @@ -3,7 +3,7 @@ This work presented \texttt{intercept.so}, a shared object file intended to be preloaded using \texttt{LD\_PRELOAD}, which may be used to intercept function calls on Linux systems. Furthermore, a tool to use this shared object easier---the \texttt{intercept} Python program---was presented. -By using preloading to hook/intercept function calls, the overhead and performance degradation is negligible for the purpose of testing student submissions. +By using preloading to hook or intercept function calls, the overhead and performance degradation remain negligible for the purpose of testing student submissions. To make use of intercepted function calls, some techniques of automatic testing of simple C programs were discussed. The source code of the programs developed in this work is attached below. diff --git a/thesis/thesis.tex b/thesis/thesis.tex index e7d3adc..c2c7ec2 100644 --- a/thesis/thesis.tex +++ b/thesis/thesis.tex @@ -11,8 +11,7 @@ \Author{\authorname} % The author's name in the document properties. \Title{Intercepting and Manipulating System/Function Calls in Linux Systems} % The document's title in the document properties. \Language{de-AT} % The document's language in the document properties. Select 'en-US', 'en-GB', or 'de-AT'. -% TODO -\Keywords{a\sep list\sep of\sep keywords} % The document's keywords in the document properties (separated by '\sep '). +\Keywords{system call\sep syscall\sep function call\sep intercept\sep hook} % The document's keywords in the document properties (separated by '\sep '). \Publisher{TU Wien} % The document's publisher in the document properties. \Subject{Thesis} % The document's subject in the document properties. \end{filecontents*} @@ -169,7 +168,8 @@ % Declare the use of AI tools as mentioned in the statement of originality. % Use either the English aitools or the German kitools. \begin{aitools} -No generative AI tools were used in and for this work whatsoever. + No generative AI tools were used in and for this work whatsoever. + The only exception was the use of ChatGPT for proofreading. \end{aitools} %\begin{kitools}