Protection and system calls

This session is dedicated to the study of protection and system calls in Operating Systems.

You don't need to submit source files or reports for this session.

For this session, you will need to log into a PC running Linux. If you are at Eurecom, simply log on a PC of rooms 52 or 53. If you were to use another PC running GNU/Linux, I cannot guarantee that the lab works in the same way (outputs could be different, manual pages could be different, etc.).

Important: you will create test files, some of which will be large. As your disk quota is limited you will create these files in the /tmp directory which purpose is actually to store temporary files. It can perfectly be that another student works remotely on the same computer as you. To avoid clashes first create a subdirectory with your user name (in the following we represent your user name as USERNAME):

# Do not just copy-paste this: replace USERNAME with your user name
$ mkdir -p /tmp/USERNAME

At the end of the lab please do not forget to delete all temporary files to free disk space in /tmp (but be careful when using rm -rf, if you get it wrong the consequences can be catastrophic; check your typing twice):

$ rm -rf /tmp/USERNAME

Trick. Under GNU/Linux you can easily copy/paste with the mouse: select some text (click and drag, double-click or triple-click), move your cursor to the destination and click with the middle button (or the wheel).

I. Basic

Be sure you have understood all the slides on the Introduction on Operating Systems
How would you define a system call? Once you are happy with your definition, you can check on wikipedia for instance.

II. Services

Type the following command. How do you explain the difference with the write() function of the source code given as example in the slides?
```
$ man write
```
Similarly, try to figure out if the following command are bash commands, syscalls, or C library functions: read, print, printf, malloc, exit, getpid, ps

(Help me!) Use man -s2 f to know if a function f is a syscall or not

III. Using system calls

In this exercise you will learn how to compile a C program, how to correct errors and warnings, and how to use Linux system calls. You will also discover how to manipulate files from a C program.

Save the writeToFile.c file into your work directory, open it with your favourite text editor. Locate where the open() function is used. Verify that open() is a system call. Find how many syscalls are used in writeToFile.c.
Study this source code in detail, find out its main purpose. What errors can be raised during execution? Can you explain in which case each of them is raised?
Compile writeToFile.c to a binary file named writeToFile:
```
$ gcc -Wall -o writeToFile writeToFile.c
```
Execute writeToFile:
```
$ ./writeToFile /tmp/USERNAME/test hello
```
The process should write hello to file /tmp/USERNAME/test (where USERNAME is your user name). Check that the file exists and check its content.
Use the program in such a way that it fails to write to the file. Check that it indeed fails.

(Help me!) What happens if the program tries to write in a file for which you don't have the write permission?
When open is called and returns a negative value, it means that there was an error. In the current version, when there is an error, the program exits with status 1. You can see the exit status of the last executed command with:
```
$ echo $?
```
We would like to provide more information to the user when an error occurs. Open the manual page of the open syscall. At the end of the page there is a list of possible error code names. Pick one of them that you are able to reproduce.

These more or less meaningful names are macros defined in the errno.h header file. Add the next line to the preamble of your program:
```
#include <errno.h>
```
errno.h not only defines macros for the error codes, it also declares variable errno. With the header file included, each time you call a syscall, you can now read the content of variable errno, and compare it with the macros to identify the exact cause of the error:
```
if (errno == EINTR) {
   ...
} else if (errno == EINVAL) {
  ....
}
```
Be careful: the errno variable can be overwritten by subsequent syscalls or function calls; if needed save it in a user variable. Note: if the moreutils is installed on your computer you can use it to discover which error has a given code:
```
$ errno 12
ENOMEM 12 Cannot allocate memory
```
Add the management of at least one error you know how to reproduce. Check that you obtain the expected error code. Provide a more specific error message to the user when this error occurs.

The program passes strlen(toBeWritten) to the write syscall to write all characters at once. Assume that we want to write only one character at a time. Code a new version named writeOnetoFile.c with this restriction on write.

What the provided code does	What your code should do
write(all characters at once)	while(not done) { write(one character) }

Instead of passing the input text on the command line we now want to read it from an input file. Code a new program named readThenWrite.c that reads all input data from an input file and then writes it back to an output file.
1. First change the handling of the arguments: the first argument is the name of the input file, the second is the name of the output file.
2. Then, read the input data from the input file. To store input data in a buffer, you may need to first get the size of the input file. Use stat(). By the way, is stat() a library function or a syscall? Do not forget to handle potential errors.
3. After the read completes, write the data to the output file. Compile and test. Again, try to trigger errors to figure out if your error management works as expected. What happens if you provide too few or too many arguments?
4. Generate a very large file (1GB) with the dd utility (use man dd to know more about it), use the stat utility to check the size in bytes (use man stat...), measure the time taken by your program to duplicate this large file, and check that the duplicates are identical:
```
$ dd if=/dev/zero of=/tmp/USERNAME/myBigFile count=1GB
$ stat -c%s /tmp/USERNAME/myBigFile
$ time ./ReadThenWrite /tmp/USERNAME/myBigFile /tmp/USERNAME/myOtherBigFile
$ diff --brief /tmp/USERNAME/myBigFile /tmp/USERNAME/myOtherBigFile && echo "OK"
```
  Once done save some disk space and delete the duplicate (you will need the original for the bonus question):
```
$ rm -f /tmp/USERNAME/myOtherBigFile
```

Bonus: Reading all the input file at once, and then writing all the data to the destination file at once works but does not scale. On a memory limited computer we could not use this for large files.

Code a new version named ReadBlockThenWrite.c that takes a third parameter: buf_size, the byte size of the buffer you use for reading and writing. Your program shall allocate only buf_size bytes and split the read/write operations accordingly.

What the initial code does	What the code should do
read(all characters at once) write(all characters at once)	while(not over) { read(up to buf_size bytes) write(what has been read) }

Program, test, and prove that it works as before with different buffer sizes.
Measure the time taken by this new version to duplicate the large file with a size of 1000000000 (that is, the size of the complete large file). Is it the same as before?
Measure the time taken by the new version to duplicate the large file with a size of 100000000 (that is, a tenth of the size of the complete large file). Repeat with smaller and smaller buffer sizes. What do you notice? Can you explain?

Assume you want to write the same input content to two output files. Would it be more efficient to write the entire content to the first file and then to the second, or to write to both files incrementally?

Do not forget to delete all temporary files to free disk space in /tmp (but be careful when using rm -rf, if you get it wrong the consequences can be catastrophic; check your typing twice):
```
$ rm -rf /tmp/USERNAME
```