BasicOS
Basics
of
Operating
Systems

Eurecom Dictionary

General overview

The objective of this project is to complete the C code of a dictionary management system. This system is built upon two components:
  • A file, storing words (in alphabetical order, or not),

  • A C program called words, performing operations on the dictionary. This program reads arguments from the command line, including commands to be performed on the dictionary (e.g., looking for a word), and returns an answer in the terminal. In your C code, you are asked to manipulate files using only system calls, e.g., with open(), read(), write() and close().

System specification

Overview

A dictionary is first stored on disk in a text file. Let's assume we have a dictionary file named words.txt. This text file contains a list of English words: one word per line. They don't have to be classified in alphabetical order. A program, called words, that you ought to write, takes as input two references:
  1. A reference to a "command file" containing a list of commands, one command per line: these commands are to be executed by words. Valid commands are given later.

  2. A reference to a dictionary file (e.g., to words.txt)

For instance, a typical use of words is:
$ ./words fileOfCommands words.txt

Actually, in the file repository (gitlab) we provide as input (see below), the dictionary file is in the subdirectory data, words is expected to be generated in bin by the Makefile, and we provide examples of command files in examples. Thus, in our file repository, a valid execution of words looks like:
$ cd bin&&./words ../examples/commands1 ../data/words.txt
Do add your own examples in order to debug your application as explained below.

Commands

The commands to implement are given in the following table. We assume that a dictionary named D has been provided as the second argument to words. Do use exactly the specified format for the outputs since we will check your outputs with our own tests. If you want to put extra information, do use a -debug options that activate extra outputs. A command must return either a number, ok, true, false, or error.
$ ./words -debug fileOfCommands words.txt


Command Action on DOutput (if no error)
add word1 word2 word3 ... adds words to Dok
remove word1 word2 ... removes words from Dok
size Total number of words in D (int)
advancedsize ab Total number of words starting with "ab" in D (int)
save filename saves D in "filename" in alphabetical orderok
search word prints true if the word was found, false otherwise
advancedsearch1 w?r?d prints true is at least one word was found, false otherwise
helphelp on commands

For each command that could not be executed because of an error, "error" is output to the terminal. It may be optionnaly followed by a message, e.g., "error: memory allocation problem". Any other error or information message you may decide to use in your program must be sent to stderr (and not to stdout). The maximum size for a command, including its parameters, is 10000 characters.

Implementation

Valid characters for words

All valid characters are the one present in words.txt. This includes:
  • Uppercase and lowercase letters
  • Numbers 0-9
  • ! & ' , - . /
We assume that the alphabetical order is the one of the ascii table.
$ man ascii

You can easily find this valid list of characters by running this bash command on the words.txt file:
$ grep -o . data/words.txt | sort -u
Or: another way to do:
$ cat data/words.txt | fold -w1 | sort | uniq 

Compilation and execution

Your implementation must be compilable and runnable on the Linux PCs in the laboratory rooms: these PCs will be used to evaluate your work. So, even if the project compiles on your own compuyter, be sure that it also compiles on Eurecom PCs. We will evaluate the compilation of your code, and then the execution of your program using the tests we provide with the project. The performance of your implementation is very important as well. Thus, for the provided examples, we will evaluate how much time the execution of your program takes.

We have prepared a C environment (Makefile, libraries, include) for the project in the following git: https://gitlab.eurecom.fr/ludovic.apvrille/basicos_project_fall2025_forstudents. Clone it as follows:
$ git clone https://gitlab.eurecom.fr/ludovic.apvrille/basicos_project_fall2025_forstudents.git
 
Once cloned, have a look at the README.md file at the root of this git repository. Using the following command, we can compute how much time your program takes to execute commands given in the commands1.cmd file:
$ time make example1
We will also run the provided tests on Eurecom PCs. The reference machine will be "eurecom10". Tests are meant to evaluate whether your program executes commands as expected, and also how much time it takes to execute. The time command returns how much time was used in user mode and in kernel mode by your program when executed. Test execution returnw whether your program is correct (PASSED / FAILED). If the test is passed, then how much time your program takes with regards to the time we have measured. 90\% means than your program takes 0.9 * our time to execute. 120 means that your programs takes 1.2 * our reference time. We provide 10 tests, that you can start as follows.
$ make test



Bonus

  • advancedsearch2. Extend the advancedsearch1 command as follows. When no words were found, it still returns false. But now, when it returns true, you should print in the same line all identified words, separated by a space. There must also be a space between true and the first word.

  • Command Action on DOutput (if no error)
    advancedsearch2 w?r?d true + list of words if is at least one word was found, false otherwise
  • advancedsearch3 command: this new command supports "?" and "*" when looking for words. "*" matches zero or more characters. For instance: "h*l?" should match "hello" and "hell" (and probably many others).

  • Command Action on DOutput (if no error)
    advancedsearch3 w*r?d true i+ list of words if at least one word was found, false otherwise.

  • Implementing a command to merge a referenced dictionary file with the opened dictionary.

  • Command Action on DOutput (if no error)
    merge refToFile add all words of refToFile to Dok
  • Automatically compressing/uncompressing dictionary files.

For each bonus, you have to provide relevant tests proving that your implementation works as expected.

Deadlines, deliverables and organization

Program the specified system, and provide the sources files in C, Makefile and one report in markdown format, by Dec. 11th, 2025, 18h00 CET. Advices on how to write a report are given here. Everything should be available in the gitlab of your project. No commit after this date will be considered. If your code does not compile, don't expect to have a grave over 5. In your gitlab, commit only source files, never commit what can be obtained by compilation.

All members of the group must work on the general definition of the dictionary handling (loading in a data structure), and the implementation of the core functions (command analysis). Then, the implementation of commands must be split between the members of the group, as follows:
  • Members #1 (leader): data structure management and size

  • Members #2: remove and advancedsize

  • Members #3: add and save

  • Members #4: search and advancedsearch1

Grading

10 points are given to the whole group, 10 points are given to each student.
  • 5 points for the report. The report covers the work of the whole group, but must emphasize on the work done by each group member. A few instructions on how to write a good report.

  • 5 points for the global approach: global algorithm, common code, regular progression on the project.

  • 10 individual points for personal implementation. This grade is individual. If your implementation in C is not ready by the deadline, provide the corresponding algorithm in pseudo code: you can get up three points with a pseudo-code.

How to proceed?

Making groups

  1. Form 17 groups of 4 students and one group of 3 students. Once your group has been established, I would like the designated group leader to send me an email providing the full names and email addresses of all the group members. In the subject line of your email, please include the phrase "[BasicOS] New group". Upon receiving your email, I will confirm your group's formation, assign you a group number.

  2. Setup a gitlab projet (use gitlab.eurecom.fr for this) called EXACTLY basicos2025-teamXX (With XX your group number, for group 1: basicos2025-team01) and send me the link to your gitlab: I will follow your progression that way. Do allow me to clone your gitlab project, as well as Sophie Coudert and Yahya Frioui.

How to work?

  1. First sessions. Understand the repository we provide (.c and .h files, and the Makefile). Clearly define the general structure of your project.

  2. First and second sessions: program together the main framework: handling of arguments, handling of main commands, handling of errors.

  3. Third session: individually program the commands you are meant to implement. Make individual tests.

  4. Fourth session: integrate together all commands, and check that all the tests you provide run: a part of your grade is based on this. Improve program performance. Provide at least 5 more tests with more that 10 commands each (could be the same command to intensively test them).

  5. Your git must be ready (=STRICT DEADLINE) the 11th of December, 18h00, Paris time.

Structure of your gitlab

And do not forget to regularly commit&push on the git. Working in a regular way is part of the final grade.