Page not found :(
+The page you are looking for doesn't exist or has been moved.
+diff --git a/404.html b/404.html new file mode 100644 index 0000000..6a71881 --- /dev/null +++ b/404.html @@ -0,0 +1,195 @@ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +The page you are looking for doesn't exist or has been moved.
+Computer Science and Communication Systems
+ +first deadline (Sunday 05/05 23:59): steps 1 to 3 (weeks 7, 8 and 9)
+ + +Project finalization and delivery
+second deadline (Sunday 02/06 23:59): the whole project
+ + +Regarding the project, you'll have two deadlines with deliverables:
+an intermediate evaluation, to be delivered at the end of week 10, corresponding to weeks 7 to 9 (project steps 1 to 3); this part will be graded for 30%, but, since it will be graded again, with improvement, in the final grading step, the actual overall weight of this part is ∼52%;
+the final, end of week 14; weighted 70%; this final deliverable includes also the first part, which may be corrected based on first evaluation feedback.
+Each of those two grading steps will be evaluated based on:
+We also will pay attention to the regularity of your workflow
+(commit
on GiHub).
This work (project) has to be the own work of the corresponding group +(pair of students). No other help is allowed, be it human or artificial. +The code delivered has to be your own (except the one provided by us).
+The above warning includes both getting external help as well as providing +help (this is also part of cheating), directly or indirectly. This +thus also include being sure you to not make you code publicly +available in any manner.
+Any plagiarism, in whatever form, will be considered as cheating and will be handled accordingly, including informing the EPFL Legal Affairs.
+We designed that project to correspond on average to 4 hours of work per student per week (i.e. 8 hours of work per group). We'd like to insist on these two aspects:
+VERY IMPORTANT: +As any homework without any time constraint, the "danger" is to work too much on it, more than expected from us, trying to reach the best possible complete code regardless of the amount of time spent. This is not the proper way to handle homeworks, especially big projects: rather than trying to do perfect final (whole) project in an infinite time, try to do want you can in the amount of time you planed to dedicate to it. We do not mean that you have to deliver bad code, but you can deliver good enough code on a decent fraction of the project (e.g. do 75% of the project at a 80% quality level -- rather than doing 100% of the project at at 33% level [bad grade], or spending an indecent amount of time to reach 100% level (on 100% of the project)).
+DO NOT HESITATE TO COME TO US AND TALK ABOUT IT!
+A good way to reduce the workload is to:
+A bad example of points 2 and 5 above: some groups in the past completely recoded some functionalities that are present in the C library (e.g. string functions).
+In order to help you handling your workload and priorities (and also see if we don't fool ourselves), we ask you to weekly commit a CSV file time.csv
(in the done/
directory) counting the total number of hours (decimal number) you spent on the project for the corresponding week (sum for the two students).
+This is absolutely not to control you; your really can, without any penalty nor judgment on our side, put a 0 on some week if that was the case. This is only a tool for you, as a kind of "compass", we put in place following suggestions of former students.
The format for this file is very simple:
+one line for each week, starting with the week number and then, comma separated (CSV), the total number of hours (in decimal) spent by the group on the project that week, regardless of the handout number (I mean, if in week 8 you are still working on the week 6 of the project, count this in week 8, the week you do the work, not the week number of the handout you're doing).
For example:
++ 6,2.75 + 7,3.5 ++ +
Welcome in the project part of the CS202 course!
+This project is organized as follows:
The objectives of the project are:
+to concretely illustrate several aspects of the lectures;
+to let you develop a real system application in C (with files, pointers, sockets, threads, external library calls, ...);
+to let you practice usual development tools, among which: control version systems (git
), manpages, make
, debugger (gdb
);
to teach you how to use system (or external) libraries;
+to practice (a bit of) refactoring.
+This week, we will setup and learn several tools that will be useful for the project:
+The aim of this first week is to guarantee that you are ready to start with the project; that you have the proper working environment to do so. It is really important that this objective is fulfilled before the actual start of the project (week 7 of the semester). Do not hesitate to come to us for help.
+Concretely, what we expect you to do this week is to:
+make
);For this project, you have to work on Linux. For this you can either:
+No other OS will be supported (nor accepted) for this project.
+In addition to the standard C development framework (editor, compiler, debugger), you'll nee the following tools (sudo apt install <package>
if you are on your own Debian/Ubuntu-like computer):
git
(this is the package name to be installed);openssh-client
;manpages
and manpages-dev
;doxygen
if you want to automatically produce the documentation out of your source code;graphviz
to see the graphs generated by Doxygen;libssl-dev
: some cryptographic function we will use to compute "hash"-code of images;libvips-dev
to process images from C code;libjson-c-dev
to process JSON content from C code.You certainly already know Git and GitHub (not to be confused!), maybe from some former classes. This is just a quick recap, or a gentle introduction if you don't know them yet.
+GitHub is one of the public servers to offer Git services. Each student will first receive a personal repository on GitHub for this first step (warm-up); then each group (pair of two students) will also get another repository for its core project (this will be explained later in the semester).
+The first thing to do is to have a GitHub account. If you don't have one yet, create it by registering here. A free account is more than enough for this course.
+(if you already have a GitHub account, please use it for this class).
Then, once you have a GitHub account, join the first assignment, here: https://classroom.github.com/a/WG78CBVj.
+GitHub Classroom may ask you the right to access your repositories:
+ +then to join this first assignment (click on YOUR SCIPER NUMBER; please don't use someone else SCIPER!):
+ +and then to create a GitHub repository for that first assignment:
+ +Once all this done, you should receive a message from GitHub that you joined the "Warm-up" assignment and that you have a new repository cs202-24-warmup-YOUR_GITHUB_ID
, the URI of which looks like:
git@github.com:projprogsys-epfl/cs202-24-warmup-GITHUBID.git
+
+To be able to clone this repository, you have to add your SSH public key (the one of the computer you are using) to GitHub.
+If you don't have any SSH key yet (on the computer you are using), you can generate one with:
+ssh-keygen
+
+Copy then the content of the file ~/.ssh/id_rsa.pub
into GitHub SSH public keys.
NOTE: you can also use https
URI rather than SSH:
https://github.com/projprogsys-epfl/cs202-24-warmup-YOURID.git
+
+but then you'll have to authenticate each time (each command).
+It's not the purpose of this class to teach you Git, nor to present all its details. The purpose of this section is to provided you a short description on the necessary commands:
+git clone [REPO_URI]
git pull
git add [FILE]
git commit -m "Commit message"
git push
git status
git tag
For each command, you can get help from git
by doing:
git help <COMMAND>
+
+In case you need a recap on git, either go to your former material (e.g. CS-214 if you took it), or see this complementary recitation page (in French).
+(If you have received the confirmation email from GitHub) +Now go and get the content of this warm-up assignment:
+git clone REPO_URI
+
+This will create a local directory on your computer, named like cs202-24-warmup-YOURID
(with your GitHub ID at the end).
Go into that directory:
+cd cs202-24-warmup-YOURID
+
+You should find two sub-directories: done
and provided
. This is how we will proceed for the project:
provided
sub-directories; THIS SUB-DIRECTORY (provided
) SHALL NOT BE MODIFIED (by you);done
directory; (incrementally) copy the necessary files from provided
to done
and then proceed.Before moving on, let us recap that the manpages are THE reference documentation in Unix world.
+You can read them with the man
command (they can also be read on line).
The first manpage to read (maybe not in whole ;-)
, but at least have a look at it) is the manpage of the man
command itself:
man man
+
+Use the space bar to move forward, 'b' to go backward and 'q' to quit. Type 'h' to get more help.
+man
actually uses another command, a "page viewer". In most of the modern Unix systems, this page viewer is less
(replacing former more
command!). Thus maybe the second manpage to read is the one of less
:
man less
+
+On of the first function you have dealt with in C was printf()
. Let's try to see its manpage:
man printf
+
+Hmm?... This does not seem to be the right printf
...
+If you have a "PRINTF(1)
" on the very top of the page, this is indeed not the expected C printf()
function.
There can indeed be several manpages with the same "title". To mark the difference, the manpages are organized in "Sections". Go and read the manpage of man
again if you missed that information:
man man
+
+To go to the desired printf
manpage, we have to look for the one in "section 3". Try to do it by yourself (maybe read the manpage of man
once again).
And don't forget to use man
in the future, whenever needed!
The aim of this first exercise is to continue setting up your environment to be able to properly code the project.
+In the coming project, we will make use of several libraries (as explained in the introduction). Let's try here to use the first one:
libssl
; this library offers cryptographic functionalities (see man ssl
); we will use it to calculate the hash ("SHA code") of some images.If you work on your own Linux (not on EPFL VMs), and you didn't install it yet, please install libssl-dev
:
sudo apt install libssl-dev libssl-doc
+
+In the provided
sub-directory you find a file sha.c
. First copy it to your done
and work there:
cd done
+cp ../provided/sha.c .
+
+To compile it, you need to add the ssl
can crypto
libraries. This is done by adding the -lssl
and -lcrypto
flags; e.g.:
gcc -std=c99 -o sha sha.c -lssl -lcrypto
+
+If everything is properly installed, the above compilation should succeed and you should have a ./sha
program in your done
sub-directory. This exec does not much for the moment as its main part is still missing. This is what you have to add now.
A "SHA code", or "SHA" (which stands for "Secure Hash Algorithm"), is a compact representation, almost certainly unique, and hardly invertible (reciprocal), of any data. More concretely:
+compact: whatever data, whatever their length, will be represented by the same amount of bits; it this project, we will use 256 bits ("SHA256");
+for example, the SHA256 of "hello" (without newline, nor quotes) is
+2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
;
almost certainly unique: different data will most probably have two different SHA; this is not guaranteed (there are "only" 1077 different SHA256), but highly probable: with 1035 different data, the probability to get the same SHA is 10-6; +when two different data have the same SHA, this is called a "collision";
+hardly invertible : from a SHA code, it's extremely difficult (= impossible in practice) to guess its corresponding data; one consequence of that is that a small variation in the data leads to a completely different SHA; for instant, the SHA256 for "hello!" is
+ce06092fb948d9ffac7d1a376e404b26b7575bcc11ee05a4615fef4fec3a308b
+(to be compared to the one of "hello" above) and the one of "hello\n" (i.e. with a newline) is
+5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
.
The provided code (sha.c
) compiles but does nothing really interesting. Actually there is no computation of the SHA256 of the input; no call to the SHA256()
function from the libssl
.
Have a look at how to use this function:
+man SHA256
+
+(if you have the manpages for this library installed on your computer; otherwise, read it online.
+Add, where indicated by "TODO
", a call to compute the SHA256 of the input string.
Example:
+If everything is properly done, you should get:
+Type a sentence: Hello world!
+The SHA256 code of
+"Hello world!
+"
+is:
+0ba904eae8773b70c75333db4de2f3ac45a8ad4ddba1b242f0b3cfc199391dd8
+
+You can also debug with the "hello" string given above (with the newline!).
+The source code of real-life applications written in C is often distributed over several text files called "source files", which are "glued together" by the compiler to create an executable program from the source code. This way of proceeding is called "modularization" and is detailed
+in those video lectures:
+the slides of which are on Moodle.
+Choose your favorite learning way (maybe benefit from both).
+We strongly recommend you go through this material before moving on.
+The objectives of this section is that
+Makefile
s;Makefile
;Makefile
s.We don't expect you to write your own Makefile
s from scratch, neither to master all the arcane details (while reading).
Makefile
sGo ahead with the above mentioned tutorial. +Follow (and understand) examples 1 and 2, then exercises 1 and 2, and then 3.
+The sub-directory bigprj
contains a "big" project for which we propose you to write its Makefile
.
IMPORTANT REMARK : the code provided in bigprj
is under copyrights and shall not be copied nor reused anywhere else, neither in total nor any piece of it.
+(It's furthermore quite bad code and is thus not at all a good example of good practice.)
+It's there only for you to learn Makefile
s by trying to write one to compile this project.
That code uses sub-directories and one C99 function (roundf()
).
To tell the compiler to search for header files in some sub-directories, add an -I
option per sub-directory. For example, when compiling machin.c
, to tell the compiler to search for a header file in the stuff
sub-directory, you would do :
gcc -c -I stuff machin.c -o machin.o
+
+(See also the CFLAGS
variable in the make
tutorial.
To compile to the C99 standard (or higher), pass the -std=c99
(or -std=c17
or -std=c2x
) option to the compiler (see the CFLAGS
variable).
Finally, to be able to use C99's roundf()
function, you need to link to the math library by adding -lm
.
(See also the LDLIBS
variable in the make
tutoriak.)
Notes:
+The provided code compiles with many "warnings". We do not ask you to fix these errors, but simply to write a Makefile
that produces a hand
executable.
You don't need to run it. If you have, you can simply quit it by typing Ctrl-C
.
In this last part, we'd like you to practice debugging of C code. This is very important to master debugging before going deeper into the project. Otherwise, without a good practice of debugging, you'll really loose lots of times.
+For those who took the CS-214 Software Construction class, also remember/review all the methodology, good practices, your learned there about debugging. Maybe have a refresh of that material first.
+To help you find faults in code (especially in your own code later on in the project; think about it!), there are several tools available:
+compiler options
+static code analysis
+dynamic memory analysis;
+and, of course, debuggers.
+The compiler is a great help when you know how to use it and interpret its messages.
+Its behavior, more or less verbose, can be modified using compiler options, the most useful of which are detailed here.
+In the same spirit (using the compiler to find errors), it can also be useful to use different compilers (with the options below) on the same code, as they don't necessarily detect the same things. On VMs, you have gcc
and clang
.
The first thing to do is to specify the standard used (as there are many non-standard "dialects"). This is done with the -std=
option. We recommend -std=c99
or -std=c17
. To stick strictly to the specified standard (and reject the associated "GNU dialect") add the -pedantic
option.
Then, it can be useful to let the compiler warn us with lots of the usual warning. This is done with the -Wall
option (like "all warnings", even if they're actually not all there ;-)`).
For even more warnings, add -Wextra
.
And here are a few more that we think are worth adding (you're free not to if you find them too fussy):
+-Wuninitialized
: warns of uninitialized variables;
-Wfloat-equal
: warns of equality tests on floating-point numbers;
-Wshadow
: warns if one name masks another (risk of scope problems);
-Wbad-function-cast
: warns of incorrect function return type conversion;
-Wcast-qual
: warns of pointed type conversion that removes a qualifier (typically const
);
-Wcast-align
: warns of pointed type conversions that do not respect memory word alignment;
-Wwrite-strings
: warns of (risk of) confusion between const char *
and char *
;
-Wconversion
: warns of implicit type conversion;
-Wunreachable-code
: warns of useless (unreachable) code;
-Wformat=2
: increases the level of format warnings (such as printf
and scan
) compared to -Wall
;
-Winit-self
: warns of recursive initialization (such as int i = 3 * i;
);
-Wstrict-prototypes
: warns of function declarations without arguments;
-Wmissing-declarations
: warns of functions defined but not prototyped; this can be useful for detecting the omission of a prototype in a .h
(or the omission of a #include
).
Finally, you can of course add other options if you feel they are useful. As usual, check out the "man pages" for more details.
+The static code analyzer is a tool that tries to find errors in code by "imagining" all possible execution paths. The scan-build
(and scan-view
) analyzer is available on VMs. It is used by simply adding scan-build
in front of the build command, e.g. :
scan-build make
+scan-build make cecicela
+scan-build gcc -o monexo monexo.c
+
+The easiest way is to try :
+scan-build gcc stats.c -lm
+
+This command tells you (at the very end) to look at its analysis using scan-view
, e.g. :
scan-view /tmp/scan-build-2024-01-17-175346-107146-1
+
+(but this file name changes every time).
+We'll let you have a look at what it found...
+See this tutorial for instructions on using the gdb
debugger.
This tutorial takes as its example the first program you'll have to hand in (stats.c
), but we encourage you to try your hand at the other codes too (ex1.c
and ex2.c
) and to go back and forth between this topic and the tutorial in question (rather than reading it linearly and then continuing with this topic).
To find an error efficiently, we suggest the following general tips (other more job-specific tips are also provided below):
+try to correct only one bug at a time;
+always start with the first error;
+isolate/identify the bug in a reproducible way: always retest with exactly the same values each time;
+apply the following methodology (it may seem trivial, but all too often we've seen students waste their time looking for bugs in the wrong place because one of the following 2 "dots" was not placed on the right side; often due to over-strong assumptions (wrong guesses) or wrong/too-fast deductions):
+always have 2 clear places (2 "dots") in your code:
+one place where you are absolutely sure that the bug has not yet occurred (e.g. the very beginning of the program);
+and another where you are absolutely sure that the bug has occurred (e.g. the point where the program crashes, or simply the end/beginning of the program);
+move (advance/reassemble) the most promising of these two points, being sure not to "cross over" to the other side of the bug; check this aspect ("not cross over") with certainty;
+at the end of this process (of dichotomous searching in fact), the two "points" will be exactly on the spot of the bug.
+if you're searching for bugs using display messages (printf()
) :
always put a \n
at the end of each message;
mark the beginning of each of your debugging messages with a clear identifier reserved only for this purpose (e.g. "####
"); this allows you :
to easily see these messages in the program output;
+find them easily in your code to edit/delete later;
+have a unique part in each message (e.g. "debug(1):", "debug(2):", "debug(3):", etc., or "here i=", "here j=", "here k=", etc.; you can of course combine);
+having this discipline with debugging messages may seem like a waste of time (especially when you're looking for the bug), but, believe me, it actually saves a lot of time in the end!
+Here are two exercises to help you get to grips with gdb
.
Look at the ex1.c
code to get an initial idea.
Then compile it for the debugger (either by hand or by making a small Makefile
).
Run it in the terminal to see what it produces.
+Then use the debugger to determine the values of d0, d1 and d17 :
+set one or more well-placed breakpoints
+try the commands :
+step
and next
;
continue
(abbreviated cont
), followed, or not, by a number;
print
and/or display
.
You can also try advance
and finish
.
NOTES:
+you can see the syntax and explanation of a command in gdb
by using help
followed by the command; e.g. :
help adv
+
+you can list all your breakpoints via :
+ info break
+
+Take a look at the ex2.c
code to get an initial idea. The aim of this code is to calculate the entropy of a given distribution by its frequencies (= integer counts).
Some examples (useful for debugging):
+the entropy without any count is 0, no matter how you enter it:
+0
+0 0 0
+0 0 0 0 0 0
+etc;
the entropy of any 1-value distribution is 0, however you enter it:
+0
+0 1 0
+0 12 0
+0 0 12
+0 0 0 33 0 0 0
+etc;
the entropy of any distribution with 2 equiprobable values is 1 bit, regardless of how it is entered:
+1 1
+0 1 1 0
+0 12 0 12
+etc;
the entropy of any distribution with 3 equiprobable values is 1.58496 bit, however you enter it;
+the entropy of the distribution :
+1 2 3 4 5
+is 2.14926 bit.
+The code provided contains several errors. Try to find them using the debugger: breakpoints, next, cont, display, etc.
+You can even start by running the code directly in gdb
, typing run
.
+then enter :
1 0
+
+and see what happens.
+To locate the error in the call stack, do :
+where
+
+To see the code :
+layout src
+
+To navigate the call stack :
+up
+down
+
+Give it a try...
+Note: there are four things to be corrected.
+All the above tools (compiler options, static code analysis, dynamic memory analysis (when you have pointers) and debugger) will help you to be more efficient in your project. We therefore ask you to start using them to correct the stats.c
code provided.
Try to fix entirely the stats.c
file provided, whose purpose is to calculate the mean and standard deviation (unbiased) of the age of a set of 1 to 1024 people (beware! it contains several errors, of different kinds; there are, however, no errors of a mathematical nature: the formulas are mathematically correct; but note however that the standard deviation of a population reduced to a single individual must be zero).
The first thing to do might be to complete your Makefile
so that it can produce stats
with information useful to the debugger (option -g
). You could also take the opportunity to turn on the compiler's warnings and look in detail at what it's telling you and, above all, understand why it's telling you.
Once the program has compiled, if possible without warning, here are 3 ways to go further in correcting the program:
+test values that are outside the expected limits, and see if the program reacts as you'd expect. For example: what happens if you enter a negative number of people? a negative age?
+calculate by hand the mean and standard deviation (following the provided formula!) of a small sample and compare them with the output of your program. If there are differences, use the debugger to find out where they come from;
+remember to test all the limiting cases of these formulas.
+When editing the sha.c
file, you may have noticed that it is commented (always comment your programs!), in a rather peculiar format ("what's with the @
?").
It's not just for show; it's also useful!
+Type :
+doxygen -g
+doxygen Doxyfile
+
+then view the file html/index.html
in your favorite browser.
+Click on "Files", then on "sha.c".
+Cool, isn't it?
Clean it up with the command
+rm -r latex html
+
+In future, remember to document your code with Doxygen-compatible comments.
+Examples will be provided, but if you want to know more, have a look at the Doxygen website.
+IMPORTANT REMARK: For the project, make your code anonymous: don't put any author's name, SCIPER, email, etc.
+ +The aim of this week is to:
+So start by reading the main project description file to understand the general framework of the project. Once you've done that, you can continue below.
+In your group's GitHub repository, you will find the following files in provided/src/
:
imgfs.h
: function prototypes for the operations described here;imgfscmd.c
: the core of your "Filesystem Manager", the command line interface (CLI) to handle imgFS
; it reads a command and calls the corresponding functions to manipulate the database;imgfs_tools.c
: the tool functions for imgFS
; for example to display the data structure;imgfscmd_functions.h
and imgfs_cmd.c
: prototypes and definitions of the functions used by the CLI;util.h
and util.c
: macros and miscellaneous functions; you do not need to use them (have a look to see if some may be useful);error.h
and error.c
: error code and messages;Makefile
containing useful rules and targets;provided/tests/{unit,end-to-end}/
;provided/tests/data
.To avoid any trouble, the contents of the provided/
directory must never be modified!
Start by copying the files you need from provided/src/
into the done/
directory at the root of the project and registering it in git (git add
); for instance:
cp provided/src/*.h provided/src/Makefile provided/src/imgfs*.c provided/src/util.c provided/src/error.c done
+git add done
+
+You'll proceed similarly in the next weeks, whenever you'll need new files from provided/src
.
The provided code does not compile; some work is still required, in the following steps (which are further detailed below):
+imgFS
;do_open()
and do_close()
;do_list()
;do_list_cmd()
.After reaching that point, the code should compile without errors. You will then have to test it.
+An example usage of the CLI (the name of which is imgfscmd
) is:
./imgfscmd list empty.imgfs
+
+where list
is a command provided to the CLI and empty.imgfs
is an argument for that command, here simply an ImgFS file (thus a file containing a whole filesystem).
Important Note: writing clean code, readable by everyone is very important. From experience, it seems that not everyone does this spontaneously at first ;-)
. There are tools that can help. For example, astyle
is a program designed to reformat source code to follow a standard (man astyle
for more details).
We provide you with a shortcuts (which uses astyle
): see the target style
in the provided Makefile (make style
to use it). We recommend you do a make style
before any (Git) commit.
The exact format of the header
and metadata
is given in the global project description. The types
struct imgfs_header
;struct img_metadata
;struct imgfs_file
;are to be defined in replacement of the "TODO WEEK 07: DEFINE YOUR STRUCTS HERE.
"" in imgfs.h
.
The second objective of this week is to process the arguments received from the command line. For modularization purposes, we will use function pointers.
+To achieve this, the signatures of the functions do_COMMAND_cmd()
(and help()
) are uniform:
int do_COMMAND_cmd(int argc, char* argv[])
+
+Those functions will handle the parsing of their respective additional arguments, while the main()
dispatches through them using the first CLI argument.
To process all the different commands, we would like to avoid an "if-then-else" approach. Indeed, this would make adding new commands (which will arrive in the following weeks) more difficult, since it would require to add new cases for each of them. It would also make the code much less readable.
+To avoid that, we put the various do_COMMAND_cmd()
(and help()
) functions in an array. We will take advantage of this to associate the names of the commands with their respective functions (e.g. the string "list"
with the do_list_cmd()
function), and then simply add a loop to the main()
function, to search for the received command among the list of possible commands -- for the moment, "list"
, "create"
, "help"
and "delete"
-- and call the corresponding function.
In imgfscmd.c
:
command
type, a pointer to functions such as those unified above;struct command_mapping
type containing a string (constant) and a command
.Then use these definitions to create an array named commands
associating the commands
+"list", "create", "help", and "delete" to the corresponding functions.
+Note: The "create"
, "help"
and "delete"
commands are not yet implemented, but you can already add them to the array.
Finally, complete the main()
using this array inside a loop. When the right command is found, simply call the function pointed to in the corresponding array entry, passing all the command line arguments.
For example, if you call the program
+./imgfscmd list imgfs_file
+
+then your code must call do_list_cmd()
with the following parameters: argc = 1
and argv = { "imgfs_file", NULL }
.
Your code must correctly handle the case where the command is not defined: in this case, simply call help()
and return ERR_INVALID_COMMAND
.
Your code can perfectly well assume that all commands in the commands
array are distinct.
do_open()
and do_close()
Now, we will implement the functions to open and close existing imgfs
files.
You need to write the definitions of do_open()
and do_close()
in the file imgfs_tools.c
.
The do_open()
function takes as arguments:
const char *
);const char *
, e.g. "rb"
, "rb+"
);imgfs_file
structure in which to store read data.The function must
+The function should return the value ERR_NONE
if all went well, and otherwise an appropriate error code in case of problems. You need to handle all possible error cases in this function, using the definitions in error.h
(see unit tests below).
+Note: to check the validity of a pointer given as parameter, you can use the macro M_REQUIRE_NON_NULL(ptr)
, which will make the function return ERR_INVALID_ARGUMENT
if ptr == NULL
(see util.h
).
The do_close()
function takes a single argument of structure type imgfs_file
and must close the file and free the metadata array. It returns no value. Here too, remember to handle the possible error case: if the file (FILE*
) is NULL
. This should be a reflex when you're writing code, especially when you're using a pointer. We won't mention it again.
do_list()
Then create a new file imgfs_list.c
to implement the do_list()
function. If output_mode
is STDOUT
, the purpose of do_list()
is first to print the contents of the "header" using the supplied print_header()
tool function, and then to print (examples below)
either
+<< empty imgFS >>
+
+if the database does not contain any images;
+or the metadata of all valid images (see print_metadata()
, provided in imgfs.h
).
The case output_mode == JSON
will be implemented later in the project; you may just call TO_BE_IMPLEMENTED()
in this case (see util.h
).
Warning: there may be "holes" in the metadata array: one or more invalid images may exists between two valid ones.
+do_list_cmd()
In order to be able to use the do_list()
function from the command line, implement the do_list_cmd()
function in imgfscmd_functions.c
, which receives the command line arguments as parameters (as explained before).
The first element of the array is the name of the file containing the database. After checking that the parameters are correct, open the database and display its contents, using the above functions.
+To make it easier to understand the various functions described above, a few examples are given here. These examples are +in the provided tests (see below).
+It's best to start testing your code on simple cases that you're familiar with.
+You can test your code with the supplied .imgfs
files: the command
./imgfscmd list ../provided/tests/data/empty.imgfs
+
+should display (exact file here):
+*****************************************
+********** IMGFS HEADER START ***********
+TYPE: EPFL ImgFS 2024
+VERSION: 0
+IMAGE COUNT: 0 MAX IMAGES: 10
+THUMBNAIL: 64 x 64 SMALL: 256 x 256
+*********** IMGFS HEADER END ************
+*****************************************
+<< empty imgFS >>
+
+while
+./imgfscmd list ../provided/tests/data/test02.imgfs
+
+should display (exact file here) :
+*****************************************
+********** IMGFS HEADER START ***********
+TYPE: EPFL ImgFS 2024
+VERSION: 2
+IMAGE COUNT: 2 MAX IMAGES: 100
+THUMBNAIL: 64 x 64 SMALL: 256 x 256
+*********** IMGFS HEADER END ************
+*****************************************
+IMAGE ID: pic1
+SHA: 66ac648b32a8268ed0b350b184cfa04c00c6236af3a2aa4411c01518f6061af8
+VALID: 1
+UNUSED: 0
+OFFSET ORIG.: 21664 SIZE ORIG.: 72876
+OFFSET THUMB.: 0 SIZE THUMB.: 0
+OFFSET SMALL: 0 SIZE SMALL: 0
+ORIGINAL: 1200 x 800
+*****************************************
+IMAGE ID: pic2
+SHA: 95962b09e0fc9716ee4c2a1cf173f9147758235360d7ac0a73dfa378858b8a10
+VALID: 1
+UNUSED: 0
+OFFSET ORIG.: 94540 SIZE ORIG.: 98119
+OFFSET THUMB.: 0 SIZE THUMB.: 0
+OFFSET SMALL: 0 SIZE SMALL: 0
+ORIGINAL: 1200 x 800
+*****************************************
+
+Note: you may compare your results by using:
+./imgfscmd list ../provided/tests/data/test02.imgfs > mon_res_02.txt
+diff -w ../provided/tests/data/list_out/test02.txt mon_res_02.txt
+
+More details: man diff
.
The provided test suites require several dependencies: Check and Robot Framework (and its own dependency, parse). On (your own) Ubuntu, you can install them with:
+sudo apt install check pip pkg-config
+
+then, depending on how you're used to work in Python, either as root or in your Python virtual environment (maybe to be created):
+pip install parse robotframework
+
+(Of course you'll have to run the tests in that Python venv, if that's your usual way to work with Python.)
+ON EPFL VMs, you have to setup a personnal Python virtual environment.
+If you already have one, activate it and install the two above mentioned packages (parse
and robotframework
).
It you don't, we recommand you create your personnal Python virtual environment in myfiles
:
cd ~/Desktop/myfile
+python -m venv mypyvenv
+cd mypyvenv
+cp -r lib lib64 ## this fixes the first warning
+cd ..
+python -m venv mypyvenv
+
+Ignore the (second) warnings.
+Then activate it:
+source mypyvenv/bin/activate
+
+and then install the required packages:
+pip install parse robotframework
+
+And you're done.
+The only thing you'll have to do next time you login and you want to run the "end to end" tests, is to activate your Python virtual environment:
+source ~/Desktop/myfiles/mypyvenv/bin/activate
+
+Of course, you can also add that to your ~/.bashrc
!
We provide you with a few tests to run against your code by using make check
, both unit tests (testing functions one by one) and end-to-end tests (testing the whole executable at once).
We strongly advise you to complete them by adding you own tests for edge cases; the imgFS
files are in provided/test/data
. You can check the unit tests in provided/test/unit
and the end-to-end ones in provided/test/end-to-end
to understand how to write your own.
+Note: Don't forget to never push modifications in the provided/
directory; instead move the test/
directory to done/
and update the TEST_DIR
variable in the Makefile
accordingly.
We also provide a make feedback
(make feedback-VM-CO
if you're working on EPFL VMs) which gives partial feedback on your work. This is normally used for a minimal final check of your work, before handing it in. It's better to run local tests directly on your machine beforehand (including more tests you've added yourself, if necessary).
The Docker image used by make feedback
will be tagged latest
every week, but if you want to run feedback for a specific week, change (in the Makefile
at the line that defines IMAGE
) this latest
tag to weekNN
where NN
is the desired week number, e.g.:
IMAGE=chappeli/cs202-feedback:week07
+
+It's up to you to organize the group work as best you can, according to your objectives and constraints; but remember to divide the task properly between the two members of the group. +If you haven't already read it in full, we recommend you read the end of the foreword page.
+You don't have to formally deliver your work for this first week of the project, as the first deliverable will only be due at the end of the week 10 (deadline: Sunday May 5th, 23:59), together with weeks 8 and 9 work.
+Having said that, we strongly advise you to mark with a commit when you think you've completed some part of the work and especially once you reached the end of this week (you can do other commits beforehand, of course!):
add the new imgfs_list.c
file to the done/
directory (of your group GitHub repository; i.e. corresponding to the project), along with your own tests if required:
git add imgfs_list.c
+
+also add the modified files (but NOT the .o
, nor the executables!): imgfs_tools.c
, imgfs.h
and maybe Makefile
:
git add -u
+
+check that everything is ok:
+git status
+
+or
+git status -uno
+
+to hide unwanted files, but be careful to not hide any required file!
+create the commit:
+git commit -m "final version week07"
+
+In fact, we strongly advise you to systematically make these regular commits, at least weekly, when your work is up and running. This will help you save your work and measure your progress.
+ +The aim of this project is to have you develop a large program in C on a "system" theme. The framework chosen this year is the construction of a command-line utility to manage images in a specific format file system, inspired by the one used by Facebook. For your information, Facebook's system is called "Haystack" and is described in the following paper: https://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf.) You are not required to read this paper as part of the course (it's just for information) because, obviously, we'll be implementing a simplified version of this system. All the basic concepts required for this project are introduced here in a simple way, assuming only standard "user" knowledge of a computer system.
+Social networks have to manage hundreds of millions of images. The usual file systems (such as the one used on your hard disk, for example) have efficiency problems with such large numbers of files. Furthermore, they aren't designed to handle the fact that we want to have each of these images in several resolutions, e.g. very small (icon), medium for a quick preview and normal size (original resolution).
+In the "Haystack" approach, several images are contained in a single file. What's more, different resolutions of the same image are stored automatically. This single file contains both data (the images) and metadata (information about each image). The key idea is that the image server has a copy of this metadata in memory, to enable very rapid access to a specific photo in the right resolution.
+This approach has a number of advantages: firstly, it reduces the number of files managed by the operating system; secondly, it elegantly implements two important aspects of image database management:
+This deduplication is done using a "hash" function, which summarize a binary content (an image in our case) into a signature much smaller. Here, we will use the "SHA-256" function, which produces a 256 bits signature, and has the useful property that it is collision resistant: it is almost impossible for two different contents to have the same signature. In this project, we will use the assumption that two images with the same signature are identical. Although it may seem surprising, many systems are based on this principle.
+You will build an image server, in a version inspired and simplified of Haystack. During the first weeks, it will consist of implementing the basic functions of the system, which are:
+During this first part, those functions will be exposed through a command line interface (CLI). Further on, you will build a true webserver to distribute the image over the network using the HTTP protocol.
+Here, we will describe the main concepts and structures you will need for this project. Their implementation details will be specified later in the weekly handouts.
+You will use a specific format -- let's call it "imgfs
" -- to represent an "image file system". A file of type imgfs
contains three distinct parts:
imgfs
creation;max_files
field of the header
; each of its entry describe the metadata of a single image, especially their position in the file;This file format will be used by the two tools that you will develop:
+The three parts explained above consists of the following data structures:
+struct imgfs_header
: the header with the configuration data:
name
: a string of at most MAX_IMGFS_NAME
characters, the name of the database;version
: a 32-bits unsigned int
; the version of the database, it is incremented after each insertion/deletion;nb_files
: a 32-bits unsigned int
; the current number of images in the system;max_files
: a 32-bits unsigned int
; the maximum number of images that the system can contain; this field is specified during the creation and must not be modified afterwards;resized_res
: an array of 2 times (NB_RES
- 1) elements, each of which is a 16-bits unsigned int
; the resolutions of the "thumbnail" and "small" images (in order: "thumbnail width", "thumbnail height", "small width", "small height"); this field is specified during the creation and must not be modified afterwards; the handling of the original resolution is explained below;unused_32
and unused_64
: two unsigned int
(of 32 and 64 bits); unused (but intended for future evolutions or temporary information - it is often useful to include fields of this type in large-scale projects; this allows old data structures to be used directly in newer versions of the software);struct img_metadata
: image metadata:
img_id
: a string of at most MAX_IMG_ID
characters, containing a unique identifier (name) for the image;
SHA
: an array of SHA256_DIGEST_LENGTH
unsigned char
; the image hash code, as explained above;
orig_res
: an array of two 32-bit unsigned int; the resolution of the original image;
size
: an array of 32-bit NB_RES
unsigned int
; memory sizes (in bytes) of images at different resolutions ("thumbnail", "small" and "original"; in this order, given by X_RES
indices defined in imgfs.h
);
offset
: an array of 64-bit NB_RES
unsigned int
; the positions in the "image database" file of images at the various possible resolutions (in the same order as for size; also use the X_RES
indices defined in imgfs.h
to access the elements of this array);
is_valid
: a 16-bit unsigned int
; indicates whether the image is in use (value NON_EMPTY
) or not (value EMPTY
);
unused_16
: a 16-bit unsigned int
; not used (but intended for future evolutions).
struct imgfs_file
:
file
: a FILE*
indicating the file containing everything (on disk);
header
: a struct imgfs_header
; the general information ("header") of the image database;
metadata
: a dynamic array of struct img_metadata
; the "metadata" of the images in the database.
header
and dynamically allocated to max_files
;is_valid
; there may therefore be "holes" in the metadata array, and unused parts in the file (since the images themselves are not deleted); the basic idea behind all this is to be prepared to lose a little space to save time;
+At a more complex level, we can imagine a "garbage collector" (or a "defrag") which, in parallel, when "there's time", effectively deletes images that are no longer in use, reorganizes metadata to reduce gaps, and so on.
+We won't go into such considerations in this project, but you may implement it as an extension.(To check, whatever the architecture, sizeof(struct img_metadata)
must give 216.)
This week's objective is to implement three features for our image management system:
+create
command, to create a new (empty) file in imgFS
format (= a new image database);delete
);help
command, a standard and essential element of any command line interface.One of the aims of this exercise is to learn how to write data structures to disk using basic I/O operations.
+As in previous weeks, you'll be writing your own code, modifying the elements provided.
+Except new tests, there is no new provided material.
+You will continue to modify the files used last week: imgfscmd.c
and imgfscmd_functions.c
.
This week's work consists of five modifications, summarized here and detailed below if necessary:
+in a new imgfs_create.c
file (to be created), implement the do_create()
function (prototyped in imgfs.h
), the purpose of which is to create a new image database in a (binary) file on disk;
complete the do_create_cmd()
function in the imgfscmd_functions.c
file in order to call do_create()
correctly;
implement the do_delete()
function (prototyped in imgfs.h
) in a new imgfs_delete.c
file; the do_delete()
function must "delete" a specified image (we'll see below what this really means);
complete the do_delete_cmd()
function in the imgfscmd_functions.c
file in order to call do_delete()
correctly;
define the help()
function, which will print instructions for using the imgfscmd
command line interface (CLI).
do_create()
.do_create()
must create a new database for the imgfs
format. It receives the name of the database file, and a partially filled imgfs_file
structure, containing only, in the header, max_files
and resized_res
.
This function should finish initializing the received imgfs_file
structure before writing it to disk, first the header, then the metadata. It must use standard C input/output functions to create the new image base in a binary file on disk. If the file already exists, it is simply overwritten (without message nor error).
It is important to initialize all relevant elements explicitly before writing. And, of course, it's essential to write the right-sized array of metadata
in the file.
+Note: the database name must be set by do_create()
from the provided constant CAT_TXT
.
It is also important to handle all possible errors. In the absence of an error, do_create()
should return ERR_NONE
; in the event of an error, it returns the corresponding value code as defined in error.h
.
As the create
command is only used once (to create a database) and always from the command line utility imgfscmd
(it will never be launched from a Web server, for example), we are exceptionally going to add a side effect in the form of a display indicating the (true) number of objects saved on disk.
+For example, with one header then ten metadatas, we'll have the following display:
11 item(s) written
+
+11
because the header and then each of the ten metadatas have been successfully written by fwrite()
.
do_create_cmd()
.We have provided you with an incomplete implementation of do_create_cmd()
. As part of your solution, you need to create an imgfs_file
, initialize the max_files
and resized_res
fields of its header with the values provided, then call do_create()
(which will initialize the other fields).
create
command argumentsThe main role of do_create_cmd()
is to correctly parse all of its arguments, both mandatory and optional.
Your solution should have the following structure:
+start by retrieving the mandatory argument (<imgFS_filename>
)
iterate on argv
;
at each iteration, first determine whether it's an acceptable optional argument (-max_files
, -thumb_res
or -small_res
; see also the help
text below);
if so, check if there are still enough parameters for the corresponding values (at least one for -max_files
and at least 2 for the other two); if not, return ERR_NOT_ENOUGH_ARGUMENTS
;
then convert the next parameter(s) to the correct type; check that the value is correct (neither zero nor too large); if not, return either ERR_MAX_FILES
(for -max_files
), or ERR_RESOLUTIONS
;
+note that util.c
, already supplied in the past, offers two tool functions (atouint16()
and atouint32()
) for converting a character string containing a number into its uint16
or uint32
value; we encourage you to use these two functions to convert character strings in command line arguments; they handle the various error cases in the event of converting an invalid number, or a number too large for the specified type (e.g., trying to convert 1000000 to a 16-bit number); they return 0 in these cases; use them to implement your code correctly;
if not an optional argument, return error ERR_INVALID_ARGUMENT
.
Please note:
+optional arguments may be repeated, e.g. -max_files 1000 -max_files 1291
; in this case, only the last value is valid;
the mandatory argument cannot be repeated.
+do_delete()
.We here describe how to implement the functionality for deleting an image. The idea is as follows: we don't actually delete the contents of the image, as this would be too costly (especially in terms of time). In fact, the size of the image base file on disk never decreases, even when you ask to "delete" an image from the base.
+Rather, an image is "deleted" by
EMPTY
in is_valid
;Changes must be made first to the metadata (memory, then disk), then to the header if successful.
+Note: for reasons of compatibility between systems, it is preferable to rewrite the entire "struct
" to disk, rather than just the modified fields.
The do_delete()
function takes the following arguments:
const char *
);imgfs_file
structure.To write the changes to disk, you first need to set the position at the right place in the file, using fseek()
(see the course and man fseek
) and then fwrite()
.
Of course, if the reference in the image database does not exist (and there is no invalidation), this must be handled correctly.
+Don't forget to update the header if the operation is successful. You also need to increase the version number (imgfs_version
) by 1, adjust the number of valid images stored (nb_files
) and write the header to disk.
do_delete_cmd()
Complete the code for do_delete_cmd()
. If the received imgID
is empty or its length is greater than MAX_IMG_ID
, do_delete_cmd()
should return the error ERR_INVALID_IMGID
(defined in error.h
).
help()
.The help
command is intended to be used in two different cases (already covered):
imgfscmd help
.The command output must have exactly the following format:
+imgfscmd [COMMAND] [ARGUMENTS]
+ help: displays this help.
+ list <imgFS_filename>: list imgFS content.
+ create <imgFS_filename> [options]: create a new imgFS.
+ options are:
+ -max_files <MAX_FILES>: maximum number of files.
+ default value is 128
+ maximum value is 4294967295
+ -thumb_res <X_RES> <Y_RES>: resolution for thumbnail images.
+ default value is 64x64
+ maximum value is 128x128
+ -small_res <X_RES> <Y_RES>: resolution for small images.
+ default value is 256x256
+ maximum value is 512x512
+ delete <imgFS_filename> <imgID>: delete image imgID from imgFS.
+
+Write the function in imgfscmd_functions.c
.
It's best to start testing your code on a simple case you're familiar with.
+Use a copy of the provided/tests/data/test02.imgfs
file from previous weeks (we insist: make a copy!!) to see its contents, delete one or two image(s). Check each time by looking at the result with list
.
Also test any edge cases you can think of.
+Test your two new commands (use help
to find out how to use create
;-P ).
To check that the binary file has been correctly written to disk, use last week's list
command.
We provide you with a bunch of unit and end-to-end tests, you can run them as usual.
+If you're on your own VM, please install libvips-dev
, e.g.:
sudo apt install libvips-dev
+
+As we move forward with the project, it is important that you can write your own tests, to complete the provided ones. You can find those in provided/tests/unit/
. Before adding new tests, don't forget to copy the test/
directory in done/
. You will also need to modify the TEST_DIR
variable in the Makefile
.
We strongly advise you to edit these files to add your own tests, or even to create new ones as you move forward. This can be done quite simply by adding your own values or lines of code to the tests already provided, or by copying this file and drawing inspiration from it (don't forget to update the tests' Makefile
accordingly). You don't need to understand everything in this file, at least not initially, but it is important you start to get familiar with its content.
That said, for those who want to go further, the main test functions available in the environment we use (Check) are described over there: https://libcheck.github.io/check/doc/check_html/check_4.html#Convenience-Test-Functions. For example, to test whether two int
are equal, use the ck_assert_int_eq
macro: ck_assert_int_eq(a, b)
.
We have also defined the following "functions" in tests.h
:
ck_assert_err(int actual_error, int expected_error)
: assert that actual_error
is expected_error
;ck_assert_err_none(int error)
: assert that error
is ERR_NONE
;ck_assert_invalid_arg(int error)
: assert that error
is ERR_INVALID_ARGUMENT
(i.e. correspond to the return code of a function which received a invalid argument; see error.h
) ;ck_assert_ptr_nonnull(void* ptr)
: assert that ptr
is not NULL
;ck_assert_ptr_null(void* ptr)
: assert that ptr
is NULL
.Finally, we'd like to remind you that just because 100% of the tests provided here pass doesn't mean you'll get 100% of the points. Firstly, because these tests may not be exhaustive (it's also part of a programmer's job to think about tests), but also and above all (as indicated on the page explaining the project grading scale, because we attach great importance to the quality of your code, which will therefore be evaluated by a human review (and not blindly by a machine).
+ +This week consists of two distinct objectives (remember to divide up the work):
+read
and insert
) which will be finalized next week;Notice also that the work up to this week (included, i.e. weeks 7, 8 and 9) is the first of the two deliverables that will be evaluated for this project. More details in the foreword.
+So don't forget to submit it before the deadline. Submission procedure is indicated at the end of this handout.
This week we provide you new tests as usual, as well as the script used to submit your first version of the project.
+One of the aims of this project course is to learn how to incorporate complex external libraries into your own work. In our case, we will make use of the VIPS library, for compressing images.
+First, you need to update your Makefile
to include the library in the compilation, by adding the following lines:
# Add options for the compiler to include the library's headers
+CFLAGS += $(shell pkg-config vips --cflags)
+
+# Add the library to the linker
+LDLIBS += $(shell pkg-config vips --libs)
+
+Then, you need to
+VIPS_INIT()
at the start of your main()
function, and give it argv[0]
as parameter;vips_shutdown()
at the end of the execution.To help you, please take a look at the online documentation of this library. You will need to use the following functions:
+vips_jpegload_buffer()
vips_jpegsave_buffer()
vips_thumbnail_image()
g_object_unref()
: equivalent of free()
for all VipsObject*
. To convert a VipsSOMETHING*
to a VipsObject*
, use the VIPS_OBJECT()
functional macro.Be aware that the first three functions take a variable number of parameters, thus you must terminate the parameter list by passing a NULL
pointer.
We stress that it's a significant part of your work this week to understand how to use this library.
+Note: You must be very careful when managing allocated memory and using VIPS at the same time. VIPS executes some operations lazily, i.e. they are deferred to the last moment. This means that, even if it does seem that you won't need an object anymore, it may actually still be needed to complete operations later on.
+One of the main functions of imgFS
is to transparently and efficiently manage the different resolutions of the same image (as a reminder: in this project, we'll have the original resolution and the "small" and "thumbnail" resolutions).
As a first step this week, you'll need to implement a function called lazily_resize()
. Its name suggests its usage: in computing, "lazy" corresponds to a commonly used strategy of deferring the work until the last moment, avoiding unnecessary work.
+(Teacher's note: don't confuse "computer science" with "studies in computer science" ;-)
).
This function has three arguments:
+THUMB_RES
or SMALL_RES
(see imgfs.h
);ORIG_RES
is passed, the function simply does nothing and returns no error (ERR_NONE
));imgfs_file
structure (the one we're working with);size_t
, position/index of the image to be processed.It must implement the following logic:
+error.h
and error.c
);resized_res
field) for the requested resolution; this is already the case when using vips_thumbnail_image()
with the simplest (= almost none) options;imgFS
file;metadata
in memory and on disk.To create the new image variant, you'll use the VIPS
library introduced below.
Your solution should consist of:
+image_content.c
file implementing the lazily_resize()
function;Makefile
(see above).The second component of the week concerns the de-duplication of images, to avoid the same image (same content) being present several times in the database. For a social network, this type of optimization saves a lot of space (and time).
+To do this, you need to write a do_name_and_content_dedup()
function, to be defined in a new image_dedup.c
file (and prototyped in image_dedup.h
).
This function returns an error code (int
) and takes two arguments (in this order):
a previously opened imgFS
file;
an index (type uint32_t
here) which specifies the position of a given image in the metadata
array.
In the image_dedup.c
file, implement this function as follows.
For all valid images in the imgfs_file
(other than the one at position index
and in ascending positions):
if the name (img_id
) of the image is identical to that of the image at position index
, return ERR_DUPLICATE_ID
; this is to ensure that the image database does not contain two images with the same internal identifier;
(then, ) if the SHA value of the image is identical to that of the image at position index
, we can avoid duplicating the image at position index
(for all its resolutions).
To de-duplicate, you need to modify the metadata at the index
position, to reference the attributes of the copy found (its three offsets and sizes; note that the original size is necessarily the same).
Note: don't modify the name (img_id
) of the image at the index
position: it's only the contents that are de-duplicated; you'll have two images with different names, but pointing to the same contents.
+This is, by the way, a good illustration of how indirection tables are used in file-systems.
If the image at position index
has no duplicate content, set its ORIG_RES
offset to 0.
+If the image at position index
has no duplicate name (img_id
), return ERR_NONE
.
As always, we provide you with a few tests, to run with make check
. We strongly advise you to write your own tests to complete those. Once you have finished your testing, you can also use the make feedback
.
As mentioned in the introduction, this week's work, together with the work of weeks 7 to 8, constitutes the first submission of the project.
+The deadline for this assignment is Sunday May 05, 23:59; make sure you don't fall behind schedule and properly divide up the work between you.
+The easiest way to submit is to do
+make submit1
+
+from your done/
directory. This simply adds a project01_1
tag to your commit (in the main
branch).
Although you can do as many make submit1
as you want, we really recommend you to do it only when you are sure you want to deliver your work.
This week, you'll implement the commands read
(extract an image from the image database) and insert
(insert an image into the database). To do this, you'll use features developed last week.
Warning:
+if you're working ahead on the submission, don't forget to make a make submit
before "polluting" your first rendering with early work for this week (which shall not be part of the part submitted for grading).
+If in doubt/difficulty, ask an assistant (or teacher).
As in the previous weeks, we're providing you with test material for this week.
+Moreover, in order to reduce the workload, we also provide you with four functions to be added in your code. These functions can be found in provided/src/week10_provided_code.txt
and consist in:
resolution_atoi()
, to be added to imgfs_tools.c
;
the purpose of this function is to transform a string specifying an image resolution into one of the enumerations specifying a resolution type, namely:
+THUMB_RES
if the argument is either "thumb"
or "thumbnail"
;SMALL_RES
if the argument is "small"
;ORIG_RES
if argument is either "orig"
or "original"
;-1
in all other cases, including if the argument is NULL
;this function is needed to process the command line arguments of the read
command;
get_resolution()
, to be added to image_content.c
;
the purpose of this function is to retrieve the resolution of a JPEG image;
+it takes an image_buffer
as input, which is a pointer to a memory region containing a JPEG image, and image_size
which is the size of this region; and it "outputs" (fills, actually) the two parameters height
and width
;
the function returns ERR_NONE
if there is no problem, or ERR_IMGLIB
if there is a VIPS error;
this function is needed by the insert
command;
Note for those who look at the provided code and may be puzzled by the cast: the prototype of vips_jpegload_buffer()
is in fact wrong, as its first argument should be const void*
(instead of void *
; we read from these data!). In fact, if you take a look at their code, that function only calls vips_blob_new()
whose second argument is correctly qualified as const void *
(well, even if there is that horrible casting). So we can safely pass a const
image_buffer
to vips_jpegload_buffer()
(by casting it, unfortunately... :-(
);
the two do_read_cmd()
and do_insert_cmd()
, to be added to imgfscmd_functions.c
; they do not compile as such yet since they require three utility functions (to be written, see below).
What you have to do this week, is to implement the do_read()
and do_insert()
functions, as well as three utility functions for
+do_read_cmd()
and do_insert_cmd()
.
do_insert()
The do_insert()
function adds an image to the "imgFS". Create a new imgfs_insert.c
file to implement it.
The implementation logic contains several steps, in an order that must be respected.
+First of all, check that the current number of images is less than max_files
. Return ERR_IMGFS_FULL
if this is not the case.
Next, you have to find an empty entry in the metadata
table. When this is the case, you must:
SHA
field (if necessary, review what you did in the warmup regarding SHA256 computation);img_id
string into the corresponding field;ORIG_RES
field (beware of type change);get_resolution()
function (see above) to determine the image width and height; put these values into the orig_res
fields of the metadata
.Call last week's do_name_and_content_dedup()
function using the correct parameters. In the event of an error, do_insert()
returns the same error code.
First, check whether the de-duplication step has found (or not) another copy of the same image. To do this, test whether the original resolution offset is zero (if necessary, review the do_name_and_content_dedup()
function).
If the image does not exist, write its contents at the end of the file. Don't forget to finish initializing the metadata.
+Update all the necessary image database header fields. Version shall be inscreased by 1.
+Finally, all that's left to do is write the header
, and then the corresponding metadata
entry to disk (your code must not write all the metadata to disk for each operation!).
do_read()
The second main function of the week is do_read()
, to be implemented in a new file, imgfs_read.c
.
This function must first find the entry in the metadata table corresponding to the supplied identifier.
+If successful, determine whether the image already exists in the requested resolution (offset
or size
null). If not, call the lazily_resize()
function from last week to create the image at the required resolution. (Note: this should never be the case for ORIG_RES
).
At this point, the position of the image (in the correct resolution) in the file is known, as is its size; you can then read the contents of the file image into a dynamically allocated memory region.
+If successful, the output parameters image_buffer
and image_size
should contain the memory address and size of the image.
Be careful to handle possible error cases:
+ERR_IO
in the event of a read error;ERR_OUT_OF_MEMORY
in the event of a memory allocation error;ERR_IMAGE_NOT_FOUND
if the requested identifier could not be found.Note: in case any of you are wondering: please note that read
on a duplicated image does not modify any of its duplicates. Indeed, lazily_resize()
has no impact on other images than the one under consideration (and was written before do_name_and_content_dedup()
). Such a behavior (which must be your program's behavior) isn't a big deal in practice, because:
We already provided you with the two wrap-up functions do_read_cmd()
and do_insert_cmd()
(see top of this handout), but they still require three utility functions (in imgfscmd_functions.c
):
static void create_name(const char* img_id, int resolution, char** new_name);
+static int write_disk_image(const char *filename, const char *image_buffer, uint32_t image_size);
+static int read_disk_image(const char *path, char **image_buffer, uint32_t *image_size);
+
+The purpose of create_name()
is to create, in new_name
the name of the file to use to save the read image (do_read_cmd()
),
+using the following naming convention:
image_id + resolution_suffix + '.jpg'
+
+where:
+image_id
is the image identifier;resolution_suffix
corresponds to _orig
, _small
or _thumb
;for instance, if the image id is "myid"
and the resolution is SMALL_RES
, then new_name
will contain "myid_small.jpg"
.
+Also have a look at its call for further details if needed.
write_disk_image()
is a very simple tool function (five lines or so) to write the content of the provided image_buffer
, the size of which is image_size
, to a file, the name of which is provided. Have a look at its call for further details if needed.
This function returns ERR_IO
on error and ERR_NONE
otherwise.
Finaly, read_disk_image()
reads an image from disk, the filename of which is provided in path
. It reads the image into image_buffer
and sets image_size
to its corresponding size.
This function returns ERR_IO
in case of a filesystem error, ERR_OUT_OF_MEMORY
in case of a memory allocation error, and ERR_NONE
otherwise.
help
Finaly, modify the help
command to reflect the new commands:
> ./imgfscmd help
+imgfscmd [COMMAND] [ARGUMENTS]
+ help: displays this help.
+ list <imgFS_filename>: list imgFS content.
+ create <imgFS_filename> [options]: create a new imgFS.
+ options are:
+ -max_files <MAX_FILES>: maximum number of files.
+ default value is 128
+ maximum value is 4294967295
+ -thumb_res <X_RES> <Y_RES>: resolution for thumbnail images.
+ default value is 64x64
+ maximum value is 128x128
+ -small_res <X_RES> <Y_RES>: resolution for small images.
+ default value is 256x256
+ maximum value is 512x512
+ read <imgFS_filename> <imgID> [original|orig|thumbnail|thumb|small]:
+ read an image from the imgFS and save it to a file.
+ default resolution is "original".
+ insert <imgFS_filename> <imgID> <filename>: insert a new image in the imgFS.
+ delete <imgFS_filename> <imgID>: delete image imgID from imgFS.
+
+
+ This week we start a new aspect of the project: adding HTTP access (server and client) to our Image Database. Basically, we want to convert our imgfscmd
application to a client-server application that uses HTTP (over TCP as its transport-layer protocol).
+This work be structured as follows over the next three weeks:
this week: create a socket layer for network communications; and use that layer to create a simple HTTP server (to be made more complex next week);
+next week: create a (simplified) HTTP layer over the socket layer that contains all the functionalities needed for this project (mainly: parse HTTP requests designed for this project), but in a blocking way (handles only one connection at a time);
+and in the last week, create a server that can serve (!) our image database commands (read, insert, ...) through HTTP access; and use it via an HTTP client; and in a non blocking way (multiple connections via a multi-threaded program).
+We thus have three logical layers, each of which shall be tested on its own:
+the socket layer, to be tested with tcp-test-client.c
and tcp-test-server.c
(to be done);
the "generic" (but incomplete) HTTP layer, to be tested with http-test-server.c
(provided) and curl
;
the ImgFS-over-HTTP layer, to be tested with imgfs_server
and either curl
(early tests) or a browser, using index.html
(already provided).
For this week, we focus on the transport layer (TCP), simply using standard Unix sockets in C to provide the four following functions (see socket_layer.h
):
tcp_server_init()
, to initialize a network communication over TCP;
tcp_accept()
, to create a blocking call that accepts a new TCP connection;
tcp_read()
, to create a blocking call that reads the active socket once and stores the output in buf
;
tcp_send()
to send a response message.
Most of these functions are simply interfaces to sys/socket.h
C functions socket(2)
, bind(2)
, listen(2)
, accept(2)
, recv(2)
and send(2)
. We strongly recommend you have a look at the corresponding man-pages.
We then use that layer to create a simple HTTP-server API. +There, you'll have to implement two functions:
+http_receive()
, to create a call and read from it;
http_reply()
to send a response message.
http_init()
, to initialize an HTTP communication, and http_close()
, to close it, are provided.
+The fifth function that appears in http_net.h
, http_serve_file()
, will be implemented later.
In the provided/src
directory, you can find the following files (some of which have certainly already been copied to your done/
):
socket_layer.h
: prototypes of the tcp_*()
functions, which interact with UNIX socket and serve as basis for our HTTP web server;http_net.h
: prototypes of the HTTP layer, responsible for receiving incoming requests, and generating HTTP responses;http_prot.h
: parse HTTP requests;imgfs_server_service.h
: core functions of the imgfs
HTTP server: sets up and shutdown server, dispatch requests;http_net.c
: implementation of the HTTP layer,imgfs_server.c
: the main code of our server,imgfs_server_service.c
: the implementation of the core functions to offer HTTP services to our ImgFS database;http-test-server.c
: a simple test of the HTTP layer.tcp_server_init()
In a file socket_layer.c
(to be created), define the tcp_server_init()
function (see its prototype in socket_layer.h
) which:
socket(2)
man-page; use AF_INET
and SOCK_STREAM
);struct sockaddr_in
type); notice that for portability, the port number received as argument shall be converted using htons()
(see htons(3)
man-page);bind(2)
); note: there is no problem passing a pointer to a struct sockaddr_in
as a pointer to a struct sockaddr
;listen(2)
);The function returns the socket id.
+Whenever an error is encountered, this function prints an informative message on stderr (see perror(3)
), closes what should be, and returns ERR_IO
. Sockets must be closed using close(3)
.
tcp_accept()
The tcp_accept()
function (to be defined also in socket_layer.c
) is simply a (one line of code) frontend to the accept(2)
function.
+We don't make any use of the addr
and addr_len
arguments of accept()
(use NULL
).
This function returns the return value of accept()
.
tcp_read()
and tcp_send()
Similarly, tcp_read()
and tcp_send()
are also frontends to recv(2)
and send(2)
functions, respectively. They return either ERR_INVALID_ARGUMENT
if they received an improper argument, or the return value of the system function called.
To test your implementation by creating two simple programs (see usage examples below):
+a client (tcp-test-client.c
) that takes two arguments from the command line: a port (number) and a (short) file;
a server (tcp-test-server.c
) that takes one argument from the command line: a port (number).
The client test if the file exists and has a size less than 2048. If it's the case, it:
+The server waits for connections and when it receives a file (length first):
+The server never terminates, as it may have to serve several clients/requests.
+You need to make sure that the two ends of the communication will +never get stuck waiting for each other at the same point in +time (this would lead in a "deadlock").
+However, when sending several messages using TCP, the boundaries of +these messages get lost. For instance, if you use a TCP socket to +transmit "Hello" and "Goodbye" as two separate messages, the receiver may +interpret this as one single message: "HelloGoodbye". This is because +all data transmitted using TCP get "serialized" into a single +byte-stream.
+We thus need to construct our messages in a way such that we can +deserialize the byte-stream back to the original messages. We can for +instance make use of a delimiting character of string. For instance, +if we know that the character "|" can never be part our message, we +can transmit "Hello", then "|", then "Goodbye" to make the remote end +(who may thus receive "Hello|Goodbye" altogether) understand that +those are two different messages. In this case, the role of "|" is +that of a delimiter.
+If there is no character that can act as a delimiter for our protocol, +you may add headers containing meta-data about the following message. +These headers can be then used by the other end to deserialize the +messages.
+To keep this test simple, we simply designed it in a two messages +passing: first the size, then the content. But an issue may happen if +the file sent starts with some digits. We thus propose you to add a +simple delimiter character at the end of the size message.
+Similarly we should have a way to delimit the end of the file (otherwise the next size may still be considered to be part of a former file). We propose you to add a simple delimiter string, e.g. "<EOF>"
.
Server (in one terminal):
+./tcp-test-server 6789
+
+Server started on port 6789
+Waiting for a size...
+Received a size: 32 --> accepted
+About to receive file of 32 bytes
+Received a file:
+Hello there!
+How are you doing?
+
+Waiting for a size...
+...
+
+Client (in another terminal):
+./tcp-test-client 6789 ../provided/tests/data/hello_there.txt
+
+Talking to 6789
+Sending size 32:
+Server responded: "Small file"
+Sending ../provided/tests/data/hello_there.txt:
+Accepted
+Done
+
+You can launch the client several times, with different files
+(for instance ../provided/tests/data/aiw.txt
).
(Terminate the server with Ctrl-C.)
+Use Wireshark to debug your code.
+Try many clients at the same time:
+for i in $(seq 5); do ./tcp-test-client 6789 ../provided/tests/data/2047.txt > log-$i 2>&1 & done
+
+What happens? (maybe nothing particular, actually)
+-->
concurrent access will not be addressed at this layer but in the last week in the HTTP layer.
In order to be generic (and be able to use our HTTP layer for other services than the one used in this project), we separate the handling of the content of the HTTP requests/services from the handling of the HTTP protocol itself.
+This separation is done by passing a function, responsible for the handling of the content of the HTTP requests/services, to the initialization of the HTTP connection. Such a function is called a "HTTP messages handler".
+To be able to pass it to the initialization function, we need a specific type: EventCallback
, to be defined in http_net.h
as a pointer to a function taking a pointer to struct http_message
and an int
as parameters, and returning an int
.
http_receive()
In a file http_net.c
(copy if from provided
; this file offers the API to a (simplified) generic HTTP server), the http_receive()
function is the main function to handle HTTP connections. But in order to prepare for multi-threaded version (last week), we recommend you to split it into two parts:
connects the socket with tcp_accept()
(returns ERR_IO
in case of error);
(if no error,) handles the connection through a tool function (we propose to name it handle_connection()
).
Of course, most of the work now remains to be done in handle_connection()
.
For future compatibility, its signature has to be:
+static void* handle_connection(void* arg)
+
+In our case, it receives a pointer to an int
containing the socket file descriptor.
+And it returns a pointer to an int
containing some error code (ERR_NONE
if none). This may seem far-fetched (why not receive and return an int
?), but this will be required when adding multi-threading. We provided two examples of how to handle that.
The handle_connection()
function:
reads the HTTP header from the socket into some buffer (max size of HTTP headers is provided in MAX_HEADER_SIZE
from http_net.h
); notice that this may require several call to tcp_read()
: read as long as the headers do not contain HTTP_HDR_END_DELIM
(and you didn't read more than MAX_HEADER_SIZE
); you can use strstr(3)
to find HTTP_HDR_END_DELIM
in the buffer;
handles error cases;
+sends the reply using http_reply()
: if the headers contains "test: ok"
(use strstr(3)
once again), use the HTTP_OK
status, otherwise HTTP_BAD_REQUEST
; the other parameters can be empty; if http_reply()
fails, handle_connection()
returns &our_ERR_IO
.
http_reply()
The http_reply()
function is a tool function to send a general reply a bit more complex than the above two, with some content.
allocates a buffer at the proper size (to be computed, read further);
+starts filling this buffer with the header in the format:
+ HTTP_PROTOCOL_ID <status> HTTP_LINE_DELIM <headers> Content-Length: <body_len> HTTP_HDR_END_DELIM
+
+where <status>
, <headers>
and <body_len>
have to be replaced by the corresponding parameter values;
for instance, the call
+ http_reply(1234, HTTP_OK, "Content-Type: text/html; charset=utf-8" HTTP_LINE_DELIM, buffer, 6789);
+
+will create the header
+ "HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=utf-8\r\nContent-Length: 6789\r\n\r\n"
+
+then adds (copies) the body to the end of the buffer;
+and send everything to the socket.
+The body
parameter may be NULL
(as long as body_len
is 0). It is useful for responses with an empty body.
Use the provided http-test-server.c
to make some tests. Simply launch this server; and, as a client, use curl
:
curl -v localhost:8000
+curl -H 'test: ok' -v localhost:8000
+curl -H 'test: fail' -v localhost:8000
+
+The final step for this week is to create a simple version of our future HTTP server for ImgFS services.
+This is separated over two files (copy them from provided
):
imgfs_server_service.c
, which implements the main functionalities needed by our server;imgfs_server.c
, which runs the server.In imgfs_server_service.c
:
declared two static
global variables, one to store the ImgFS file and another to store the port number (uint16_t
);
define the function server_startup()
, which receives argc
and argv
, and:
DEFAULT_LISTENING_PORT
;In imgfs_server.c
:
server_startup()
;http_receive()
as long as there are no error (see http-test-server.c
for an example);Finally, we'd like to properly close the server. For this we will add a signal handler that will close the HTTP connection and the ImgFS file on server termination.
+First of all, add a call to http_close()
into server_shutdown()
.
Then, to imgfs_server.c
:
add the function
+static void signal_handler(int sig _unused)
+which simply calls server_shutdown()
, then stops the program using exit(0)
;
and call set_signal_handler()
from the main()
.
Try it by sending a Ctrl-C to a running server.
+You can test your new server with the same curl
commands as above. Test different port numbers.
There is no other "end-to-end" test for this week (except the self-made, mentioned in this handout) since we did not finish the implementation of a "final product".
+Similarly, there is no unit-test, since we don't really have independent tool functions this week.
+ +This week we continue our client-server application by adding a (simplified) HTTP server layer.
+Last week, our HTTP layer was only able to detect some specific string in the header. We now want to be able to read and write full HTTP requests (only the ones needed for our purposes; not the full RFCs (9110-9112) ;-)
). Mainly, your work this week will consist in writing http_prot.c
and use it.
To recap the design of our ImgFS client-server architecture: it has been layered as, from lower to upper level:
+socket_layer
: TCP client and server;http_net
: minimal HTTP network services (no parsing of the content);http_prot
: (simple) HTTP content parsing;imgfs_server_service
: tools to build an ImgFS server over HTTP (the client will be either curl
or your browser; you won't write any HTTP client);imgfs_server
: the ImgFS server (over HTTP) itself.Regarding the communication design, the three steps that could be considered are:
+This week we will implement version 1 and next week move to version 2. Polling won't be addressed in this project (but those who'd like to, can do it).
+We thus have three things to be done this week:
+tools to parse and create HTTP messages (http_prot.c
);
create a more appropriate (but still generic) handle_connection()
(in http_net.c
);
develop ad-hoc services for our ImgFS server (imgfs_server_service.c
).
IMPORTANT NOTICE: it's really important you proceed step by step and test each progress separately to be sure you're building upon safe grounds. We will propose you several testing steps but feel free to develop/use your own whenever needed!
+Also, help yourself by displaying informed error message whenever possible. For instance, instead of writing:
+fprintf(stderr, "error with URL parameter\n");
+
+it may be more effective to do something like:
+fprintf(stderr, "http_get_var(): URL parameter \"%s\" not found in \"%s\"\n", name, url->val);
+
+You can also use debug_printf()
whenever needed.
This week, we provide you with unit tests in tests/unit/unit-test-http.c
and, in src/week12_provided_code.txt
, some code to be added to imgfs_server_service.c
.
The first thing to pay really attention about is the difference between strings used by HTTP, which are not null-terminated, and the C "strings" (null-terminated char*
). When using a C "string", always ensure it is indeed null-terminated (to go the other way round is not a problem as the size will always be send over HTTP: simply use C "string" size minus one).
To properly handle HTTP string, we propose you the struct http_string
type (in http_prot.h
; have a look!).
+Similarly, to make your life easier, we also propose a few other data structures. Feel free to use them when needed!
Printing those HTTP strings is a bit tricky; using the usual "%s"
will not work, since the string is not null-terminated. You can instead use the "%.*s"
specifier, and pass the length of the string before its value :
struct http_string s = {.val = "Hello world!<this is outside the http string>", .len = 12};
+printf("C string: %s\n", s.val);
+// C string: Hello world!<this is outside the http string>
+printf("HTTP string: %.*s\n", s.len, s.val);
+// HTTP string: Hello world!
+
+http_match_uri()
and http_match_verb()
Let's start with the simple http_match_uri()
and http_match_verb()
functions: see their description in http_prot.h
and implement them. Notice the difference between URI, where only the prefix matter (e.g. HTTP string "https://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg
" matches any of the C strings "https://"
, "https://localhost:8000/"
, "https://localhost:8000/imgfs"
, etc.; this is very simple indeed) and "verbs" where the whole string matters (e.g. HTTP string "POST
" does not match C string "POS"
, nor "POST /localhost:8000/imgfs"
).
As usual, pay attention to receive valid arguments.
+You can test your function with the above examples, as well as those:
+"/universal/resource/identifier" match uri "/universal/resource/"
+"/universal/resource/identifier" match uri "/universal"
+
+{val = "POST / HTTP/1.1", len = 4} match verb "POST"
+{val = "GET / HTTP/1.1" , len = 3} match verb "GET"
+{val = "GET / HTTP/1.1" , len = 3} does not match verb "GET /"
+{val = "GET / HTTP/1.1" , len = 3} does not match verb "G"
+
+http_get_var()
The purpose of http_get_var()
function is to extract values of parameters from URL. For instance, get "orig"
for parameter "res"
in http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg
. Or extract "mure.jpg"
for parameter "img_id"
in the same URL.
For this we recommend:
+name
parameter into a new string and append =
to it;'&'
somewhere after that string; if yes consider the position of this '&'
as the end of the value, otherwise consider the end of the URL as the end of the value;out
(shall be a valid C string).Note: this method is not complete, we should also check that the parameter is located after the first "?"
in the url, and right after the "?"
or a "&"
, and decode the argument. This is left as a bonus exercise for those interested.
Regarding the return values:
+ERR_INVALID_ARGUMENT
;out_len
), return ERR_RUNTIME
;We strongly recommend you to write at least a few unit tests for this function. Here are some test values:
+"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "res" -> "orig"
+"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "img_id" -> "mure.jpg"
+"http://localhost:8000/imgfs/read?res=orig&img_id=mure.jpg", "max_files" -> <not found>
+
+Writing tests of your own is something really important in real-life projects.
+http_parse_message()
The most complex function of this module is definitely http_parse_message()
, the aim of which is to parse a HTTP message (only the ones needed for our purposes; not the full RFCs (9110-9112)).
Such a message is made of:
+HTTP_LINE_DELIM
); this first line either describes the requests to be implemented, or its status (successf/failure);HTTP_HDR_END_DELIM
) indicating the end of the header;For simplicity, we will consider the start-line to be part of the header (we will call "the header", the start line an all the non empty header lines).
+For instance, the message:
+GET /imgfs/read?res=orig&img_id=mure.jpg HTTP/1.1
+Host: localhost:8000
+User-Agent: curl/8.5.0
+Accept: */*
+
+consists only of a header (no body).
+This example:
+POST /imgfs/insert?&name=papillon.jpg HTTP/1.1
+Host: localhost:8000
+User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0
+Accept: */*
+Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
+Accept-Encoding: gzip, deflate, br
+Referer: http://localhost:8000/index.html
+Content-Length: 72876
+Origin: http://localhost:8000
+DNT: 1
+Connection: keep-alive
+Sec-Fetch-Dest: empty
+Sec-Fetch-Mode: cors
+Sec-Fetch-Site: same-origin
+
+<some binary data>
+
+has a 14 lines long header, a blank line (HTTP_HDR_END_DELIM
actually), and then a body of 72876 bytes (a JPG image in this case). The body length is indicated by the Content-Length:
header line.
For such a complex task, we recommend, as usual, to split it into relevant pieces; for instance (please read below, next 3 subsections! But also feel free to follow your own way):
+"Host"
from "Host: localhost:8000"
with delimiter HTTP_HDR_KV_DELIM
;Of course, feel free to develop more tool functions if appropriate.
+get_next_token()
To ease the treatment of a message, we propose you to write a tool function that extract the first substring (= prefix) of a the string before some delimiter:
+static const char* get_next_token(const char* message, const char* delimiter, struct http_string* output)
+
+For instance get_next_token("abcdefg", "de", &token)
will put "abc"
(as an HTTP string, not a C string) into token
, and return a pointer to "fg"
.
+If output
is a NULL
pointer, which can be accepted, simply don't store the value. This may be useful to simply skip tokens without storing them.
Note: this function must not perform any copies of the string, instead the output
must contain a reference inside the message
string.
As for http_get_var()
, you will need to write unit tests for your function. This is a bit trickier, since it's a static function -- i.e. it cannot be used from another .c
file. You can use the following workaround:
get_next_token()
(without static
) and add #define IN_CS202_UNIT_TEST
;http_prot.c
instead of static
, use the static_unless_test
defined as follows:#ifdef IN_CS202_UNIT_TEST
+#define static_unless_test
+#else
+#define static_unless_test static
+#endif
+
+You can thus call get_next_token()
from your unit-text .c
code.
Here are some suggestions of test data:
+message, delim -> output, return value
+
+"abcdefg", "de" -> "abc", "fg"
+"Content-Length: 0\r\nAccept: */*", ": " -> "Content-Length", "0\r\nAccept: */*"
+"0\r\nAccept: */*", "\r\n" -> "0", "Accept: */*"
+
+(for the second example, use HTTP_HDR_KV_DELIM
)
http_parse_headers()
Another tool function that may be worth creating is
+static const char* http_parse_headers(const char* header_start, struct http_message* output)
+
+to fill all headers
key-value pairs of output
(have a look at struct http_message
in http_prot.h
).
For this: until you find an empty line (in the HTML sense: use HTTP_LINE_DELIM
), do
HTTP_HDR_KV_DELIM
and store it as a new key (in the headers
of output
);HTTP_HDR_KV_DELIM
delimiter;HTTP_LINE_DELIM
and store it as the value associated to the preceding key.We found it useful to return the position right after the last header line, i.e. where the body starts; but feel free to choose your own: remember this is a tool function, for you.
+Notice that the above algorithms assumes that HTTP headers end with an empty line, that is that HTTP_HDR_END_DELIM
is simply twice HTTP_LINE_DELIM
, which is indeed the case.
+For those who want to be really strict, you can statically assert this assumption, e.g. by:
_Static_assert(strcmp(HTTP_HDR_END_DELIM, HTTP_LINE_DELIM HTTP_LINE_DELIM) == 0, "HTTP_HDR_END_DELIM is not twice HTTP_LINE_DELIM");
+
+To test this function, you will need the same trick as for get_next_token()
regarding the static
qualifier.
+As a test example, the following string:
"Host: localhost:8000\r\nUser-Agent: curl/8.5.0\r\nAccept: */*\r\n\r\n"
+
+should yield the key-value pairs:
+"Host" -> "localhost:8000"
+"User-Agent" -> "curl/8.5.0"
+"Accept" -> "*/*"
+
+http_parse_message()
Once you have all your desired tools (create more on-the-fly when needed), you can write the parsing of a whole HTTP message.
+HTTP_HDR_END_DELIM
shall be present, otherwise simply return 0, indicating message is incomplete);GET /imgfs/read?res=orig&img_id=mure.jpg HTTP/1.1
) by:
+method
field of out
argument;uri
field of out
argument;HTTP_LINE_DELIM
(it should match "HTTP/1.1"
);"Content-Length"
value from the parsed header lines;Content-Length
header, or is value is 0
) or you were able to read the full body;Test it with:
+"GET / HTTP/1.1\r\nHost: localhost:8000\r\nAc" -> incomplete headers
+"GET / HTTP/1.1\r\nHost: localhost:8000\r\nAccept */*\r\n\r\n\r\n" -> OK
+"GET / HTTP/1.1\r\nHost: localhost:8000\r\nContent-Length: 10\r\n\r\n\01234" -> incomplete body (content_len: 10)
+"GET / HTTP/1.1\r\nHost: localhost:8000\r\nContent-Length: 10\r\n\r\n\0123456789" -> OK
+
+handle_connection()
It's now time to have a more appropriate version of handle_connection()
(in http_net.c
), which is able to properly handle HTTP messages, in a generic manner through a global variable of type EventCallback
: handle_connection()
will do all the generic job and then call EventCallback
for the specific parts to be done.
Rather than simply checking if we have a header containing "test: ok"
(as done last week), we now have a bit more work to do:
rcvbuf
(using our newly created http_parse_message()
);rcvbuf
) to MAX_HEADER_SIZE
plus the content length (this is the reason why we extend only once);tcp_read()
) to the right position, (which is rcvbuf
plus the number of already read bytes);EventCallback
(global variable), it takes as parameters the http_message
and the socket file descriptor;tcp_read()
.And of course, everywhere all errors shall be properly handled, simply (deallocating/closing all what should be and) returning the corresponding error code.
+handle_http_message()
To finalize the work, we still have to write the parts specific to our ImgFS server, the EventCallback
. This is the job of the handle_http_message()
(of imgfs_server_service.c
).
In order to reduce the workload, we provided you the required code in the file week12_provided_code.txt
.
+Include this code in your imgfs_server_service.c
.
The very last step is to have handle_http_message()
as our event handler. This is as easy as passing it to http_init()
rather than NULL
in server_startup()
.
Try it out with the following curl commands:
+curl -i http://localhost:8000/imgfs # Should fail
+curl -i http://localhost:8000/imgfs/read # Should succeed
+curl -i http://localhost:8000/imgfs/insert # Should fail
+curl -X POST -i http://localhost:8000/imgfs/insert # Should succeed
+
+(of course, launch you server first)
+Don't hesitate to look at curl(1)
manpage to create other commands to test your program. This will be especially useful next week when we will build the final HTTP API for our server, which will use more complex requests.
Now that we have the low layers of a quiet generic HTTP server, we can start offering our first real ImgFS services.
+The main goal of this last week is to provide over HTTP, the equivalent of the command-line interface (CLI) commands. When the server will be completed, it will implement the same functionalities as the CLI imgfscmd
, with the exception of the create
command, which remains available only through the CLI.
In index.html
, we provide an example of a client code, written in Javascript (as many of today's web applications) that your can use in your browser to test your server. You can also use curl
on the command line as an alternative client.
We will also take the opportunity to improve our server so has to handle multiple connections through multi-threading.
+There are thus basically three things to be done this week:
+list
command to provide the content in JSON format, useful for Web clients;As usual, we recommend you split the work over the team members. Moreover, remember the early advice and choose what you want to do, or not, in the remaining time.
+This week, we provide you with:
+tests/unit/unit-test-imgfslist.c
;tests/end-to-end/week13.robot
;src/week13_provided_code.txt
, some code to be added to http_net.c
.Normally, the client code provided/src/index.html
was already provided at the beginning of the project.
libjson
You will need the libjson
library, which allows to parse and write data in JSON format. It is the standard format used for Javascript applications, easy to read both for the computer and a human developer (and much more simple than XML).
If your on your own machine and haven't already done it, start by installing the libjson
library:
sudo apt install libjson-c-dev
+
+To check if you have the correct version, use apt-cache show libjson-c-dev
and check that the Homepage
is https://github.com/json-c/json-c/wiki (there may be several variants of this library).
To use the library:
+the interface is defined in <json-c/json.h>
-- worth looking at; add the include
in all source files that needs the library;
add the following lines to your Makefile
:
# Add options for the compiler to include the library's headers
+CFLAGS += -I/usr/include/json-c
+
+# Add the library to the linker
+LDLIBS += -ljson-c
+
+The API's documentation is located there: https://json-c.github.io/json-c/json-c-current-release/doc/html/
+The functions you will need are:
+json_object_put()
; we can ignore its return value.
If any of the above functions returns an error, you must return ERR_RUNTIME
.
do_list()
The first objective is to integrate the JSON format in the application imgfscmd
; this part is independent from the web server integration and can be done in parallel, for example by your teammate.
For this, you have to complement the do_list()
function so that if its output mode is JSON
, it returns a string (rather than directly printing to stdout
as it does when output mode is STDOUT
).
The function must use the libjson
library (see above) to build a JSON object with the following structure:
{
+ "Images": ["pic1", "pic2"] // an array of the strings of the img_id fields from the metadata
+}
+
+It is a JSON object containing an array of string, which are the img_id
of the images in the filesystem, then converts it to a string to return it.
Beware of the lifetime/scope of the data you manipulate! Particularly, the string used in a JSON object are owned by the object, and are freed upon calling json_object_put()
on it.
You can simply test you implementation by punctually editing do_list_cmd()
and changing the call to do_list()
so as to have JSON output rather than usual textual output and use imgfscmd list
to test. For instance:
empty.imgfs -> { "Images": [ ] }
+test02.imgfs -> { "Images": [ "pic1", "pic2" ] }
+
+You can also (non exclusive) launch the two new unit-tests with:
+make test-imgfslist
+
+The next thing to be done is to update handle_http_message()
to serve our needs. For this:
create four functions handle_list_call()
, handle_read_call()
, handle_delete_call()
and handle_insert_call()
; these functions are the equivalent for our server of the do_X_cmd()
functions for the CLI and are detailed below;
for the moment make them simply return reply_302_msg(connection);
adapt handle_http_message()
to call the appropriate function in each case (URI match either /list
, /read
, /delete
or /insert
(and verb is POST
in this later case; as already done last week)
add a first condition which is:
+ if (http_match_verb(&msg->uri, "/") || http_match_uri(msg, "/index.html")) {
+ return http_serve_file(connection, BASE_FILE);
+ }
+
+The server must answer with a valid HTTP response, using the JSON format, at the URI /imgfs/list
. To achieve this, update the
+handle_list_call()
function so as to call do_list()
with the proper format; and then replies.
The HTTP message that the list
command must produce is:
HTTP/1.1 200 OK\r\n
+Content-Type: application/json\r\n
+Content-Length: XXX\r\n\r\n
+YYY
+
+This can easily be achieved with the functions that you implemented in the previous weeks (and the JSON update of do_list()
).
Test this first functionality by launching your server and querying it with curl:
+curl -i 'http://localhost:8000/imgfs/list'
+
+(use curl -v
if you want more information to debug).
Next, update the function handle_read_call()
, equivalent to handle_list_call()
but for the URI /imgfs/read
.
This function must use http_get_var()
to get the following arguments:
res
: the resolution of the image queried; to be converted with resolution_atoi()
(see the read
from imgfscmd
);img_id
: the identifier of the image (its "name").Those two parameters are required, but the order does not matter. Example of URI:
+http://localhost:8000/imgfs/read?res=orig&img_id=pic2
+
+Then call the function do_read()
with the correct arguments.
On success, return the following HTTP response:
+HTTP/1.1 200 OK
+Content-Type: image/jpeg
+Content-Length: <XXX>
+
+<YYY>
+
+The Content-Length
must be the size of the image (in bytes).
+Note: the lines above are, as always, terminated with "\r\n"
, which we do not write anymore for readability)
If an error occurs, call the function reply_error_msg()
.
Test with:
+curl -i 'http://localhost:8000/imgfs/read?res=orig&img_id=pic1'
+
+Test also error cases (missing argument, wrong resolution, ...).
+Implement the handle_delete_call()
to answer the request at the URI /imgfs/delete
.
+Those requests only need one argument: img_id
.
Once the argument (valid) recovered, call the do_delete()
. If successful, return the following HTTP response to make the client reload index.html
:
HTTP/1.1 302 Found
+Location: http://<URL>/index.html
+
+where <URL>
is the HTTP address used by the server.
If an error occurs, call the function reply_error_msg()
, as usual.
Implement the function handle_insert_call()
, the most complex one, to handle the URI /imgfs/insert
.
The insertion logic is different from that used to return a list (list
) or an image (read
). Insertion uses the HTTP POST
command, while the other two use HTTP GET
. Basically, a GET
contains all the arguments in the URI, whereas a POST
has additional arguments in addition to the URI. In particular, the /imgfs/insert
command uses a POST
for the actual content of the image to be inserted.
To avoid overloading the server's RAM, large files are generally sent piece by piece ("chunk") in several successive POST
. To simplify things in this project, we've set an image size limit in index.html
that allows the image to be sent all at once in a single chunk. This avoids having to write the retrieval piece by piece and put it back together in the server!
The handle_insert_call()
function must therefore essentially:
name
), which we'll use as an identifier to insert it into the database;do_insert()
.In the event of an error, be sure to return an appropriate error message.
+If successful, proceed as with delete
to redisplay the index page.
Finally, since image processing uses the VIPS library (indirectly), don't forget to start it (VIPS_INIT
) when you launch the server, and close it (vips_shutdown()
) when you stop it.
To test your web server, simply launch your imgfs_server
after having copied the provided index.html
to your done/
, then open http://localhost:8000/
in a web browser. You should get something like this (depending on the ImgFS with which you run your server; here the test02.imgfs
-- which we always recommend you copy before your tests and test on the copy):
delete
.read
).insert
).You can also test URIs directly, e.g. http://localhost:8000/imgfs/read?res=small&img_id=pic1
to test the "small" resolution, directly in your browser, or e.g. on the command line (in another terminal):
curl -v 'http://localhost:8000/imgfs/read?res=small&img_id=pic1' --output myowntest.jpg
+
+To test an insert with curl
do things like:
curl -v -X POST 'http://localhost:8000/imgfs/insert?name=pic3' --data-binary @../provided/tests/data/brouillard.jpg
+
+Finally, there's always make check
, and then make feedback
, available (tests performed via curl
).
The main problem with the current server design is that we open only one single socket for the communication and that this socket is blocking: only one single communication can occur at a time. This is not convenient for a Web server... (try with several tabs to the same server in your browser).
+The most advance way to solve this problem is to use polling non blocking connections (using poll()
or even epoll()
for larger servers).
+In this project, we choose to implement a simplest way, also illustrating the lectures you soon had: multithreaded blocking connections.
+Each socket will be open in a new thread, thus allowing several parallel communications to the server.
But then, of course, all access to the ImgFS shall be locked. +(We here assume that any interaction with the ImgFS may change its internal state; thus any interaction with it must be locked for the other threads and unlocked as soon as the interaction with the ImgFS is over.)
+In http_net.c
:
It's the handle_connection()
that will be threaded. So we first have to create a thread in http_receive()
. However, in order to avoid race conditions between thread on the active file descriptor used to communicate (the one returned by tcp_accept()
), this value has to be stored (on the heap) separately for each call to http_receive()
.
Concretely, in http_receive()
:
tcp_accept()
be stored on the heap; and of course free()
it whenever needed (don't forget error cases); let's name this value: active_socket
(needed below);PTHREAD_CREATE_DETACHED
some pthread attributes; see pthread_attr_init()
and pthread_attr_setdetachstate()
man-pages; notice that "detached" threads automatically release their resources on exit (but then there is no way to get their return value; we'll ignore them);pthread_create()
) that will run handle_connection()
with active_socket
as a parameter.pthread_attr_t
with pthread_attr_destroy()
.Note: this is a practice exercise for programming threads in C. There is thus a part of understanding, reading man-pages, (asking questions,) on your side.
+Now that handle_connection()
is multi-threaded, we simply don't want the SIGTERM
and SIGINT
signals to be intercepted by it (but leave them to the main thread).
+For this, simply add this code at the beginning of handle_connection()
:
sigset_t mask;
+ sigemptyset(&mask);
+ sigaddset(&mask, SIGINT );
+ sigaddset(&mask, SIGTERM);
+ pthread_sigmask(SIG_BLOCK, &mask, NULL);
+
+Notice also that since handle_connection()
is now multi-threaded, we have to close and release its active_socket
on exit (which, depending on your design, was maybe previously handled by http_receive()
).
Finally, in imgfs_server_service.c
, we have to lock all access to the ImgFS:
pthread_mutex_t
;server_startup()
(see pthread_mutex_init()
man-page); and release it in server_shutdown()
(see pthread_mutex_destroy()
);pthread_mutex_lock()
) and unlock around all your do_X()
calls (all the calls that interact with the ImgFS data).Test the multithreaded approach by launching several client at the same time: multiple tabs in your browser and multiple curl
calls.
So this is the end! Next week will indeed be "free", no new content, only to finalize your project before the deadline which is:
+SUNDAY JUNE 02, 11:59pm
For this deadline, there is nothing special to be done, except to commit and push, and to provide a (short) README.md
file which must contain:
Don't forget to push everything before the above deadline. The content of your project will be the state of your main
branch at the deadline (in case this is relevant for you: thus don't forget to merge your branch(es) into the main
branch).
Le but de ce tutoriel est de vous apprendre à utiliser des outils de débogage des aspects mémoire (dynamiques, donc ; « run-time »). Mais avant tout, n'oubliez pas déjà d'utiliser les autres outils présentés pour le débogage : les options du compilateur, l'analyseur statique et gdb
.
Dans ce tutoriel ci, nous allons présenter Address Sanitizer (alias « ASAN ») et Valgrind.
+Mais pour cela, nous allons avoir besoin de bugs mémoire. Téléchargez ici un programme comprenant un florilège d'erreurs sur les pointeurs :
+Commencez par regarder le programme fourni et comprendre son fonctionnement et ses erreurs (indiquées).
+Avant d'utiliser de nouveaux outils, essayez de compiler puis d'analyser statiquement le code fourni.
+Avec ces outils (options du compilateur et scan-build
), vous devriez facilement trouver les erreurs 1 et 2 ci-dessus.
+Laissez les pour le moment.
Address Sanitizer (alias « ASAN ») est un outil d'analyse des défauts d'accès mémoire utilisant le compilateur. Pour l'utiliser, il faut ajouter l'option
+-fsanitize=address
+
+au compilateur.
+Compilez avec cette option (ainsi que -g
, en tout cas, et toutes les autres options que vous souhaitez), puis lancez le programme. Vous devriez obtenir quelque chose comme :
3-2i
+0
+-5+i
+AddressSanitizer:DEADLYSIGNAL
+=================================================================
+==165699==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55bf9afe15e8 bp 0x7ffd6a193c50 sp 0x7ffd6a193c30 T0)
+==165699==The signal is caused by a READ memory access.
+==165699==Hint: address points to the zero page.
+ #0 0x55bf9afe15e7 in affiche complexe.c:76
+ #1 0x55bf9afe13ab in main complexe.c:38
+ #2 0x7f8874e2cbba in __libc_start_main ../csu/libc-start.c:308
+ #3 0x55bf9afe1159 in _start (complexe+0x1159)
+
+AddressSanitizer can not provide additional info.
+SUMMARY: AddressSanitizer: SEGV complexe.c:76 in affiche
+==165699==ABORTING
+
+Cela vous dit qu'il y a un « Segmentation Fault » (SEGV) dans affiche()
à la ligne 76, et que cette fonction a été appelée depuis la ligne 38 du main()
.
+Voyez-vous de quoi il s'agit ?
Nous allons y revenir plus tard, mais voyons d'abord l'autre outil.
+Valgrind est une suite d'outils d'analyse dynamique de code utilisant une machine virtuelle et la « compilation a la volée » (just-in-time (JIT) compilation).
+Il s'utilise en lançant simplement valgrind
devant le nom du programme à exécuter. Pour cela :
supprimer l'exécutable précédemment compilé (car on ne va pas utiliser en même temps valgrind et ASAN !) :
+ rm complexe
+
+recompilez mais SANS l'option -fsanitize=address
(par contre gardez au moins l'option -g
) ;
lancez :
+ valgrind ./complexe
+
+Vous devriez obtenir quelque chose comme :
+==165821== Memcheck, a memory error detector
+==165821== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
+==165821== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
+==165821== Command: ./complexe
+==165821==
+3-2i
+==165821== Use of uninitialised value of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+==165821==
+==165821== Use of uninitialised value of size 8
+==165821== at 0x109358: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+==165821==
+[... plusieurs répétitions possibles en fonction de votre machine ...]
+0 // ou une autre valeur
+-5+i
+==165821== Invalid read of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== Address 0x0 is not stack'd, malloc'd or (recently) free'd
+==165821==
+==165821==
+==165821== Process terminating with default action of signal 11 (SIGSEGV)
+==165821== Access not within mapped region at address 0x0
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== If you believe this happened as a result of a stack
+==165821== overflow in your program's main thread (unlikely but
+==165821== possible), you can try to increase the size of the
+==165821== main thread stack using the --main-stacksize= flag.
+==165821== The main thread stack size used in this run was 8388608.
+==165821==
+==165821== HEAP SUMMARY:
+==165821== in use at exit: 0 bytes in 0 blocks
+==165821== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
+==165821==
+==165821== All heap blocks were freed -- no leaks are possible
+==165821==
+==165821== Use --track-origins=yes to see where uninitialised values come from
+==165821== For lists of detected and suppressed errors, rerun with: -s
+==165821== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
+Erreur de segmentation
+
+On y voit plus de choses entre l'affichage de a
(3-2i
) et les deux affichages suivants, et même aussi autre chose avant le crash final. De quoi s'agit-il ?
==165821== Use of uninitialised value of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+
+vous dit que dans l'appel à la fonction affiche()
réalisé à la ligne 31 du main()
, vous utilisez une valeur non initialisée.
+Il vous le dit même au moins deux fois de suite. Pourquoi ?
+Simplement parce que (1) le pointeur p_b
n'est pas initialisé (première erreur) et (2) la valeur pointée (adresse quelconque) ne l'a pas non plus été (seconde erreur). Ensuite, en fonction de cette valeur non initialisée, plusieurs lignes de affiche()
sont encore exécutées (ou pas), donnant autant de messages d'erreur.
Enfin le :
+==165821== Invalid read of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== Address 0x0 is not stack'd, malloc'd or (recently) free'd
+
+juste avant le crash, vous dit justement que vous lisez 8 octets (size 8
, soient 64 bits) invalides lors de l'appel affiche()
à la ligne 44 du main()
. C'est la même chose que ce que nous avions déjà vu avec les options du compilateur, l'analyse statique et aussi ASAN (c'est une erreur tellement grosse que tout le monde la voit ! ;-)
).
Il est maintenant temps de corriger ce programme.
+Je vous conseille de toujours commencer par corriger les erreurs détectées avec les outils les plus simples en premier.
+Normalement, si vous avez suivi les conseils des semaines précédentes, vous devriez compiler avec assez d'options pour trouver facilement l'erreur de retour d'adresse de variable locale.
+Corrigez la (supprimez sans autre la fonction bad_addition()
et son appel), puis recompilez. Normalement, cela devrait compiler sans warning (majeur, ceux qui ont -Wcast-qual
, n'utilisez pas cette option ici).
Utilisez l'analyseur statique (scan-build
; revoir si nécessaire les autres outils présentés pour le débogage) pour trouver une autre erreur. Corrigez la (p.ex. en supprimant la ligne 37).
Relancez l'analyseur statique. Il en trouve une autre !
+Corrigez la également (p.ex. en déplaçant la ligne du free
).
Relancez à nouveau l'analyseur statique. Il en trouve encore une autre !!
+Corrigez la aussi (suppression de la ligne).
Relancez encore une fois l'analyseur statique. Il arrive encore à en trouver deux autres !!!!
+Corrigez les aussi (suppression de la ligne 42 et ajout d'un free à la fin).
Relancez pour la dernière fois l'analyseur statique. Ca y est, ça passe !
+Bilan à ce stade : 6 erreurs sur 7 trouvées.
+Compilez en ajoutant ASAN et lancez le programme.
+Il trouve le buffer overflow :
+==178554==ERROR: AddressSanitizer: heap-buffer-overflow on address [...]
+WRITE of size 16 at [...]
+ #0 0x556e52e43592 in main complexe.c:56
+[...]
+
+Laissez la pour le moment et voyons ce que dit valgrind
.
Supprimez l'exécutable et recompilez le sans ASAN ; puis lancez le avec valgrind
.
+Il la trouve aussi :
==178672== Invalid write of size 8
+==178672== at 0x1092D3: main (complexe.c:56)
+==178672== Address 0x4a39540 is 0 bytes after a block of size 32 alloc'd
+==178672== at 0x4838B65: calloc (vg_replace_malloc.c:762)
+==178672== by 0x109290: main (complexe.c:53)
+==178672==
+==178672== Invalid write of size 8
+==178672== at 0x1092D6: main (complexe.c:56)
+==178672== Address 0x4a39548 is 8 bytes after a block of size 32 alloc'd
+==178672== at 0x4838B65: calloc (vg_replace_malloc.c:762)
+==178672== by 0x109290: main (complexe.c:53)
+
+C'est la même erreur que celle pointée par ASAN, mais valgrind la voit en 2 écritures de 8_octets, alors que ASAN la reportée comme une écriture de taille 16 octets. C'est une question de point de vue (les deux champs du Complexe
, ou tout le Complexe
lui-même).
Corrigez l'erreur et retestez avec ASAS et avec valgrind.
+C'est une question de goût. A vous de voir à l'usage.
+Y a-t-il des erreurs que l'un voit et pas l'autre ? Personnellement, je n'en sais rien. Et j'utilise les deux pour être sûr ;-)
Ces outils peuvent aussi détecter les fuites de mémoire (que l'analyseur statique auraient ratées). Par exemple (supprimez les free que vous aviez ajouté) :
+valgrind --leak-check=full ./complexe
+
+(ASAN n'a a priori pas besoin d'option supplémentaire. Si ce n'est pas le cas sur votre machine, faites :
+export ASAN_OPTIONS=detect_leaks=1
+
+)
+ +Docker est une alternative aux machines virtuelles. C'est une architecture logicielle qui permet d'exécuter du code (y compris des logiciels système) en local sur sa machine mais dans un environnement isolé (appelé « container »). Pour en savoir plus : voir la page Wikipédia.
+Docker utilise deux concepts de base:
+Il est important de bien comprendre la distinction des deux. En particulier, des modifications effectuées dans containers ne vont pas affecter son image (contrairement à ce qui se passe avec une machine virutelle par exemple).
+Pour voir toutes les images disponibles sur votre machine (une fois Docker installé, cf ci-dessous) :
+docker images
+
+Pour voir tous les containers en cours sur votre machine :
+docker ps -a
+
+L'installation de Docker sur votre machine est normalement assez facile. Voir leur page d'installation pour plus de détails.
+Vérifiez que Docker fonctionne
+docker run hello-world
+
+En cas de succes, vous verrez un message de confirmation "Hello from Docker".
+Par contre, certaines installations nécessite des privilèges additionnels pour tourner Docker.
+Si vous avez un message d'erreur, suivez les instructions ici et changez les permissions du socket si nécessaire.
+Pour Ubuntu comme host, la marche à suivre est:
+sudo groupadd docker
+sudo usermod -aG docker ${USER}
+sudo chmod 666 /var/run/docker.sock
+
+Si vous souhaitez développer sous Docker (mais ce n' est pas ce que nous recommandons comme premier choix, en particulier à celles et ceux qui ne codent pas sur la ligne de commande (vim
) mais utilisent plutôt une interface graphique), vous pouvez créer votre propre image de travail.
+Si vous n'utiliserez Docker que pour recevoir le feedback du cours, il n'est pas nécessaire de faire une image spécifique (nous fournirons notre image pour les tests).
Docker propose déjà plusieurs images sur son « hub ». Le plus simple pour créer une image de développement pour le cours, c'est de partir d'une image Ubuntu :
+docker pull ubuntu
+
+Puis lancer l'image (= créer un nouveau container ; voir tout en bas de la page pour un rappel des principales commandes) :
+docker run -ti ubuntu bash
+
+Au cas où l'image ne serait pas à jour (ces commandes sont à exécuter dans le shell du container Ubuntu) :
+apt update
+apt upgrade -y
+
+Installation des outils nécessaires pour le cours (cette commande est à exécuter dans le shell du container Ubuntu) :
+apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Quittez le container :
+exit
+
+Créez une nouvelle image à partir de nouvel état de votre container :
+cherchez l'id du container :
+ docker ps -a
+
+créez l'image :
+ docker commit CONTAINER_ID projet-cs212
+
+en remplaçant CONTAINER_ID
par le bon id; par exemple :
docker commit 55959d62b348 projet-cs212
+
+supprimez le container :
+ docker rm CONTAINER_ID
+
+Vous pouvez maintenant lancer votre nouvelle image et, par exemple, y compiler votre projet.
+Pour cela, nous vous conseillons de « mounter » le répertoire où se trouvent vos codes sources sur le container avec l'option -v
de docker run
.
Par exemple, si vous êtes sur une machine Unix, allez dans le répertoire de vos codes sources et faites :
+docker run -ti --rm -v $(pwd):/localhost projet-cs212
+
+Autre exemple :
+docker run -ti --rm -v /home/chezmoi/projet:/localhost projet-cs212
+
+Vous aurez alors accès dans le container à vos fichiers via /localhost
. Par exemple :
ls /localhost
+
+Vous pouvez alors y compiler votre projet. Par exemple :
+cd /localhost
+make
+
+Quittez le container avec :
+exit
+
+(ou simplement CTRL-D).
+obtenir de l'aide :
+ docker help COMMAND
+
+par exemple :
+ docker help ps
+
+Liste de tous les containters :
+ docker ps -a
+
+Supprimer un container :
+ docker rm CONTAINER_ID
+
+Supprimer tous les containers :
+ docker rm $(docker ps -aq)
+
+Liste de toutes les images :
+ docker images
+
+Supprimer une image (qui n'a plus de container) :
+ docker rmi IMAGE_ID
+
+Créer un container en mode interactif et le supprimer automatiquement en fin :
+ docker run -ti --rm CONTAINER_ID
+ docker run -ti --rm CONTAINER_ID COMMANDE
+
+par exemple :
+ docker run -ti --rm ubuntu bash
+
+Créer un container en mode interactif sans le supprimer automatiquement :
+ docker run -ti CONTAINER_ID
+
+Créer un container en mode interactif en « mountant » le système de fichier local (c.-à-d. en ayant dans le container un accès à un endroit du disque local) :
+ docker run -ti --rm -v local_dirname:container_dirname CONTAINER_ID
+
+par exemple :
+ docker run -ti --rm -v /home/machin:/tmp/home_machin_local CONTAINER_ID
+
+Note : les noms de fichiers/répertoires doivent être absolus (pas relatifs).
+Redémarrer un container en pause :
+ docker start CONTAINER_ID
+
+Copier des fichiers entre la machine locale et un container :
+de la machine locale au container :
+docker cp local_filename CONTAINER_ID:where_to_put
+
+du container à la machine locale :
+docker cp CONTAINER_ID:where_to_get local_filename
+
+Créer une nouvelle image à partir de l'état d'un container :
+ docker commit CONTAINER_ID IMAGE_ID
+
+Un débogueur est un programme qui permet de suivre le déroulement d'un autre programme, de l'arrêter, d'ausculter l'état de la mémoire (valeur de variables par exemple), etc. ; ce qui est particulièrement utile pour rechercher des erreurs de programmation.
+Nous expliquons ici les bases de l'utilisation d'un débogueur à l'aide du débogueur gdb
à la ligne de commande, mais vous pouvez bien sûr utiliser des versions avec interface graphique, souvent intégrées dans les IDE ; les principes de base restent les mêmes ; en salles CO, vous avez par exemple ddd
ou le module debuger
intégré dans Geany
pour lequel vous pouvez trouver un tutoriel là-bas (attention ! il s'agit d'un autre cours) ; pour d'autres GUI voir ce lien, parmi lesquelles nous vous recommandons gdbgui (site officiel ; site GitHub).
Vous pouvez aussi utiliser un autre débuger, comme par exemple lldb
; là aussi, les principes de base restent les mêmes. La correspondance entre les commandes gdb
et lldb
se trouve ici.
NOTE pour macOS : depuis OS X 10.9, Apple est passé à LLVM ; il n'y a donc plus gdb
de base. Si vous êtes sur Mac, vous avez alors deux options :
soit utiliser lldb
;
soit installer gdb
(via brew
) et le signer ;
+OS X a un mécanisme de contrôle d'accès aux autres processus qui nécessite un binaire signé (ce qui est nécessaire pour un débuggeur) ;
+pour signer le binaire gdb
après son installation, il faut suivre les instructions qu'on peut trouver sur Internet ; par exemple :
gdb-entitlement.xml
» est nécessaire pour les Mac avec système d'exploitation Big Sur.Avant de lire la suite, nous vous proposons en guise d'introduction de regarder un tutoriel vidéo de 23 minutes, crée par Chris Bourke et disponible sur Youtube. Ce tutoriel explique comment trouver des erreurs dans du code en utilisant le débogueur gdb
. La plupart des notions évoquées dans cette vidéo sont ensuite reprise pas à pas dans la suite de ce tutoriel.
Quelques remarques pour vous faciliter la comprehension de ce tutoriel vidéo :
+À 1:08, Chris dit que le débogueur, dans son exemple, se lance avec la commande suivante : gdb a.out
. Il faut remarquer que, pour vous, a.out
devrait être remplacé par le nom de votre programme (executable).
À 1:14, il faut (temporairement) ignorer l'histoire d'arguments du programme. Vous verrez ca plus tard dans le cours.
+À 2:34, il lance le programme avec la commande ./a.out
. Ici aussi, utilisez le nom de votre programme (executable) à la place de a.out
.
Jusqu'à 16:17, tout devrait être assez clair (sauf les arguments du programme en 1:14, comme dit ci-dessus). À partir de là, il utilise des notions de C non encore vues en cours. Vous pouvez donc arrêter ici cette vidéo et y revenir plus tard, ou continuer à la regarder pour voir comment utiliser gdb
mais sans chercher à comprendre en profondeur les problèmes de C qui sont évoqués :
Vous êtes maintenant prêts à lire et suivre les instructions et explications ci-dessous.
+La première chose à faire pour pouvoir utiliser un débogueur est de demander au compilateur de mettre des informations supplémentaires dans le programme afin de permettre au débogueur de se repérer. Cela se fait en ajoutant l'option -g
lors de la compilation. Par exemple :
gcc -g -o mon_programme mon_programme.c
+
+Compilez de la sorte un des programmes fournis ; p.ex. :
+gcc -g -std=c99 -o ex1 ex1.c
+
+ou
+gcc -g -std=c99 -o stats stats.c -lm
+
+NOTE : nous utiliserons pour ce cours la norme C99 ; pour certains compilateurs, la compilation peut alors nécessiter l'ajout de l'option -std=c99
comme indiqué ci-dessus. Vous pouvez bien sûr aussi utiliser des normes plus récentes (p.ex. -std=c17
).
Ensuite, on peut exécuter le programme dans le débogueur. On lance pour cela le débogueur avec comme argument le programme à déboguer ; p.ex. :
+gdb ./ex1
+
+ou
+gdb ./stats
+
+On se retrouve dans le débogueur (c'est ici un interpréteur de commandes), dans lequel on ne voit pas grand chose pour le moment. Pour voir le code, tapez
+layout src
+
+Le code ne s'affiche pas encore car gdb
n'a pas encore lancé notre programme.
+Lancez simplement son exécution avec la commande :
run
+
+Le programme se déroule alors normalement (on peut déjà remarquer l'un ou l'autre bugs ;-)
. Tapez Ctrl-C
pour l'arrêter quand vous en avez assez).
Tapez
+quit
+
+pour quitter le débogueur.
+Si ce n'est pas déjà fait, ouvrez le code stats.c
dans un éditeur pour voir de quoi il s'agit.
+Le but de ce programme est de calculer la moyenne et l'écart-type (non biaisé) de l'âge d'un ensemble de 1 à 1024 personnes.
Vous voyez au début du programme une variable nb_people
qui est lue au clavier à la ligne 22. Utilisons le débogueur pour aller voir la valeur lue.
+Pour cela, relancez le débogueur sur notre programme :
gdb ./stats
+
+puis
+layout src
+
+Mais cette fois, ajoutons un « point d'arrêt » (breakpoint) avant de lancer l'exécution. Cela se fait à l'aide de la commande break
:
break 22
+
+NOTE : pour en savoir plus sur cette commande, vous pouvez taper :
+help break
+
+Vous verrez alors que l'on peut non seulement indiquer des numéros de ligne, mais aussi des noms de fonctions (entre autres).
+On peut par ailleurs mettre autant de point d'arrêt que l'on veut.
Une fois le point d'arrêt placé, lancer l'exécution :
+run
+
+Cette fois le débogueur arrête l'exécution du programme à la ligne 22 et vous l'indique.
+Vous pouvez à ce stade donner des commandes au débogueur comme voir la valeur d'une variable, avancer d'un pas l'exécution du programme, continuer l'exécution ou ajouter un autre point d'arrêt.
+Commençons par regarder la valeur de la variable nb_people
:
print nb_people
+
+vous affiche le résultat :
+$1 = 0
+
+($1
veut simplement dire que c'est la première expression que vous avez demandé qui est ici affichée).
NOTE : toutes les commande gdb
peuvent être abrégées tant qu'elles ne sont pas ambiguës. Ici, on aurait donc simplement pu entrer :
p nb_people
+
+A noter aussi qu'on a la complétion automatique avec la touche TAB
. Essayez :
p nb_<TAB>
+
+Mais il reste néanmoins fastidieux de toujours avoir à retaper des commandes print
utiles. Il existe deux moyens d'éviter cela :
gdb
garde toutes les commandes en mémoire ; il suffit donc de naviguer dans l'historique avec les flèches (Haut et Bas) pour retrouver une commande déjà entrée ;display
affiche automatiquement l'expression demandée à chaque arrêt du débogueur (si tant est que l'expression fait sens à l'endroit de l'arrêt).Essayons la commande display
(on verra mieux son effet dans un instant) :
display nb_people
+
+Essayons maintenant de continuer l'exécution.
+Si vous ne savez plus où vous en êtes dans le programme, la commande :
where
+
+vous l'indiquera (ici : dans la fonction main()
à la ligne 22 du programme stats.c
).
NOTE : where
est en fait un alias pour backtrace
ou bt
, qui sont aussi souvent utilisés.
Pour avancer d'un pas, tapez :
+next
+
+Le débogueur exécute alors le scanf
. C'est pour cela que vous avez le texte de la question qui apparaît.
+Répondez-y.
Le débogueur vous indique alors s'être arrêté à la ligne 26 (vu qu'il n'y a pas de code aux lignes 23 à 25).
+La commande next
n'exécute en effet qu'une seule ligne du programme.
+Si l'on avait voulu continuer l'exécution sans ne plus s'arrêter (en fait : continuer jusqu'au prochain point d'arrêt, mais comme nous n'en avons pas d'autre...), on aurait utilisé la commande (ATTENTION ! NE le faites PAS ici) :
cont
+
+Vous pouvez également remarquer qu'en plus de la ligne 26, le débogueur vous a affiché la nouvelle valeur (celle saisie) de la variable nb_people
. C'est le résultat de votre display
précédent. Sans cette commande display
, la nouvelle valeur n'aurait pas été affichée et il vous aurait fallu entrer un nouveau print
pour la voir.
REMARQUES :
+next
peut s'abréger n
;
si l'on entre aucune commande, c'est simplement la commande précédente qui s'applique à nouveau ; cela est particulièrement pratique avec next
: il suffit d'appuyer ensuite sur Enter plusieurs fois pour avancer pas à pas ;
next
peut être complété d'un nombre de répétitions :
next 8
+
+fera par exemple 8 fois next
;
+next
tout seul est donc la même chose de next 1
;
Une confusion fréquente lors de la prise en main de débogueur est celle entre next
+et step
:
next
passe à l'expression suivante en restant au même niveau ; sans rentrer dans les sous-routines (= appel de fonctions) ;step
passe à la prochaine expression à évaluer, où qu'elle soit ; même si celle-ci est dans une sous-routine (et même si ce n'est pas une sous-routine à nous).Illustrons cela en ajoutant un point d'arrêt supplémentaire un peu plus loin :
+break 42
+
+et continuez l'exécution jusque là-bas avec un simple :
+cont
+
+(répondez normalement aux questions).
+Arrivé à la ligne 42, tapez
+next
+
+pour continuer. Vous voyez que la ligne 42 est exécutée et que l'on passe à la ligne 43.
+Reprenons l'exemple en relançant l'exécution depuis le début :
+run
+y
+
+Le débogueur arrête à nouveau l'exécution à la ligne 22. Comme cela ne nous intéresse plus, supprimons ce point d'arrêt :
+info br
+
+nous montre qu'il s'agit du point d'arrêt numéro 1 ; que l'on supprime :
+delete 1
+
+Puis l'on continue l'exécution :
+cont
+
+jusqu'à la ligne 42.
+Si l'on tape maintenant step
au lieu de next
, on passe à la ligne...
+...28 ? [Note : cela ne fonctionne pas sur macOS sur cet exemple (printf
), mais fonctionnera avec vos propres fonctions.]
__printf (format=0x40094d "\nMoyenne : %g\n") at printf.c:28
+28 printf.c: No such file or directory.
+
+Oui, 28 ! Mais pas de notre programme ; la ligne 28 de printf.c
qui est le fichier qui a été compilé (il y a bien longtemps) pour donner le code de printf
dans la bibliothèque C !
+Et auquel nous n'avons pas accès (il n'est certainement pas sur votre ordinateur).
Que s'est il passé ?
+Avec le step
, nous sommes passés à la prochaine instruction C, qui se trouve en fait être à l'intérieur de printf
lui-même (il a bien fallu l'écrire !!).
Essayez encore quelques step
(au moins 7). Vous voyez que l'on « s'enfonce » dans la bibliothèque C...
+Un
where
+
+après plus de 7 step est d'ailleurs intéressant :
+#0 _IO_vfprintf_internal (s=0x7ffff7ad1740 <_IO_2_1_stdout_>, format=0x40094d "\nMoyenne : %g\n", ap=ap@entry=0x7fffffffdcc8) at vfprintf.c:1278
+#1 0x00007ffff7781209 in __printf (format=<optimized out>) at printf.c:33
+#2 0x0000000000400839 in main () at stats.c:42
+
+Nous sommes dans une fonction _IO_vfprintf_internal
qui a elle-même été appelée par une fonction __printf
que nous avons appelée depuis la ligne 42 de notre programme.
+Ca commence à ressembler aux messages d'exceptions de Java ;-)
!
Comme on est perdu, terminons l'exécution du programme (et ce tutoriel) avec un simple
+cont
+
+layout src
run
ou r
help
break NUMERO_DE_LIGNE
ou br NUMERO_DE_LIGNE
break NOM_DE_FONCTION
ou br NOM_DE_FONCTION
delete
info br
where
ou bt
(ou backtrace
)
print
ou p
display
cont
ou c
next
ou n
step
ou s
Plus tard dans le projet, vous utiliserez peut être des tests unitaires avec la bibliothèque check
. Mais ces tests unitaires se lancent un nouveau sous-processus par test (fork()
) et c'est donc plus difficile à suivre. Si vous souhaitez debogguer avec gdb
ces programmes de tests-unitaires, voici quelques compléments :
entrez ces options dans gdb
:
set follow-fork-mode child
+ set detach-on-fork off
+
+suivez dans quel sous-processus vous êtes avec la commande :
+ info infe
+
+changez de processus avec infe
suivi d'un numéro (tel qu'indiqué par info infe
) ; p.ex. :
infe 1
+
+ne mettez pas de breakpoints sur le code des unit-test-*
eux-mêmes (car ils sont écrit avec des macros en fait), mais sur du « vrai » code C, soit celui des fonctions-outils utilisées pour ces tests, soit carrément sur votre propre code à vous.
Exemple :
+Supposons que ce soit dans le 5e test que vous ayez des problèmes. Ce sera donc le 5e sous-processus qui vous intéresse.
+Commencez alors comme d'habitude par lancer le débogueur sur le programme de tests-unitaires :
+gdb ./unit-test-machin
+
+Ajoutez les options suggérées :
+set follow-fork-mode child
+set detach-on-fork off
+
+Mettez le breakpoint à l'endroit qui vous intéresse, p.ex. ici sur une fonction fait_machin_truc()
:
break fait_machin_truc
+
+Et lancez l'exécution dans le débogueur :
+run
+
+gdb
s'arrêtera au premier break (ou alors au premier crash ;-)
).
+On regarde où l'on se situe :
info infe
+
+On est p.ex. dans le 2e processus, c.-à-d. dans le 1er test (car le processus 1, c'est le main()
et les tests créent un sous-processus à chaque fois) ; ce n'est pas celui-ci qui nous intéresse, donc on continue :
cont
+
+gdb
nous dit alors, par exemple, que le 2e process (« Inferior 2
») est fini, mais il s'y trouve encore (faites « info infe
» pour voir). Il faut donc ramener gdb
au process père :
infe 1
+
+et on continue :
+cont
+
+Il nous arrête à nouveau au breakpoint. On regarde à nouveau où l'on est :
+info infe
+
+...Et on continue comme ça jusqu'au breakpoint qui nous intéresse.
+Là on peut faire des next
, display
, print
etc. comme d'habitude.
On peut comme celà « se promener » de processus en processus (infe <numero>
) et savoir où on est (info infe
).
+Avec un peu d'habitude on arrive à s'y retrouver ;-)
La première chose à faire avant d'utiliser git
est de le configurer.
+Ceci n'est à faire qu'une seule fois.
Dans la ligne de commande, tapez le code suivant, ligne par ligne,
+en remplaçant les #<XXXX>#
par des informations personnelles correspondantes:
git config --global user.name #<UN USERNAME>#
+git config --global user.email #<VOTRE EMAIL EPFL>#
+
+Ensuite, si vous aimez la couleur sur le terminal, vous pouvez ajouter :
+git config --global color.diff auto
+git config --global color.status auto
+git config --global color.branch auto
+
+Ajoutons quelques alias. Pour cela, créez/éditez un fichier ~/.gitconfig
+(i.e. fichier nommé « .gitconfig » à la racine de votre répertoire personnel.
+Sur les machines du CO, pensez à le recopier dans votre myfiles
).
+Mettez y les lignes suivantes :
[alias]
+ lg = log --graph --abbrev-commit --decorate --date=relative --format=format:'%C(bold blue)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(dim white)- %an%C(reset)%C(bold yellow)%d%C(reset)' --all
+ glog = log --graph --decorate --oneline --all
+unstage = reset HEAD --
+ last = log -1 HEAD
+
+Avant de se lancer dans l'utilisation de git
, il faut en comprendre le but et la logique :
le but est de travailler en commun sur du contenu partagé et, pour cela, archiver les différentes versions (git
est un « gestionnaire de versions ») ;
la logique est d'avoir trois niveaux :
+git clone
et dans lequel vous allez travailler (chacun(e)).Le niveau 2 est en fait assez « abstrait » au sens où vous ne le voyez pas concrètement ; il est totalement géré par des commandes git
.
Pour faire passer quelque chose du niveau 1 aux niveaux 2 et 3 en même temps (c.-à-d. récupérer quelque chose mis à disposition par d'autres dans le server principal), on fait (toutes ces commandes dont détaillées ci-dessous ; ici on se concentre sur les concepts) :
+git pull
+
+Je vous conseille de le faire assez régulièrement et en tout cas systématiquement avant vos commit
/push
(expliqués ci-dessous).
Pour valider quelque chose de local, c.-à-d. passer du niveau 3 au niveau 2 uniquement :
+soit, pour valider une nouvelle version d'un fichier déjà connu :
+ git commit -m "MESSAGE" FICHIER
+
+veuillez à chaque fois mettre un message pertinent ;
+p.ex., pour valider une nouvelle version du fichier core.c
:
git commit -m "correction du bug de calcul" core.c
+
+soit, pour ajouter un nouveau fichier (tout ceci est repris en détails ci-dessous) :
+ git add FICHIER
+ git commit -m "ajout de EXPLICATION"
+
+p.ex. pour ajouter le nouveau fichier io.c
:
git add io.c
+ git commit -m "ajout des entrées/sorties"
+
+Note : pas besoin de remettre le nom du fichier au commit
suivant un ou des add
; cela permet en fait d'ajouter plusieurs fichiers d'un coup ; par exemple :
git add io_core.c
+ git add io_errors.c
+ git add io.h
+ git commit -m "ajout des entrées/sorties"
+
+Pour publier en commun ses validations locales, c.-à-d. faire passer du niveau 2 au niveau 1 :
+git push
+
+Nous insistons donc sur le fait que pour publier à tous une modification locale, il faut bien faire DEUX choses :
+git commit
+
+puis
+git push
+
+Reprennons tout ceci (et plus) en détails.
+Voyons le premier principe de git
: archiver le travail effectué.
+Pour cela créez un répertoire et allez-y :
mkdir alice
+cd alice
+
+Ajoutez y un fichier :
+echo "This is a README file" > README.md
+
+puis archivez dans git
l'état courant de ce répertoire :
git add .
+git commit -m "Initial commit: README"
+
+git add
vous permet de proposer (sans que ce soit encore confirmé) un changement ;
+ici nous avons mis tout le répertoire courant avec son nom court : «
+.
», mais on aurait aussi pu ne mettre qu'un fichier, par exemple :
git add README.md
+
+Nous vous DÉCONSEILLONS d'ailleurs de faire des git add .
car cela ajoute souvent plein de mauvaises choses : tous les fichiers du répertoire courant, y compris des fichiers temporaires, brouillons, etc.
Nous vous conseillons par ailleurs de faire un git status
AVANT de faire vos git commit
pour bien vérifier ce que vous ajoutez.
git commit
confirme l'enregistrement (local) des changements proposés ;
+il est fortement recommandé (si ce n'est obligatoire ;-)
) de
+commenter ses changements en ajoutant un message au commit
; c'est
+ce que nous avons fait avec l'option -m
.
Si quelque chose est modifié, Git peut vous le dire:
+echo "A second line for the README" >> README.md
+git status
+
+On pourrait proposer d'ajouter cette nouvelle modification pour un commit
futur:
git add README.md
+
+Cette façon de procéder en deux (puis trois, comme nous verrons tout à +l'heure) étapes peut parraître fastidieuse, mais c'est une bonne +protection contre les bêtises et un bon moyen de faire les choses +petit à petit, une à une.
+Une fois que vous êtes prêt à enregistrer (localement) vos modification, faites un commit
...
+...sans oublier d'ajouter un commentaire pertinent avec -m
:
git commit -m "Adding a second line to the README file"
+
+Avec Git, on peut voir tout le états enregistrés (snapshots) et même +se déplacer de l'un à l'autre (mais c'est plus avancé et vous ne +devriez pas en avoir besoin) :
+git log
+
+ou:
+git lg # si vous avez défini l'alias plus haut...
+
+Pour se déplacer (en guise ici d'illustration, mais ce n'est pas +nécessaire de comprendre cette partie au niveau de ce cours) :
+git checkout 5d340 # Mettez un numéro de commit approprié, ancien
+cat README.md
+
+Voyez que c'est une ancienne version. +Revenons à l'état courant :
+git checkout master
+cat README.md
+
+Vous avez maintenant compris la notion d'états archivés par Git (snapshots), et donc la +différence entre le répertoire de travail courant et l'archive.
+Le deuxième concept qu'il faut bien comprendre c'est les DEUX archives qui existent.
+Git permet en effet de travailler à plusieurs (cf section suivante) et utilise pour cela deux archives différentes :
Pour « pousser » vos changements enregistrés localement (avec des commit
) vers le server central, il faut faire :
git push
+
+Je vous recommande grandement de faire au préalable un
+git pull
+
+avant chacun de vos
+git push
+
+Le pull
permet de synchroniser dans l'autre sens : aller chercher
+les modifications enregistrées dans le server et les appliquer
+localement.
Plus de détails dans la suite...
+Git est avant tout un outil de travail collaboratif que beaucoup de gens utilisent
+justement pour travailler « en parallèle ». Il est donc prévu pour faciliter la
+gestion de modifications « concurrentes » (ou en tout cas « parallèles » ;-)
).
Supposons que Alice a un collaborateur, Bob, sur son projet, et qu'il ait fait +des modifications de son coté :
+echo "Hey, this is a line added by Bob" >> README.md
+git commit -m "Add greeting from Bob" README.md
+
+pendant, qu'en parallèle, Alice continuait aussi à travailler :
+sed -i 's/This is a README file/This is a README file with a twist/' README.md
+git commit -am "Add a twist to the first line of the README"
+
+Où en sommes nous ? Quels sont les états de Git ?
+A partir du dernier état commun (dernier pull
des 2 cotés), il y a en fait
+deux commit
bien séparés :
Jusque là, pas de confusion possible, donc.
+Alice et Bob peuvent maintenant collaborer en partageant leurs contributions.
+Supposons que Bob « pousse » le premier (pas besoin de pull
avant ici) :
git push
+
+Quand Bob fait cela, le server central reçoit et enregistre la modification de Bob. Pas de problème ici.
+Un git lg
du coté Bob montre que origin/master
(celui du server) et master
(celui local) sont
+maintenant les mêmes.
Alice de son coté ne sait pas que le changement de Bob a été propagé au server central. +Quand elle essaye de « pousser » ses modifications vers le server :
+git push
+
+elle rencontre un problème : un message lui dit que son push
a
+échoué et qu'elle doit d'abord fetch
(= récupérer) les modifications
+enregistrées sur le server central...
+Ce qu'elle fait docilement :
git fetch
+
+Avec un
+git lg
+
+de son coté, Alice voit qu'elle a maintenant deux commit
: un appelé
+origin/master
, qui correspond à celui de Bob et un autre appelé
+master
ou HEAD
qui correspond au sien.
+Si Alice veut « pousser » ses modifications vers le server central, elle doit d'abord
+fusionner/regrouper (= merge) ces 2 états différents. Si il n'y a pas de conflit (modifications
+parrallèles non concurrentes), cela se fait simplement comme suit :
git merge
+# Enter a commit message...
+
+Elle peut vérifier l'état :
+git lg
+
+puis « maintenant » pousser ce nouvel état, résultant de la fusion des deux modifications :
+git push
+
+A partir de là, Alice et le server central sont synchronisés. Pour que Bob soit +aussi synchronisé, il lui faut aussi faire un
+git fetch
+
+ou plus simplement un
+git pull
+
+Cette commande (git pull
) permet de faire un fetch
puis un merge
d'un seul coup.
Un « tag » (étiquette) est simplement un nom donné à un état mémorisé (snapshot).
+Contrairement aux années précédentes (pour ceux qui auraient connu), nous ne les utiliserons pas spécialement. Mais vous pouvez les utilisez, si vous le souhaiter, pour marquer une version particulière de votre projet, typiquement pour vous souvenir d'une version stable. Mais c'est un détail.
Pour donner une étiquette à l'état courant, il suffit simplement de faire :
+git tag -a NOM_DU_TAG -m "message"
+
+Par exemple, si vous voulez nommer l'état courrant « version1.1 », vous faites :
+git tag -a version1.1 -m "Version 1.1 stable"
+
+La commande
+git tag
+
+donne simplement la liste de tous vos « tags ».
+Pour voir à quoi correspond un « tag » donné : git show NOM_DU_TAG
; par exemple :
git show version1.1
+
+Pour pousser le tag vers GitHub, ajoutez --tags
au push:
git push --tags
+
+Si vous souhaitez en apprendre plus sur Git et GitHub, vous pouvez aller voir ce tutoriel (en anglais).
+ +Here (as a quick introduction, or later as a reminder) is the bare minimum you need to know (but you are, of course, welcome to read on):
+a Makefile
is just a simple text file (if it's simply called "Makefile
" on its own, with no extension), which is automatically called by the make
command, and which simply contains a "to-do list" (known as "targets");
one line of the Makefile
simply describes one target and what is needed to make it (known as "dependencies"), in the format:
target: list of dependencies
+
+for example (fictitious):
+ cake: flour eggs butter sugar chocolate yeast
+
+and that's it! Simple as that! Except that for us, targets are executables and dependencies are .o
files; for example:
calculCplx: calculCplx.o complex.o calculator.o
+
+compilation dependencies (for the creation of a .o
file, then) are simply the corresponding .c
file, together with the list of required .h
files; e.g.:
calculator.o: calculator.c calculator.h complex.h
+
+Note that all these target-dependency lines for compilation can be obtained simply by typing the command:
+ gcc -MM *.c
+
+Often, by convention, the first target is called "all
" and designates all the executables you wish to build with this Makefile
.
To sum up, here's a simple but complete example of a Makefile
:
all: calculCplx
+
+calculCplx: calculCplx.o complex.o calculator.o
+
+# These lines were copied from the gcc -MM *.c command
+
+complex.o: complex.c complex.h
+
+calculator.o: calculator.c calculator.h complex.h
+
+calculCplx.o: calculCplx.c calcGUI.h
+
+And that's it! As simple as this!
+Note: this is a written tutorial. You might prefer the video lectures; choose your favorite learning way (or maybe benefit from both).
+For the sake of modularization, the source code of a complete program written in C is often distributed over several text files called "source files". Source files are of two kinds: header files and main files (often called "definition files", or even simply "source files", hence some terminological confusion). By convention, header files have the .h
extension, while definition files have the .c
extension.
These are "glued together" by the compiler to create an executable program from the source code.
+A pair (header file, definition file) corresponding to a given concept is called a "module".
+What's the purpose of a header file, then?
+A header file is ther to announce to the other modules the functionality (API) provided by the module it is part of.
+For example, a matrix.h
file will contain the module's API for matrices.
In header files, we typically write:
+#pragma once
(see below);
directives to include the other header files necessary for this header file only (see below);
+(very frequent) declarations of types offered by the module;
+(very frequent) declarations of the functions offered by the module (corresponding to the "public" part in an OO design);
+(frequent) some "macros" (lines beginning with the #define
symbol);
(rare) declarations of (global) variables to be shared with other modules by the current module.
+In the definition file (with extension .c
), we typically write:
directives to include the header files necessary for this source file only (see below);
+declarations of variables or functions used exclusively in the current module;
+definitions of (variables and) shared functions (offered by the header file).
+Header files are not compiled directly into machine code, but their content is copied as a whole into all other modules that include them. These other modules (which need them) request a copy of a header file by indicating #include
followed by the header file name. For example:
#include "matrix.h
+
+in a source file that requires matrices.
+This copy is made by the compiler when compiling the module requesting the inclusion.
+[ Note: the inclusion of "local" files (specific to our application) is written with double quotation marks (e.g. #include "matrice.h"
), whereas the inclusion of standard libraries is written with "angle brackets" (e.g. #include <stdio.h>
)
+]
Compiling a program consists of two main stages:
+the actual compilation stage:
+.o
);the "linking" stage:
+Let's take a look at two examples.
+The sum_odd.c
file provided in done/ex_single
is a (single) source file containing the code to request a positive number n
and then calculate the sum of n
first odd numbers.
The program starts with a #include <stdio.h>
directive which requests the inclusion (= copying) of standard definitions (std
) for input-output (io
), such as printf()
.
Try following the steps illustrated in the image below:
+ +These steps are automatically performed (transparently) when you compile an IDE. +But, in order to understand well, let's do them step by step.
+First, we'll create the object "files" (here, only one) using the following command:
+gcc -c sum_odd.c -o sum_odd.o
+
+The -c
option tells the compiler not to perform linking, but only compilation (hence the c
as "compile").
This option is followed by the name of the file from which you want to create the object file, then the name you want for the object file in question (the -o
option means "output").
Run this command and check that the object file is actually present in the directory. Don't try to read or open it - it's machine code!
+Next, you need to link the object files. And here, there are already several of them, unbeknownst to you: the one created from our source file and those of the standard libraries used, which are automatically linked by the compiler without our having to name them explicitly.
+To make these links, we simply use the following command:
+gcc -o sum_odd sum_odd.o
+
+Once again, the -o
option followed by the name of the desired file (in our example, the file is called odd_sum
) is used to create the executable program with that name. Note that you can put this option and its associated file name wherever you like in the command (here we've put them first, whereas in the previous example, compiling, we put them last).
Then we need to specify the files to be linked together to create the executable program. In our example, all we need to do is specify our only sum_odd.o
(as standard libraries are linked automatically).
Check that the executable program has been successfully created and run it from the terminal by typing:
+./sum_odd
+
+A large program is usually broken down into several modules. In addition to bringing clarity to the program organization, this technique (known as "modular design") enables +the reuse of elements (modules) for different programs (for example, one module for matrices, another for "ask for a number", etc.).
+Let's take a look at how such programs are produced.
+In the done/ex_multiples
directory, you'll find five source files and four header files.
Look at the contents of all the files and try to reconstruct the dependencies illustrated below:
+ +To create such a program, you must first compile all .c
files into object files:
gcc -c array_filter.c
+gcc -c array_sort.c
+gcc -c array_std.c
+gcc -c swap.c
+gcc -c main.c
+
+And then produce the executable (called selection_sort
in our example):
gcc -o selection_sort array_filter.o array_sort.o array_std.o swap.o main.o
+
+Create the executable as described above (tedious, isn't it? We'll come back to that in the next section), then run it. Its purpose is to sort, using the "selection sort" algorithm, an array of integers, whose size and range of values are given by the user.
+What happens if, by mistake or indirectly, the same module header is included several times?
+For example, have you ever tried to include a "#include <stdio.h>
" twice in one of your programs?
If .h
files are not protected against multiple inclusions, the compiler may refuse to compile, for example because of redefinition of a type already defined in the first inclusion.
It is therefore necessary to protect your .h
files against multiple inclusions by starting them with the line:
#pragma once
+
+
+This must be the very first line of your .h
files.
make
In the case of large (modular) programs, compiling and linking can become +tedious (perhaps you've already found it to be the case for just 5 modules...): you have to compile each module ("separate compilation") in its own object file, +then "link" all the object files produced.
+And since it's highly likely that several modules will themselves make +call upon other modules, a modification to one of the modules may require +to recompile not only the modified module, but also those that depend on it, recursively, and of course the final executable.
+The make
tool enables you to automate the sequence of commands
+that are dependent on each other. It can be used for many purposes, but its primary use (and the one we're interested in here)
+is the compilation of (executable) programs from
+source files. Benefits:
you don't have to do it by hand;
+it recompiles only what is strictly necessary.
+To use make
, all you have to do is write a few simple rules describing the project's various dependencies in a simple text file named Makefile
(or makefile
).
Let's see how this tool is presented to us, in its manual:
+man make
+
+(Don't read everthing! Just an overview to get an idea what it is about.)
+A Makefile
is essentially made up of rules, which define, for a given target,
all the dependencies of the target (i.e. the elements on which the target depends),
+as well as the set of commands to be performed to update the target (from its dependencies).
+It's a bit like a list of recipes:
+"rule" = recipe;
+"target" = result (e.g. chocolate cake);
+"dependencies" = ingredients (e.g. flour, eggs, chocolate, sugar, butter);
+"commands" = instructions for making the recipe.
+But we're not cooking here. If we illustrate these concepts with the previous example (program selection_sort
), we'd have, for example a rule for linking (program selection_sort
), another rule for compiling array_sort.c
(into array_sort.o
), and so on.
For the linking rule, we'd have:
+target: selection_sort
;
dependencies: array_filter.o
, array_sort.o
, array_std.o
, swap.o
and main.o
.
all these .o
files must exist to produce the selection_sort
executable;
command: the linking command used above.
+For the array_sort.c
compilation rule, we would have:
target: array_sort.o
;
dependencies: array_sort.c
, swap.h
, array_filter.h
(see previous figure, which shows the dependencies);
command: gcc -c array_sort.c
.
The general syntax of a rule is:
+target: dependencies
+[tab]command 1
+[tab]command 2
+
+where:
+target is most often the name of a file that will be generated
+by the commands (the executable program, object files,
+etc.), but it can also represent a "fileless" target, such as
+install
or clean
;
dependencies are the prerequisites for the target to be achievable, +usually the files on which the target depends (e.g. declaration files +like header files), but they can also be rules (e.g. +name of the target of another rule);
+to specify several dependencies, simply separate them with a space; a rule may also have no dependencies;
+if a dependency occurs several times in the same rule, only the first occurrence is taken into account by make
;
the commands are the actions that make
must undertake to
+update the target; they are one or several shell commands;
we have one command per line, and group the commands +related to a target below the dependency line;
+a special syntax feature is that each command line must begin with the tabulation character ("TAB" key), and NOT spaces; this is certainly the most archaic and enoying aspect of make
!
It is possible to omit commands for a target; +then either a default rule applies, or nothing at all +(which might be useful simply for forcing dependencies/checks).
+In fact, make
has a number of implicit rules (typically for compilation), so we don't have to write too many things, as we'll see below.
Another good news is that you can automatically generate a list of all dependencies using the -MM
option in gcc
:
gcc -MM *.c
+
+Try it out! You should immediately see the link between the list of all dependencies. It's very handy to put them at the end of your Makefile
.
Note that the order of the rules is not important, except when determining the default target (i.e. when the user types make
on its own, without any arguments: the first rule is then launched; otherwise, simply type make target
on the command line).
The simplest example of Makefile
is... ...an empty file!
Thanks to its implicit rules, make
already knows how to
+do(=make) lots of things without you having to write anything.
(in done/ex_single
) Delete the files sum_odd.o
and sum_odd
and run make
like this:
make sum_odd
+
+All done. Great!
+make
"knows" that to make an X
file from a X.c
source file, you need to call the C compiler.
If you wanted to write a Makefile
to do this, you could have written (try it!):
sum_odd: sum_odd.c
+
+and that's it!
+The target here is the sum_odd
executable and its dependency, unique here, the sum_odd.c
source file.
This Makefile
does not specify any commands to be executed. It simply uses the default commands known to make
.
Would we want to make the command more explicit (but why?), a more complete Makefile
would have been:
sum_odd: sum_odd.c
+ gcc -o sum_odd sum_odd.c
+
+where the command to switch from the dependency to the target is made explicit (preceded by an TAB
character).
Let's try to write a completely artificial Makefile
:
all: dep1 dep2
+ @echo "target 'all' completed."
+
+dep1:
+ @echo "dependency 1 completed."
+
+dep2:
+ @echo "dependency 2 ok..."
+
+dep3:
+ echo "banzai!"
+
+(You can either add these lines to the Makefile
written for sum_odd
if you tried the exercise above, or now create a Makefile
file with the above lines).
If you simply type the command
+make
+
+you get:
+dependency 1 completed.
+dependency 2 ok...
+target 'all' completed.
+
+In this example, make
is called on its own, with no indication of a particular target.
+make
will thus search the Makefile
for the first acceptable target, in this case all
.
+(There are particular targets that are not acceptable as default targets, but this is beyond the scope of this
+introduction.)
The rule for this target specifies two dependencies, dep1
and dep2
, which don't exist (they don't correspond to any existing files); make
will thus attempt to create them successively.
Since dep1
has no dependencies, make
immediately proceeds
+to executing the commands accompanying the target, i.e.
+display a message on the terminal (using the echo
command).
The same applies to the second dependency (dep2
).
Once all dependencies have been realized, make
returns to the
+the initial target, all
, the build commands of which gets executed.
If we now type the command
+make dep3
+
+you get:
+echo "banzai!"
+banzai!
+
+In this example, the target dep3
is specified as the goal when
+invocating make
. This target has no dependencies;
+make
thus directly executes the build commands for this target
+(displaying the string "banzai!
").
Let's note a slight difference in behavior between our two examples: in the first case, the target is created by executing the commands directly, whereas in the second case, make
first displays the command it will execute ("echo "banzai!"
").
The reason for this behavior lies in the @
character preceding the command in the first case, and absent in the second.
+By default, make
first displays the commands it will execute before actually calling it.
+To suppress this automatic display simply prefix the command with the @
character.
Tip: always let make
display the commands it is supposed to do
+(especially compilations), except for pure display commands, such as echo
.
Makefile
That's all interesting, but what use is it "in real life", since we've seen that with the default implicit rules we don't need to write anything?
+Sure! But in more complex projects, the default rules are no longer sufficient.
Let's say we've a program to implement a calculator for complex numbers, splited into modules as follows:
+in addition to the standard library, we have a graphics library, LibGraph
, with its header file, libgraph.h
, and a library file libgraph.so
;
modeling of complex numbers and their arithmetic, with its header file complex.h
+and its implementation file complexe.c
;
calculator modeling (basic functions, memory, parenthesis, etc.), with its header file
+calculator.h
, which depends on complexe.h
, and source file calculator.c
(no dependency);
modeling of the calculator's graphical interface, with calcGUI.h
, dependent on calculator.h
and libgraph.h
, and calcGUI.c
;
the main program (containing the main()
function), provided as calculCplx.c
file, which depends on calcGUI.h
;
each source code (.c
) also depends on its header file (.h
).
Here's an illustration:
+ +To write the corresponding Makefile
, all we have to do is to add
a target for each module, i.e. one target for each object file resulting from compilation of the source file;
+and another one to link the whole into an executable program.
+The dependencies of each of these targets are all the files it depends on (!). +But we only consider dependencies that can be modified as part of our project. +We can therefore ignore dependencies on the graphics library, for example, just as we ignore dependencies on any other standard library.
+These dependencies can be automatically generated using the command
+ gcc -MM *.c
+
+All we have to do is to copy its result into our Makefile
.
The build commands are, of course, the compilation instruction;
+but we don't need to explicitely write it, as we have seen above: make
has default commands which are perfectly fine in this case.
The only build command that needs to be specified is the "linking" command, which puts all the object files together to form the final executable. This is because the default linking rule will not make use of the required libgraph
library.
A possible Makefile
could therefore be:
all: calculCplx
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ gcc -o calculCplx calculCplx.o complexe.o calculatrice.o calcGUI.o -lgraph
+
+ # These lines have been copied from gcc -MM *.c
+ complex.o: complex.c complex.h
+ calculatrice.o: calculatrice.c calculatrice.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+With such a Makefile
, our project can be compiled using the make
+command alone, as the first target, the all
target, here is an alias for the calculCplx
target.
To build this target, make
must first build the targets indicated as dependencies (the set of object files
+files).
Note that make
will only (re)construct a target
+if at least one of its dependencies is more recent than the target itself.
+It is this mechanism that enables make
to compile
+only what is strictly necessary. So, if you run the
+the make
command a second time, after the first compile
+compilation, the program will report:
make: Nothing to be done for `all'.
+
+which means there's nothing new to be done! Everything is up to date.
+Similarly, if you were to modify only the file complex.c
file, the make
command would only lead to the
+recompilation of the latter (creation of the target complexe.o
, since it's one of its dependencies),
+an the linker command, which in turn updates the target calculCplx
(for the same reason as above).
If, on the other hand, the complexe.h
file is modified, the targets complex.o
, calculator.o
and calculCplx
will be updated.
Finally, it should be noted that some libraries, particularly our own, must be specified when linking: this is the case, for example, the graph
library. This is done by adding the -lgraph
option to the end of the linker command; thus the reason for having to write the build command explicitely.
In the done/ex_multiples
directory, create a Makefile
to compile the selection_sort
program described above.
Test it.
+There's a slight subtlety here: there's no selection_sort.c
, but the main()
function is in main.c
. This is simply to make you write a rule once (instead of using the default rule). Obviously, main.c
would "normally" be called selection_sort.c
. But you're not allowed to rename this file (or make a symbolic link;-)`).
That's pretty much about the basics. The rest of this document described more advanced stuff, not strictly necessary for you, but can be useful if you want to go further than the bare minimum.
+And if you'd prefer a more "classroom" video/presentation on the subject of separate compilation and Makefile
, here's a few lecture videos (52 min.).
If what has been presented here is enough for you (you've already spent enough time), you can simply continue this week's series where you left it.
+What has been presented so far is sufficient to enable you
+to write a functional Makefile
; however, as the previous
+example show, writing a functional Makefile
may relatively
+tedious. The information in this section will enable you to
+considerably increase the expressive power of the Makefile
instructions, making them
+easier to write.
To make writing Makefiles
easier (and more concise), you can define and use
+variables (actually, they're more like macro-commands, but who cares?)
The general syntax for defining a variable in a Makefile
is:
NAME = value(s)
+
+(or its more advanced variants +=
, :=
, ::=
, ?=
)
+where:
NAME
: the name of the variable you wish to define; this name must not contain the following
+characters :
, #
or =
, nor accented letters; the use of characters other than letters, numbers or
+numbers or underscores is strongly discouraged;
variable names are case-sensitive;
+value(s)
: a list of strings, separated by spaces.
Example:
+RUBS = *.o *~ *.bak
+
+Note also that for GNU make
(also called gmake
), the following syntax
+can be used to add one or more elements to the list
+of values already associated with a variable:
NAME += value(s)
+
+To use a variable (i.e. to substitute it for the list of values
+associated with it), simply enclose the variable name in parentheses, preceded by the $
sign:
$(NAME)
+
+Example:
+-@$(RM) $(RUBS)
+
+which, with the above definition of RUBS
, deletes all *.o
, *~
and *.bak
files; the RM
variable is one of the predefined variables in make
(remove the @
to see the command actually executed).
Note: These variables can be redefined when calling make
; e.g.:
make LDLIBS=-lm ma_target
+
+redefines the LDLIBS
variable.
Suppose we want to systematically specify a certain number of options to the compiler; e.g.
+to enable the use of a debugger (-g
), to force a level 2 optimization of the compiled code
+(-O2
), and to make the compiler stricly comply the C17 standard (-std=c17 -pedantic
).
Rather than adding each of these options to every compile command (and having to re-modify everything when we want to change those options), it would be wiser to use a variable (for example CFLAGS
, which is the default name used by
+make
) to store the options to be passed on to the compiler. Our Makefile
would then become:
CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+
+ all: calculCplx
+
+ calculCplx: calculCplx.o complexe.o calculatrice.o calcGUI.o
+ gcc -o calculCplx calculCplx.o complexe.o calculatrice.o calcGUI.o -lgraph
+
+ # These lines have been copied from gcc -MM *.c
+ complex.o: complex.c complex.h
+ calculatrice.o: calculatrice.c calculatrice.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+It's possible to add comments in a Makefile
(line-oriented, i.e. like the the //...
of C99 or Java), by marking the beginning of the comment with the #
symbol. Note that
+comments in command lines are not removed by make
before its execution by the Shell. For example:
# Here's a comment line
+
+all: dep1 dep2
+ @echo "target 'all' completed."
+
+dep1:
+ @echo "dependency 1 completed."
+
+dep2:
+ @echo "dependency 2 ok..."
+
+dep3: # this target is not built by default
+ echo "banzai!" # comment submitted to Shell
+
+Examples of execution:
+$> make
+
+dependency 1 completed.
+dependency 2 ok...
+target 'all' completed.
+
+$> make dep3
+
+echo "banzai!" # comment submitted to Shell
+
+banzai!
+
+Notice that the # comment submitted to Shell
is indeed passed to the Shell, but since #
is also the comment-character for the Shell, it is considered as a comment by the Shell.
make
automatically maintains a number of predefined variables, updating them as each rule gets executed,
+depending on the target and its dependencies.
These variables include:
+$@
name of the target (file) of the current rule;
$<
list of dependencies as calculated by default make
rules;
$?
list of all dependencies (separated by a space) more recent than the current target (dependencies involving target updates);
$^
[GNU Make] list of all dependencies (separated by a space) on the target; if a dependency occurs several times in the same dependency list, it will only be reported once;
$(CC)
compiler name (C);
$(CPPFLAGS)
precompilation options;
$(CFLAGS)
compiler options;
$(LDFLAGS)
linker* options;
$(LDLIBS)
libraries to be added.
For instance, the calculator's Makefile
could be rewritten as follows (modification of the linker command):
CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+
+ all: calculCplx
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ gcc -o $@ $^ -lgraph
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+As mentioned above, make
has a number of implicit rules (i.e. rules that the user doesn't need to specify), which enable it to "behave" in the presence of a source file without any further instructions.
+For instance, it "knows" how to produce object files from sources in assembly, Fortran, Pascal,
+Modula-2, Yacc, Lex, TeX, ..., and of course C and C++.
For example:
+the target file.o
will be automatically created from the file file.c
by means of an (implicit) command of the form:
$(CC) -c $(CPPFLAGS) $(CFLAGS) -o $@ $<
+
+which can also be simplified to
+ $(COMPILE.c) -o $@ $<
+
+Usually, the CC
variable is associated to the cc
command.
a target file
can be automatically created from the file.o
object file, or from a set of object files (specified in the list of dependencies) of which file.o
is a part, such as x.o file.o z.o
, using a command of the form:
$(CC) $(LDFLAGS) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+a target file
can be automatically created from the file.c
source file, and possibly a set of object files (specified in the list of dependencies), such as y.o z.o
, using a command of the form:
$(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+which can be simplified to
+ $(LINK.c) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+Therefore, we can transform our previous Makefile
to make it even more concise, as follows:
CPPFLAGS = -std=c17 -pedantic
+ CPPFLAGS += -O2
+ CPPFLAGS += -g
+
+ all: calculCplx
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ $(LINK.cpp) -o $@ $^ -lgraph
+
+or even:
+ CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+ LDLIBS = -lgraph
+
+ all: calculCplx
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+
+where we have now completely removed the command associated with the last target (executable production).
+When an element (variable definition, list of target dependencies, commands, ... and even a comment, although this is not recommended) is too long to reasonably fit on one line,
+it is possible to place a line break by telling make
to consider the next line as a continuation of the previous one.
This is achieved by placing the \
character at the end of the line to be extend:
# here's a comment \
+ on two lines
+
+all: dep1 \
+ dep2
+ @echo "target 'all' done"
+
+dep1:
+ @echo "dependency 1 completed"
+
+dep2:
+ @echo "dependency 2 ok..." \
+"indeed!"
+
+Example of execution:
+$> make
+
+dependency 1 completed
+dependency 2 ok... indeed!
+
+target 'all' done
+
+This example shows that clumsy use of this option
+can considerably impair the readability of the Makefile
.
Despite the name of the previous section, we're still a long way off the possibilities of make
.
For those who would like to know even more, don't hesitate to consult +the following references (all external):
+GNU make website](http://www.gnu.org/software/make/)
+The (GNU)make manual, taken from the previous site](http://www.gnu.org/software/make/manual/make.html)
+Finally, please note that there are many more modern redesigns of
+development project management tools (CMake, SCons, GNU autotools,tools integrated into IDEs: KDevelop, Anjunta,
+NetBeans, Code::Blocks, ...), but we feel that a good knowledge of the
+make
is a real bonus to your programmer CV.
Click here to be redirected.
diff --git a/tutorials/vmvb/index.html b/tutorials/vmvb/index.html new file mode 100644 index 0000000..68a28d7 --- /dev/null +++ b/tutorials/vmvb/index.html @@ -0,0 +1,297 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +L'installation de VirtualBox sur votre machine est normalement assez facile. Voir leur site Web.
+A noter que certaines machines à processeur Intel (genre HP, Lenovo, etc.) peuvent nécessiter la modification du paramètre « Intel Virtualization » (ou mot similaire) dans le BIOS. Commencez par l'installation indiquée et si nécessaire (message de VirtualBox) redémarrer pour aller modifier le BIOS.
+Téléchargez une image Ubuntu LTS 64 bits depuis leur site de téléchargement.
+Démarrez VirtualBox et créez une nouvelle machine:
+Intel Virtualizaton
est désactivée dans le BIOS)Une fois la nouvelle machine créée, avant de la lancer, « chargez » lui l'image ISO Ubuntu précédemment téléchargée :
+Une fois le « cdrom » ISO « chargez » dans la machine virtuelle (étape précédente), démarrez la et suivez simplement les instructions.
+Une fois l'installation terminée, redémarrez la machine vituelle et
+faite une mise à jour, soit par l'outil de mise à jour (software updater), soit « à la main » dans un terminal :
+sudo apt update
+sudo apt upgrade -y
+
+installez les « Additions invitées » (« Guest Additions ») ; elles vous permettrons d'avoir une meilleure intégration de votre machine vituelle dans votre machine réelle (redimensionnement d'écran, copié-collé de l'une à l'autre, accès au disque local dans la VM, ...) :
+Redémarrez la machine vituelle
+Pour finir, installez les outils nécessaires pour le cours :
+ sudo apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+ sudo apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Vous pouvez maintenant compiler votre projet sur votre VM, soit en accèdant à un disque local (Périphérique -> Dossiers partagés -> Règlages des dossiers partagés), soit en clonant votre dépôt GitHub.
+ +L'installation de VMWare (Fusion sur OSX, Worksation Player sur Windows ou Linux) est normalement assez facile.
+Depuis 2021, VMware n'est plus disponible sous license générale pour les étudiants de l'EPFL. L'utilisation de Virtualbox (gratuit) est recommandée.
+A noter que certaines machines à processeur Intel (genre HP, Lenovo, etc.) peuvent nécessiter la modification du paramètre « Intel Virtualization » (ou mot similaire) dans le BIOS. Commencez par l'installation indiquée et si nécessaire (message de VMWare) redémarrer pour aller modifier le BIOS.
+Téléchargez une image Ubuntu LTS depuis leur site de téléchargement.
+Démarrez VMWare Fusion ;
+File -> New -> Install from disk or image ;
+Maintenant vous pouvez choisir l'image ISO Ubuntu précédemment téléchargée sur votre disque ;
+Sélectionnez Easy Install, entrez le mot de passe, et cochez la check-box pour partager les fichiers avec votre ordinateur « host » ;
+Sélectionnez « Customize Settings » avec les option suivantes :
+Après l'instalation (ça peut prendre un peu de temps), vous pouver définir la plus haute résolution de votre écran dans la VM (System settings -> Display).
+Assurez vous que le clavier soit correctement configuré (System settings -> Region&Language -> Input Sources)
+Faites une mise à jour, soit par l'outil de mise à jour (software updater), soit « à la main » dans un terminal :
+ sudo apt update
+ sudo apt upgrade -y
+
+Assurez vous que les dossier du 'host' soit visible dans le directory '/mnt/hgfs'. Si ce n'est pas le cas, suivre les instructions ici.
+Pour finir, installez les outils nécessaires pour le cours :
+ apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+ apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Vous pouvez maintenant compiler votre projet sur votre VM, soit en accèdant à un disque local (Périphérique -> Dossiers partagés -> Règlages des dossiers partagés), soit en clonant votre dépôt GitHub.
+ +