Page not found :(
+The page you are looking for doesn't exist or has been moved.
+diff --git a/404.html b/404.html new file mode 100644 index 0000000..a0073fb --- /dev/null +++ b/404.html @@ -0,0 +1,195 @@ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +The page you are looking for doesn't exist or has been moved.
+Computer Science and Communication Systems
+ +first deadline (Sunday 05/05 23:59): steps 1 to 3 (weeks 7, 8 and 9)
+ + +Project finalization and delivery
+seconde deadline (Sunday 02/06 23:59): the whole project
+ + +Regarding the project, you'll have two deadlines with deliverables:
+an intermediate evaluation, to be delivered at the end of week 10, corresponding to weeks 7 to 9 (project steps 1 to 3); this part will be graded for 30%, but, since it will be graded again, with improvement, in the final grading step, the actual overall weight of this part is ∼52%;
+the final, end of week 14; weighted 70%; this final deliverable includes also the first part, which may be corrected based on first evaluation feedback.
+Each of those two grading steps will be evaluated based on:
+We also will pay attention to the regularity of your workflow
+(commit
on GiHub).
This work (project) has to be the own work of the corresponding group +(pair of students). No other help is allowed, be it human or artificial. +The code delivered has to be your own (except the one provided by us).
+The above warning includes both getting external help as well as providing +help (this is also part of cheating), directly or indirectly. This +thus also include being sure you to not make you code publicly +available in any manner.
+Any plagiarism, in whatever form, will be considered as cheating and will be handled accordingly, including informing the EPFL Legal Affairs.
+We designed that project to correspond on average to 4 hours of work per student per week (i.e. 8 hours of work per group). We'd like to insist on these two aspects:
+VERY IMPORTANT: +As any homework without any time constraint, the "danger" is to work too much on it, more than expected from us, trying to reach the best possible complete code regardless of the amount of time spent. This is not the proper way to handle homeworks, especially big projects: rather than trying to do perfect final (whole) project in an infinite time, try to do want you can in the amount of time you planed to dedicate to it. We do not mean that you have to deliver bad code, but you can deliver good enough code on a decent fraction of the project (e.g. do 75% of the project at a 80% quality level -- rather than doing 100% of the project at at 33% level [bad grade], or spending an indecent amount of time to reach 100% level (on 100% of the project)).
+DO NOT HESITATE TO COME TO US AND TALK ABOUT IT!
+A good way to reduce the workload is to:
+A bad example of points 2 and 5 above: some groups in the past completely recoded some functionalities that are present in the C library (e.g. string functions).
+In order to help you handling your workload and priorities (and also see if we don't fool ourselves), we ask you to weekly commit a CSV file time.csv
(in the done/
directory) counting the total number of hours (decimal number) you spent on the project for the corresponding week (sum for the two students).
+This is absolutely not to control you; your really can, without any penalty nor judgment on our side, put a 0 on some week if that was the case. This is only a tool for you, as a kind of "compass", we put in place following suggestions of former students.
The format for this file is very simple:
+one line for each week, starting with the week number and then, comma separated (CSV), the total number of hours (in decimal) spent by the group on the project that week, regardless of the handout number (I mean, if in week 8 you are still working on the week 6 of the project, count this in week 8, the week you do the work, not the week number of the handout you're doing).
For example:
++ 6,2.75 + 7,3.5 ++ +
Welcome in the project part of the CS202 course!
+This project is organized as follows:
The objectives of the project are:
+to concretely illustrate several aspects of the lectures;
+to let you develop a real system application in C (with files, pointers, sockets, threads, external library calls, ...);
+to let you practice usual development tools, among which: control version systems (git
), manpages, make
, debugger (gdb
);
to teach you how to use system (or external) libraries;
+to practice (a bit of) refactoring.
+This week, we will setup and learn several tools that will be useful for the project:
+The aim of this first week is to guarantee that you are ready to start with the project; that you have the proper working environment to do so. It is really important that this objective is fulfilled before the actual start of the project (week 7 of the semester). Do not hesitate to come to us for help.
+Concretely, what we expect you to do this week is to:
+make
);For this project, you have to work on Linux. For this you can either:
+No other OS will be supported (nor accepted) for this project.
+In addition to the standard C development framework (editor, compiler, debugger), you'll nee the following tools (sudo apt install <package>
if you are on your own Debian/Ubuntu-like computer):
git
(this is the package name to be installed);openssh-client
;manpages
and manpages-dev
;doxygen
if you want to automatically produce the documentation out of your source code;graphviz
to see the graphs generated by Doxygen;libssl-dev
: some cryptographic function we will use to compute "hash"-code of images;libvips-dev
to process images from C code;libjson-c-dev
to process JSON content from C code.You certainly already know Git and GitHub (not to be confused!), maybe from some former classes. This is just a quick recap, or a gentle introduction if you don't know them yet.
+GitHub is one of the public servers to offer Git services. Each student will first receive a personal repository on GitHub for this first step (warm-up); then each group (pair of two students) will also get another repository for its core project (this will be explained later in the semester).
+The first thing to do is to have a GitHub account. If you don't have one yet, create it by registering here. A free account is more than enough for this course.
+(if you already have a GitHub account, please use it for this class).
Then, once you have a GitHub account, join the first assignment, here: https://classroom.github.com/a/WG78CBVj.
+GitHub Classroom may ask you the right to access your repositories:
+ +then to join this first assignment (click on YOUR SCIPER NUMBER; please don't use someone else SCIPER!):
+ +and then to create a GitHub repository for that first assignment:
+ +Once all this done, you should receive a message from GitHub that you joined the "Warm-up" assignment and that you have a new repository cs202-24-warmup-YOUR_GITHUB_ID
, the URI of which looks like:
git@github.com:projprogsys-epfl/cs202-24-warmup-GITHUBID.git
+
+To be able to clone this repository, you have to add your SSH public key (the one of the computer you are using) to GitHub.
+If you don't have any SSH key yet (on the computer you are using), you can generate one with:
+ssh-keygen
+
+Copy then the content of the file ~/.ssh/id_rsa.pub
into GitHub SSH public keys.
NOTE: you can also use https
URI rather than SSH:
https://github.com/projprogsys-epfl/cs202-24-warmup-YOURID.git
+
+but then you'll have to authenticate each time (each command).
+It's not the purpose of this class to teach you Git, nor to present all its details. The purpose of this section is to provided you a short description on the necessary commands:
+git clone [REPO_URI]
git pull
git add [FILE]
git commit -m "Commit message"
git push
git status
git tag
For each command, you can get help from git
by doing:
git help <COMMAND>
+
+In case you need a recap on git, either go to your former material (e.g. CS-214 if you took it), or see this complementary recitation page (in French).
+(If you have received the confirmation email from GitHub) +Now go and get the content of this warm-up assignment:
+git clone REPO_URI
+
+This will create a local directory on your computer, named like cs202-24-warmup-YOURID
(with your GitHub ID at the end).
Go into that directory:
+cd cs202-24-warmup-YOURID
+
+You should find two sub-directories: done
and provided
. This is how we will proceed for the project:
provided
sub-directories; THIS SUB-DIRECTORY (provided
) SHALL NOT BE MODIFIED (by you);done
directory; (incrementally) copy the necessary files from provided
to done
and then proceed.Before moving on, let us recap that the manpages are THE reference documentation in Unix world.
+You can read them with the man
command (they can also be read on line).
The first manpage to read (maybe not in whole ;-)
, but at least have a look at it) is the manpage of the man
command itself:
man man
+
+Use the space bar to move forward, 'b' to go backward and 'q' to quit. Type 'h' to get more help.
+man
actually uses another command, a "page viewer". In most of the modern Unix systems, this page viewer is less
(replacing former more
command!). Thus maybe the second manpage to read is the one of less
:
man less
+
+On of the first function you have dealt with in C was printf()
. Let's try to see its manpage:
man printf
+
+Hmm?... This does not seem to be the right printf
...
+If you have a "PRINTF(1)
" on the very top of the page, this is indeed not the expected C printf()
function.
There can indeed be several manpages with the same "title". To mark the difference, the manpages are organized in "Sections". Go and read the manpage of man
again if you missed that information:
man man
+
+To go to the desired printf
manpage, we have to look for the one in "section 3". Try to do it by yourself (maybe read the manpage of man
once again).
And don't forget to use man
in the future, whenever needed!
The aim of this first exercise is to continue setting up your environment to be able to properly code the project.
+In the coming project, we will make use of several libraries (as explained in the introduction). Let's try here to use the first one:
libssl
; this library offers cryptographic functionalities (see man ssl
); we will use it to calculate the hash ("SHA code") of some images.If you work on your own Linux (not on EPFL VMs), and you didn't install it yet, please install libssl-dev
:
sudo apt install libssl-dev libssl-doc
+
+In the provided
sub-directory you find a file sha.c
. First copy it to your done
and work there:
cd done
+cp ../provided/sha.c .
+
+To compile it, you need to add the ssl
can crypto
libraries. This is done by adding the -lssl
and -lcrypto
flags; e.g.:
gcc -std=c99 -o sha sha.c -lssl -lcrypto
+
+If everything is properly installed, the above compilation should succeed and you should have a ./sha
program in your done
sub-directory. This exec does not much for the moment as its main part is still missing. This is what you have to add now.
A "SHA code", or "SHA" (which stands for "Secure Hash Algorithm"), is a compact representation, almost certainly unique, and hardly invertible (reciprocal), of any data. More concretely:
+compact: whatever data, whatever their length, will be represented by the same amount of bits; it this project, we will use 256 bits ("SHA256");
+for example, the SHA256 of "hello" (without newline, nor quotes) is
+2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
;
almost certainly unique: different data will most probably have two different SHA; this is not guaranteed (there are "only" 1077 different SHA256), but highly probable: with 1035 different data, the probability to get the same SHA is 10-6; +when two different data have the same SHA, this is called a "collision";
+hardly invertible : from a SHA code, it's extremely difficult (= impossible in practice) to guess its corresponding data; one consequence of that is that a small variation in the data leads to a completely different SHA; for instant, the SHA256 for "hello!" is
+ce06092fb948d9ffac7d1a376e404b26b7575bcc11ee05a4615fef4fec3a308b
+(to be compared to the one of "hello" above) and the one of "hello\n" (i.e. with a newline) is
+5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
.
The provided code (sha.c
) compiles but does nothing really interesting. Actually there is no computation of the SHA256 of the input; no call to the SHA256()
function from the libssl
.
Have a look at how to use this function:
+man SHA256
+
+(if you have the manpages for this library installed on your computer; otherwise, read it online.
+Add, where indicated by "TODO
", a call to compute the SHA256 of the input string.
Example:
+If everything is properly done, you should get:
+Type a sentence: Hello world!
+The SHA256 code of
+"Hello world!
+"
+is:
+0ba904eae8773b70c75333db4de2f3ac45a8ad4ddba1b242f0b3cfc199391dd8
+
+You can also debug with the "hello" string given above (with the newline!).
+The source code of real-life applications written in C is often distributed over several text files called "source files", which are "glued together" by the compiler to create an executable program from the source code. This way of proceeding is called "modularization" and is detailed
+in those video lectures:
+the slides of which are on Moodle.
+Choose your favorite learning way (maybe benefit from both).
+We strongly recommend you go through this material before moving on.
+The objectives of this section is that
+Makefile
s;Makefile
;Makefile
s.We don't expect you to write your own Makefile
s from scratch, neither to master all the arcane details (while reading).
Makefile
sGo ahead with the above mentioned tutorial. +Follow (and understand) examples 1 and 2, then exercises 1 and 2, and then 3.
+The sub-directory bigprj
contains a "big" project for which we propose you to write its Makefile
.
IMPORTANT REMARK : the code provided in bigprj
is under copyrights and shall not be copied nor reused anywhere else, neither in total nor any piece of it.
+(It's furthermore quite bad code and is thus not at all a good example of good practice.)
+It's there only for you to learn Makefile
s by trying to write one to compile this project.
That code uses sub-directories and one C99 function (roundf()
).
To tell the compiler to search for header files in some sub-directories, add an -I
option per sub-directory. For example, when compiling machin.c
, to tell the compiler to search for a header file in the stuff
sub-directory, you would do :
gcc -c -I stuff machin.c -o machin.o
+
+(See also the CFLAGS
variable in the make
tutorial.
To compile to the C99 standard (or higher), pass the -std=c99
(or -std=c17
or -std=c2x
) option to the compiler (see the CFLAGS
variable).
Finally, to be able to use C99's roundf()
function, you need to link to the math library by adding -lm
.
(See also the LDLIBS
variable in the make
tutoriak.)
Notes:
+The provided code compiles with many "warnings". We do not ask you to fix these errors, but simply to write a Makefile
that produces a hand
executable.
You don't need to run it. If you have, you can simply quit it by typing Ctrl-C
.
In this last part, we'd like you to practice debugging of C code. This is very important to master debugging before going deeper into the project. Otherwise, without a good practice of debugging, you'll really loose lots of times.
+For those who took the CS-214 Software Construction class, also remember/review all the methodology, good practices, your learned there about debugging. Maybe have a refresh of that material first.
+To help you find faults in code (especially in your own code later on in the project; think about it!), there are several tools available:
+compiler options
+static code analysis
+dynamic memory analysis;
+and, of course, debuggers.
+The compiler is a great help when you know how to use it and interpret its messages.
+Its behavior, more or less verbose, can be modified using compiler options, the most useful of which are detailed here.
+In the same spirit (using the compiler to find errors), it can also be useful to use different compilers (with the options below) on the same code, as they don't necessarily detect the same things. On VMs, you have gcc
and clang
.
The first thing to do is to specify the standard used (as there are many non-standard "dialects"). This is done with the -std=
option. We recommend -std=c99
or -std=c17
. To stick strictly to the specified standard (and reject the associated "GNU dialect") add the -pedantic
option.
Then, it can be useful to let the compiler warn us with lots of the usual warning. This is done with the -Wall
option (like "all warnings", even if they're actually not all there ;-)`).
For even more warnings, add -Wextra
.
And here are a few more that we think are worth adding (you're free not to if you find them too fussy):
+-Wuninitialized
: warns of uninitialized variables;
-Wfloat-equal
: warns of equality tests on floating-point numbers;
-Wshadow
: warns if one name masks another (risk of scope problems);
-Wbad-function-cast
: warns of incorrect function return type conversion;
-Wcast-qual
: warns of pointed type conversion that removes a qualifier (typically const
);
-Wcast-align
: warns of pointed type conversions that do not respect memory word alignment;
-Wwrite-strings
: warns of (risk of) confusion between const char *
and char *
;
-Wconversion
: warns of implicit type conversion;
-Wunreachable-code
: warns of useless (unreachable) code;
-Wformat=2
: increases the level of format warnings (such as printf
and scan
) compared to -Wall
;
-Winit-self
: warns of recursive initialization (such as int i = 3 * i;
);
-Wstrict-prototypes
: warns of function declarations without arguments;
-Wmissing-declarations
: warns of functions defined but not prototyped; this can be useful for detecting the omission of a prototype in a .h
(or the omission of a #include
).
Finally, you can of course add other options if you feel they are useful. As usual, check out the "man pages" for more details.
+The static code analyzer is a tool that tries to find errors in code by "imagining" all possible execution paths. The scan-build
(and scan-view
) analyzer is available on VMs. It is used by simply adding scan-build
in front of the build command, e.g. :
scan-build make
+scan-build make cecicela
+scan-build gcc -o monexo monexo.c
+
+The easiest way is to try :
+scan-build gcc stats.c -lm
+
+This command tells you (at the very end) to look at its analysis using scan-view
, e.g. :
scan-view /tmp/scan-build-2024-01-17-175346-107146-1
+
+(but this file name changes every time).
+We'll let you have a look at what it found...
+See this tutorial for instructions on using the gdb
debugger.
This tutorial takes as its example the first program you'll have to hand in (stats.c
), but we encourage you to try your hand at the other codes too (ex1.c
and ex2.c
) and to go back and forth between this topic and the tutorial in question (rather than reading it linearly and then continuing with this topic).
To find an error efficiently, we suggest the following general tips (other more job-specific tips are also provided below):
+try to correct only one bug at a time;
+always start with the first error;
+isolate/identify the bug in a reproducible way: always retest with exactly the same values each time;
+apply the following methodology (it may seem trivial, but all too often we've seen students waste their time looking for bugs in the wrong place because one of the following 2 "dots" was not placed on the right side; often due to over-strong assumptions (wrong guesses) or wrong/too-fast deductions):
+always have 2 clear places (2 "dots") in your code:
+one place where you are absolutely sure that the bug has not yet occurred (e.g. the very beginning of the program);
+and another where you are absolutely sure that the bug has occurred (e.g. the point where the program crashes, or simply the end/beginning of the program);
+move (advance/reassemble) the most promising of these two points, being sure not to "cross over" to the other side of the bug; check this aspect ("not cross over") with certainty;
+at the end of this process (of dichotomous searching in fact), the two "points" will be exactly on the spot of the bug.
+if you're searching for bugs using display messages (printf()
) :
always put a \n
at the end of each message;
mark the beginning of each of your debugging messages with a clear identifier reserved only for this purpose (e.g. "####
"); this allows you :
to easily see these messages in the program output;
+find them easily in your code to edit/delete later;
+have a unique part in each message (e.g. "debug(1):", "debug(2):", "debug(3):", etc., or "here i=", "here j=", "here k=", etc.; you can of course combine);
+having this discipline with debugging messages may seem like a waste of time (especially when you're looking for the bug), but, believe me, it actually saves a lot of time in the end!
+Here are two exercises to help you get to grips with gdb
.
Look at the ex1.c
code to get an initial idea.
Then compile it for the debugger (either by hand or by making a small Makefile
).
Run it in the terminal to see what it produces.
+Then use the debugger to determine the values of d0, d1 and d17 :
+set one or more well-placed breakpoints
+try the commands :
+step
and next
;
continue
(abbreviated cont
), followed, or not, by a number;
print
and/or display
.
You can also try advance
and finish
.
NOTES:
+you can see the syntax and explanation of a command in gdb
by using help
followed by the command; e.g. :
help adv
+
+you can list all your breakpoints via :
+ info break
+
+Take a look at the ex2.c
code to get an initial idea. The aim of this code is to calculate the entropy of a given distribution by its frequencies (= integer counts).
Some examples (useful for debugging):
+the entropy without any count is 0, no matter how you enter it:
+0
+0 0 0
+0 0 0 0 0 0
+etc;
the entropy of any 1-value distribution is 0, however you enter it:
+0
+0 1 0
+0 12 0
+0 0 12
+0 0 0 33 0 0 0
+etc;
the entropy of any distribution with 2 equiprobable values is 1 bit, regardless of how it is entered:
+1 1
+0 1 1 0
+0 12 0 12
+etc;
the entropy of any distribution with 3 equiprobable values is 1.58496 bit, however you enter it;
+the entropy of the distribution :
+1 2 3 4 5
+is 2.14926 bit.
+The code provided contains several errors. Try to find them using the debugger: breakpoints, next, cont, display, etc.
+You can even start by running the code directly in gdb
, typing run
.
+then enter :
1 0
+
+and see what happens.
+To locate the error in the call stack, do :
+where
+
+To see the code :
+layout src
+
+To navigate the call stack :
+up
+down
+
+Give it a try...
+Note: there are four things to be corrected.
+All the above tools (compiler options, static code analysis, dynamic memory analysis (when you have pointers) and debugger) will help you to be more efficient in your project. We therefore ask you to start using them to correct the stats.c
code provided.
Try to fix entirely the stats.c
file provided, whose purpose is to calculate the mean and standard deviation (unbiased) of the age of a set of 1 to 1024 people (beware! it contains several errors, of different kinds; there are, however, no errors of a mathematical nature: the formulas are mathematically correct; but note however that the standard deviation of a population reduced to a single individual must be zero).
The first thing to do might be to complete your Makefile
so that it can produce stats
with information useful to the debugger (option -g
). You could also take the opportunity to turn on the compiler's warnings and look in detail at what it's telling you and, above all, understand why it's telling you.
Once the program has compiled, if possible without warning, here are 3 ways to go further in correcting the program:
+test values that are outside the expected limits, and see if the program reacts as you'd expect. For example: what happens if you enter a negative number of people? a negative age?
+calculate by hand the mean and standard deviation (following the provided formula!) of a small sample and compare them with the output of your program. If there are differences, use the debugger to find out where they come from;
+remember to test all the limiting cases of these formulas.
+When editing the sha.c
file, you may have noticed that it is commented (always comment your programs!), in a rather peculiar format ("what's with the @
?").
It's not just for show; it's also useful!
+Type :
+doxygen -g
+doxygen Doxyfile
+
+then view the file html/index.html
in your favorite browser.
+Click on "Files", then on "sha.c".
+Cool, isn't it?
Clean it up with the command
+rm -r latex html
+
+In future, remember to document your code with Doxygen-compatible comments.
+Examples will be provided, but if you want to know more, have a look at the Doxygen website.
+IMPORTANT REMARK: For the project, make your code anonymous: don't put any author's name, SCIPER, email, etc.
+ +The aim of this week is to:
+So start by reading the main project description file to understand the general framework of the project. Once you've done that, you can continue below.
+In your group's GitHub repository, you will find the following files in provided/src/
:
imgfs.h
: function prototypes for the operations described here;imgfscmd.c
: the core of your "Filesystem Manager", the command line interface (CLI) to handle imgFS
; it reads a command and calls the corresponding functions to manipulate the database;imgfs_tools.c
: the tool functions for imgFS
; for example to display the data structure;imgfscmd_functions.h
and imgfs_cmd.c
: prototypes and definitions of the functions used by the CLI;util.h
and util.c
: macros and miscellaneous functions; you do not need to use them (have a look to see if some may be useful);error.h
and error.c
: error code and messages;Makefile
containing useful rules and targets;provided/tests/{unit,end-to-end}/
;provided/tests/data
.To avoid any trouble, the contents of the provided/
directory must never be modified!
Start by copying the files you need from provided/src/
into the done/
directory at the root of the project and registering it in git (git add
); for instance:
cp provided/src/*.h provided/src/Makefile provided/src/imgfs*.c provided/src/util.c provided/src/error.c done
+git add done
+
+You'll proceed similarly in the next weeks, whenever you'll need new files from provided/src
.
The provided code does not compile; some work is still required, in the following steps (which are further detailed below):
+imgFS
;do_open()
and do_close()
;do_list()
;do_list_cmd()
.After reaching that point, the code should compile without errors. You will then have to test it.
+An example usage of the CLI (the name of which is imgfscmd
) is:
./imgfscmd list empty.imgfs
+
+where list
is a command provided to the CLI and empty.imgfs
is an argument for that command, here simply an ImgFS file (thus a file containing a whole filesystem).
Important Note: writing clean code, readable by everyone is very important. From experience, it seems that not everyone does this spontaneously at first ;-)
. There are tools that can help. For example, astyle
is a program designed to reformat source code to follow a standard (man astyle
for more details).
We provide you with a shortcuts (which uses astyle
): see the target style
in the provided Makefile (make style
to use it). We recommend you do a make style
before any (Git) commit.
The exact format of the header
and metadata
is given in the global project description. The types
struct imgfs_header
;struct img_metadata
;struct imgfs_file
;are to be defined in replacement of the "TODO WEEK 07: DEFINE YOUR STRUCTS HERE.
"" in imgfs.h
.
The second objective of this week is to process the arguments received from the command line. For modularization purposes, we will use function pointers.
+To achieve this, the signatures of the functions do_COMMAND_cmd()
(and help()
) are uniform:
int do_COMMAND_cmd(int argc, char* argv[])
+
+Those functions will handle the parsing of their respective additional arguments, while the main()
dispatches through them using the first CLI argument.
To process all the different commands, we would like to avoid an "if-then-else" approach. Indeed, this would make adding new commands (which will arrive in the following weeks) more difficult, since it would require to add new cases for each of them. It would also make the code much less readable.
+To avoid that, we put the various do_COMMAND_cmd()
(and help()
) functions in an array. We will take advantage of this to associate the names of the commands with their respective functions (e.g. the string "list"
with the do_list_cmd()
function), and then simply add a loop to the main()
function, to search for the received command among the list of possible commands -- for the moment, "list"
, "create"
, "help"
and "delete"
-- and call the corresponding function.
In imgfscmd.c
:
command
type, a pointer to functions such as those unified above;struct command_mapping
type containing a string (constant) and a command
.Then use these definitions to create an array named commands
associating the commands
+"list", "create", "help", and "delete" to the corresponding functions.
+Note: The "create"
, "help"
and "delete"
commands are not yet implemented, but you can already add them to the array.
Finally, complete the main()
using this array inside a loop. When the right command is found, simply call the function pointed to in the corresponding array entry, passing all the command line arguments.
For example, if you call the program
+./imgfscmd list imgfs_file
+
+then your code must call do_list_cmd()
with the following parameters: argc = 1
and argv = { "imgfs_file", NULL }
.
Your code must correctly handle the case where the command is not defined: in this case, simply call help()
and return ERR_INVALID_COMMAND
.
Your code can perfectly well assume that all commands in the commands
array are distinct.
do_open()
and do_close()
Now, we will implement the functions to open and close existing imgfs
files.
You need to write the definitions of do_open()
and do_close()
in the file imgfs_tools.c
.
The do_open()
function takes as arguments:
const char *
);const char *
, e.g. "rb"
, "rb+"
);imgfs_file
structure in which to store read data.The function must
+The function should return the value ERR_NONE
if all went well, and otherwise an appropriate error code in case of problems. You need to handle all possible error cases in this function, using the definitions in error.h
(see unit tests below).
+Note: to check the validity of a pointer given as parameter, you can use the macro M_REQUIRE_NON_NULL(ptr)
, which will make the function return ERR_INVALID_ARGUMENT
if ptr == NULL
(see util.h
).
The do_close()
function takes a single argument of structure type imgfs_file
and must close the file and free the metadata array. It returns no value. Here too, remember to handle the possible error case: if the file (FILE*
) is NULL
. This should be a reflex when you're writing code, especially when you're using a pointer. We won't mention it again.
do_list()
Then create a new file imgfs_list.c
to implement the do_list()
function. If output_mode
is STDOUT
, the purpose of do_list()
is first to print the contents of the "header" using the supplied print_header()
tool function, and then to print (examples below)
either
+<< empty imgFS >>
+
+if the database does not contain any images;
+or the metadata of all valid images (see print_metadata()
, provided in imgfs.h
).
The case output_mode == JSON
will be implemented later in the project; you may just call TO_BE_IMPLEMENTED()
in this case (see util.h
).
Warning: there may be "holes" in the metadata array: one or more invalid images may exists between two valid ones.
+do_list_cmd()
In order to be able to use the do_list()
function from the command line, implement the do_list_cmd()
function in imgfscmd_functions.c
, which receives the command line arguments as parameters (as explained before).
The first element of the array is the name of the file containing the database. After checking that the parameters are correct, open the database and display its contents, using the above functions.
+To make it easier to understand the various functions described above, a few examples are given here. These examples are +in the provided tests (see below).
+It's best to start testing your code on simple cases that you're familiar with.
+You can test your code with the supplied .imgfs
files: the command
./imgfscmd list ../provided/tests/data/empty.imgfs
+
+should display (exact file here):
+*****************************************
+********** IMGFS HEADER START ***********
+TYPE: EPFL ImgFS 2024
+VERSION: 0
+IMAGE COUNT: 0 MAX IMAGES: 10
+THUMBNAIL: 64 x 64 SMALL: 256 x 256
+*********** IMGFS HEADER END ************
+*****************************************
+<< empty imgFS >>
+
+while
+./imgfscmd list ../provided/tests/data/test02.imgfs
+
+should display (exact file here) :
+*****************************************
+********** IMGFS HEADER START ***********
+TYPE: EPFL ImgFS 2024
+VERSION: 2
+IMAGE COUNT: 2 MAX IMAGES: 100
+THUMBNAIL: 64 x 64 SMALL: 256 x 256
+*********** IMGFS HEADER END ************
+*****************************************
+IMAGE ID: pic1
+SHA: 66ac648b32a8268ed0b350b184cfa04c00c6236af3a2aa4411c01518f6061af8
+VALID: 1
+UNUSED: 0
+OFFSET ORIG.: 21664 SIZE ORIG.: 72876
+OFFSET THUMB.: 0 SIZE THUMB.: 0
+OFFSET SMALL: 0 SIZE SMALL: 0
+ORIGINAL: 1200 x 800
+*****************************************
+IMAGE ID: pic2
+SHA: 95962b09e0fc9716ee4c2a1cf173f9147758235360d7ac0a73dfa378858b8a10
+VALID: 1
+UNUSED: 0
+OFFSET ORIG.: 94540 SIZE ORIG.: 98119
+OFFSET THUMB.: 0 SIZE THUMB.: 0
+OFFSET SMALL: 0 SIZE SMALL: 0
+ORIGINAL: 1200 x 800
+*****************************************
+
+Note: you may compare your results by using:
+./imgfscmd list ../provided/tests/data/test02.imgfs > mon_res_02.txt
+diff -w ../provided/tests/data/list_out/test02.txt mon_res_02.txt
+
+More details: man diff
.
The provided test suites require several dependencies: Check and Robot Framework (and its own dependency, parse). On (your own) Ubuntu, you can install them with:
+sudo apt install check pip pkg-config
+
+then, depending on how you're used to work in Python, either as root or in your Python virtual environment (maybe to be created):
+pip install parse robotframework
+
+(Of course you'll have to run the tests in that Python venv, if that's your usual way to work with Python.)
+ON EPFL VMs, you have to setup a personnal Python virtual environment.
+If you already have one, activate it and install the two above mentioned packages (parse
and robotframework
).
It you don't, we recommand you create your personnal Python virtual environment in myfiles
:
cd ~/Desktop/myfile
+python -m venv mypyvenv
+cd mypyvenv
+cp -r lib lib64 ## this fixes the first warning
+cd ..
+python -m venv mypyvenv
+
+Ignore the (second) warnings.
+Then activate it:
+source mypyvenv/bin/activate
+
+and then install the required packages:
+pip install parse robotframework
+
+And you're done.
+The only thing you'll have to do next time you login and you want to run the "end to end" tests, is to activate your Python virtual environment:
+source ~/Desktop/myfiles/mypyvenv/bin/activate
+
+Of course, you can also add that to your ~/.bashrc
!
We provide you with a few tests to run against your code by using make check
, both unit tests (testing functions one by one) and end-to-end tests (testing the whole executable at once).
We strongly advise you to complete them by adding you own tests for edge cases; the imgFS
files are in provided/test/data
. You can check the unit tests in provided/test/unit
and the end-to-end ones in provided/test/end-to-end
to understand how to write your own.
+Note: Don't forget to never push modifications in the provided/
directory; instead move the test/
directory to done/
and update the TEST_DIR
variable in the Makefile
accordingly.
We also provide a make feedback
(make feedback-VM-CO
if you're working on EPFL VMs) which gives partial feedback on your work. This is normally used for a minimal final check of your work, before handing it in. It's better to run local tests directly on your machine beforehand (including more tests you've added yourself, if necessary).
The Docker image used by make feedback
will be tagged latest
every week, but if you want to run feedback for a specific week, change (in the Makefile
at the line that defines IMAGE
) this latest
tag to weekNN
where NN
is the desired week number, e.g.:
IMAGE=chappeli/cs202-feedback:week07
+
+It's up to you to organize the group work as best you can, according to your objectives and constraints; but remember to divide the task properly between the two members of the group. +If you haven't already read it in full, we recommend you read the end of the foreword page.
+You don't have to formally deliver your work for this first week of the project, as the first deliverable will only be due at the end of the week 10 (deadline: Sunday May 5th, 23:59), together with weeks 8 and 9 work.
+Having said that, we strongly advise you to mark with a commit when you think you've completed some part of the work and especially once you reached the end of this week (you can do other commits beforehand, of course!):
add the new imgfs_list.c
file to the done/
directory (of your group GitHub repository; i.e. corresponding to the project), along with your own tests if required:
git add imgfs_list.c
+
+also add the modified files (but NOT the .o
, nor the executables!): imgfs_tools.c
, imgfs.h
and maybe Makefile
:
git add -u
+
+check that everything is ok:
+git status
+
+or
+git status -uno
+
+to hide unwanted files, but be careful to not hide any required file!
+create the commit:
+git commit -m "final version week07"
+
+In fact, we strongly advise you to systematically make these regular commits, at least weekly, when your work is up and running. This will help you save your work and measure your progress.
+ +The aim of this project is to have you develop a large program in C on a "system" theme. The framework chosen this year is the construction of a command-line utility to manage images in a specific format file system, inspired by the one used by Facebook. For your information, Facebook's system is called "Haystack" and is described in the following paper: https://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf.) You are not required to read this paper as part of the course (it's just for information) because, obviously, we'll be implementing a simplified version of this system. All the basic concepts required for this project are introduced here in a simple way, assuming only standard "user" knowledge of a computer system.
+Social networks have to manage hundreds of millions of images. The usual file systems (such as the one used on your hard disk, for example) have efficiency problems with such large numbers of files. Furthermore, they aren't designed to handle the fact that we want to have each of these images in several resolutions, e.g. very small (icon), medium for a quick preview and normal size (original resolution).
+In the "Haystack" approach, several images are contained in a single file. What's more, different resolutions of the same image are stored automatically. This single file contains both data (the images) and metadata (information about each image). The key idea is that the image server has a copy of this metadata in memory, to enable very rapid access to a specific photo in the right resolution.
+This approach has a number of advantages: firstly, it reduces the number of files managed by the operating system; secondly, it elegantly implements two important aspects of image database management:
+This deduplication is done using a "hash" function, which summarize a binary content (an image in our case) into a signature much smaller. Here, we will use the "SHA-256" function, which produces a 256 bits signature, and has the useful property that it is collision resistant: it is almost impossible for two different contents to have the same signature. In this project, we will use the assumption that two images with the same signature are identical. Although it may seem surprising, many systems are based on this principle.
+You will build an image server, in a version inspired and simplified of Haystack. During the first weeks, it will consist of implementing the basic functions of the system, which are:
+During this first part, those functions will be exposed through a command line interface (CLI). Further on, you will build a true webserver to distribute the image over the network using the HTTP protocol.
+Here, we will describe the main concepts and structures you will need for this project. Their implementation details will be specified later in the weekly handouts.
+You will use a specific format -- let's call it "imgfs
" -- to represent an "image file system". A file of type imgfs
contains three distinct parts:
imgfs
creation;max_files
field of the header
; each of its entry describe the metadata of a single image, especially their position in the file;This file format will be used by the two tools that you will develop:
+The three parts explained above consists of the following data structures:
+struct imgfs_header
: the header with the configuration data:
name
: a string of at most MAX_IMGFS_NAME
characters, the name of the database;version
: a 32-bits unsigned int
; the version of the database, it is incremented after each insertion/deletion;nb_files
: a 32-bits unsigned int
; the current number of images in the system;max_files
: a 32-bits unsigned int
; the maximum number of images that the system can contain; this field is specified during the creation and must not be modified afterwards;resized_res
: an array of 2 times (NB_RES
- 1) elements, each of which is a 16-bits unsigned int
; the resolutions of the "thumbnail" and "small" images (in order: "thumbnail width", "thumbnail height", "small width", "small height"); this field is specified during the creation and must not be modified afterwards; the handling of the original resolution is explained below;unused_32
and unused_64
: two unsigned int
(of 32 and 64 bits); unused (but intended for future evolutions or temporary information - it is often useful to include fields of this type in large-scale projects; this allows old data structures to be used directly in newer versions of the software);struct img_metadata
: image metadata:
img_id
: a string of at most MAX_IMG_ID
characters, containing a unique identifier (name) for the image;
SHA
: an array of SHA256_DIGEST_LENGTH
unsigned char
; the image hash code, as explained above;
orig_res
: an array of two 32-bit unsigned int; the resolution of the original image;
size
: an array of 32-bit NB_RES
unsigned int
; memory sizes (in bytes) of images at different resolutions ("thumbnail", "small" and "original"; in this order, given by X_RES
indices defined in imgfs.h
);
offset
: an array of 64-bit NB_RES
unsigned int
; the positions in the "image database" file of images at the various possible resolutions (in the same order as for size; also use the X_RES
indices defined in imgfs.h
to access the elements of this array);
is_valid
: a 16-bit unsigned int
; indicates whether the image is in use (value NON_EMPTY
) or not (value EMPTY
);
unused_16
: a 16-bit unsigned int
; not used (but intended for future evolutions).
struct imgfs_file
:
file
: a FILE*
indicating the file containing everything (on disk);
header
: a struct imgfs_header
; the general information ("header") of the image database;
metadata
: a dynamic array of struct img_metadata
; the "metadata" of the images in the database.
header
and dynamically allocated to max_files
;is_valid
; there may therefore be "holes" in the metadata array, and unused parts in the file (since the images themselves are not deleted); the basic idea behind all this is to be prepared to lose a little space to save time;
+At a more complex level, we can imagine a "garbage collector" (or a "defrag") which, in parallel, when "there's time", effectively deletes images that are no longer in use, reorganizes metadata to reduce gaps, and so on.
+We won't go into such considerations in this project, but you may implement it as an extension.(To check, whatever the architecture, sizeof(struct img_metadata)
must give 216.)
This week's objective is to implement three features for our image management system:
+create
command, to create a new (empty) file in imgFS
format (= a new image database);delete
);help
command, a standard and essential element of any command line interface.One of the aims of this exercise is to learn how to write data structures to disk using basic I/O operations.
+As in previous weeks, you'll be writing your own code, modifying the elements provided.
+Except new tests, there is no new provided material.
+You will continue to modify the files used last week: imgfscmd.c
and imgfscmd_functions.c
.
This week's work consists of five modifications, summarized here and detailed below if necessary:
+in a new imgfs_create.c
file (to be created), implement the do_create()
function (prototyped in imgfs.h
), the purpose of which is to create a new image database in a (binary) file on disk;
complete the do_create_cmd()
function in the imgfscmd_functions.c
file in order to call do_create()
correctly;
implement the do_delete()
function (prototyped in imgfs.h
) in a new imgfs_delete.c
file; the do_delete()
function must "delete" a specified image (we'll see below what this really means);
complete the do_delete_cmd()
function in the imgfscmd_functions.c
file in order to call do_delete()
correctly;
define the help()
function, which will print instructions for using the imgfscmd
command line interface (CLI).
do_create()
.do_create()
must create a new database for the imgfs
format. It receives the name of the database file, and a partially filled imgfs_file
structure, containing only, in the header, max_files
and resized_res
.
This function should finish initializing the received imgfs_file
structure before writing it to disk, first the header, then the metadata. It must use standard C input/output functions to create the new image base in a binary file on disk. If the file already exists, it is simply overwritten (without message nor error).
It is important to initialize all relevant elements explicitly before writing. And, of course, it's essential to write the right-sized array of metadata
in the file.
+Note: the database name must be set by do_create()
from the provided constant CAT_TXT
.
It is also important to handle all possible errors. In the absence of an error, do_create()
should return ERR_NONE
; in the event of an error, it returns the corresponding value code as defined in error.h
.
As the create
command is only used once (to create a database) and always from the command line utility imgfscmd
(it will never be launched from a Web server, for example), we are exceptionally going to add a side effect in the form of a display indicating the (true) number of objects saved on disk.
+For example, with one header then ten metadatas, we'll have the following display:
11 item(s) written
+
+11
because the header and then each of the ten metadatas have been successfully written by fwrite()
.
do_create_cmd()
.We have provided you with an incomplete implementation of do_create_cmd()
. As part of your solution, you need to create an imgfs_file
, initialize the max_files
and resized_res
fields of its header with the values provided, then call do_create()
(which will initialize the other fields).
create
command argumentsThe main role of do_create_cmd()
is to correctly parse all of its arguments, both mandatory and optional.
Your solution should have the following structure:
+start by retrieving the mandatory argument (<imgFS_filename>
)
iterate on argv
;
at each iteration, first determine whether it's an acceptable optional argument (-max_files
, -thumb_res
or -small_res
; see also the help
text below);
if so, check if there are still enough parameters for the corresponding values (at least one for -max_files
and at least 2 for the other two); if not, return ERR_NOT_ENOUGH_ARGUMENTS
;
then convert the next parameter(s) to the correct type; check that the value is correct (neither zero nor too large); if not, return either ERR_MAX_FILES
(for -max_files
), or ERR_RESOLUTIONS
;
+note that util.c
, already supplied in the past, offers two tool functions (atouint16()
and atouint32()
) for converting a character string containing a number into its uint16
or uint32
value; we encourage you to use these two functions to convert character strings in command line arguments; they handle the various error cases in the event of converting an invalid number, or a number too large for the specified type (e.g., trying to convert 1000000 to a 16-bit number); they return 0 in these cases; use them to implement your code correctly;
if not an optional argument, return error ERR_INVALID_ARGUMENT
.
Please note:
+optional arguments may be repeated, e.g. -max_files 1000 -max_files 1291
; in this case, only the last value is valid;
the mandatory argument cannot be repeated.
+do_delete()
.We here describe how to implement the functionality for deleting an image. The idea is as follows: we don't actually delete the contents of the image, as this would be too costly (especially in terms of time). In fact, the size of the image base file on disk never decreases, even when you ask to "delete" an image from the base.
+Rather, an image is "deleted" by
EMPTY
in is_valid
;Changes must be made first to the metadata (memory, then disk), then to the header if successful.
+Note: for reasons of compatibility between systems, it is preferable to rewrite the entire "struct
" to disk, rather than just the modified fields.
The do_delete()
function takes the following arguments:
const char *
);imgfs_file
structure.To write the changes to disk, you first need to set the position at the right place in the file, using fseek()
(see the course and man fseek
) and then fwrite()
.
Of course, if the reference in the image database does not exist (and there is no invalidation), this must be handled correctly.
+Don't forget to update the header if the operation is successful. You also need to increase the version number (imgfs_version
) by 1, adjust the number of valid images stored (nb_files
) and write the header to disk.
do_delete_cmd()
Complete the code for do_delete_cmd()
. If the received imgID
is empty or its length is greater than MAX_IMG_ID
, do_delete_cmd()
should return the error ERR_INVALID_IMGID
(defined in error.h
).
help()
.The help
command is intended to be used in two different cases (already covered):
imgfscmd help
.The command output must have exactly the following format:
+imgfscmd [COMMAND] [ARGUMENTS]
+ help: displays this help.
+ list <imgFS_filename>: list imgFS content.
+ create <imgFS_filename> [options]: create a new imgFS.
+ options are:
+ -max_files <MAX_FILES>: maximum number of files.
+ default value is 128
+ maximum value is 4294967295
+ -thumb_res <X_RES> <Y_RES>: resolution for thumbnail images.
+ default value is 64x64
+ maximum value is 128x128
+ -small_res <X_RES> <Y_RES>: resolution for small images.
+ default value is 256x256
+ maximum value is 512x512
+ delete <imgFS_filename> <imgID>: delete image imgID from imgFS.
+
+Write the function in imgfscmd_functions.c
.
It's best to start testing your code on a simple case you're familiar with.
+Use a copy of the provided/tests/data/test02.imgfs
file from previous weeks (we insist: make a copy!!) to see its contents, delete one or two image(s). Check each time by looking at the result with list
.
Also test any edge cases you can think of.
+Test your two new commands (use help
to find out how to use create
;-P ).
To check that the binary file has been correctly written to disk, use last week's list
command.
We provide you with a bunch of unit and end-to-end tests, you can run them as usual.
+If you're on your own VM, please install libvips-dev
, e.g.:
sudo apt install libvips-dev
+
+As we move forward with the project, it is important that you can write your own tests, to complete the provided ones. You can find those in provided/tests/unit/
. Before adding new tests, don't forget to copy the test/
directory in done/
. You will also need to modify the TEST_DIR
variable in the Makefile
.
We strongly advise you to edit these files to add your own tests, or even to create new ones as you move forward. This can be done quite simply by adding your own values or lines of code to the tests already provided, or by copying this file and drawing inspiration from it (don't forget to update the tests' Makefile
accordingly). You don't need to understand everything in this file, at least not initially, but it is important you start to get familiar with its content.
That said, for those who want to go further, the main test functions available in the environment we use (Check) are described over there: https://libcheck.github.io/check/doc/check_html/check_4.html#Convenience-Test-Functions. For example, to test whether two int
are equal, use the ck_assert_int_eq
macro: ck_assert_int_eq(a, b)
.
We have also defined the following "functions" in tests.h
:
ck_assert_err(int actual_error, int expected_error)
: assert that actual_error
is expected_error
;ck_assert_err_none(int error)
: assert that error
is ERR_NONE
;ck_assert_invalid_arg(int error)
: assert that error
is ERR_INVALID_ARGUMENT
(i.e. correspond to the return code of a function which received a invalid argument; see error.h
) ;ck_assert_ptr_nonnull(void* ptr)
: assert that ptr
is not NULL
;ck_assert_ptr_null(void* ptr)
: assert that ptr
is NULL
.Finally, we'd like to remind you that just because 100% of the tests provided here pass doesn't mean you'll get 100% of the points. Firstly, because these tests may not be exhaustive (it's also part of a programmer's job to think about tests), but also and above all (as indicated on the page explaining the project grading scale, because we attach great importance to the quality of your code, which will therefore be evaluated by a human review (and not blindly by a machine).
+ +This week consists of two distinct objectives (remember to divide up the work):
+read
and insert
) which will be finalized next week;Notice also that the work up to this week (included, i.e. weeks 7, 8 and 9) is the first of the two deliverables that will be evaluated for this project. More details in the foreword.
+So don't forget to submit it before the deadline. Submission procedure is indicated at the end of this handout.
This week we provide you new tests as usual, as well as the script used to submit your first version of the project.
+One of the aims of this project course is to learn how to incorporate complex external libraries into your own work. In our case, we will make use of the VIPS library, for compressing images.
+First, you need to update your Makefile
to include the library in the compilation, by adding the following lines:
# Add options for the compiler to include the library's headers
+CFLAGS += $(shell pkg-config vips --cflags)
+
+# Add the library to the linker
+LDLIBS += $(shell pkg-config vips --libs)
+
+Then, you need to
+VIPS_INIT()
at the start of your main()
function, and give it argv[0]
as parameter;vips_shutdown()
at the end of the execution.To help you, please take a look at the online documentation of this library. You will need to use the following functions:
+vips_jpegload_buffer()
vips_jpegsave_buffer()
vips_thumbnail_image()
g_object_unref()
: equivalent of free()
for all VipsObject*
. To convert a VipsSOMETHING*
to a VipsObject*
, use the VIPS_OBJECT()
functional macro.Be aware that the first three functions take a variable number of parameters, thus you must terminate the parameter list by passing a NULL
pointer.
We stress that it's a significant part of your work this week to understand how to use this library.
+Note: You must be very careful when managing allocated memory and using VIPS at the same time. VIPS executes some operations lazily, i.e. they are deferred to the last moment. This means that, even if it does seem that you won't need an object anymore, it may actually still be needed to complete operations later on.
+One of the main functions of imgFS
is to transparently and efficiently manage the different resolutions of the same image (as a reminder: in this project, we'll have the original resolution and the "small" and "thumbnail" resolutions).
As a first step this week, you'll need to implement a function called lazily_resize()
. Its name suggests its usage: in computing, "lazy" corresponds to a commonly used strategy of deferring the work until the last moment, avoiding unnecessary work.
+(Teacher's note: don't confuse "computer science" with "studies in computer science" ;-)
).
This function has three arguments:
+THUMB_RES
or SMALL_RES
(see imgfs.h
);ORIG_RES
is passed, the function simply does nothing and returns no error (ERR_NONE
));imgfs_file
structure (the one we're working with);size_t
, position/index of the image to be processed.It must implement the following logic:
+error.h
and error.c
);resized_res
field) for the requested resolution; this is already the case when using vips_thumbnail_image()
with the simplest (= almost none) options;imgFS
file;metadata
in memory and on disk.To create the new image variant, you'll use the VIPS
library introduced below.
Your solution should consist of:
+image_content.c
file implementing the lazily_resize()
function;Makefile
(see above).The second component of the week concerns the de-duplication of images, to avoid the same image (same content) being present several times in the database. For a social network, this type of optimization saves a lot of space (and time).
+To do this, you need to write a do_name_and_content_dedup()
function, to be defined in a new image_dedup.c
file (and prototyped in image_dedup.h
).
This function returns an error code (int
) and takes two arguments (in this order):
a previously opened imgFS
file;
an index (type uint32_t
here) which specifies the position of a given image in the metadata
array.
In the image_dedup.c
file, implement this function as follows.
For all valid images in the imgfs_file
(other than the one at position index
and in ascending positions):
if the name (img_id
) of the image is identical to that of the image at position index
, return ERR_DUPLICATE_ID
; this is to ensure that the image database does not contain two images with the same internal identifier;
(then, ) if the SHA value of the image is identical to that of the image at position index
, we can avoid duplicating the image at position index
(for all its resolutions).
To de-duplicate, you need to modify the metadata at the index
position, to reference the attributes of the copy found (its three offsets and sizes; note that the original size is necessarily the same).
Note: don't modify the name (img_id
) of the image at the index
position: it's only the contents that are de-duplicated; you'll have two images with different names, but pointing to the same contents.
+This is, by the way, a good illustration of how indirection tables are used in file-systems.
If the image at position index
has no duplicate content, set its ORIG_RES
offset to 0.
+If the image at position index
has no duplicate name (img_id
), return ERR_NONE
.
As always, we provide you with a few tests, to run with make check
. We strongly advise you to write your own tests to complete those. Once you have finished your testing, you can also use the make feedback
.
As mentioned in the introduction, this week's work, together with the work of weeks 7 to 8, constitutes the first submission of the project.
+The deadline for this assignment is Sunday May 05, 23:59; make sure you don't fall behind schedule and properly divide up the work between you.
+The easiest way to submit is to do
+make submit1
+
+from your done/
directory. This simply adds a project01_1
tag to your commit (in the main
branch).
Although you can do as many make submit1
as you want, we really recommend you to do it only when you are sure you want to deliver your work.
Le but de ce tutoriel est de vous apprendre à utiliser des outils de débogage des aspects mémoire (dynamiques, donc ; « run-time »). Mais avant tout, n'oubliez pas déjà d'utiliser les autres outils présentés pour le débogage : les options du compilateur, l'analyseur statique et gdb
.
Dans ce tutoriel ci, nous allons présenter Address Sanitizer (alias « ASAN ») et Valgrind.
+Mais pour cela, nous allons avoir besoin de bugs mémoire. Téléchargez ici un programme comprenant un florilège d'erreurs sur les pointeurs :
+Commencez par regarder le programme fourni et comprendre son fonctionnement et ses erreurs (indiquées).
+Avant d'utiliser de nouveaux outils, essayez de compiler puis d'analyser statiquement le code fourni.
+Avec ces outils (options du compilateur et scan-build
), vous devriez facilement trouver les erreurs 1 et 2 ci-dessus.
+Laissez les pour le moment.
Address Sanitizer (alias « ASAN ») est un outil d'analyse des défauts d'accès mémoire utilisant le compilateur. Pour l'utiliser, il faut ajouter l'option
+-fsanitize=address
+
+au compilateur.
+Compilez avec cette option (ainsi que -g
, en tout cas, et toutes les autres options que vous souhaitez), puis lancez le programme. Vous devriez obtenir quelque chose comme :
3-2i
+0
+-5+i
+AddressSanitizer:DEADLYSIGNAL
+=================================================================
+==165699==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55bf9afe15e8 bp 0x7ffd6a193c50 sp 0x7ffd6a193c30 T0)
+==165699==The signal is caused by a READ memory access.
+==165699==Hint: address points to the zero page.
+ #0 0x55bf9afe15e7 in affiche complexe.c:76
+ #1 0x55bf9afe13ab in main complexe.c:38
+ #2 0x7f8874e2cbba in __libc_start_main ../csu/libc-start.c:308
+ #3 0x55bf9afe1159 in _start (complexe+0x1159)
+
+AddressSanitizer can not provide additional info.
+SUMMARY: AddressSanitizer: SEGV complexe.c:76 in affiche
+==165699==ABORTING
+
+Cela vous dit qu'il y a un « Segmentation Fault » (SEGV) dans affiche()
à la ligne 76, et que cette fonction a été appelée depuis la ligne 38 du main()
.
+Voyez-vous de quoi il s'agit ?
Nous allons y revenir plus tard, mais voyons d'abord l'autre outil.
+Valgrind est une suite d'outils d'analyse dynamique de code utilisant une machine virtuelle et la « compilation a la volée » (just-in-time (JIT) compilation).
+Il s'utilise en lançant simplement valgrind
devant le nom du programme à exécuter. Pour cela :
supprimer l'exécutable précédemment compilé (car on ne va pas utiliser en même temps valgrind et ASAN !) :
+ rm complexe
+
+recompilez mais SANS l'option -fsanitize=address
(par contre gardez au moins l'option -g
) ;
lancez :
+ valgrind ./complexe
+
+Vous devriez obtenir quelque chose comme :
+==165821== Memcheck, a memory error detector
+==165821== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
+==165821== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
+==165821== Command: ./complexe
+==165821==
+3-2i
+==165821== Use of uninitialised value of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+==165821==
+==165821== Use of uninitialised value of size 8
+==165821== at 0x109358: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+==165821==
+[... plusieurs répétitions possibles en fonction de votre machine ...]
+0 // ou une autre valeur
+-5+i
+==165821== Invalid read of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== Address 0x0 is not stack'd, malloc'd or (recently) free'd
+==165821==
+==165821==
+==165821== Process terminating with default action of signal 11 (SIGSEGV)
+==165821== Access not within mapped region at address 0x0
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== If you believe this happened as a result of a stack
+==165821== overflow in your program's main thread (unlikely but
+==165821== possible), you can try to increase the size of the
+==165821== main thread stack using the --main-stacksize= flag.
+==165821== The main thread stack size used in this run was 8388608.
+==165821==
+==165821== HEAP SUMMARY:
+==165821== in use at exit: 0 bytes in 0 blocks
+==165821== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated
+==165821==
+==165821== All heap blocks were freed -- no leaks are possible
+==165821==
+==165821== Use --track-origins=yes to see where uninitialised values come from
+==165821== For lists of detected and suppressed errors, rerun with: -s
+==165821== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
+Erreur de segmentation
+
+On y voit plus de choses entre l'affichage de a
(3-2i
) et les deux affichages suivants, et même aussi autre chose avant le crash final. De quoi s'agit-il ?
==165821== Use of uninitialised value of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x1091BE: main (complexe.c:38)
+
+vous dit que dans l'appel à la fonction affiche()
réalisé à la ligne 31 du main()
, vous utilisez une valeur non initialisée.
+Il vous le dit même au moins deux fois de suite. Pourquoi ?
+Simplement parce que (1) le pointeur p_b
n'est pas initialisé (première erreur) et (2) la valeur pointée (adresse quelconque) ne l'a pas non plus été (seconde erreur). Ensuite, en fonction de cette valeur non initialisée, plusieurs lignes de affiche()
sont encore exécutées (ou pas), donnant autant de messages d'erreur.
Enfin le :
+==165821== Invalid read of size 8
+==165821== at 0x109336: affiche (complexe.c:76)
+==165821== by 0x109207: main (complexe.c:44)
+==165821== Address 0x0 is not stack'd, malloc'd or (recently) free'd
+
+juste avant le crash, vous dit justement que vous lisez 8 octets (size 8
, soient 64 bits) invalides lors de l'appel affiche()
à la ligne 44 du main()
. C'est la même chose que ce que nous avions déjà vu avec les options du compilateur, l'analyse statique et aussi ASAN (c'est une erreur tellement grosse que tout le monde la voit ! ;-)
).
Il est maintenant temps de corriger ce programme.
+Je vous conseille de toujours commencer par corriger les erreurs détectées avec les outils les plus simples en premier.
+Normalement, si vous avez suivi les conseils des semaines précédentes, vous devriez compiler avec assez d'options pour trouver facilement l'erreur de retour d'adresse de variable locale.
+Corrigez la (supprimez sans autre la fonction bad_addition()
et son appel), puis recompilez. Normalement, cela devrait compiler sans warning (majeur, ceux qui ont -Wcast-qual
, n'utilisez pas cette option ici).
Utilisez l'analyseur statique (scan-build
; revoir si nécessaire les autres outils présentés pour le débogage) pour trouver une autre erreur. Corrigez la (p.ex. en supprimant la ligne 37).
Relancez l'analyseur statique. Il en trouve une autre !
+Corrigez la également (p.ex. en déplaçant la ligne du free
).
Relancez à nouveau l'analyseur statique. Il en trouve encore une autre !!
+Corrigez la aussi (suppression de la ligne).
Relancez encore une fois l'analyseur statique. Il arrive encore à en trouver deux autres !!!!
+Corrigez les aussi (suppression de la ligne 42 et ajout d'un free à la fin).
Relancez pour la dernière fois l'analyseur statique. Ca y est, ça passe !
+Bilan à ce stade : 6 erreurs sur 7 trouvées.
+Compilez en ajoutant ASAN et lancez le programme.
+Il trouve le buffer overflow :
+==178554==ERROR: AddressSanitizer: heap-buffer-overflow on address [...]
+WRITE of size 16 at [...]
+ #0 0x556e52e43592 in main complexe.c:56
+[...]
+
+Laissez la pour le moment et voyons ce que dit valgrind
.
Supprimez l'exécutable et recompilez le sans ASAN ; puis lancez le avec valgrind
.
+Il la trouve aussi :
==178672== Invalid write of size 8
+==178672== at 0x1092D3: main (complexe.c:56)
+==178672== Address 0x4a39540 is 0 bytes after a block of size 32 alloc'd
+==178672== at 0x4838B65: calloc (vg_replace_malloc.c:762)
+==178672== by 0x109290: main (complexe.c:53)
+==178672==
+==178672== Invalid write of size 8
+==178672== at 0x1092D6: main (complexe.c:56)
+==178672== Address 0x4a39548 is 8 bytes after a block of size 32 alloc'd
+==178672== at 0x4838B65: calloc (vg_replace_malloc.c:762)
+==178672== by 0x109290: main (complexe.c:53)
+
+C'est la même erreur que celle pointée par ASAN, mais valgrind la voit en 2 écritures de 8_octets, alors que ASAN la reportée comme une écriture de taille 16 octets. C'est une question de point de vue (les deux champs du Complexe
, ou tout le Complexe
lui-même).
Corrigez l'erreur et retestez avec ASAS et avec valgrind.
+C'est une question de goût. A vous de voir à l'usage.
+Y a-t-il des erreurs que l'un voit et pas l'autre ? Personnellement, je n'en sais rien. Et j'utilise les deux pour être sûr ;-)
Ces outils peuvent aussi détecter les fuites de mémoire (que l'analyseur statique auraient ratées). Par exemple (supprimez les free que vous aviez ajouté) :
+valgrind --leak-check=full ./complexe
+
+(ASAN n'a a priori pas besoin d'option supplémentaire. Si ce n'est pas le cas sur votre machine, faites :
+export ASAN_OPTIONS=detect_leaks=1
+
+)
+ +Docker est une alternative aux machines virtuelles. C'est une architecture logicielle qui permet d'exécuter du code (y compris des logiciels système) en local sur sa machine mais dans un environnement isolé (appelé « container »). Pour en savoir plus : voir la page Wikipédia.
+Docker utilise deux concepts de base:
+Il est important de bien comprendre la distinction des deux. En particulier, des modifications effectuées dans containers ne vont pas affecter son image (contrairement à ce qui se passe avec une machine virutelle par exemple).
+Pour voir toutes les images disponibles sur votre machine (une fois Docker installé, cf ci-dessous) :
+docker images
+
+Pour voir tous les containers en cours sur votre machine :
+docker ps -a
+
+L'installation de Docker sur votre machine est normalement assez facile. Voir leur page d'installation pour plus de détails.
+Vérifiez que Docker fonctionne
+docker run hello-world
+
+En cas de succes, vous verrez un message de confirmation "Hello from Docker".
+Par contre, certaines installations nécessite des privilèges additionnels pour tourner Docker.
+Si vous avez un message d'erreur, suivez les instructions ici et changez les permissions du socket si nécessaire.
+Pour Ubuntu comme host, la marche à suivre est:
+sudo groupadd docker
+sudo usermod -aG docker ${USER}
+sudo chmod 666 /var/run/docker.sock
+
+Si vous souhaitez développer sous Docker (mais ce n' est pas ce que nous recommandons comme premier choix, en particulier à celles et ceux qui ne codent pas sur la ligne de commande (vim
) mais utilisent plutôt une interface graphique), vous pouvez créer votre propre image de travail.
+Si vous n'utiliserez Docker que pour recevoir le feedback du cours, il n'est pas nécessaire de faire une image spécifique (nous fournirons notre image pour les tests).
Docker propose déjà plusieurs images sur son « hub ». Le plus simple pour créer une image de développement pour le cours, c'est de partir d'une image Ubuntu :
+docker pull ubuntu
+
+Puis lancer l'image (= créer un nouveau container ; voir tout en bas de la page pour un rappel des principales commandes) :
+docker run -ti ubuntu bash
+
+Au cas où l'image ne serait pas à jour (ces commandes sont à exécuter dans le shell du container Ubuntu) :
+apt update
+apt upgrade -y
+
+Installation des outils nécessaires pour le cours (cette commande est à exécuter dans le shell du container Ubuntu) :
+apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Quittez le container :
+exit
+
+Créez une nouvelle image à partir de nouvel état de votre container :
+cherchez l'id du container :
+ docker ps -a
+
+créez l'image :
+ docker commit CONTAINER_ID projet-cs212
+
+en remplaçant CONTAINER_ID
par le bon id; par exemple :
docker commit 55959d62b348 projet-cs212
+
+supprimez le container :
+ docker rm CONTAINER_ID
+
+Vous pouvez maintenant lancer votre nouvelle image et, par exemple, y compiler votre projet.
+Pour cela, nous vous conseillons de « mounter » le répertoire où se trouvent vos codes sources sur le container avec l'option -v
de docker run
.
Par exemple, si vous êtes sur une machine Unix, allez dans le répertoire de vos codes sources et faites :
+docker run -ti --rm -v $(pwd):/localhost projet-cs212
+
+Autre exemple :
+docker run -ti --rm -v /home/chezmoi/projet:/localhost projet-cs212
+
+Vous aurez alors accès dans le container à vos fichiers via /localhost
. Par exemple :
ls /localhost
+
+Vous pouvez alors y compiler votre projet. Par exemple :
+cd /localhost
+make
+
+Quittez le container avec :
+exit
+
+(ou simplement CTRL-D).
+obtenir de l'aide :
+ docker help COMMAND
+
+par exemple :
+ docker help ps
+
+Liste de tous les containters :
+ docker ps -a
+
+Supprimer un container :
+ docker rm CONTAINER_ID
+
+Supprimer tous les containers :
+ docker rm $(docker ps -aq)
+
+Liste de toutes les images :
+ docker images
+
+Supprimer une image (qui n'a plus de container) :
+ docker rmi IMAGE_ID
+
+Créer un container en mode interactif et le supprimer automatiquement en fin :
+ docker run -ti --rm CONTAINER_ID
+ docker run -ti --rm CONTAINER_ID COMMANDE
+
+par exemple :
+ docker run -ti --rm ubuntu bash
+
+Créer un container en mode interactif sans le supprimer automatiquement :
+ docker run -ti CONTAINER_ID
+
+Créer un container en mode interactif en « mountant » le système de fichier local (c.-à-d. en ayant dans le container un accès à un endroit du disque local) :
+ docker run -ti --rm -v local_dirname:container_dirname CONTAINER_ID
+
+par exemple :
+ docker run -ti --rm -v /home/machin:/tmp/home_machin_local CONTAINER_ID
+
+Note : les noms de fichiers/répertoires doivent être absolus (pas relatifs).
+Redémarrer un container en pause :
+ docker start CONTAINER_ID
+
+Copier des fichiers entre la machine locale et un container :
+de la machine locale au container :
+docker cp local_filename CONTAINER_ID:where_to_put
+
+du container à la machine locale :
+docker cp CONTAINER_ID:where_to_get local_filename
+
+Créer une nouvelle image à partir de l'état d'un container :
+ docker commit CONTAINER_ID IMAGE_ID
+
+Un débogueur est un programme qui permet de suivre le déroulement d'un autre programme, de l'arrêter, d'ausculter l'état de la mémoire (valeur de variables par exemple), etc. ; ce qui est particulièrement utile pour rechercher des erreurs de programmation.
+Nous expliquons ici les bases de l'utilisation d'un débogueur à l'aide du débogueur gdb
à la ligne de commande, mais vous pouvez bien sûr utiliser des versions avec interface graphique, souvent intégrées dans les IDE ; les principes de base restent les mêmes ; en salles CO, vous avez par exemple ddd
ou le module debuger
intégré dans Geany
pour lequel vous pouvez trouver un tutoriel là-bas (attention ! il s'agit d'un autre cours) ; pour d'autres GUI voir ce lien, parmi lesquelles nous vous recommandons gdbgui (site officiel ; site GitHub).
Vous pouvez aussi utiliser un autre débuger, comme par exemple lldb
; là aussi, les principes de base restent les mêmes. La correspondance entre les commandes gdb
et lldb
se trouve ici.
NOTE pour macOS : depuis OS X 10.9, Apple est passé à LLVM ; il n'y a donc plus gdb
de base. Si vous êtes sur Mac, vous avez alors deux options :
soit utiliser lldb
;
soit installer gdb
(via brew
) et le signer ;
+OS X a un mécanisme de contrôle d'accès aux autres processus qui nécessite un binaire signé (ce qui est nécessaire pour un débuggeur) ;
+pour signer le binaire gdb
après son installation, il faut suivre les instructions qu'on peut trouver sur Internet ; par exemple :
gdb-entitlement.xml
» est nécessaire pour les Mac avec système d'exploitation Big Sur.Avant de lire la suite, nous vous proposons en guise d'introduction de regarder un tutoriel vidéo de 23 minutes, crée par Chris Bourke et disponible sur Youtube. Ce tutoriel explique comment trouver des erreurs dans du code en utilisant le débogueur gdb
. La plupart des notions évoquées dans cette vidéo sont ensuite reprise pas à pas dans la suite de ce tutoriel.
Quelques remarques pour vous faciliter la comprehension de ce tutoriel vidéo :
+À 1:08, Chris dit que le débogueur, dans son exemple, se lance avec la commande suivante : gdb a.out
. Il faut remarquer que, pour vous, a.out
devrait être remplacé par le nom de votre programme (executable).
À 1:14, il faut (temporairement) ignorer l'histoire d'arguments du programme. Vous verrez ca plus tard dans le cours.
+À 2:34, il lance le programme avec la commande ./a.out
. Ici aussi, utilisez le nom de votre programme (executable) à la place de a.out
.
Jusqu'à 16:17, tout devrait être assez clair (sauf les arguments du programme en 1:14, comme dit ci-dessus). À partir de là, il utilise des notions de C non encore vues en cours. Vous pouvez donc arrêter ici cette vidéo et y revenir plus tard, ou continuer à la regarder pour voir comment utiliser gdb
mais sans chercher à comprendre en profondeur les problèmes de C qui sont évoqués :
Vous êtes maintenant prêts à lire et suivre les instructions et explications ci-dessous.
+La première chose à faire pour pouvoir utiliser un débogueur est de demander au compilateur de mettre des informations supplémentaires dans le programme afin de permettre au débogueur de se repérer. Cela se fait en ajoutant l'option -g
lors de la compilation. Par exemple :
gcc -g -o mon_programme mon_programme.c
+
+Compilez de la sorte un des programmes fournis ; p.ex. :
+gcc -g -std=c99 -o ex1 ex1.c
+
+ou
+gcc -g -std=c99 -o stats stats.c -lm
+
+NOTE : nous utiliserons pour ce cours la norme C99 ; pour certains compilateurs, la compilation peut alors nécessiter l'ajout de l'option -std=c99
comme indiqué ci-dessus. Vous pouvez bien sûr aussi utiliser des normes plus récentes (p.ex. -std=c17
).
Ensuite, on peut exécuter le programme dans le débogueur. On lance pour cela le débogueur avec comme argument le programme à déboguer ; p.ex. :
+gdb ./ex1
+
+ou
+gdb ./stats
+
+On se retrouve dans le débogueur (c'est ici un interpréteur de commandes), dans lequel on ne voit pas grand chose pour le moment. Pour voir le code, tapez
+layout src
+
+Le code ne s'affiche pas encore car gdb
n'a pas encore lancé notre programme.
+Lancez simplement son exécution avec la commande :
run
+
+Le programme se déroule alors normalement (on peut déjà remarquer l'un ou l'autre bugs ;-)
. Tapez Ctrl-C
pour l'arrêter quand vous en avez assez).
Tapez
+quit
+
+pour quitter le débogueur.
+Si ce n'est pas déjà fait, ouvrez le code stats.c
dans un éditeur pour voir de quoi il s'agit.
+Le but de ce programme est de calculer la moyenne et l'écart-type (non biaisé) de l'âge d'un ensemble de 1 à 1024 personnes.
Vous voyez au début du programme une variable nb_people
qui est lue au clavier à la ligne 22. Utilisons le débogueur pour aller voir la valeur lue.
+Pour cela, relancez le débogueur sur notre programme :
gdb ./stats
+
+puis
+layout src
+
+Mais cette fois, ajoutons un « point d'arrêt » (breakpoint) avant de lancer l'exécution. Cela se fait à l'aide de la commande break
:
break 22
+
+NOTE : pour en savoir plus sur cette commande, vous pouvez taper :
+help break
+
+Vous verrez alors que l'on peut non seulement indiquer des numéros de ligne, mais aussi des noms de fonctions (entre autres).
+On peut par ailleurs mettre autant de point d'arrêt que l'on veut.
Une fois le point d'arrêt placé, lancer l'exécution :
+run
+
+Cette fois le débogueur arrête l'exécution du programme à la ligne 22 et vous l'indique.
+Vous pouvez à ce stade donner des commandes au débogueur comme voir la valeur d'une variable, avancer d'un pas l'exécution du programme, continuer l'exécution ou ajouter un autre point d'arrêt.
+Commençons par regarder la valeur de la variable nb_people
:
print nb_people
+
+vous affiche le résultat :
+$1 = 0
+
+($1
veut simplement dire que c'est la première expression que vous avez demandé qui est ici affichée).
NOTE : toutes les commande gdb
peuvent être abrégées tant qu'elles ne sont pas ambiguës. Ici, on aurait donc simplement pu entrer :
p nb_people
+
+A noter aussi qu'on a la complétion automatique avec la touche TAB
. Essayez :
p nb_<TAB>
+
+Mais il reste néanmoins fastidieux de toujours avoir à retaper des commandes print
utiles. Il existe deux moyens d'éviter cela :
gdb
garde toutes les commandes en mémoire ; il suffit donc de naviguer dans l'historique avec les flèches (Haut et Bas) pour retrouver une commande déjà entrée ;display
affiche automatiquement l'expression demandée à chaque arrêt du débogueur (si tant est que l'expression fait sens à l'endroit de l'arrêt).Essayons la commande display
(on verra mieux son effet dans un instant) :
display nb_people
+
+Essayons maintenant de continuer l'exécution.
+Si vous ne savez plus où vous en êtes dans le programme, la commande :
where
+
+vous l'indiquera (ici : dans la fonction main()
à la ligne 22 du programme stats.c
).
NOTE : where
est en fait un alias pour backtrace
ou bt
, qui sont aussi souvent utilisés.
Pour avancer d'un pas, tapez :
+next
+
+Le débogueur exécute alors le scanf
. C'est pour cela que vous avez le texte de la question qui apparaît.
+Répondez-y.
Le débogueur vous indique alors s'être arrêté à la ligne 26 (vu qu'il n'y a pas de code aux lignes 23 à 25).
+La commande next
n'exécute en effet qu'une seule ligne du programme.
+Si l'on avait voulu continuer l'exécution sans ne plus s'arrêter (en fait : continuer jusqu'au prochain point d'arrêt, mais comme nous n'en avons pas d'autre...), on aurait utilisé la commande (ATTENTION ! NE le faites PAS ici) :
cont
+
+Vous pouvez également remarquer qu'en plus de la ligne 26, le débogueur vous a affiché la nouvelle valeur (celle saisie) de la variable nb_people
. C'est le résultat de votre display
précédent. Sans cette commande display
, la nouvelle valeur n'aurait pas été affichée et il vous aurait fallu entrer un nouveau print
pour la voir.
REMARQUES :
+next
peut s'abréger n
;
si l'on entre aucune commande, c'est simplement la commande précédente qui s'applique à nouveau ; cela est particulièrement pratique avec next
: il suffit d'appuyer ensuite sur Enter plusieurs fois pour avancer pas à pas ;
next
peut être complété d'un nombre de répétitions :
next 8
+
+fera par exemple 8 fois next
;
+next
tout seul est donc la même chose de next 1
;
Une confusion fréquente lors de la prise en main de débogueur est celle entre next
+et step
:
next
passe à l'expression suivante en restant au même niveau ; sans rentrer dans les sous-routines (= appel de fonctions) ;step
passe à la prochaine expression à évaluer, où qu'elle soit ; même si celle-ci est dans une sous-routine (et même si ce n'est pas une sous-routine à nous).Illustrons cela en ajoutant un point d'arrêt supplémentaire un peu plus loin :
+break 42
+
+et continuez l'exécution jusque là-bas avec un simple :
+cont
+
+(répondez normalement aux questions).
+Arrivé à la ligne 42, tapez
+next
+
+pour continuer. Vous voyez que la ligne 42 est exécutée et que l'on passe à la ligne 43.
+Reprenons l'exemple en relançant l'exécution depuis le début :
+run
+y
+
+Le débogueur arrête à nouveau l'exécution à la ligne 22. Comme cela ne nous intéresse plus, supprimons ce point d'arrêt :
+info br
+
+nous montre qu'il s'agit du point d'arrêt numéro 1 ; que l'on supprime :
+delete 1
+
+Puis l'on continue l'exécution :
+cont
+
+jusqu'à la ligne 42.
+Si l'on tape maintenant step
au lieu de next
, on passe à la ligne...
+...28 ? [Note : cela ne fonctionne pas sur macOS sur cet exemple (printf
), mais fonctionnera avec vos propres fonctions.]
__printf (format=0x40094d "\nMoyenne : %g\n") at printf.c:28
+28 printf.c: No such file or directory.
+
+Oui, 28 ! Mais pas de notre programme ; la ligne 28 de printf.c
qui est le fichier qui a été compilé (il y a bien longtemps) pour donner le code de printf
dans la bibliothèque C !
+Et auquel nous n'avons pas accès (il n'est certainement pas sur votre ordinateur).
Que s'est il passé ?
+Avec le step
, nous sommes passés à la prochaine instruction C, qui se trouve en fait être à l'intérieur de printf
lui-même (il a bien fallu l'écrire !!).
Essayez encore quelques step
(au moins 7). Vous voyez que l'on « s'enfonce » dans la bibliothèque C...
+Un
where
+
+après plus de 7 step est d'ailleurs intéressant :
+#0 _IO_vfprintf_internal (s=0x7ffff7ad1740 <_IO_2_1_stdout_>, format=0x40094d "\nMoyenne : %g\n", ap=ap@entry=0x7fffffffdcc8) at vfprintf.c:1278
+#1 0x00007ffff7781209 in __printf (format=<optimized out>) at printf.c:33
+#2 0x0000000000400839 in main () at stats.c:42
+
+Nous sommes dans une fonction _IO_vfprintf_internal
qui a elle-même été appelée par une fonction __printf
que nous avons appelée depuis la ligne 42 de notre programme.
+Ca commence à ressembler aux messages d'exceptions de Java ;-)
!
Comme on est perdu, terminons l'exécution du programme (et ce tutoriel) avec un simple
+cont
+
+layout src
run
ou r
help
break NUMERO_DE_LIGNE
ou br NUMERO_DE_LIGNE
break NOM_DE_FONCTION
ou br NOM_DE_FONCTION
delete
info br
where
ou bt
(ou backtrace
)
print
ou p
display
cont
ou c
next
ou n
step
ou s
Plus tard dans le projet, vous utiliserez peut être des tests unitaires avec la bibliothèque check
. Mais ces tests unitaires se lancent un nouveau sous-processus par test (fork()
) et c'est donc plus difficile à suivre. Si vous souhaitez debogguer avec gdb
ces programmes de tests-unitaires, voici quelques compléments :
entrez ces options dans gdb
:
set follow-fork-mode child
+ set detach-on-fork off
+
+suivez dans quel sous-processus vous êtes avec la commande :
+ info infe
+
+changez de processus avec infe
suivi d'un numéro (tel qu'indiqué par info infe
) ; p.ex. :
infe 1
+
+ne mettez pas de breakpoints sur le code des unit-test-*
eux-mêmes (car ils sont écrit avec des macros en fait), mais sur du « vrai » code C, soit celui des fonctions-outils utilisées pour ces tests, soit carrément sur votre propre code à vous.
Exemple :
+Supposons que ce soit dans le 5e test que vous ayez des problèmes. Ce sera donc le 5e sous-processus qui vous intéresse.
+Commencez alors comme d'habitude par lancer le débogueur sur le programme de tests-unitaires :
+gdb ./unit-test-machin
+
+Ajoutez les options suggérées :
+set follow-fork-mode child
+set detach-on-fork off
+
+Mettez le breakpoint à l'endroit qui vous intéresse, p.ex. ici sur une fonction fait_machin_truc()
:
break fait_machin_truc
+
+Et lancez l'exécution dans le débogueur :
+run
+
+gdb
s'arrêtera au premier break (ou alors au premier crash ;-)
).
+On regarde où l'on se situe :
info infe
+
+On est p.ex. dans le 2e processus, c.-à-d. dans le 1er test (car le processus 1, c'est le main()
et les tests créent un sous-processus à chaque fois) ; ce n'est pas celui-ci qui nous intéresse, donc on continue :
cont
+
+gdb
nous dit alors, par exemple, que le 2e process (« Inferior 2
») est fini, mais il s'y trouve encore (faites « info infe
» pour voir). Il faut donc ramener gdb
au process père :
infe 1
+
+et on continue :
+cont
+
+Il nous arrête à nouveau au breakpoint. On regarde à nouveau où l'on est :
+info infe
+
+...Et on continue comme ça jusqu'au breakpoint qui nous intéresse.
+Là on peut faire des next
, display
, print
etc. comme d'habitude.
On peut comme celà « se promener » de processus en processus (infe <numero>
) et savoir où on est (info infe
).
+Avec un peu d'habitude on arrive à s'y retrouver ;-)
La première chose à faire avant d'utiliser git
est de le configurer.
+Ceci n'est à faire qu'une seule fois.
Dans la ligne de commande, tapez le code suivant, ligne par ligne,
+en remplaçant les #<XXXX>#
par des informations personnelles correspondantes:
git config --global user.name #<UN USERNAME>#
+git config --global user.email #<VOTRE EMAIL EPFL>#
+
+Ensuite, si vous aimez la couleur sur le terminal, vous pouvez ajouter :
+git config --global color.diff auto
+git config --global color.status auto
+git config --global color.branch auto
+
+Ajoutons quelques alias. Pour cela, créez/éditez un fichier ~/.gitconfig
+(i.e. fichier nommé « .gitconfig » à la racine de votre répertoire personnel.
+Sur les machines du CO, pensez à le recopier dans votre myfiles
).
+Mettez y les lignes suivantes :
[alias]
+ lg = log --graph --abbrev-commit --decorate --date=relative --format=format:'%C(bold blue)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(dim white)- %an%C(reset)%C(bold yellow)%d%C(reset)' --all
+ glog = log --graph --decorate --oneline --all
+unstage = reset HEAD --
+ last = log -1 HEAD
+
+Avant de se lancer dans l'utilisation de git
, il faut en comprendre le but et la logique :
le but est de travailler en commun sur du contenu partagé et, pour cela, archiver les différentes versions (git
est un « gestionnaire de versions ») ;
la logique est d'avoir trois niveaux :
+git clone
et dans lequel vous allez travailler (chacun(e)).Le niveau 2 est en fait assez « abstrait » au sens où vous ne le voyez pas concrètement ; il est totalement géré par des commandes git
.
Pour faire passer quelque chose du niveau 1 aux niveaux 2 et 3 en même temps (c.-à-d. récupérer quelque chose mis à disposition par d'autres dans le server principal), on fait (toutes ces commandes dont détaillées ci-dessous ; ici on se concentre sur les concepts) :
+git pull
+
+Je vous conseille de le faire assez régulièrement et en tout cas systématiquement avant vos commit
/push
(expliqués ci-dessous).
Pour valider quelque chose de local, c.-à-d. passer du niveau 3 au niveau 2 uniquement :
+soit, pour valider une nouvelle version d'un fichier déjà connu :
+ git commit -m "MESSAGE" FICHIER
+
+veuillez à chaque fois mettre un message pertinent ;
+p.ex., pour valider une nouvelle version du fichier core.c
:
git commit -m "correction du bug de calcul" core.c
+
+soit, pour ajouter un nouveau fichier (tout ceci est repris en détails ci-dessous) :
+ git add FICHIER
+ git commit -m "ajout de EXPLICATION"
+
+p.ex. pour ajouter le nouveau fichier io.c
:
git add io.c
+ git commit -m "ajout des entrées/sorties"
+
+Note : pas besoin de remettre le nom du fichier au commit
suivant un ou des add
; cela permet en fait d'ajouter plusieurs fichiers d'un coup ; par exemple :
git add io_core.c
+ git add io_errors.c
+ git add io.h
+ git commit -m "ajout des entrées/sorties"
+
+Pour publier en commun ses validations locales, c.-à-d. faire passer du niveau 2 au niveau 1 :
+git push
+
+Nous insistons donc sur le fait que pour publier à tous une modification locale, il faut bien faire DEUX choses :
+git commit
+
+puis
+git push
+
+Reprennons tout ceci (et plus) en détails.
+Voyons le premier principe de git
: archiver le travail effectué.
+Pour cela créez un répertoire et allez-y :
mkdir alice
+cd alice
+
+Ajoutez y un fichier :
+echo "This is a README file" > README.md
+
+puis archivez dans git
l'état courant de ce répertoire :
git add .
+git commit -m "Initial commit: README"
+
+git add
vous permet de proposer (sans que ce soit encore confirmé) un changement ;
+ici nous avons mis tout le répertoire courant avec son nom court : «
+.
», mais on aurait aussi pu ne mettre qu'un fichier, par exemple :
git add README.md
+
+Nous vous DÉCONSEILLONS d'ailleurs de faire des git add .
car cela ajoute souvent plein de mauvaises choses : tous les fichiers du répertoire courant, y compris des fichiers temporaires, brouillons, etc.
Nous vous conseillons par ailleurs de faire un git status
AVANT de faire vos git commit
pour bien vérifier ce que vous ajoutez.
git commit
confirme l'enregistrement (local) des changements proposés ;
+il est fortement recommandé (si ce n'est obligatoire ;-)
) de
+commenter ses changements en ajoutant un message au commit
; c'est
+ce que nous avons fait avec l'option -m
.
Si quelque chose est modifié, Git peut vous le dire:
+echo "A second line for the README" >> README.md
+git status
+
+On pourrait proposer d'ajouter cette nouvelle modification pour un commit
futur:
git add README.md
+
+Cette façon de procéder en deux (puis trois, comme nous verrons tout à +l'heure) étapes peut parraître fastidieuse, mais c'est une bonne +protection contre les bêtises et un bon moyen de faire les choses +petit à petit, une à une.
+Une fois que vous êtes prêt à enregistrer (localement) vos modification, faites un commit
...
+...sans oublier d'ajouter un commentaire pertinent avec -m
:
git commit -m "Adding a second line to the README file"
+
+Avec Git, on peut voir tout le états enregistrés (snapshots) et même +se déplacer de l'un à l'autre (mais c'est plus avancé et vous ne +devriez pas en avoir besoin) :
+git log
+
+ou:
+git lg # si vous avez défini l'alias plus haut...
+
+Pour se déplacer (en guise ici d'illustration, mais ce n'est pas +nécessaire de comprendre cette partie au niveau de ce cours) :
+git checkout 5d340 # Mettez un numéro de commit approprié, ancien
+cat README.md
+
+Voyez que c'est une ancienne version. +Revenons à l'état courant :
+git checkout master
+cat README.md
+
+Vous avez maintenant compris la notion d'états archivés par Git (snapshots), et donc la +différence entre le répertoire de travail courant et l'archive.
+Le deuxième concept qu'il faut bien comprendre c'est les DEUX archives qui existent.
+Git permet en effet de travailler à plusieurs (cf section suivante) et utilise pour cela deux archives différentes :
Pour « pousser » vos changements enregistrés localement (avec des commit
) vers le server central, il faut faire :
git push
+
+Je vous recommande grandement de faire au préalable un
+git pull
+
+avant chacun de vos
+git push
+
+Le pull
permet de synchroniser dans l'autre sens : aller chercher
+les modifications enregistrées dans le server et les appliquer
+localement.
Plus de détails dans la suite...
+Git est avant tout un outil de travail collaboratif que beaucoup de gens utilisent
+justement pour travailler « en parallèle ». Il est donc prévu pour faciliter la
+gestion de modifications « concurrentes » (ou en tout cas « parallèles » ;-)
).
Supposons que Alice a un collaborateur, Bob, sur son projet, et qu'il ait fait +des modifications de son coté :
+echo "Hey, this is a line added by Bob" >> README.md
+git commit -m "Add greeting from Bob" README.md
+
+pendant, qu'en parallèle, Alice continuait aussi à travailler :
+sed -i 's/This is a README file/This is a README file with a twist/' README.md
+git commit -am "Add a twist to the first line of the README"
+
+Où en sommes nous ? Quels sont les états de Git ?
+A partir du dernier état commun (dernier pull
des 2 cotés), il y a en fait
+deux commit
bien séparés :
Jusque là, pas de confusion possible, donc.
+Alice et Bob peuvent maintenant collaborer en partageant leurs contributions.
+Supposons que Bob « pousse » le premier (pas besoin de pull
avant ici) :
git push
+
+Quand Bob fait cela, le server central reçoit et enregistre la modification de Bob. Pas de problème ici.
+Un git lg
du coté Bob montre que origin/master
(celui du server) et master
(celui local) sont
+maintenant les mêmes.
Alice de son coté ne sait pas que le changement de Bob a été propagé au server central. +Quand elle essaye de « pousser » ses modifications vers le server :
+git push
+
+elle rencontre un problème : un message lui dit que son push
a
+échoué et qu'elle doit d'abord fetch
(= récupérer) les modifications
+enregistrées sur le server central...
+Ce qu'elle fait docilement :
git fetch
+
+Avec un
+git lg
+
+de son coté, Alice voit qu'elle a maintenant deux commit
: un appelé
+origin/master
, qui correspond à celui de Bob et un autre appelé
+master
ou HEAD
qui correspond au sien.
+Si Alice veut « pousser » ses modifications vers le server central, elle doit d'abord
+fusionner/regrouper (= merge) ces 2 états différents. Si il n'y a pas de conflit (modifications
+parrallèles non concurrentes), cela se fait simplement comme suit :
git merge
+# Enter a commit message...
+
+Elle peut vérifier l'état :
+git lg
+
+puis « maintenant » pousser ce nouvel état, résultant de la fusion des deux modifications :
+git push
+
+A partir de là, Alice et le server central sont synchronisés. Pour que Bob soit +aussi synchronisé, il lui faut aussi faire un
+git fetch
+
+ou plus simplement un
+git pull
+
+Cette commande (git pull
) permet de faire un fetch
puis un merge
d'un seul coup.
Un « tag » (étiquette) est simplement un nom donné à un état mémorisé (snapshot).
+Contrairement aux années précédentes (pour ceux qui auraient connu), nous ne les utiliserons pas spécialement. Mais vous pouvez les utilisez, si vous le souhaiter, pour marquer une version particulière de votre projet, typiquement pour vous souvenir d'une version stable. Mais c'est un détail.
Pour donner une étiquette à l'état courant, il suffit simplement de faire :
+git tag -a NOM_DU_TAG -m "message"
+
+Par exemple, si vous voulez nommer l'état courrant « version1.1 », vous faites :
+git tag -a version1.1 -m "Version 1.1 stable"
+
+La commande
+git tag
+
+donne simplement la liste de tous vos « tags ».
+Pour voir à quoi correspond un « tag » donné : git show NOM_DU_TAG
; par exemple :
git show version1.1
+
+Pour pousser le tag vers GitHub, ajoutez --tags
au push:
git push --tags
+
+Si vous souhaitez en apprendre plus sur Git et GitHub, vous pouvez aller voir ce tutoriel (en anglais).
+ +Here (as a quick introduction, or later as a reminder) is the bare minimum you need to know (but you are, of course, welcome to read on):
+a Makefile
is just a simple text file (if it's simply called "Makefile
" on its own, with no extension), which is automatically called by the make
command, and which simply contains a "to-do list" (known as "targets");
one line of the Makefile
simply describes one target and what is needed to make it (known as "dependencies"), in the format:
target: list of dependencies
+
+for example (fictitious):
+ cake: flour eggs butter sugar chocolate yeast
+
+and that's it! Simple as that! Except that for us, targets are executables and dependencies are .o
files; for example:
calculCplx: calculCplx.o complex.o calculator.o
+
+compilation dependencies (for the creation of a .o
file, then) are simply the corresponding .c
file, together with the list of required .h
files; e.g.:
calculator.o: calculator.c calculator.h complex.h
+
+Note that all these target-dependency lines for compilation can be obtained simply by typing the command:
+ gcc -MM *.c
+
+Often, by convention, the first target is called "all
" and designates all the executables you wish to build with this Makefile
.
To sum up, here's a simple but complete example of a Makefile
:
all: calculCplx
+
+calculCplx: calculCplx.o complex.o calculator.o
+
+# These lines were copied from the gcc -MM *.c command
+
+complex.o: complex.c complex.h
+
+calculator.o: calculator.c calculator.h complex.h
+
+calculCplx.o: calculCplx.c calcGUI.h
+
+And that's it! As simple as this!
+Note: this is a written tutorial. You might prefer the video lectures; choose your favorite learning way (or maybe benefit from both).
+For the sake of modularization, the source code of a complete program written in C is often distributed over several text files called "source files". Source files are of two kinds: header files and main files (often called "definition files", or even simply "source files", hence some terminological confusion). By convention, header files have the .h
extension, while definition files have the .c
extension.
These are "glued together" by the compiler to create an executable program from the source code.
+A pair (header file, definition file) corresponding to a given concept is called a "module".
+What's the purpose of a header file, then?
+A header file is ther to announce to the other modules the functionality (API) provided by the module it is part of.
+For example, a matrix.h
file will contain the module's API for matrices.
In header files, we typically write:
+#pragma once
(see below);
directives to include the other header files necessary for this header file only (see below);
+(very frequent) declarations of types offered by the module;
+(very frequent) declarations of the functions offered by the module (corresponding to the "public" part in an OO design);
+(frequent) some "macros" (lines beginning with the #define
symbol);
(rare) declarations of (global) variables to be shared with other modules by the current module.
+In the definition file (with extension .c
), we typically write:
directives to include the header files necessary for this source file only (see below);
+declarations of variables or functions used exclusively in the current module;
+definitions of (variables and) shared functions (offered by the header file).
+Header files are not compiled directly into machine code, but their content is copied as a whole into all other modules that include them. These other modules (which need them) request a copy of a header file by indicating #include
followed by the header file name. For example:
#include "matrix.h
+
+in a source file that requires matrices.
+This copy is made by the compiler when compiling the module requesting the inclusion.
+[ Note: the inclusion of "local" files (specific to our application) is written with double quotation marks (e.g. #include "matrice.h"
), whereas the inclusion of standard libraries is written with "angle brackets" (e.g. #include <stdio.h>
)
+]
Compiling a program consists of two main stages:
+the actual compilation stage:
+.o
);the "linking" stage:
+Let's take a look at two examples.
+The sum_odd.c
file provided in done/ex_single
is a (single) source file containing the code to request a positive number n
and then calculate the sum of n
first odd numbers.
The program starts with a #include <stdio.h>
directive which requests the inclusion (= copying) of standard definitions (std
) for input-output (io
), such as printf()
.
Try following the steps illustrated in the image below:
+ +These steps are automatically performed (transparently) when you compile an IDE. +But, in order to understand well, let's do them step by step.
+First, we'll create the object "files" (here, only one) using the following command:
+gcc -c sum_odd.c -o sum_odd.o
+
+The -c
option tells the compiler not to perform linking, but only compilation (hence the c
as "compile").
This option is followed by the name of the file from which you want to create the object file, then the name you want for the object file in question (the -o
option means "output").
Run this command and check that the object file is actually present in the directory. Don't try to read or open it - it's machine code!
+Next, you need to link the object files. And here, there are already several of them, unbeknownst to you: the one created from our source file and those of the standard libraries used, which are automatically linked by the compiler without our having to name them explicitly.
+To make these links, we simply use the following command:
+gcc -o sum_odd sum_odd.o
+
+Once again, the -o
option followed by the name of the desired file (in our example, the file is called odd_sum
) is used to create the executable program with that name. Note that you can put this option and its associated file name wherever you like in the command (here we've put them first, whereas in the previous example, compiling, we put them last).
Then we need to specify the files to be linked together to create the executable program. In our example, all we need to do is specify our only sum_odd.o
(as standard libraries are linked automatically).
Check that the executable program has been successfully created and run it from the terminal by typing:
+./sum_odd
+
+A large program is usually broken down into several modules. In addition to bringing clarity to the program organization, this technique (known as "modular design") enables +the reuse of elements (modules) for different programs (for example, one module for matrices, another for "ask for a number", etc.).
+Let's take a look at how such programs are produced.
+In the done/ex_multiples
directory, you'll find five source files and four header files.
Look at the contents of all the files and try to reconstruct the dependencies illustrated below:
+ +To create such a program, you must first compile all .c
files into object files:
gcc -c array_filter.c
+gcc -c array_sort.c
+gcc -c array_std.c
+gcc -c swap.c
+gcc -c main.c
+
+And then produce the executable (called selection_sort
in our example):
gcc -o selection_sort array_filter.o array_sort.o array_std.o swap.o main.o
+
+Create the executable as described above (tedious, isn't it? We'll come back to that in the next section), then run it. Its purpose is to sort, using the "selection sort" algorithm, an array of integers, whose size and range of values are given by the user.
+What happens if, by mistake or indirectly, the same module header is included several times?
+For example, have you ever tried to include a "#include <stdio.h>
" twice in one of your programs?
If .h
files are not protected against multiple inclusions, the compiler may refuse to compile, for example because of redefinition of a type already defined in the first inclusion.
It is therefore necessary to protect your .h
files against multiple inclusions by starting them with the line:
#pragma once
+
+
+This must be the very first line of your .h
files.
make
In the case of large (modular) programs, compiling and linking can become +tedious (perhaps you've already found it to be the case for just 5 modules...): you have to compile each module ("separate compilation") in its own object file, +then "link" all the object files produced.
+And since it's highly likely that several modules will themselves make +call upon other modules, a modification to one of the modules may require +to recompile not only the modified module, but also those that depend on it, recursively, and of course the final executable.
+The make
tool enables you to automate the sequence of commands
+that are dependent on each other. It can be used for many purposes, but its primary use (and the one we're interested in here)
+is the compilation of (executable) programs from
+source files. Benefits:
you don't have to do it by hand;
+it recompiles only what is strictly necessary.
+To use make
, all you have to do is write a few simple rules describing the project's various dependencies in a simple text file named Makefile
(or makefile
).
Let's see how this tool is presented to us, in its manual:
+man make
+
+(Don't read everthing! Just an overview to get an idea what it is about.)
+A Makefile
is essentially made up of rules, which define, for a given target,
all the dependencies of the target (i.e. the elements on which the target depends),
+as well as the set of commands to be performed to update the target (from its dependencies).
+It's a bit like a list of recipes:
+"rule" = recipe;
+"target" = result (e.g. chocolate cake);
+"dependencies" = ingredients (e.g. flour, eggs, chocolate, sugar, butter);
+"commands" = instructions for making the recipe.
+But we're not cooking here. If we illustrate these concepts with the previous example (program selection_sort
), we'd have, for example a rule for linking (program selection_sort
), another rule for compiling array_sort.c
(into array_sort.o
), and so on.
For the linking rule, we'd have:
+target: selection_sort
;
dependencies: array_filter.o
, array_sort.o
, array_std.o
, swap.o
and main.o
.
all these .o
files must exist to produce the selection_sort
executable;
command: the linking command used above.
+For the array_sort.c
compilation rule, we would have:
target: array_sort.o
;
dependencies: array_sort.c
, swap.h
, array_filter.h
(see previous figure, which shows the dependencies);
command: gcc -c array_sort.c
.
The general syntax of a rule is:
+target: dependencies
+[tab]command 1
+[tab]command 2
+
+where:
+target is most often the name of a file that will be generated
+by the commands (the executable program, object files,
+etc.), but it can also represent a "fileless" target, such as
+install
or clean
;
dependencies are the prerequisites for the target to be achievable, +usually the files on which the target depends (e.g. declaration files +like header files), but they can also be rules (e.g. +name of the target of another rule);
+to specify several dependencies, simply separate them with a space; a rule may also have no dependencies;
+if a dependency occurs several times in the same rule, only the first occurrence is taken into account by make
;
the commands are the actions that make
must undertake to
+update the target; they are one or several shell commands;
we have one command per line, and group the commands +related to a target below the dependency line;
+a special syntax feature is that each command line must begin with the tabulation character ("TAB" key), and NOT spaces; this is certainly the most archaic and enoying aspect of make
!
It is possible to omit commands for a target; +then either a default rule applies, or nothing at all +(which might be useful simply for forcing dependencies/checks).
+In fact, make
has a number of implicit rules (typically for compilation), so we don't have to write too many things, as we'll see below.
Another good news is that you can automatically generate a list of all dependencies using the -MM
option in gcc
:
gcc -MM *.c
+
+Try it out! You should immediately see the link between the list of all dependencies. It's very handy to put them at the end of your Makefile
.
Note that the order of the rules is not important, except when determining the default target (i.e. when the user types make
on its own, without any arguments: the first rule is then launched; otherwise, simply type make target
on the command line).
The simplest example of Makefile
is... ...an empty file!
Thanks to its implicit rules, make
already knows how to
+do(=make) lots of things without you having to write anything.
(in done/ex_single
) Delete the files sum_odd.o
and sum_odd
and run make
like this:
make sum_odd
+
+All done. Great!
+make
"knows" that to make an X
file from a X.c
source file, you need to call the C compiler.
If you wanted to write a Makefile
to do this, you could have written (try it!):
sum_odd: sum_odd.c
+
+and that's it!
+The target here is the sum_odd
executable and its dependency, unique here, the sum_odd.c
source file.
This Makefile
does not specify any commands to be executed. It simply uses the default commands known to make
.
Would we want to make the command more explicit (but why?), a more complete Makefile
would have been:
sum_odd: sum_odd.c
+ gcc -o sum_odd sum_odd.c
+
+where the command to switch from the dependency to the target is made explicit (preceded by an TAB
character).
Let's try to write a completely artificial Makefile
:
all: dep1 dep2
+ @echo "target 'all' completed."
+
+dep1:
+ @echo "dependency 1 completed."
+
+dep2:
+ @echo "dependency 2 ok..."
+
+dep3:
+ echo "banzai!"
+
+(You can either add these lines to the Makefile
written for sum_odd
if you tried the exercise above, or now create a Makefile
file with the above lines).
If you simply type the command
+make
+
+you get:
+dependency 1 completed.
+dependency 2 ok...
+target 'all' completed.
+
+In this example, make
is called on its own, with no indication of a particular target.
+make
will thus search the Makefile
for the first acceptable target, in this case all
.
+(There are particular targets that are not acceptable as default targets, but this is beyond the scope of this
+introduction.)
The rule for this target specifies two dependencies, dep1
and dep2
, which don't exist (they don't correspond to any existing files); make
will thus attempt to create them successively.
Since dep1
has no dependencies, make
immediately proceeds
+to executing the commands accompanying the target, i.e.
+display a message on the terminal (using the echo
command).
The same applies to the second dependency (dep2
).
Once all dependencies have been realized, make
returns to the
+the initial target, all
, the build commands of which gets executed.
If we now type the command
+make dep3
+
+you get:
+echo "banzai!"
+banzai!
+
+In this example, the target dep3
is specified as the goal when
+invocating make
. This target has no dependencies;
+make
thus directly executes the build commands for this target
+(displaying the string "banzai!
").
Let's note a slight difference in behavior between our two examples: in the first case, the target is created by executing the commands directly, whereas in the second case, make
first displays the command it will execute ("echo "banzai!"
").
The reason for this behavior lies in the @
character preceding the command in the first case, and absent in the second.
+By default, make
first displays the commands it will execute before actually calling it.
+To suppress this automatic display simply prefix the command with the @
character.
Tip: always let make
display the commands it is supposed to do
+(especially compilations), except for pure display commands, such as echo
.
Makefile
That's all interesting, but what use is it "in real life", since we've seen that with the default implicit rules we don't need to write anything?
+Sure! But in more complex projects, the default rules are no longer sufficient.
Let's say we've a program to implement a calculator for complex numbers, splited into modules as follows:
+in addition to the standard library, we have a graphics library, LibGraph
, with its header file, libgraph.h
, and a library file libgraph.so
;
modeling of complex numbers and their arithmetic, with its header file complex.h
+and its implementation file complexe.c
;
calculator modeling (basic functions, memory, parenthesis, etc.), with its header file
+calculator.h
, which depends on complexe.h
, and source file calculator.c
(no dependency);
modeling of the calculator's graphical interface, with calcGUI.h
, dependent on calculator.h
and libgraph.h
, and calcGUI.c
;
the main program (containing the main()
function), provided as calculCplx.c
file, which depends on calcGUI.h
;
each source code (.c
) also depends on its header file (.h
).
Here's an illustration:
+ +To write the corresponding Makefile
, all we have to do is to add
a target for each module, i.e. one target for each object file resulting from compilation of the source file;
+and another one to link the whole into an executable program.
+The dependencies of each of these targets are all the files it depends on (!). +But we only consider dependencies that can be modified as part of our project. +We can therefore ignore dependencies on the graphics library, for example, just as we ignore dependencies on any other standard library.
+These dependencies can be automatically generated using the command
+ gcc -MM *.c
+
+All we have to do is to copy its result into our Makefile
.
The build commands are, of course, the compilation instruction;
+but we don't need to explicitely write it, as we have seen above: make
has default commands which are perfectly fine in this case.
The only build command that needs to be specified is the "linking" command, which puts all the object files together to form the final executable. This is because the default linking rule will not make use of the required libgraph
library.
A possible Makefile
could therefore be:
all: calculCplx
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ gcc -o calculCplx calculCplx.o complexe.o calculatrice.o calcGUI.o -lgraph
+
+ # These lines have been copied from gcc -MM *.c
+ complex.o: complex.c complex.h
+ calculatrice.o: calculatrice.c calculatrice.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+With such a Makefile
, our project can be compiled using the make
+command alone, as the first target, the all
target, here is an alias for the calculCplx
target.
To build this target, make
must first build the targets indicated as dependencies (the set of object files
+files).
Note that make
will only (re)construct a target
+if at least one of its dependencies is more recent than the target itself.
+It is this mechanism that enables make
to compile
+only what is strictly necessary. So, if you run the
+the make
command a second time, after the first compile
+compilation, the program will report:
make: Nothing to be done for `all'.
+
+which means there's nothing new to be done! Everything is up to date.
+Similarly, if you were to modify only the file complex.c
file, the make
command would only lead to the
+recompilation of the latter (creation of the target complexe.o
, since it's one of its dependencies),
+an the linker command, which in turn updates the target calculCplx
(for the same reason as above).
If, on the other hand, the complexe.h
file is modified, the targets complex.o
, calculator.o
and calculCplx
will be updated.
Finally, it should be noted that some libraries, particularly our own, must be specified when linking: this is the case, for example, the graph
library. This is done by adding the -lgraph
option to the end of the linker command; thus the reason for having to write the build command explicitely.
In the done/ex_multiples
directory, create a Makefile
to compile the selection_sort
program described above.
Test it.
+There's a slight subtlety here: there's no selection_sort.c
, but the main()
function is in main.c
. This is simply to make you write a rule once (instead of using the default rule). Obviously, main.c
would "normally" be called selection_sort.c
. But you're not allowed to rename this file (or make a symbolic link;-)`).
That's pretty much about the basics. The rest of this document described more advanced stuff, not strictly necessary for you, but can be useful if you want to go further than the bare minimum.
+And if you'd prefer a more "classroom" video/presentation on the subject of separate compilation and Makefile
, here's a few lecture videos (52 min.).
If what has been presented here is enough for you (you've already spent enough time), you can simply continue this week's series where you left it.
+What has been presented so far is sufficient to enable you
+to write a functional Makefile
; however, as the previous
+example show, writing a functional Makefile
may relatively
+tedious. The information in this section will enable you to
+considerably increase the expressive power of the Makefile
instructions, making them
+easier to write.
To make writing Makefiles
easier (and more concise), you can define and use
+variables (actually, they're more like macro-commands, but who cares?)
The general syntax for defining a variable in a Makefile
is:
NAME = value(s)
+
+(or its more advanced variants +=
, :=
, ::=
, ?=
)
+where:
NAME
: the name of the variable you wish to define; this name must not contain the following
+characters :
, #
or =
, nor accented letters; the use of characters other than letters, numbers or
+numbers or underscores is strongly discouraged;
variable names are case-sensitive;
+value(s)
: a list of strings, separated by spaces.
Example:
+RUBS = *.o *~ *.bak
+
+Note also that for GNU make
(also called gmake
), the following syntax
+can be used to add one or more elements to the list
+of values already associated with a variable:
NAME += value(s)
+
+To use a variable (i.e. to substitute it for the list of values
+associated with it), simply enclose the variable name in parentheses, preceded by the $
sign:
$(NAME)
+
+Example:
+-@$(RM) $(RUBS)
+
+which, with the above definition of RUBS
, deletes all *.o
, *~
and *.bak
files; the RM
variable is one of the predefined variables in make
(remove the @
to see the command actually executed).
Note: These variables can be redefined when calling make
; e.g.:
make LDLIBS=-lm ma_target
+
+redefines the LDLIBS
variable.
Suppose we want to systematically specify a certain number of options to the compiler; e.g.
+to enable the use of a debugger (-g
), to force a level 2 optimization of the compiled code
+(-O2
), and to make the compiler stricly comply the C17 standard (-std=c17 -pedantic
).
Rather than adding each of these options to every compile command (and having to re-modify everything when we want to change those options), it would be wiser to use a variable (for example CFLAGS
, which is the default name used by
+make
) to store the options to be passed on to the compiler. Our Makefile
would then become:
CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+
+ all: calculCplx
+
+ calculCplx: calculCplx.o complexe.o calculatrice.o calcGUI.o
+ gcc -o calculCplx calculCplx.o complexe.o calculatrice.o calcGUI.o -lgraph
+
+ # These lines have been copied from gcc -MM *.c
+ complex.o: complex.c complex.h
+ calculatrice.o: calculatrice.c calculatrice.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+It's possible to add comments in a Makefile
(line-oriented, i.e. like the the //...
of C99 or Java), by marking the beginning of the comment with the #
symbol. Note that
+comments in command lines are not removed by make
before its execution by the Shell. For example:
# Here's a comment line
+
+all: dep1 dep2
+ @echo "target 'all' completed."
+
+dep1:
+ @echo "dependency 1 completed."
+
+dep2:
+ @echo "dependency 2 ok..."
+
+dep3: # this target is not built by default
+ echo "banzai!" # comment submitted to Shell
+
+Examples of execution:
+$> make
+
+dependency 1 completed.
+dependency 2 ok...
+target 'all' completed.
+
+$> make dep3
+
+echo "banzai!" # comment submitted to Shell
+
+banzai!
+
+Notice that the # comment submitted to Shell
is indeed passed to the Shell, but since #
is also the comment-character for the Shell, it is considered as a comment by the Shell.
make
automatically maintains a number of predefined variables, updating them as each rule gets executed,
+depending on the target and its dependencies.
These variables include:
+$@
name of the target (file) of the current rule;
$<
list of dependencies as calculated by default make
rules;
$?
list of all dependencies (separated by a space) more recent than the current target (dependencies involving target updates);
$^
[GNU Make] list of all dependencies (separated by a space) on the target; if a dependency occurs several times in the same dependency list, it will only be reported once;
$(CC)
compiler name (C);
$(CPPFLAGS)
precompilation options;
$(CFLAGS)
compiler options;
$(LDFLAGS)
linker* options;
$(LDLIBS)
libraries to be added.
For instance, the calculator's Makefile
could be rewritten as follows (modification of the linker command):
CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+
+ all: calculCplx
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ gcc -o $@ $^ -lgraph
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+As mentioned above, make
has a number of implicit rules (i.e. rules that the user doesn't need to specify), which enable it to "behave" in the presence of a source file without any further instructions.
+For instance, it "knows" how to produce object files from sources in assembly, Fortran, Pascal,
+Modula-2, Yacc, Lex, TeX, ..., and of course C and C++.
For example:
+the target file.o
will be automatically created from the file file.c
by means of an (implicit) command of the form:
$(CC) -c $(CPPFLAGS) $(CFLAGS) -o $@ $<
+
+which can also be simplified to
+ $(COMPILE.c) -o $@ $<
+
+Usually, the CC
variable is associated to the cc
command.
a target file
can be automatically created from the file.o
object file, or from a set of object files (specified in the list of dependencies) of which file.o
is a part, such as x.o file.o z.o
, using a command of the form:
$(CC) $(LDFLAGS) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+a target file
can be automatically created from the file.c
source file, and possibly a set of object files (specified in the list of dependencies), such as y.o z.o
, using a command of the form:
$(CC) $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+which can be simplified to
+ $(LINK.c) -o $@ $< $(LOADLIBES) $(LDLIBS)
+
+Therefore, we can transform our previous Makefile
to make it even more concise, as follows:
CPPFLAGS = -std=c17 -pedantic
+ CPPFLAGS += -O2
+ CPPFLAGS += -g
+
+ all: calculCplx
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+ $(LINK.cpp) -o $@ $^ -lgraph
+
+or even:
+ CFLAGS = -std=c17 -pedantic
+ CFLAGS += -O2
+ CFLAGS += -g
+ LDLIBS = -lgraph
+
+ all: calculCplx
+
+ complex.o: complex.c complex.h
+ calculator.o: calculator.c calculator.h complex.h
+ calcGUI.o: calcGUI.c calcGUI.h calculator.h
+ calculCplx.o: calculCplx.c calcGUI.h
+
+ calculCplx: calculCplx.o complex.o calculator.o calcGUI.o
+
+where we have now completely removed the command associated with the last target (executable production).
+When an element (variable definition, list of target dependencies, commands, ... and even a comment, although this is not recommended) is too long to reasonably fit on one line,
+it is possible to place a line break by telling make
to consider the next line as a continuation of the previous one.
This is achieved by placing the \
character at the end of the line to be extend:
# here's a comment \
+ on two lines
+
+all: dep1 \
+ dep2
+ @echo "target 'all' done"
+
+dep1:
+ @echo "dependency 1 completed"
+
+dep2:
+ @echo "dependency 2 ok..." \
+"indeed!"
+
+Example of execution:
+$> make
+
+dependency 1 completed
+dependency 2 ok... indeed!
+
+target 'all' done
+
+This example shows that clumsy use of this option
+can considerably impair the readability of the Makefile
.
Despite the name of the previous section, we're still a long way off the possibilities of make
.
For those who would like to know even more, don't hesitate to consult +the following references (all external):
+GNU make website](http://www.gnu.org/software/make/)
+The (GNU)make manual, taken from the previous site](http://www.gnu.org/software/make/manual/make.html)
+Finally, please note that there are many more modern redesigns of
+development project management tools (CMake, SCons, GNU autotools,tools integrated into IDEs: KDevelop, Anjunta,
+NetBeans, Code::Blocks, ...), but we feel that a good knowledge of the
+make
is a real bonus to your programmer CV.
Click here to be redirected.
diff --git a/tutorials/vmvb/index.html b/tutorials/vmvb/index.html new file mode 100644 index 0000000..eebbc19 --- /dev/null +++ b/tutorials/vmvb/index.html @@ -0,0 +1,297 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +L'installation de VirtualBox sur votre machine est normalement assez facile. Voir leur site Web.
+A noter que certaines machines à processeur Intel (genre HP, Lenovo, etc.) peuvent nécessiter la modification du paramètre « Intel Virtualization » (ou mot similaire) dans le BIOS. Commencez par l'installation indiquée et si nécessaire (message de VirtualBox) redémarrer pour aller modifier le BIOS.
+Téléchargez une image Ubuntu LTS 64 bits depuis leur site de téléchargement.
+Démarrez VirtualBox et créez une nouvelle machine:
+Intel Virtualizaton
est désactivée dans le BIOS)Une fois la nouvelle machine créée, avant de la lancer, « chargez » lui l'image ISO Ubuntu précédemment téléchargée :
+Une fois le « cdrom » ISO « chargez » dans la machine virtuelle (étape précédente), démarrez la et suivez simplement les instructions.
+Une fois l'installation terminée, redémarrez la machine vituelle et
+faite une mise à jour, soit par l'outil de mise à jour (software updater), soit « à la main » dans un terminal :
+sudo apt update
+sudo apt upgrade -y
+
+installez les « Additions invitées » (« Guest Additions ») ; elles vous permettrons d'avoir une meilleure intégration de votre machine vituelle dans votre machine réelle (redimensionnement d'écran, copié-collé de l'une à l'autre, accès au disque local dans la VM, ...) :
+Redémarrez la machine vituelle
+Pour finir, installez les outils nécessaires pour le cours :
+ sudo apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+ sudo apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Vous pouvez maintenant compiler votre projet sur votre VM, soit en accèdant à un disque local (Périphérique -> Dossiers partagés -> Règlages des dossiers partagés), soit en clonant votre dépôt GitHub.
+ +L'installation de VMWare (Fusion sur OSX, Worksation Player sur Windows ou Linux) est normalement assez facile.
+Depuis 2021, VMware n'est plus disponible sous license générale pour les étudiants de l'EPFL. L'utilisation de Virtualbox (gratuit) est recommandée.
+A noter que certaines machines à processeur Intel (genre HP, Lenovo, etc.) peuvent nécessiter la modification du paramètre « Intel Virtualization » (ou mot similaire) dans le BIOS. Commencez par l'installation indiquée et si nécessaire (message de VMWare) redémarrer pour aller modifier le BIOS.
+Téléchargez une image Ubuntu LTS depuis leur site de téléchargement.
+Démarrez VMWare Fusion ;
+File -> New -> Install from disk or image ;
+Maintenant vous pouvez choisir l'image ISO Ubuntu précédemment téléchargée sur votre disque ;
+Sélectionnez Easy Install, entrez le mot de passe, et cochez la check-box pour partager les fichiers avec votre ordinateur « host » ;
+Sélectionnez « Customize Settings » avec les option suivantes :
+Après l'instalation (ça peut prendre un peu de temps), vous pouver définir la plus haute résolution de votre écran dans la VM (System settings -> Display).
+Assurez vous que le clavier soit correctement configuré (System settings -> Region&Language -> Input Sources)
+Faites une mise à jour, soit par l'outil de mise à jour (software updater), soit « à la main » dans un terminal :
+ sudo apt update
+ sudo apt upgrade -y
+
+Assurez vous que les dossier du 'host' soit visible dans le directory '/mnt/hgfs'. Si ce n'est pas le cas, suivre les instructions ici.
+Pour finir, installez les outils nécessaires pour le cours :
+ apt install build-essential clang check wdiff colordiff git openssh-client manpages manpages-dev doxygen curl
+ apt install libssl-dev libssl-doc libcurl4-openssl-dev libjson-c-dev
+
+Vous pouvez maintenant compiler votre projet sur votre VM, soit en accèdant à un disque local (Périphérique -> Dossiers partagés -> Règlages des dossiers partagés), soit en clonant votre dépôt GitHub.
+ +