From 7fb5839fdb8d1e0107ee56170bf77df444ab7710 Mon Sep 17 00:00:00 2001 From: Mats Wichmann Date: Wed, 31 Jan 2024 13:55:45 -0700 Subject: [PATCH 1/2] Fix scanner examples in User Guide [skip appveyor] Examples (except for the ones which are intentionally fragments) can now be run, and they are set to do so via the markup. More explanation added. Syntax errors corrected. Also reworded and expanded on some parts of the Scanner Objects section of the manpage. Fixes #4468 Signed-off-by: Mats Wichmann --- CHANGES.txt | 2 + RELEASE.txt | 6 +- .../examples/scanners_builders_1.xml | 5 + doc/generated/examples/scanners_scan_1.xml | 3 + doc/generated/examples/scanners_scan_foo.k | 5 + doc/man/scons.xml | 151 ++++++++---- doc/user/scanners.xml | 221 +++++++++++++----- 7 files changed, 281 insertions(+), 112 deletions(-) create mode 100644 doc/generated/examples/scanners_builders_1.xml create mode 100644 doc/generated/examples/scanners_scan_1.xml create mode 100644 doc/generated/examples/scanners_scan_foo.k diff --git a/CHANGES.txt b/CHANGES.txt index 358275b0e0..44a458a8f7 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -72,6 +72,8 @@ RELEASE VERSION/DATE TO BE FILLED IN LATER - Be more cautious about encodings fetching command output on Windows. Problem occurs in piped-spawn scenario, used by Configure tests. Fixes #3529. + - Clarify/fix documentation of Scanners in User Guide and Manpage. + Fixes #4468. RELEASE 4.6.0 - Sun, 19 Nov 2023 17:22:20 -0700 diff --git a/RELEASE.txt b/RELEASE.txt index 88731939bf..2952d4f396 100644 --- a/RELEASE.txt +++ b/RELEASE.txt @@ -73,9 +73,9 @@ PACKAGING DOCUMENTATION ------------- -- List any significant changes to the documentation (not individual - typo fixes, even if they're mentioned in src/CHANGES.txt to give - the contributor credit) +- Fixed the Scanner examples in the User Guide to be runnable and added + some more explantion. Clarified discussion of the scanner function in + the Scanner Objects section of the manpage. DEVELOPMENT ----------- diff --git a/doc/generated/examples/scanners_builders_1.xml b/doc/generated/examples/scanners_builders_1.xml new file mode 100644 index 0000000000..12d935647c --- /dev/null +++ b/doc/generated/examples/scanners_builders_1.xml @@ -0,0 +1,5 @@ +% scons -Q +DEBUG: scan of 'file.input' found ['other_file'] +DEBUG: scanned dependencies found: ['inc/other_file'] +build_function(["file.k"], ["file.input"]) + diff --git a/doc/generated/examples/scanners_scan_1.xml b/doc/generated/examples/scanners_scan_1.xml new file mode 100644 index 0000000000..035ae52f26 --- /dev/null +++ b/doc/generated/examples/scanners_scan_1.xml @@ -0,0 +1,3 @@ +% scons -Q +scons: *** [foo] Implicit dependency `other_file' not found, needed by target `foo'. + diff --git a/doc/generated/examples/scanners_scan_foo.k b/doc/generated/examples/scanners_scan_foo.k new file mode 100644 index 0000000000..1e3d804124 --- /dev/null +++ b/doc/generated/examples/scanners_scan_foo.k @@ -0,0 +1,5 @@ + +some initial text +include other_file +some other text + diff --git a/doc/man/scons.xml b/doc/man/scons.xml index 6808cd2b3f..d4b52d1d66 100644 --- a/doc/man/scons.xml +++ b/doc/man/scons.xml @@ -5634,7 +5634,7 @@ may be a string or a list of strings. - + target_scanner A Scanner object that @@ -5652,7 +5652,7 @@ for information about creating Scanner objects. - + source_scanner A Scanner object that @@ -7243,11 +7243,12 @@ the rest are optional: function -A scanner function to call to process +A function which can process ("scan") a given Node (usually a file) and return a list of Nodes -representing the implicit -dependencies (usually files) found in the contents. +representing any implicit +dependencies (usually files) which will be tracked +for the Node. The function must accept three required arguments, node, env and @@ -7260,52 +7261,51 @@ the internal &SCons; node representing the file to scan, the scan, and path is a tuple of directories that can be searched for files, as generated by the optional scanner -path_function (see below). -If argument was supplied when the Scanner -object was created, it is given as arg -when the scanner function is called; since argument -is optional, the default is no arg. +. +If the +parameter was supplied when the Scanner object was created, +it is passed as the arg parameter +to the scanner function when it is called. +Since argument is optional, +the scanner function may be +called without an arg parameter. + -The function can use use +The scanner function can make use of str(node) to fetch the name of the file, -node.dir +node.dir to fetch the directory the file is in, -node.get_contents() +node.get_contents() to fetch the contents of the file as bytes or -node.get_text_contents() +node.get_text_contents() to fetch the contents of the file as text. + -The function must take into account the path -directories when generating the dependency Nodes. To illustrate this, -a C language source file may contain a line like -#include "foo.h". However, there is no guarantee -that foo.h exists in the current directory: -the contents of &cv-link-CPPPATH; is passed to the C preprocessor which -will look in those places for the header, -so the scanner function needs to look in those places as well -in order to build Nodes with correct paths. -Using &f-link-FindPathDirs; with an argument of CPPPATH -as the path_function in the &f-Scanner; call -means the scanner function will be called with the paths extracted -from &cv-CPPPATH; in the environment env -passed as the paths parameter. - - -Note that the file to scan is -not -guaranteed to exist at the time the scanner is called - -it could be a generated file which has not been generated yet - -so the scanner function must be tolerant of that. +The scanner function should account for any directories +listed in the path parameter +when determining the existence of possible dependencies. +External tools such as the C/C++ preprocessor are given +lists of directories to search for source file inclusion directives +(e.g. #include "myheader.h"). +That list is generated from the relevant path variable +(e.g. &cv-link-CPPPATH; for C/C++). The Scanner can be +directed to pass the same list on to the scanner function +via the path parameter so it can +search in the same places. +The Scanner is enabled to pass this list via the + argument at Scanner creation time. + -Alternatively, you can supply a dictionary as the -function parameter, -to map keys (such as file suffixes) to other Scanner objects. +Instead of a scanner function, you can supply a dictionary as the +function parameter. +The dictionary must map keys (such as file suffixes) +to other Scanner objects. A Scanner created this way serves as a dispatcher: -the Scanner's skeys parameter is +the Scanner's parameter is automatically populated with the dictionary's keys, indicating that the Scanner handles Nodes which would be selected by those keys; the mapping is then used to pass @@ -7313,6 +7313,46 @@ the file on to a different Scanner that would not have been selected to handle that Node based on its own skeys. + + +Note that the file to scan is +not +guaranteed to exist at the time the scanner is called - +it could be a generated file which has not been generated yet - +so the scanner function must be tolerant of that. + + + +While many scanner functions operate on source code files by +looking for known patterns in the code, they can really +do anything they need to. +For example, the &b-link-Program; Builder is assigned a + which examines the +list of libraries supplied for the build (&cv-link-LIBS;) +and decides whether to add them as dependencies, +it does not look inside the built binary. + + + +It is up to the scanner function to decide whether or not to +generate an &SCons; dependency for candidates identified by scanning. +Dependencies are a key part of &SCons; operation, +enabling both rebuild determination and correct ordering of builds. +It is particularly important that generated files which are +dependencies are added into the Node graph, +or use-before-create failures are likely. +However, not everything may need to be tracked as a dependency. +In some cases, implementation-provided header files change +infrequently but are included very widely, +so tracking them in the &SCons; node graph could become quite +expensive for limited benefit - +consider for example the C standard header file +string.h. +The scanner function is not passed any special information +to help make this choice, so the decision making encoded +in the scanner function must be carefully considered. + + @@ -7325,7 +7365,7 @@ The default value is "NONE". - + argument If specified, @@ -7339,7 +7379,7 @@ as the optional parameter each of those functions takes. - + skeys Scanner key(s) indicating the file types @@ -7355,10 +7395,13 @@ it will be expanded into a list by the current environment. - + path_function -A Python function that takes four or five arguments: + +If specified, a function to generate paths to pass to +the scanner function to search while generating dependencies. +The function must take five arguments: a &consenv;, a Node for the directory containing the &SConscript; file in which @@ -7366,16 +7409,28 @@ the first target was defined, a list of target nodes, a list of source nodes, and the value of argument -if it was supplied when the Scanner was created. +if it was supplied when the Scanner was created +(since argument is optional, +the function may be called without this argument, +the path_function +should be prepared for this). Must return a tuple of directories that can be searched for files to be returned by this Scanner object. -(Note that the -&f-link-FindPathDirs; -function can be used to return a ready-made + + + +The &f-link-FindPathDirs; +function can be called to return a ready-made path_function for a given &consvar; name, -instead of having to write your own function from scratch.) +which is often easier than writing your own function from scratch. +For example, +path_function=FindPathDirs('CPPPATH') +means the scanner function will be called with the paths extracted +from &cv-CPPPATH; in the &consenv; env, +and passed as the path parameter +to the scanner function. diff --git a/doc/user/scanners.xml b/doc/user/scanners.xml index ac0b84fc4f..aec4cac83d 100644 --- a/doc/user/scanners.xml +++ b/doc/user/scanners.xml @@ -141,14 +141,14 @@ over the file scanning rather than being called for each input line: -
+
A Simple Scanner Example Suppose, for example, that we want to create a simple &Scanner; - for .foo files. - A .foo file contains some text that + for .k files. + A .k file contains some text that will be processed, and can include other files on lines that begin with include @@ -157,7 +157,7 @@ over the file scanning rather than being called for each input line: -include filename.foo +include filename.k @@ -175,7 +175,7 @@ import re include_re = re.compile(r'^include\s+(\S+)$', re.M) -def kfile_scan(node, env, path, arg): +def kfile_scan(node, env, path, arg=None): contents = node.get_text_contents() return env.File(include_re.findall(contents)) @@ -184,7 +184,8 @@ def kfile_scan(node, env, path, arg): It is important to note that you have to return a list of File nodes from the scanner function, simple - strings for the file names won't do. As in the examples we are showing here, + strings for the file names won't do. + As in the examples we are showing here, you can use the &f-link-File; function of your current &consenv; in order to create nodes on the fly from a sequence of file names with relative paths. @@ -194,7 +195,7 @@ def kfile_scan(node, env, path, arg): The scanner function must - accept the four specified arguments + accept three or four specified arguments and return a list of implicit dependencies. Presumably, these would be dependencies found from examining the contents of the file, @@ -258,9 +259,22 @@ def kfile_scan(node, env, path, arg): - An optional argument that you can choose to - have passed to this scanner function by - various scanner instances. + An optional argument that can be passed + to this scanner function when it is called from + a scanner instance. The argument is only supplied + if it was given when the scanner instance is created + (see the manpage section "Scanner Objects"). + This can be useful, for example, to distinguish which + scanner type called us, if the function might be bound + to several scanner objects. + Since the argument is only supplied in the function + call if it was defined for that scanner, the function + needs to be prepared to possibly be called in different + ways if multiple scanners are expected to use this + function - giving the parameter a default value as + shown above is a good way to do this. + If the function to scanner relationship will be 1:1, + just make sure they match. @@ -286,7 +300,13 @@ env.Append(SCANNERS=kscan) - When we put it all together, it looks like: + Let's put this all together. + Our new file type, with the .k suffix, + will be processed by a command named kprocess, + which lives in non-standard location + /usr/local/bin, + so we add that path to the execution environment so &SCons; + can find it. Here's what it looks like: @@ -302,17 +322,22 @@ def kfile_scan(node, env, path): return env.File(includes) kscan = Scanner(function=kfile_scan, skeys=['.k']) -env = Environment(ENV={'PATH': '__ROOT__/usr/local/bin'}) +env = Environment() +env.AppendENVPath('PATH', '__ROOT__/usr/local/bin') env.Append(SCANNERS=kscan) env.Command('foo', 'foo.k', 'kprocess < $SOURCES > $TARGET') - + +some initial text include other_file +some other text - -other_file + @@ -321,30 +346,33 @@ cat - -
-
+
Adding a search path to a Scanner: &FindPathDirs; @@ -352,14 +380,26 @@ cat If the build tool in question will use a path variable to search for included files or other dependencies, then the &Scanner; will need to take that path variable into account as well - - &cv-link-CPPPATH; and &cv-link-LIBPATH; are used this way, - for example. The path to search is passed to your - Scanner as the path argument. Path variables - may be lists of nodes, semicolon-separated strings, or even - contain &consvars; which need to be expanded. - &SCons; provides the &f-link-FindPathDirs; function which returns - a callable to expand a given path (given as a SCons &consvar; - name) to a list of paths at the time the Scanner is called. + the same way &cv-link-CPPPATH; is used for files processed + by the C Preprocessor (used for C, C++, Fortran and others). + Path variables may be lists of nodes or semicolon-separated strings + (&SCons; uses a semicolon here irrespective of + the pathlist separator used by the native operating system), + and may contain &consvars; to be expanded. + A Scanner can take a path_function + to process such a path variable; + the function produces a tuple of paths that is passed to the + scanner function as its path parameter. + + + + + + To make this easy, + &SCons; provides the premade &f-link-FindPathDirs; + function which returns a callable to expand a given path variable + (given as an &SCons; &consvar; name) + to a tuple of paths at the time the Scanner is called. Deferring evaluation until that point allows, for instance, the path to contain &cv-link-TARGET; references which differ for each file scanned. @@ -368,37 +408,56 @@ cat - Using &FindPathDirs; is quite easy. Continuing the above example, - using KPATH as the &consvar; with the search path + Using &FindPathDirs; is easy. Continuing the above example, + using $KPATH as the &consvar; to hold the paths (analogous to &cv-link-CPPPATH;), we just modify the call to - the &f-link-Scanner; factory function to include a path keyword arg: + the &f-link-Scanner; factory function to include a + path_function keyword argument: -kscan = Scanner(function=kfile_scan, skeys=['.k'], path_function=FindPathDirs('KPATH')) +kscan = Scanner( + function=kfile_scan, + skeys=['.k'], + path_function=FindPathDirs('KPATH'), +) - &FindPathDirs; returns a callable object that, when called, will - essentially expand the elements in env['KPATH'] - and tell the Scanner to search in those dirs. It will also properly - add related repository and variant dirs to the search list. As a side - note, the returned method stores the path in an efficient way so + &FindPathDirs; is called when the Scanner is created, + and the callable object it returns is stored + as an attribute in the scanner. + When the scanner is invoked, it calls that object, + which processes the $KPATH from the + current &consenv;, doing necessary expansions and, + if necessary, adds related repository and variant directories, + producing a (possibly empty) tuple of paths + that is passed on to the scanner function. + The scanner function is then responsible for using that list + of paths to locate the include files identified by the scan. + The next section will show an example of that. + + + + + + As a side note, + the returned method stores the path in an efficient way so lookups are fast even when variable substitutions may be needed. This is important since many files get scanned in a typical build. +
-
+
Using scanners with Builders - One approach for introducing a &Scanner; into the build is in conjunction with a &Builder;. There are two relvant optional parameters we can use when creating a Builder: @@ -407,20 +466,32 @@ kscan = Scanner(function=kfile_scan, skeys=['.k'], path_function=FindPathDirs('K source_scanner is used for scanning source files, and target_scanner is used for scanning the target once it is generated. - -import re - -include_re = re.compile(r'^include\s+(\S+)$', re.M) - -def kfile_scan(node, env, path, arg): - contents = node.get_text_contents() - return env.File(include_re.findall(contents)) - -kscan = Scanner(function=kfile_scan, skeys=['.k'], path_function=FindPathDirs('KPATH') +import os, re + +include_re = re.compile(r"^include\s+(\S+)$", re.M) + +def kfile_scan(node, env, path, arg=None): + includes = include_re.findall(node.get_text_contents()) + print(f"DEBUG: scan of {str(node)!r} found {includes}") + deps = [] + for inc in includes: + for dir in path: + file = str(dir) + os.sep + inc + if os.path.exists(file): + deps.append(file) + break + print(f"DEBUG: scanned dependencies found: {deps}") + return env.File(deps) + +kscan = Scanner( + function=kfile_scan, + skeys=[".k"], + path_function=FindPathDirs("KPATH"), +) def build_function(target, source, env): # Code to build "target" from "source" @@ -428,15 +499,43 @@ def build_function(target, source, env): bld = Builder( action=build_function, - suffix='.foo', + suffix=".k", source_scanner=kscan, - src_suffix='.input', + src_suffix=".input", ) -env = Environment(BUILDERS={'Foo': bld}) -env.Foo('file') + +env = Environment(BUILDERS={"KFile": bld}, KPATH="inc") +env.KFile("file") + + +some initial text +include other_file +some other text + + +text to include + + Running this example would only show that the stub + build_function is getting called, + so some debug prints were added to the scaner function, + just to show the scanner is being invoked. + + + + scons -Q + + + + The path-search implementation in + kfile_scan works, + but is quite simple-minded - a production scanner + will probably do something more sophisticated. + + + An emitter function can modify the list of sources or targets @@ -448,7 +547,7 @@ env.Foo('file') A scanner function will not affect the list of sources or targets seen by the Builder during the build action. The scanner function - will however affect if the Builder should rebuild (if any of + will, however, affect if the Builder should rebuild (if any of the files sourced by the Scanner have changed for example). From a30f12fa49e05dd99a370213dd8650c19d77b6ec Mon Sep 17 00:00:00 2001 From: Mats Wichmann Date: Fri, 2 Feb 2024 07:35:49 -0700 Subject: [PATCH 2/2] Undo unintended change in Scanners chapter [skip appveyor] Signed-off-by: Mats Wichmann --- doc/user/scanners.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user/scanners.xml b/doc/user/scanners.xml index aec4cac83d..c3f3294e03 100644 --- a/doc/user/scanners.xml +++ b/doc/user/scanners.xml @@ -195,7 +195,7 @@ def kfile_scan(node, env, path, arg=None): The scanner function must - accept three or four specified arguments + accept the four specified arguments and return a list of implicit dependencies. Presumably, these would be dependencies found from examining the contents of the file,