Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression tests fail if /etc/tidy.conf or ~/.tidyrc exists #778

Closed
ler762 opened this issue Nov 21, 2018 · 10 comments
Closed

regression tests fail if /etc/tidy.conf or ~/.tidyrc exists #778

ler762 opened this issue Nov 21, 2018 · 10 comments
Labels

Comments

@ler762
Copy link
Contributor

ler762 commented Nov 21, 2018

Just doing touch ~/.tidyrc or touch /etc/tidy.conf is enough to make some the regression tests fail.

$ ls -l /etc/tidy.conf ~/.tidyrc
ls: cannot access '/etc/tidy.conf': No such file or directory
ls: cannot access '/home/Lee/.tidyrc': No such file or directory
 $ tidy -q -export-config | grep -i indent
indent: no
indent-attributes: no
indent-cdata: no
indent-spaces: 2
indent-with-tabs: no
$ touch ~/.tidyrc
$ tidy -q -export-config | grep -i indent
indent: no
indent-attributes: no
indent-cdata: no
indent-spaces: 0
indent-with-tabs: no

If tidy is going to have multiple init files it needs to wait until all are read before calling AdjustConfig.
Or not call AdjustConfig at all & make the user responsible for specifying a consistent set of options.
Or fix AdjustConfig to not bork the config.
Or .. what?

Not calling AdjustConfig is enough to have all the regression tests pass if /etc/tidy.conf and/or ~/.tidyrc exist:

$ git diff
diff --git a/src/config.c b/src/config.c
index f5e8379..67fc804 100644
--- a/src/config.c
+++ b/src/config.c
@@ -1056,8 +1056,6 @@ int TY_(ParseConfigFileEnc)( TidyDocImpl* doc, ctmbstr file, ctmbstr charenc )
     if ( fname != (tmbstr) file )
         TidyDocFree( doc, fname );

-    AdjustConfig( doc );
-
     /* any new config errors? If so, return warning status. */
     return (doc->optionErrors > opterrs ? 1 : 0);
 }
@geoffmcl
Copy link
Contributor

@ler762 thank you for adding this issue... this takes over from Tests Issue 30...

And showing an easy way to see some of what config has changed, just by calling AdjustConfig( doc );... this goes a long way to explain the test anomalies, like missing indent... thanks...

But it is not just an issue of our regression tests! As expressed elsewhere, they are code testers after all, and should ensure all environment and runtime configs are removed... They should never see the problem... but...

This could yield strange results for any tidy user, anytime... and needs to be addressed. It is a BUG!

Do not see the removal of AdjustConfig( doc ); as the obvious answer...

I have not yet had a chance to study it in detail, but there could be important issues with config consistency that are addressed by this function... that, in itself, needs to be checked, understood, and reported...

So I completely agree fix AdjustConfig to not bork the config could be the best, first attack...

Look forward to further feedback... thanks...

@geoffmcl geoffmcl added the Bug label Nov 22, 2018
@ler762
Copy link
Contributor Author

ler762 commented Nov 22, 2018

@geoffmcl htacg/tidy-html5-tests#30 is a different problem

The fix for htacg/tidy-html5-tests#30 is to add a -no-config option that tells tidy not to load any of the default init files. Then add -no-config to all the regression tests and that issue is resolved.

And adding a -no-config option to tidy also allows me to put tidy -no-config ... in a script or make file and having tidy do the right thing even if the user has a ~/.tidyrc, the sysadmin has created an /etc/tidy.conf or the user set HTML_TIDY=xxx in their startup.

This problem is different. Calling AdjustConfig after reading each config file is wrong - some settings get reset. For example:

$ cat /etc/tidy.conf
indent-spaces: 4

$ cat ~/.tidyrc
cat: /home/Lee/.tidyrc: No such file or directory

$ tidy -q -indent -export-config | grep -i indent
indent: auto
indent-attributes: no
indent-cdata: no
indent-spaces: 2
indent-with-tabs: no

Now try removing the AdjustConfig( doc ); call in TY_(ParseConfigFileEnc):

$ ./tidy -q -indent -export-config | grep -i indent
indent: auto
indent-attributes: no
indent-cdata: no
indent-spaces: 4
indent-with-tabs: no

I gave up on trying to figure out exactly what correct behavior would be for AdjustConfig & went with the idea that AdjustConfig shouldn't be called until after all the init files & program options have been read.

Turns out I got lucky & removing the AdjustConfig( doc ); call in TY_(ParseConfigFileEnc) is the fix; you get the desired behavior of calling AdjustConfig after reading all the config files/params and just before reading the html file.

$ cat /etc/tidy.conf

$ cat ~/.tidyrc
cat: /home/Lee/.tidyrc: No such file or directory

$ ./DOregression-test

and all the tests pass.

@geoffmcl
Copy link
Contributor

@ler762 this is the problem I faced when I opened Tests 30 - It is the SAME problem!

And as expressed at length, would never paper over the problem by adding say a -no-def-configs option, which I am now getting very cold about, but that's about issue #772, which I have yet to respond on...

If you can not understand the need for the AdjustConfig call, then I can only suggest try reading the code... and you need to explore that, at length, before suggesting to delete it...

That is, try again to figure out exactly what **correct behavior** would be for **AdjustConfig** ... and this is NOT about running our regression tests... forget them for this issue... please...

My method, is to sort of write research notes, as reading the code, and this is some of my thoughts... I hope they help...

The default state of TidyIndentContent is TidyNoState so the code that does this in AdjustConfig is -

    if ( cfgAutoBool(doc, TidyIndentContent) == TidyNoState )
        TY_(SetOptionInt)( doc, TidyIndentSpaces, 0 );

In tidy.c, if the config later contains an indent option, then if TidyIndentSpaces is 0, it is reset to the default 2...

    else if ( strcasecmp(arg, "indent") == 0 ) or
    case 'i':
        tidyOptSetInt( tdoc, TidyIndentContent, TidyAutoState );
        if ( tidyOptGetInt(tdoc, TidyIndentSpaces) == 0 )
            tidyOptResetToDefault( tdoc, TidyIndentSpaces );
        break;

However, if the option indent: auto is found in a config file, then AdjustConfig will not pick this change up, and potentially reset indent-spaces, if it is zero... so it can remain zero even though indent is now required, been requested...

This suggests the AdjustConfig be expanded a little, to perform the reset, like -

    if ( cfgAutoBool(doc, TidyIndentContent) == TidyNoState )
    {
        TY_(SetOptionInt)( doc, TidyIndentSpaces, 0 ); /* why this is required! does `pprint` expect, use it...*/
    } 
    else if ( tidyOptGetInt(tdoc, TidyIndentSpaces) == 0 )
    {
        /* indent is 'on', or 'auto' so reset the 'spaces', as `tidy.c` does with
         * tidyOptResetToDefault( tdoc, TidyIndentSpaces ); but we are in-lib, so -
         */
        TY_(ResetOptionToDefault)( doc, TidyIndentSpaces ); /* Is #778 - loss of indent */
    }

I would really appreciate it if someone could test, try, improve this code, maybe in an issue-778 branch...

The simple test setup is -

$ rm -f ~/.tidyrc # clean runtimes
$ sudo rm -f /etc/tidy.conf # clean
$ echo "indent: auto" > test.conf # setup a local config
$ echo "hullo world" | tidy -config test.conf # html output should be indented
$ sudo touch /etc/tidy.conf # create null
$ echo "hullo world" | tidy -config test.conf # output indent lost
$ sudo rm -f /etc/tidy.conf # clean up

Look forward to further feedback... thanks...

@ler762
Copy link
Contributor Author

ler762 commented Nov 22, 2018

@geoffmcl I see two different problems:

Issue 1: don't adjust the config until you have the complete config

If /etc/tidy.conf and ~/.tidyrc don't exist the regression tests all pass.
touch ~/.tidyrc and run the regression tests - some fail.
delete the call to AdjustConfig in TY_(ParseConfigFileEnc), rebuild & run the regression tests - all pass

So it sure seems like the fix for this problem is not calling AdjustConfig until tidy has the complete config - which means all the init files and command line options.

Issue 2: need the ability to ignore the default init files when tidy is called in a script

echo "wrap: 120" > ~/.tidyrc
run the regression tests & some fail

The fix for this problem is not reading ~/.tidyrc
and the only way to tell tidy not read a default init file is to add a new command line option.

@geoffmcl
Copy link
Contributor

@ler762 we seem to have a big problem in communication... ;=))

Have I not tried to imply just deleting AdjustConfig would cause all sorts of problems... maybe there is no present regression test to show this... that would be a good exercise, to write, add one... can you? But that is beside the point...

And AdjustConfig must be run exactly after each config file read...

Take a chained command like tidy -config test1.conf test1.html -config test2.conf test2.html ... adinfinitum..., well until the command line length becomes a factor, so is correctly built into the config file read... this can not be easily changed...

In other words, there is no moment when tidy has the complete config...

Configuration is an additive process, and at the end of each config file read, but we do not know yet if there will be more, that complete config must be checked for consistency... to be used on the next html input... and so on...

If you still do not get it, what else can I explain? But please do not repeat things over, and over...

Have suggested a fix, on which I need help... I will get to this soonest...

Look forward to further constructive feedback... thanks...

@ler762
Copy link
Contributor Author

ler762 commented Nov 23, 2018

@geoffmcl yes, we do seem to be doing a bit of talking past each other :(

Take a chained command like ... there is no moment when tidy has the complete config

I'd say that just before tidy starts processing the file to be tidy'ed it has, by definition, the complete config. It's run out of options, config files to be read, whatever else there is and it's to the point of ok - time to process this file.
At that point it's got all the config settings it's going to get for that file, so I'm saying it's got the complete config and it's finally time to call AdjustConfig

Start tracing the procedure calls beginning at console/tidy.c where it starts processing the file:

status = tidyParseFile( tdoc, htmlfil );

tidyParseFile ->
  tidyDocParseFile ->
    TY_(DocParseStream) ->
      TY_(TakeConfigSnapshot) ->
        AdjustConfig ->
         TY_(ParseDocument)

AdjustConfig gets called before processing the document.

Let's add a bunch of print statements to tidy & come up with an extreme example:

$ cat /tmp/x
$TIDY -q /tmp/test.html \
      -config /tmp/i4 -indent -wrap 40 /tmp/test.html \
      -config /tmp/i8 -indent /tmp/test.html \
      --indent no --tidy-mark no -wrap 40 /tmp/test.html

$ cat /tmp/i4
indent-spaces: 4

$ cat /tmp/i8
indent-spaces: 8

$ sh /tmp/x
debug: enter tidyParseFile
debug: enter tidyDocParseFile
debug: enter TY_(DocParseStream)
debug: enter TY_(TakeConfigSnapshot)
debug: enter AdjustConfig
deubg: enter TY_(ParseDocument)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta name="generator" content="HTML Tidy for HTML5 for Cygwin version 5.7.17.ler762.nullrc">
<title>The Title</title>
</head>
<body>
<p>A paragraph.</p>
</body>
</html>
debug: enter tidyParseFile
debug: enter tidyDocParseFile
debug: enter TY_(DocParseStream)
debug: enter TY_(TakeConfigSnapshot)
debug: enter AdjustConfig
deubg: enter TY_(ParseDocument)
<!DOCTYPE
HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
    <meta name="generator" content=
    "HTML Tidy for HTML5 for Cygwin version 5.7.17.ler762.nullrc">
    <title>The Title</title>
</head>
<body>
    <p>A paragraph.</p>
</body>
</html>
debug: enter tidyParseFile
debug: enter tidyDocParseFile
debug: enter TY_(DocParseStream)
debug: enter TY_(TakeConfigSnapshot)
debug: enter AdjustConfig
deubg: enter TY_(ParseDocument)
<!DOCTYPE
HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
        <meta name="generator" content=
        "HTML Tidy for HTML5 for Cygwin version 5.7.17.ler762.nullrc">
        <title>The Title</title>
</head>
<body>
        <p>A paragraph.</p>
</body>
</html>
debug: enter tidyParseFile
debug: enter tidyDocParseFile
debug: enter TY_(DocParseStream)
debug: enter TY_(TakeConfigSnapshot)
debug: enter AdjustConfig
deubg: enter TY_(ParseDocument)
<!DOCTYPE
HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>The Title</title>
</head>
<body>
<p>A paragraph.</p>
</body>
</html>
debug: enter TY_(FreeConfig)
debug: enter TY_(TakeConfigSnapshot)
debug: enter AdjustConfig

$

Looks like tidy did the right thing each time.

@geoffmcl
Copy link
Contributor

@ler762 now it becomes move AdjustConfig to a different place in the library ;=))

Why, I do not exactly know... some discussion about complete... inconclusive, especially if there is only one, probably the majority use case... brief simple test says this could work... one test! ... LOL

I do not have time to build, and debug now... not against maybe better logic here, but at what cost, gain, or losses? Needs to be explored...

Is after TakeConfigSnapshot the best place, an order change? Why?... Other issues that it may raise... possibly lots of questions, or it works perfectly, and is great... don't yet know...

But this should be a separate issue, if you want to pursue it... Want to move AdjustConfig to a better place, or something...

Moving it may or may not solve the problem addressed here... I don't think it will... maybe harder to setup...

That is, in one special case where the indent is off, the indent-spaces is zeroed, there is no library code to restore this, if indent is later enabled...

Such code presently resides only in tidy.c, where it monitors the i options... suggested a simple fix to AdjustConfig... in the library...

It has an action to be performed, however it finds the indent - looks good, but also still to be tested... maybe I am wrong? Hope not!

Hope I can get some help in this... thanks...

PS: Have you tried -DENABLE_DEBUG_LOG:BOOL=ON in your tidy build... it provided a very good trace of the order of the library code? enters, exits, etc... It may beat adding a bunch of printf, but that is certainly the traditional unix trace method... I have it enabled for my MSVC Debug build, so can watch, while debug source tracing... it helps a lot... and the output is to a log file...

@geoffmcl
Copy link
Contributor

Ok, found another way to show this "loss of indent" bug...

It is really nothing to do with regression tests, nor the existence of /etc/tidy.conf and/or ~/.tidyrc, so in effect the title of this issue is real wrong!

The title should be something like "indent is not restored if it gets zeroed", or something...

But to test, first make sure runtime init config files are all removed...

$ tidy -help-env
$ sudo rm -f /etc/tidy.conf
$ rm -f ~/.tidyrc

Setup, and run the following...

$ echo "hullo world" > hw.html
$ echo "indent: no" > c1.conf
$ tidy -config c1.conf hw.html
# This output has no indent as expected
$ echo "indent: yes" > c2.conf
$ tidy -config c2.conf hw.html
# this output has current default indent 2
# now combine them together, in one command
$ tidy -config c1.conf hw.html -config c2.conf hw.html
# == big bad bug == the 2nd output contains no indent!

Now, I hope it can be clearly seen moving AdjustConfig to a different place in the library will not fix this! As stated, this move may have logical merit, but it is not a fix for this bug... its current location is not the problem, so why change... but has to be a new issue, if someone thinks it is important...

Sorry about allowing the misleading current title to stand... but it was through the regression tests, and existence of a /etc/tidy.conf, put there by a debian tidy install, and I did not notice, that I first discovered it, so left that in the title... probably a very bad idea... sorry...

I certainly seek some help in solving this "loss of indent" bug... thanks...

@geoffmcl
Copy link
Contributor

20210225:
@ler762, revisited this, loss of indent over multiple configs, and related #779...

First I tried, moving AdjustConfig around, and even deleting it, in some places, but could not get the results desired... maybe I need to try harder ;=))

But I have found a simple patch that seems to work, in ResetConfigToSnapshot -

diff --git a/src/config.c b/src/config.c
index bae56b8..5db8e28 100644
--- a/src/config.c
+++ b/src/config.c
@@ -739,6 +739,10 @@ void TY_(ResetConfigToSnapshot)( TidyDocImpl* doc )
     }
     if ( needReparseTagsDecls )
         ReparseTagDecls( doc, changedUserTags );
+
+    if (cfg(doc, TidyIndentSpaces) < 2) /* Is. #778 */
+        TY_(SetOptionInt)(doc, TidyIndentSpaces, 2);
+
 }
 

Maybe that < 2 should be < 1, but there is also the case --indent-with-tabs yes, which resets the indent-spaces to 1... and maybe the optional TY_(ResetOptionToDefault)( doc, TidyIndentSpaces ); should be done in issue #108? It gets real messy when fully considering the interaction between various options... and even their order of presentation...

Perhaps that patch should be expanded to -

    if (cfg(doc, TidyIndentSpaces) < 2) /* Is. #778 */
        if (cfgBool(doc, TidyPPrintTabs))
            TY_(SetOptionInt)(doc, TidyIndentSpaces, 1);
        else
            TY_(SetOptionInt)(doc, TidyIndentSpaces, 2);

or something...

Incidentally, I also ran the regression tests, using this tidy-5.7.45.I778.exe, and nothing popped... but of course there is no test case for multiple configs with multiple files on the command line... so would not expect anything...

Will do some more testing before pushing say an issue-778 branch for testing and merging...

Meantime, look forward to further feedback, comments, patches, etc... thanks...

@balthisar
Copy link
Member

AdjustConfig moved to be called only a single time after all configuration is loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants