-
Notifications
You must be signed in to change notification settings - Fork 31
Debug
Debugging javascript is a major endeavor, as I first discovered when debugging nasa.gov, which came up empty. It has megabytes of js, and we're not really sure why, since it is a readonly site that merely disseminates information. You don't buy things, you don't check your bank account, you can't even log in. As Avril Lavigne says, "Why'd you have to go and make things so complicated?" Most informational sites, like newspapers, can be read with js disabled and they're fine, but some, like nasa, come up empty without js enabled. Even with js on, it still came up empty, thus there was a problem. I spent many days finding and fixing the bug, but in doing so, I realized we need better debugging tools, including tracing, debug prints, and breakpoints. This article describes these tools.
Follow-up: nasa.gov has since become more accessible, you can read it without javascript. Maybe people like me complained. But this is not the norm. Many sites are getting more complicated, so on we go.
First, we have the javascript debugger, which you enter via the jdb command. It is basically a javascript shell inside edbrowse. bye to exit the shell and return to edbrowse proper. A few edbrowse commands are interpreted here, such as db3 for debug level, and e7 to go to another session, and shell escape, etc. See jdb_passthrough() for the pass-through commands. Javascript expressions are evaluated, and the document objects are available. document.head is the head of your document <head>, document.body is the body <body>, document.body.firstChild is the first node under <body>, and so on. You want at least db3 to show you the errors as you enter expressions. At db2, you can type the syntactically incorrect line
7 7
and you are greeted with unhelpful silence. At db3 or above you get
jdb line 1: Syntax Error: unterminated statement
If o is an object, ok(o) shows you the object keys, at least those that are enumerable. If x is one of the keys, then o.x returns the value of x. If you're not sure what x is, typeof o.x will tell you. Warning: react is a complex js framework used by some websites, and it redefines ok. You can use Object.keys instead, but it's longer to type and isn't quite as powerful. The alias $ok will work, and is just one keystroke longer than ok.
natok is a native version of ok, and it gives all the properties under an object, even those that are not enumerable.
There is an attribute system in DOM javascript, something I did not understand for many years. It isn't complicated, but is poorly documented, or not at all, and occasionally it actually is complicated, so it is probably worth writing about.
Each tag becomes an object in javascript, let us say that a tag <p> becomes a tag of type HTMLParagraphElement, that I will assign to the variable p. p.snork refers to a member, or property, of the object p. This typically has nothing to do with html or the running of the web page. I did not realize that for a long time. There is instead an attribute system. p.setAttribute("snork", 77) p.getAttribute("snork") then produces 77. Even if you do this, p.snork is not defined. They have nothing to do with each other. Similarly, you can set p.snork = 85, and p.getAttribute("snork") will still be 77. They run separately.
If your line of html says <p snork=77> and p is the corresponding tag object, p.snork will be undefined. html tag attributes go into the attribute system. p.getAttribute("snork") will be 77.
If s is the tag object for your select, with <select size=20>, that means s.getAttribute("size") is 20, and that I'm pretty sure is the effective display size of the menu. Unless we act differently, that has nothing to do with s.size.
Now, there are two effects that I call spillup and spilldown. This is not well documented, it is my own terminology. spilldown is when the value of the attribute spills down to the property. spillup is when setting the property spills up to the attribute. They usually go together. Look at this from the documentation on HTMLSelectElement and its members.
HTMLSelectElement.name: A string reflecting the {name} HTML attribute, containing the name of this control used by servers and DOM search functions.
This implies spilldown for sure, and I think spillup as well, though that is less clear. It says that if we use setAttribute to make the name fred, then s.name also becomes fred. It spills down to the property of s. It also suggests that s.name = "fred" spills up to the attribute. This often happens with name, and always with id. The id is so important that it spills down from attribute to property and up from property to attribute. This is accomplished by the following javascript in startwindow.js
Object.defineProperty(HTMLElement.prototype, "id", {
get:function(){ var t = this.getAttribute("id");
return typeof t == "string" ? t : undefined; },
set:function(v) { this.setAttribute("id", v)}});
Gettting, or setting, the id property of any html element vectors through the setAttribute system. Once I understood this a lot of html javascript made more sense. Now go back to select.size. Note this from the documentation.
HTMLSelectElement.size: A long reflecting the {size} HTML attribute, which contains the number of visible items in the control. The default is 1, unless multiple is true, in which case it is 4.
This means to me, spilldown at the very least. Probably spillup as well, though that is less clear. If the attribute is a number or a string that looks like a number, it spills down to the property. I have to convert it to a number because it says long in the documentation. Every html attribute is a string,even if it looks like a number or boolean it is a string, so I convert. If no attribute, or the type is wrong, return the default of 0.
Object.defineProperty(HTMLSelectElement.prototype, "size", {
get:function(){ var t = this.getAttribute("size");
if(typeof t == "number") return t;
if(typeof t == "string" && t.match(/^\d+$/)) return parseInt(t);
return 0;},
set:function(v) { this.setAttribute("size", v);}});
There it is, spillup and spilldown. It's just a few lines of code. I thought the default return would be 1 or 4, but other browsers return 0. The documentation says default is 1 or 4, but I'm guessing that is the default display size, not the default spilldown from attribute to property.
Sometimes type conversion is implied. Like multiple <select multiple> that looks like multiple is an empty string, but it actually means multiple = true. required, disabled, hidden, several attributes work this way.
Finding nodes buried somewhere in the tree can be challenging, but querySelectorAll() can help. This isn't a debugging tool, it is standard in all browsers, but it can be helpful.
a = querySelectorAll("p") list all the paragraphs in the document
a = querySelectorAll("input") list all the input elements in the document
a = querySelectorAll("p.instruction") list all the paragraphs with class=instruction
a = querySelectorAll("#login") list the node whose id=login
There should only be one node with a given id, so instead of an array, you can fetch that particular node directly.
a = querySelector("#login") returns a node not an array of nodes
a = querySelectorAll("#login")[0] equivalent
by_esn(76) is a home grown function to return the node with edbrowse sequence number 76. I assign sequence numbers to all the nodes, for tracking and debugging purposes. Type g? on a hyperlink and get the usual information, plus the sequence number at db3 or higher. i? gives the field information, and the sequence number.
Once you have found a node, use dumptree(n) to see the descendants of a node, and uptrace(n) for the ancestors. The latter shows the class and id of each node, as they are very important in html, javascript, and css.
Scripts are of particular importance, so there is a special function showscripts() to gather all the scripts, even those that were dynamically created, and put them into a special array $ss. You will see the length of each script, and where it comes from (url), and whether it was deminimized. (More on deminimization below.) Use ok($ss[3]) to find out what we know about script number 3. src holds the url, if it came from another web page, and data holds the actual data. Since > is an operator in javascript, I use ^> to write to a file. Thus you can save a copy of the script to disk like this.
\$ss[3].data ^> script3.js
There may be dozens of scripts, so the searchscripts() function helps you find the script that contains a particular string.
searchscripts("foobar")
Since the css files are more static, they are already gathered into an array; you don't need to issue a command to do it. cssSource is an array of objects, similar to $ss, but each object describes a css file, rather than a javascript file. src holds the url of the css file, and data holds the actual data. The url is typically the href attribute in the <link> tag, and the data is the contents of that web page.
<link href="https://something.com/foobar.min.css" rel="stylesheet" type="text/css">
$pjobs is an array that holds the Promise jobs, at level 3 or above. Thus you can see what functions were executed by Promise.
As you browse and unbrowse, and fill out forms, and push buttons, and move in and out of jdb, there are debug flags beyond the numeric debug level. These are toggle flags.
dbev debugs the events and the listeners for those events. You will see when listeners are added and removed, when events are dispatched, and when handlers are called. You want to be familiar with the edbrowse convention for event handlers. Node.onclick is the function to run when a node captures or bubbles the onclick event, and that is standard javascript, we can't change that. So we put in our own onclick$$fn() function that calls all the handlers that have been added by addEventListener(). These are stored in Node.onclick$$array. If there was an onclick function originally, either from an html tag setting the onclick attribute, or because javascript added an onclick function to the Node, and then more handlers are added via addEventListener(), the original handler is stored in Node.onclick$$orig. So our onclick function checks for onclick$$orig first and runs that, and if it does not fail, it steps through the handlers in onclick$$array and runs those in turn as long as they don't fail. The same pattern applies for onload, oninput, onchange, and all the other events. Of course things like onmouseout and onfocus and such we ignore, because edbrowse has no such events.
dbcss debugs css. There can be thousands of selectors and rules; the results are stored in /tmp/css.
dberr shows you all the errors, even those caught in a try catch block. Many of these errors are ok, false positives, as far as the website is concerned, that's why they are in try blocks; so turning this on can be more distracting than helpful.
timers disables javascript timers, and you often want to do this so that things are frozen while you debug.
Now, before we can trace execution or add breakpoints, we need a local copy of the website. I'll illustrate with nasa.gov.
-
make a directory somewhere called nasa and cd into it. This directory is empty.
-
Call up edbrowse and set demin for deminimization, and trace to add trace points, and db3 if you want to see what is going on.
demin trace db3 b https://www.nasa.gov
-
Verify you have about 100 lines, you got real stuff. Try to ignore the errors for now. Type timers- to turn off the action.
-
jump into jdb and run snapshot(). This is a special function that creates a local copy of the website as best it can. It prints a reminder of what you need to do next.
-
Use bye to exit out of jdb, then ub to unbrowse. Read the file called from into the current html file just after the <head> tag. You may need to cut the first few tags apart if they are all on one line, so you can place the Base tag just after the Head tag and before other tags. The from file will look like this.
This sets the base tag so the local website looks like it came from the server at nasa.gov. Finally, save this to a local file called base, and then quit.
-
ls to see what has happened. There are lots of new files.
-
edbrowse the base file, just a local file. db3 to see what is happening, and b to browse. Almost all of the javascript files, and certainly the ones we care about, are fetched locally from your directory. Also the css files. You should get the same errors, and the same 100 lines of stuff.
-
You can change the js files, add alert statements, breakpoints, etc, and browse and change and browse again, and otherwise debug.
For ongoing work, you can change the filenames to be more intuitive. Perhaps f2.js is vendor.js and f3.js is nasa.js. Just make the same change in the jslocal map.
The js files are deminimized, with lots of trace points. A trace point looks like trace@(d221). The global variable step$l controls the trace. If step$l = 0 then the javascript runs as before, though much slower. If step$l = 1 then each trace label is printed as execution procedes. Combine this with db3 to see where errors are happening. If step$l = 2 then edbrowse stops at each trace point, as though it were a breakpoint. The breakpoint shell is similar to jdb but much more restricted. You can't issue any edbrowse commands at all, since javascript is actively running here. You can view and even modify the local variables that are in scope, or the global variables. The object known as this is preserved. The special symbol arg$ is the arguments object of the currently running function. Type . to exit and resume execution. You can change step$l to change tracing.
You can add your own breakpoint with the bp@ macro, i.e. bp@(label). Edbrowse will say break at line label, then you can look around. Again, type . to exit.
I worked on one site that added its own toString() functions to various prototypes, and these in turn had trace points, so when I asked for the value of x, and it tried to turn x into a string, it entered the toString() function associated with x, which triggered more breakpoints, which was really really confusing. I entered . . . and finally got back to the string value of x. I hope this is a rare occurrence. But it's not. Lots of sites do it. either that or they all use a common js library, like jquery, which does it. It sure makes debugging complicated. If you know this is going to happen, set step$l = 0 so you don't see the tracing; then x gives you the value of x right away. But remember to put step$l back, if you wish, before you type . to exit the breakpoint.
At the beginning of youre base file, you can add <script> step$go = 'd221'; </script>, and step$l will change to 2 when execution reaches trace@(d221). Edbrowse breaks at every trace point thereafter, until you downgrade step$l. Another trigger mechanism is <script> step$exp = "foobar==27"; </script>. Your expression will be evaluated everywhere, so make sure it refers to global variables, or nodes that are uniquely identified in the dom tree, by using querySelector("#something") for example.
When you jump into jdb, you do so within the frame of the current line, and within its context. This might be a subpage of the entire web page. parent takes you to a higher frame, if there is one. if parent === window then you are at the top. top takes you to the top window. This is standard javascript it is not my creation. It is often best to enter jdb from line 1, so you are at the top.
The frames array lists the frames within the current window. These are the actual windows beneath your window.
eb$ctx is a context number, which I assign to each window. These are just sequence numbers, like the ones I assign to nodes. frames[1].eb$ctx gives the context number of the second frame in the current window. When you first enter jdb it might say something like
cx 1 http://www.nasa.gov.browse
This is just a reminder of where you are: the url and its context number.
Event handlers have sequence numbers. They are in eb$hsn in the various functions in onwhatever$$array. You'll see these numbers if you have dbev enabled.
Timers have sequence numbers. You'll see these but only at db4. Timers run so often I don't always want to see them at db3. Browse jsrt for example, and after 30 seconds:
exec timer 3 context 1
exec complete
When debugging, timers live on, even after they are complete. They are in gc$xx variables under window. For example, the above timer is in gc$411,, and to confirm, gc$411.tsn == 3.
Some people are able to run this local snapshot in firefox, with tracing on, and then compare that with the trace from the edbrowse run, looking for divergence. Timers can run in any order in firefox, and same for Promise jobs, truly asynchronous, nor can we turn these timers off, as we can in edbrowse, so this isn't a simple matter of doing a diff. It is a visual scan through millions of lines looking for patterns and discrepencies. It is more an art than a science. To be quite honest, I, (Karl Dahlke), can't do it, so I rely on others. This technique has proved essential in finding certain bugs.
Have fun.