Skip to content

Querying over ASTs with Rascal

Kangcor edited this page Apr 24, 2014 · 2 revisions

ASTs are represented as Rascal terms built using a variety of constructors. To load the AST constructors, do the following:

import lang::php::ast::AbstractSyntax;

Each PHP file yields a term of type Script, which (except in cases where errors are found during parsing) are defined as

script(list[Stmt] body)

Searching ASTs for specific features is just a matter of using Rascal's standard pattern matching features, especially matches over constructed terms and deep matches. As an example, assume we have loaded WordPress 3.4 as follows:

import lang::php::ast::AbstractSyntax;
import lang::php::util::Utils;
pt = loadBinary("Drupal", "7.14");

To find all functions calls in file wp-admin/user/menu.php you can use the following code (this assumes the corpus is in a root directory named corpus):

[ c | /c:call(_,_) := pt[|file:///corpus/WordPress/wordpress-3.4/wp-admin/user/menu.php|] ]

This code looks for all call nodes anywhere in the script associated with this file. call itself is defined as:

call(NameOrExpr funName, list[ActualParameter] parameters)

but, since we do not care about either the function name, or the function parameters, we just use _ for each so any value will match.

To instead do this across all of WordPress, you would do the following:

[c | /c:call(_,_) := pt ]

To look for all uses of function mysql_query, do the following:

[c | /c:call(name(name("mysql_query")),_) := pt ]

Finally, to look for all calls to any function starting with mysql, do the following:

[c | /c:call(name(name(n)),_) := pt, /^mysql/ := n ]

Obviously, these can be saved into variables and further examined. For instance, to find out how many occurrences of each call there are in WordPress, do the following:

mysqlCalls = [c | /c:call(name(name(n)),_) := pt, /^mysql/ := n ];
calledFunctions = { n | call(name(name(n)),_) <- mysqlCalls };

This gives back the following functions:

{"mysql_unbuffered_query","mysql2date","mysql_fetch_field","mysql_fetch_row","mysql_free_result","mysql_set_charset","mysql_num_fields","mysql_num_rows","mysql_connect","mysql_error","mysql_real_escape_string","mysql_query","mysql_affected_rows","mysql_get_server_info","mysql_insert_id","mysql_select_db","mysql_fetch_object"}

Now, we can use a map to count the occurrences:

map[str,int] callCounts = ( fn : 0 | fn <- calledFunctions );
for (call(name(name(n)),_) <- mysqlCalls) callCounts[n] += 1;

Now, we can print these sorted on function name:

import List;
import Set;
for (fn <- sort(toList(calledFunctions))) println("<fn> is called <callCounts[fn]> times");

which will yield the following results:

mysql2date is called 75 times
mysql_affected_rows is called 2 times
mysql_connect is called 3 times
mysql_error is called 2 times
mysql_fetch_field is called 1 times
mysql_fetch_object is called 1 times
mysql_fetch_row is called 5 times
mysql_free_result is called 1 times
mysql_get_server_info is called 1 times
mysql_insert_id is called 1 times
mysql_num_fields is called 1 times
mysql_num_rows is called 2 times
mysql_query is called 17 times
mysql_real_escape_string is called 17 times
mysql_select_db is called 2 times
mysql_set_charset is called 1 times
mysql_unbuffered_query is called 3 times
Clone this wiki locally