-
Notifications
You must be signed in to change notification settings - Fork 8
Querying over ASTs with Rascal
ASTs are represented as Rascal terms built using a variety of constructors. To load the AST constructors, do the following:
import lang::php::ast::AbstractSyntax;
Each PHP file yields a term of type Script
, which (except in cases where errors are found during parsing) are defined as
script(list[Stmt] body)
Searching ASTs for specific features is just a matter of using Rascal's standard pattern matching features, especially matches over constructed terms and deep matches. As an example, assume we have loaded WordPress 3.4 as follows:
import lang::php::ast::AbstractSyntax;
import lang::php::util::Utils;
pt = loadBinary("Drupal", "7.14");
To find all functions calls in file wp-admin/user/menu.php
you can use the following code (this assumes the corpus is in a root directory named corpus):
[ c | /c:call(_,_) := pt[|file:///corpus/WordPress/wordpress-3.4/wp-admin/user/menu.php|] ]
This code looks for all call
nodes anywhere in the script associated with this file. call
itself is defined as:
call(NameOrExpr funName, list[ActualParameter] parameters)
but, since we do not care about either the function name, or the function parameters, we just use _
for each so any value will match.
To instead do this across all of WordPress, you would do the following:
[c | /c:call(_,_) := pt ]
To look for all uses of function mysql_query
, do the following:
[c | /c:call(name(name("mysql_query")),_) := pt ]
Finally, to look for all calls to any function starting with mysql
, do the following:
[c | /c:call(name(name(n)),_) := pt, /^mysql/ := n ]
Obviously, these can be saved into variables and further examined. For instance, to find out how many occurrences of each call there are in WordPress, do the following:
mysqlCalls = [c | /c:call(name(name(n)),_) := pt, /^mysql/ := n ];
calledFunctions = { n | call(name(name(n)),_) <- mysqlCalls };
This gives back the following functions:
{"mysql_unbuffered_query","mysql2date","mysql_fetch_field","mysql_fetch_row","mysql_free_result","mysql_set_charset","mysql_num_fields","mysql_num_rows","mysql_connect","mysql_error","mysql_real_escape_string","mysql_query","mysql_affected_rows","mysql_get_server_info","mysql_insert_id","mysql_select_db","mysql_fetch_object"}
Now, we can use a map to count the occurrences:
map[str,int] callCounts = ( fn : 0 | fn <- calledFunctions );
for (call(name(name(n)),_) <- mysqlCalls) callCounts[n] += 1;
Now, we can print these sorted on function name:
import List;
import Set;
for (fn <- sort(toList(calledFunctions))) println("<fn> is called <callCounts[fn]> times");
which will yield the following results:
mysql2date is called 75 times
mysql_affected_rows is called 2 times
mysql_connect is called 3 times
mysql_error is called 2 times
mysql_fetch_field is called 1 times
mysql_fetch_object is called 1 times
mysql_fetch_row is called 5 times
mysql_free_result is called 1 times
mysql_get_server_info is called 1 times
mysql_insert_id is called 1 times
mysql_num_fields is called 1 times
mysql_num_rows is called 2 times
mysql_query is called 17 times
mysql_real_escape_string is called 17 times
mysql_select_db is called 2 times
mysql_set_charset is called 1 times
mysql_unbuffered_query is called 3 times