Skip to content

Commit

Permalink
Made regexp extracting GFS directy names more robust. (#1410)
Browse files Browse the repository at this point in the history
Issue 1409: It seems the directory listing HTML changed, and this
caused the regexp extracting the directory names (e.g. `gfs.*` or
the cycle directory) to capture HTML elements along ith the actual
value, thus making the names always invalid.

Resolves #1409.
Resolves #1410 (PR).
  • Loading branch information
wwlwpd authored Nov 1, 2024
1 parent 4a725f6 commit 93cc670
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion bin/get_gfs_status.pl
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,11 @@
my $url = sprintf( qq{https://ftp.ncep.noaa.gov/%s}, $dir );
my $res = $ua->get($url);
my $raw_listing = $res->{content};
my @dirs = ( $raw_listing =~ m/>(.+)\/</g );
# Welcome to the perils of parsing HTML, the following match is set up
# to work on the following 2 examples,
# 1. <tr><td><a href="00/">00/</a></td><td align="right">31-Oct-2024 03:32
# 2. <tr><td><a href="gfs.20241027/">gfs.20241027/</a></td><td align="right">27-Oct
my @dirs = ( $raw_listing =~ m/href="(gfs\.\d{8}|\d\d)\/"/g );
if ( not @dirs ) {
warn "!! No directories found via $url\n";
}
Expand Down

0 comments on commit 93cc670

Please sign in to comment.