Figure out a full-featured scraping framework #11

lassik · 2020-08-21T13:36:20Z

Currently we use a hand-written scraper for these implementations:

chez
guile
kawa
larceny
loko
mit
racket
scheme48
slib
tinyscheme
ypsilon

Would be nice to extend the scraper generator in listings.scm so it can handle all of the above.

The scraper generator currently writes Unix shell scripts that use curl, tar and grep, but that's just an implementation detail. If Scheme had a more fully developed archive framework, we could just as well skip the shell scripts and run the scrapers directly in Scheme.

The main thing is to discover a specification language for the scrapers.

The text was updated successfully, but these errors were encountered:

lassik · 2020-08-21T13:37:01Z

The current definitions look like this:

srfi-metadata/listings.scm

Line 17 in b1f03fb

(define schemes

lassik · 2020-08-21T13:40:18Z

IMHO it's a good idea to scrape the listings from manuals whenever possible. Manuals usually describe the most "official" features of the implementation - they tend to put emphasis on polished features, not experimental ones. So if a SRFI is mentioned in the manual, support is probably reasonably complete.

jpellegrini · 2020-08-21T14:43:44Z

it's a good idea to scrape the listings from manuals whenever possible

Some of the manuals are generated programatically (STklos is one example). Maybe (I don't know) this could be used, somehow.

lassik · 2020-08-21T15:12:33Z

Yes - in that case it doesn't matter whether we scrape the source or result. Whichever is easier to do. For STklos we currently scrape doc/skb/srfi.stk with one regexp.

erkin · 2020-11-05T18:26:49Z

I added GitLab support to the scraper generator to integrate Loko and Kawa. I also added support for Snow Fort and chez-srfi, which doesn't depend on Unix shell scripts but depends on external Racket packages. Maybe a compromise can be made.

For Racket, we can scrape racket/srfi repo.
Guile and MIT/GNU Scheme are both hosted on Savannah with cgit, so maybe we can add a third Git host to the scraper generator.
TinyScheme, Ypsilon, Chez, Scheme48, SLIB and Larceny don't seem like they're going to add new SRFIs any time soon, so we can even remove the existing handwritten scrapers and keep static data.

lassik · 2020-11-05T19:00:38Z

Fantastic. Thanks you very much for continuing to work on this!

erkin · 2020-11-06T07:14:20Z

No problem! (I had to neglect it for a while though. ^^;)

I added Racket to the scraper generator (although it's not yet complete, see racket/srfi#10). Now only CHICKEN, Guile and MIT/GNU Scheme remain.

erkin · 2020-11-08T21:03:06Z

CHICKEN down with d25c7b7! Thanks to the work of @diamond-lizard.

lassik · 2020-11-11T12:32:44Z

I added Racket to the scraper generator (although it's not yet complete, see racket/srfi#10). Now only CHICKEN, Guile and MIT/GNU Scheme remain.

Excellent work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out a full-featured scraping framework #11

Figure out a full-featured scraping framework #11

lassik commented Aug 21, 2020

lassik commented Aug 21, 2020

lassik commented Aug 21, 2020

jpellegrini commented Aug 21, 2020

lassik commented Aug 21, 2020

erkin commented Nov 5, 2020

lassik commented Nov 5, 2020

erkin commented Nov 6, 2020 •

edited

Loading

erkin commented Nov 8, 2020

lassik commented Nov 11, 2020

Figure out a full-featured scraping framework #11

Figure out a full-featured scraping framework #11

Comments

lassik commented Aug 21, 2020

lassik commented Aug 21, 2020

lassik commented Aug 21, 2020

jpellegrini commented Aug 21, 2020

lassik commented Aug 21, 2020

erkin commented Nov 5, 2020

lassik commented Nov 5, 2020

erkin commented Nov 6, 2020 • edited Loading

erkin commented Nov 8, 2020

lassik commented Nov 11, 2020

erkin commented Nov 6, 2020 •

edited

Loading