Skip to content

NugetScraper is an HTML parser that reads an HTML page retrieved from NuGet.org and provides a list containing NuGet packages released by a specified organization.

Notifications You must be signed in to change notification settings

maroontress/NugetScraper

Repository files navigation

NugetScraper

NugetScraper is an HTML parser that reads an HTML page retrieved from NuGet.org and provides a list containing NuGet packages released by a specified organization.

Example

A typical usage example would be as follows:

...
public final class Demo {

    public static void run(String organizationName)
            throws IOException, InterruptedException {
        var urlBase = "https://www.nuget.org";
        var id = URLEncoder.encode(organizationName, StandardCharsets.UTF_8);
        var request = HttpRequest.newBuilder()
                .uri(URI.create(urlBase + "/profiles/" + id + "/"))
                .build();
        var client = HttpClient.newHttpClient();
        var response = client.send(request, HttpResponse.BodyHandlers.ofString());
        var profile = NugetScraper.toProfile(response.body());

        for (var i : profile.packageList()) {
            System.out.println(i.title() + ":" + i.totalDownloads());
        }
        var maybePath = profile.nextPageUrl();
        if (maybePath.isPresent()) {
            var path = maybePath.get();
            if (!path.startsWith("/")) {
                throw new IllegalStateException("unexpected URL: " + path);
            }
            var nextPageUrl = URI.create(urlBase + path);
            System.out.println("The next page URL: " + nextPageUrl);
        }
    }

    public static void main(String[] args) {
        if (args.length == 0) {
            System.out.println("usage: java com.example.Demo ID");
            return;
        }
        var organizationName = args[0];
        try {
            run(organizationName);
        } catch (IOException | InterruptedException e) {
            System.out.println("failed (ignored)");
        }
    }
}

In this example, the result of "java com.example.Demo Microsoft" (that represents the one of parsing https://nuget.org/profiles/Microsoft/) will be as follows:

Microsoft.Extensions.Primitives:3179750929
Microsoft.NETCore.Platforms:3174900796
Microsoft.Extensions.DependencyInjection.Abstractions:3114882542
System.Runtime.CompilerServices.Unsafe:2688797891
Microsoft.Extensions.Options:2661649028
Microsoft.Extensions.Logging.Abstractions:2638017167
Microsoft.Extensions.Configuration.Abstractions:2578117049
System.Diagnostics.DiagnosticSource:2574167053
System.Threading.Tasks.Extensions:2290236144
Microsoft.CSharp:2003285729
System.Buffers:1987825889
Microsoft.Extensions.DependencyInjection:1958339674
Microsoft.Extensions.Configuration:1916727034
Microsoft.Extensions.FileProviders.Abstractions:1844208793
Microsoft.Extensions.Logging:1837702027
System.Memory:1792564237
Microsoft.Extensions.Configuration.Binder:1668198620
System.Security.Principal.Windows:1637388455
Microsoft.NETCore.Targets:1590023964
System.Security.Cryptography.Cng:1539977932
The next page URL: https://www.nuget.org/profiles/Microsoft?page=2

Note that each number represents the total downloads at that time.

🚧 The structure of the HTML that nuget.org generates is subject to change. This parser will follow such changes in future releases.

API Reference

About

NugetScraper is an HTML parser that reads an HTML page retrieved from NuGet.org and provides a list containing NuGet packages released by a specified organization.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published