Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

ANY23-321 Add openie toggle functionality to service #56

Merged
merged 18 commits into from
Feb 28, 2018

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jan 3, 2018

This issue is a large step towards addressing https://issues.apache.org/jira/browse/ANY23-321
I am however not able to currently register the OpenIEExtractor within the ExtractorGroup/Factory when undertaking an extractor. I've debugged this down to the code in the URLClassLoader.addURL of Any23PluginManager. The OpenIE JAR's are dynamically loaded however the Extractor implementation does not seem to be registered when the extraction is executed.

@ansell if you are able to pull this code and debug it would be greatly appreciated. I have specific lines for debugging if this would be helpful. Thank you in advance for any assistance here.

P.S. you will also see that I've been making attempts to update the plugin documentation.

P.P.S I actually remember encountering a similar issue previously when attempting to register a plugin via the command line... I think dynamic ClassLoading is broken in Any23 right now. I am keen to fix it so any help here is appreciated folks.

@lewismc
Copy link
Member Author

lewismc commented Feb 3, 2018

@HansBrende @jgrzebyta can you both please have a look at dynamic plugin loading in this patch? I am at a loss, having tried to debug this with no progress. Thanks in advance,

@HansBrende
Copy link
Member

@lewismc you said you had some specific lines for debugging?

@lewismc
Copy link
Member Author

lewismc commented Feb 3, 2018

Thanks @HansBrende :)
Once you have built the Webservice and deployed to Tomcat, you should make sure that you check the openie checkbox in the GUI.
You can set a breakpoint at the openie conditional toggle on in Servlet.java which will enable you to see that the parameter is being received on server side. From here, you need to look into how the Any23PluginManager does dynamic classloading.
Let me know how you get on and I can provide a bit more context as to how the directory openie structure works for dynamic loading.
Thanks in advance,

@HansBrende
Copy link
Member

@lewismc I really have no idea how to deploy to Tomcat, being a bit of a noob, but have you tried doing something like this in Servlet.java?

if (loadedJars) { 
    ExtractorRegistry r = ExtractorRegistryImpl.getInstance();
    pManager.getExtractors().forEachRemaining(r::register);
    ...
}

@jgrzebyta
Copy link
Contributor

jgrzebyta commented Feb 5, 2018

@lewismc

I have: Failed to execute goal on project apache-any23-service: Could not resolve dependencies for project org.apache.any23:apache-any23-service:war:2.3-SNAPSHOT: Failure to find org.allenai.openie:openie_2.11:jar:4.2.6 in https://svn.apache.org/repos/asf/any23/repo-ext/ was cached in the local repository, resolution will not be reattempted until the update interval of any23-repository-external has elapsed or updates are forced.

IMHO it shouldn't be said in the apache-any23-service pom that the dependency is provided. Provided means it comes together with tomcat. It did not help. However I found this jar in my local repo.

@HansBrende did you had that problem as well? I have it also when I exec mvn in command line.

@HansBrende
Copy link
Member

HansBrende commented Feb 5, 2018

@jgrzebyta Yes, I had that problem too. Mvn build fails, but that's not related to the classloader problem @lewismc is having (which my comment above should fix--the extractor registry automatically registers classes only via the system classloader, so extractors that are loaded by other means have to be explicitly registered).

@HansBrende
Copy link
Member

@lewismc have you been able to test whether or not my suggested insertion fixes your problem?

@lewismc
Copy link
Member Author

lewismc commented Feb 8, 2018

@HansBrende not yet sorry, I have been overloaded :( I will try tonight.

@ansell
Copy link
Member

ansell commented Feb 8, 2018

I don't have time right now to debug this or work on its design. It isn't a pattern I have worked on in the past. All ServiceLoader projects that I have worked on in the past either have all of the service Factory objects initialised as startup, or they do periodic scans, but always create and initialise the service Factory objects whenever they are discovered.

Is this issue attempting to delay/extract the OpenIE initialisation by delaying/postponing the Factory initialisation? If so, the focus could possibly be on making the OpenIE initialisation lazy to only occur when the Factory is actually required/used to create an object instance.

A more configurable lifecycle possibly requires OSGi, which I again have absolutely no experience with, but from the little understanding each time I look, it appears to allow user driven initialisation (push rather than pull, if that makes sense).

@lewismc
Copy link
Member Author

lewismc commented Feb 23, 2018

Hi @HansBrende your suggestion fixed the code so I have pushed for further review.
@ansell thank you for the suggestions I think you are absolutely right... I've posted a question on users@maven along with my followup so hopefully I can keep the service pom clean by obtaining the correct solution.
I'll post here once I know. Thanks all.

@lewismc
Copy link
Member Author

lewismc commented Feb 23, 2018

I've messed this PR up now so I will close and reopen another once I have the dependency issue resolved. I'll also provide an update to the documentation as to how dynamic classloading is done per example.

@lewismc
Copy link
Member Author

lewismc commented Feb 24, 2018

OK folks, so this issue is now available for testing. I can see two immediate issues

  1. OpenIE is memory-intensive and can prety easily exhaust >6GB memory for larger jobs. This is an issue if we are running this as a service via any23.org,
  2. There is a considerable amount of bloat on the resulting service artifacts e.g. WAR, tarballs, zip files, etc. See below
lmcgibbn@LMC-056430 /usr/local/any23/service/target(ANY23-321) $ ls -lh
total 13499344
drwxr-xr-x  5 lmcgibbn  wheel   170B Feb 23 17:47 apache-any23-service-2.2-SNAPSHOT
-rw-r--r--  1 lmcgibbn  wheel   851M Feb 23 17:50 apache-any23-service-2.2-SNAPSHOT-server-embedded.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   851M Feb 23 17:50 apache-any23-service-2.2-SNAPSHOT-server-embedded.zip
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:48 apache-any23-service-2.2-SNAPSHOT-with-deps.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-with-deps.zip
-rw-r--r--  1 lmcgibbn  wheel   784M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-without-deps.tar.gz
-rw-r--r--  1 lmcgibbn  wheel   783M Feb 23 17:48 apache-any23-service-2.2-SNAPSHOT-without-deps.war
-rw-r--r--  1 lmcgibbn  wheel   784M Feb 23 17:49 apache-any23-service-2.2-SNAPSHOT-without-deps.zip
-rw-r--r--  1 lmcgibbn  wheel   846M Feb 23 17:47 apache-any23-service-2.2-SNAPSHOT.war
drwxr-xr-x  2 lmcgibbn  wheel    68B Feb 23 17:47 archive-tmp
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 classes
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 generated-sources
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 generated-test-sources
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 maven-archiver
drwxr-xr-x  3 lmcgibbn  wheel   102B Feb 23 17:47 maven-status
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 test-classes
drwxr-xr-x  4 lmcgibbn  wheel   136B Feb 23 17:47 war-legals

@lewismc
Copy link
Member Author

lewismc commented Feb 27, 2018

Any comments here folks?

@HansBrende
Copy link
Member

I mean, I'm not sure what kind of resources you want your webservice to be using, but 6GB memory sounds like a blocker to me. Using OpenIE with the core api would be opt-in only, right?

@lewismc
Copy link
Member Author

lewismc commented Feb 27, 2018 via email

@HansBrende
Copy link
Member

I haven't tried it locally, but I agree it's pretty cool functionality and would aptly demonstrate that plugin-loading works.

My main concern, if this will be deployed to any23.org, would be the RAM usage.

One solution might be to add a configuration flag that either enables or disables OpenIE processing (the OpenIE toggle would only be displayed if the flag were enabled)--and disable the flag by default. (Unless whoever's paying for the any23.org web service doesn't mind each request using 6 GB--in that case I guess it wouldn't matter.)

@HansBrende
Copy link
Member

HansBrende commented Feb 27, 2018

(The flag could even be read at the build-level: for example, you might add a build step that reads your configuration file and modifies the HTML to show/enable or hide/disable the toggle according to whether or not the OpenIE flag is enabled--so as not to make an unnecessary web request. Of course on the server side, you could just read the config file in on the fly--and respond with an error if the user requested OpenIE, but it was disabled in the config. I personally have no experience with building complex build tasks, so I have no idea whether or not this is feasible, or even a good idea. But just a thought.)

@lewismc
Copy link
Member Author

lewismc commented Feb 27, 2018

My main concern, if this will be deployed to any23.org, would be the RAM usage.

We can always request more RAM for the dedicated VM powering any23.org

One solution might be to add a configuration flag that either enables or disables OpenIE processing (the OpenIE toggle would only be displayed if the flag were enabled)--and disable the flag by default.

This is exactly whats provided in the patch. I guessed the it should be disabled by default. You can see examples of the toggle functionality in the snapshot's below

screen shot 2018-02-26 at 10 00 44 pm

Note the new OpenIE toggle box above

screen shot 2018-02-26 at 10 00 59 pm

The above information is displayed when you click on the '?' icon.

@HansBrende
Copy link
Member

HansBrende commented Feb 27, 2018

@lewismc my suggestion was to provide a config flag in addition to the existing checkbox (the difference being: the config flag would not be controlled by the user accessing the webpage--but by the server itself, in order to specify whether or not the server allows the user to switch OpenIE "on".)

(It would have been just a way to leave the plugin-loading code there so that people could look it over and test it out locally by changing the config flag, while not actually allowing them to do so on the any23 webservice.)

But if the RAM usage is not a problem for any23.org, then there's no need for it.

@HansBrende
Copy link
Member

HansBrende commented Feb 27, 2018

In lieu of the config flag, it might be nice to just handle OutOfMemoryErrors gracefully. Maybe something like this in OpenIEExtractor.run()?

} catch (OutOfMemoryError e) {
    //let the gc do its thang
    openIE = null;
    out.notifyIssue(IssueReport.IssueLevel.FATAL, "Not enough memory available to perform OpenIE extraction.", -1, -1);
    return;
}

In any case, if any23.org can handle 6 GB of memory per instance (or rather, 10 GB, the -Xmx value recommended by the OpenIE readme), then I say +1. Cool functionality, and it would be great to be able to use it in the any23 webservice!

@HansBrende
Copy link
Member

HansBrende commented Feb 27, 2018

On a slightly tangential topic, according to this github page, the current version of OpenIE is OpenIE 5, located here, whereas it looks like we are using OpenIE 4.2.6.

@lewismc
Copy link
Member Author

lewismc commented Feb 27, 2018 via email

@HansBrende
Copy link
Member

Ah, I see. Well, it may not be the end of the world if we stick with 4.2.6 for now--looks like that version requires less memory anyway.

What do you think about my OOM handling suggestion? I edited my comment to provide some sample code.

@lewismc
Copy link
Member Author

lewismc commented Feb 27, 2018

@HansBrende

What do you think about my OOM handling suggestion? I edited my comment to provide some sample code.

Excellent suggestion, I've accommodated it. I also agree that another setting, possibly within core Any23 Configuration would be good, however i think that this can be added if the exception handling is not sufficient. That would be a different issue.

@asfgit asfgit merged commit 71bf171 into apache:master Feb 28, 2018
@lewismc lewismc deleted the ANY23-321 branch February 28, 2018 04:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants