-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Selenium Grid Capability #1123
Conversation
Signed-off-by: Maimur Hasan <[email protected]>
missed a commit |
Signed-off-by: Maimur Hasan <[email protected]>
Signed-off-by: Maimur Hasan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @msghasan! Very kind of you to share your work with the community
A few initial comments
- Formatting a bit off, please run
mvn git-code-format:format-code -Dgcf.globPattern=**/*
prior to committing - Would be good to explain your approach in a comment in the PR + explain how you tested it
- remove config commented out Elastic/opensearch modules and add it to https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/resources/crawler-default.yaml instead
- remove change to URLFrontier which is unrelated
- Add tests, possibly extending what we recently added in https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/test/java/com/digitalpebble/stormcrawler/protocol/selenium/ProtocolTest.java
Signed-off-by: Maimur Hasan <[email protected]>
@jnioche |
Pull Request #1023 Explanation: Enhancing Selenium Grid Capability in Storm Crawler This pull request aims to improve the functionality of Selenium Grid within the Storm Crawler framework. The implementation involves incorporating a Selenium Grid web address for retrieving remote objects. The key components include two classes: SeleniumGrid.java and its corresponding implementation, SeleniumGridImpl.java. To manage the number of available browsers in the grid, a LinkedBlockingQueue is utilized. A dynamic check is performed every 5 minutes to ascertain the activity status of the driver object in the queue. We check the session id of the driver ojbect over the grid if it is available we keep the driver else we delete and create new drivers. The implementation employs a Holder POJO class to store timing information for a driver object and the corresponding driver object added to the queue. In the event of non-session-related exceptions, specifically those arising from website loading issues, the browser object is not discarded; instead, it is reused within the queue. Testing Approach: Docker image of Selenium Grid was executed. |
I think it does expose the grid. I ran the existing test, called [http://localhost:32779/ui] (the port number might be different) and got the Grid UI. It would have only one node running on it but still, worth using it to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement something similar to RemoteDriverProtocol
in order to test without running an actual SC crawl?
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
external/elasticsearch/archetype/src/main/resources/archetype-resources/crawler-conf.yaml
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridImpl.java
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
core/src/main/java/com/digitalpebble/stormcrawler/protocol/selenium/SeleniumGridProtocol.java
Show resolved
Hide resolved
Closing because of lack of activity. Can be reopen later if someone wants to pick it up |
Signed-off-by: Maimur Hasan [email protected]
Thanks for contributing to StormCrawler, your efforts are appreciated!
Developer Certificate of Origin
By contributing to StormCrawler, you accept and agree to the following terms and conditions (the Developer Certificate of Origin) for your present and future contributions submitted to StormCrawler.
Please refer to the Developer Certificate of Origin section in
CONTRIBUTING.md
for details.Before opening a PR, please check that:
Thanks!