Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional sources (comment here to suggest another source) #23

Open
nicobrenner opened this issue Mar 9, 2024 · 8 comments
Open
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@nicobrenner
Copy link
Owner

  • workatastartup.com
  • public LinkedIn.com listings

========

Idea: create a p2p network of jobs - anyone in the network can post a job, the rest of the network gets the job listing and processes/matches it locally

This would allow that anyone that runs a scraper, can add listings to the network. How do we prevent it from getting flooded by a malicious actor (someone who wants to spam the network)? How do we curate the listings?

One way would be for the local nodes are responsible for filtering the things they receive. There could be filtering nodes as well, that individual nodes could subscribe to

What would be the incentive to post a job? Well all companies want to post job, so they would do it for free probably

What would be the incentive to filter the jobs? For the individual is easy, they don't want to get spammed with jobs

How about for the filtering nodes that want to act as a sort of aggregator and curator? Why would they do it?

That could also be LLM-based, but it would have a cost. Unless for example, the individuals contribute back their own filtered jobs

Maybe the nodes could signal to the network that their node made a match with the post, thus validating the usefulness of the post

Users could mark posts as spam, that would filter out that job, it would add the job to a list of spam that could help an LLM-filter, and it would also mark the poster node as spamming (black list it)

@nicobrenner
Copy link
Owner Author

As suggested here: https://news.ycombinator.com/item?id=39624542 and here: https://news.ycombinator.com/item?id=39692590

Add support for scraping workdays' career portals, eg (from: https://news.ycombinator.com/item?id=39692590) Here's some with tech jobs listed:
https://workday.wd5.myworkdayjobs.com/Workday https://shipt.wd1.myworkdayjobs.com/en-US/Shipt_External
It doesn't seem like there's a global listing across myworkdayjobs.com sites - each company using it has its own sub-domain. Which sort of makes sense - Workday is an internal HR/employee portal tool licensed to companies.

@mbafford
Copy link

mbafford commented Mar 13, 2024

Another is greenhouse.io:

https://boards.greenhouse.io/remotecom
https://boards.greenhouse.io/notion

Same as workday, there doesn't seem to be a public master list - each company lists their own jobs on their own sub-folder.

@nicobrenner nicobrenner self-assigned this Mar 14, 2024
@nicobrenner nicobrenner added enhancement New feature or request help wanted Extra attention is needed labels Mar 14, 2024
@nicobrenner
Copy link
Owner Author

@christosgousis and @GiorgosPapageorgiou contributed a new source, workatastartup.com/jobs in this excellent PR: #66

It's a good example for adding new sources. I'm also realizing a whole "sources" section might be necessary in the future

@noameron
Copy link
Contributor

Hi @nicobrenner, first thing I wanted to say that project looks great!
I thought of a very similar idea and encountered in your project.

Is this project still alive and maintained?

@nicobrenner
Copy link
Owner Author

Hi @noameron this project is still alive, but it definitely needs some more attention. Is there anything in particular you are interested in? Right now, the most helpful and useful feature would be for it to automatically update its Hacker News sources, so that it always gets the latest one

@noameron
Copy link
Contributor

noameron commented Nov 4, 2024

@nicobrenner Hi :) I thought of adding a new source as well, I have a generic workday scraper that I need to adjust abit but its working as it is atm.

I will be happy to modify hn scraper as well, and maybe add sqlAlchemy as the db handler?

wdyt?

@nicobrenner
Copy link
Owner Author

@noameron it would be amazing, feel free to create a PR. Do you need any pointers on how to get started? Check out PR #66 mentioned above and let me know. I'll start keeping a closer eye on this and be more responsive ;)

@noameron
Copy link
Contributor

hey @nicobrenner I opened this PR but its still not ready since i am having issues building docker, do you have an idea how to solve this? I am not sure how to download chrome and chromeDriver successfully.

This is th eucrrent error I am getting:
Traceback (most recent call last): commandjobs | File "/commandjobs/src/menu.py", line 17, in <module> commandjobs | from job_scraper.workday.scraper import WorkdayScraper commandjobs | File "/commandjobs/job_scraper/workday/scraper.py", line 137, in <module> commandjobs | scraper = WorkdayScraper() commandjobs | ^^^^^^^^^^^^^^^^ commandjobs | File "/commandjobs/job_scraper/workday/scraper.py", line 17, in __init__ commandjobs | self.driver = webdriver.Chrome(options=self.get_selenium_configs()) commandjobs | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ commandjobs | File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chrome/webdriver.py", line 45, in __init__ commandjobs | super().__init__( commandjobs | File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/chromium/webdriver.py", line 50, in __init__ commandjobs | if finder.get_browser_path(): commandjobs | ^^^^^^^^^^^^^^^^^^^^^^^^^ commandjobs | File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/common/driver_finder.py", line 47, in get_browser_path commandjobs | return self._binary_paths()["browser_path"] commandjobs | ^^^^^^^^^^^^^^^^^^^^ commandjobs | File "/usr/local/lib/python3.12/site-packages/selenium/webdriver/common/driver_finder.py", line 78, in _binary_paths commandjobs | raise NoSuchDriverException(msg) from err commandjobs | selenium.common.exceptions.NoSuchDriverException: Message: Unable to obtain driver for chrome; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location commandjobs | commandjobs exited with code 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants