Skip to content

Commit

Permalink
fix: web-scraping blog minor code adjustments (#30)
Browse files Browse the repository at this point in the history
  • Loading branch information
llorenspujol authored Oct 21, 2023
1 parent 644c5c0 commit 2d8282a
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 22 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ Aquesta és la part central d'aquest blog, on ens submergim en el procés d'acc
Per accedir a les ofertes de feina de LinkedIn, necessitem construir una URL utilitzant la funció `urlQueryPage`:

```ts:src/linkedin.ts
export const urlQueryPage = (search: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${search.searchText}
&start=${search.nPage * 25}${search.locationText ? '&location=' + search.locationText : ''}`
export const urlQueryPage = (searchParams: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${searchParams.searchText}
&start=${searchParams.pageNumber * 25}${searchParams.locationText ? '&location=' + searchParams.locationText : ''}`
```

En aquest cas, ja he realitzat la investigació prèvia per trobar aquesta URL. El nostre objectiu és trobar una URL que puguem parametritzar amb els nostres paràmetres de cerca desitjats.
Expand Down Expand Up @@ -117,19 +117,24 @@ Amb la nostra URL identificada, podem procedir amb les dues accions principals r

```ts:src/linkedin.ts

export interface ScraperSearchParams {
searchText: string;
locationText: string;
pageNumber: number;
}

/** main function */
export function getJobsFromLinkedinPage(page: Page, searchParams): Observable<JobInterface[]> {
return defer(() => navigateToJobsPage(page, searchParams))
export function goToLinkedinJobsPageAndExtractJobs(page: Page, searchParams: ScraperSearchParams): Observable<JobInterface[]> {
return defer(() => fromPromise(navigateToJobsPage(page, searchParams)))
.pipe(switchMap(() => getJobsFromLinkedinPage(page)));
}

/* Utility functions */
export const urlQueryPage = (searchParams: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${searchParams.searchText}
&start=${searchParams.pageNumber * 25}${searchParams.locationText ? '&location=' + searchParams.locationText : ''}`

export const urlQueryPage = (search: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${search.searchText}
&start=${search.nPage * 25}${search.locationText ? '&location=' + search.locationText : ''}`

function navigateToJobsPage(page: Page, searchParams): Promise<Response | null> {
function navigateToJobsPage(page: Page, searchParams: ScraperSearchParams): Promise<Response | null> {
return page.goto(urlQueryPage(searchParams), { waitUntil: 'networkidle0' });
}

Expand Down Expand Up @@ -380,7 +385,7 @@ export function getJobsFromLinkedin(browser: Browser): Observable<ScraperResult>
const scrapeJobs = (page: Page): Observable<ScraperResult> =>
fromArray(searchParamsList).pipe(
concatMap(({ searchText, locationText }) =>
getJobsFromPageRecursive(page, { searchText, locationText, nPage: 0 })
getJobsFromAllPages(page, { searchText, locationText, pageNumber: 0 })
)
)

Expand Down
27 changes: 16 additions & 11 deletions blog/en/3_web-scraping-linkedin-jobs-using-puppeteer-and-rxjs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ This is the core part of this blog, where we dive into the process of accessing
To access LinkedIn's job listings, we need to construct a URL using the function `urlQueryPage`:

```ts:src/linkedin.ts
export const urlQueryPage = (search: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${search.searchText}
&start=${search.nPage * 25}${search.locationText ? '&location=' + search.locationText : ''}`
export const urlQueryPage = (searchParams: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${searchParams.searchText}
&start=${searchParams.pageNumber * 25}${searchParams.locationText ? '&location=' + searchParams.locationText : ''}`
```

In this case, I have already done the previous investigation to find this URL. Our objective is to find a URL that we can parameterize with our desired search parameters.
Expand Down Expand Up @@ -117,19 +117,24 @@ With our target URL identified, we can proceed with the two primary actions requ

```ts:src/linkedin.ts

export interface ScraperSearchParams {
searchText: string;
locationText: string;
pageNumber: number;
}

/** main function */
export function getJobsFromLinkedinPage(page: Page, searchParams): Observable<JobInterface[]> {
return defer(() => navigateToJobsPage(page, searchParams))
export function goToLinkedinJobsPageAndExtractJobs(page: Page, searchParams: ScraperSearchParams): Observable<JobInterface[]> {
return defer(() => fromPromise(navigateToJobsPage(page, searchParams)))
.pipe(switchMap(() => getJobsFromLinkedinPage(page)));
}

/* Utility functions */
export const urlQueryPage = (searchParams: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${searchParams.searchText}
&start=${searchParams.pageNumber * 25}${searchParams.locationText ? '&location=' + searchParams.locationText : ''}`

export const urlQueryPage = (search: ScraperSearchParams) =>
`https://linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=${search.searchText}
&start=${search.nPage * 25}${search.locationText ? '&location=' + search.locationText : ''}`

function navigateToJobsPage(page: Page, searchParams): Promise<Response | null> {
function navigateToJobsPage(page: Page, searchParams: ScraperSearchParams): Promise<Response | null> {
return page.goto(urlQueryPage(searchParams), { waitUntil: 'networkidle0' });
}

Expand Down Expand Up @@ -380,7 +385,7 @@ export function getJobsFromLinkedin(browser: Browser): Observable<ScraperResult>
const scrapeJobs = (page: Page): Observable<ScraperResult> =>
fromArray(searchParamsList).pipe(
concatMap(({ searchText, locationText }) =>
getJobsFromPageRecursive(page, { searchText, locationText, nPage: 0 })
getJobsFromAllPages(page, { searchText, locationText, pageNumber: 0 })
)
)

Expand Down

1 comment on commit 2d8282a

@vercel
Copy link

@vercel vercel bot commented on 2d8282a Oct 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.