-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Integrate browserforge fingerprints #829
base: master
Are you sure you want to change the base?
Conversation
TODO: Tests, JS page function for injecting fingerprint json.
Todo: Make it work with page.add_init_script
Added dev test TODO: Add proper tests Sync header generation with fingerprint generation (use gen from browserforge)
Use browser_pool_options from crawler to pass fingerprint related stuff to be similar to JS
Good job! Why don't we use browserforge's |
I was not aware of that. Wow, that guy did everything. I will try to use that injector, |
@@ -152,8 +143,45 @@ async def _create_browser_context(self, proxy_info: ProxyInfo | None = None) -> | |||
else None | |||
) | |||
|
|||
return await self._browser.new_context( | |||
if self._use_fingerprints: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This can be simpler if we decide to rely purely on browserforge and remove our own header generation. )
Draft adapter and basic test.
TODO: Solve circular imports
Remove types unused in this change. Format, lint, type check.
7bd985c
to
25aa4e2
Compare
(Hint about browserforge being just implementation detail and not core functionality.)
Why not expose the entire class directly, instead of creating an additional one just for the options and exposing that? fingerprint_generator = FingerprintGenerator(
header_generator=HeaderGenerator(...),
screen=Screen(...),
)
crawler = PlaywrightCrawler(
headless=True,
fingerprint_generator=fingerprint_generator,
)
I would completely remove our implementation of |
At least for the HeaderGenaratorOptions that I think is necessary to avoid users mixing various header generators with fingerprint generators. HeaderGenerator is internal to FingerprintGenerator to ensure that generated fingerprint and headers are consistent. FingerprintGeneratorOptions can be removed and the inputs defined in abstract class instead. |
I agree, but that step I would rather take in separate follow up PR to not make this PR too large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this part of the codebase that well, so I found just some minor design nitpicks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I'd prefer to wait for @vdusek's approval as well
Do I understand it correctly that it will not require any changes on the interface? Also, could you open an issue for it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be documented later by #481 , which already has decent PR #576
I would prefer to include the documentation along with this PR. It could provide a clearer understanding and demonstrate how it is being used. This is one of the reasons we keep the documentation in the same repository as the source code.
Otherwise it's great.
from browserforge.fingerprints import Fingerprint | ||
|
||
|
||
class FingerprintGenerator(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class docstring? It should explain what it is and what it is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
from ._types import HeaderGeneratorOptions, ScreenOptions | ||
|
||
|
||
class BrowserforgeFingerprintGenerator(FingerprintGenerator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing class docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Co-authored-by: Vlada Dusek <[email protected]>
Yes, I think it should be without any interface changes. |
(To avoid some import race conditions downloads when running pytest with multiple processes)
I added example code + page. I guess I should also make Python variant of this doc in other repo: https://docs.apify.com/academy/anti-scraping/mitigation/generating-fingerprints |
Description
Integrate
browserforge
package and use it's fingerprint and header generation capabilities to enable fingerprint generation in PlaywrightCrawlerHow can users use it?
Closely following JS implementation would lead to:
I suggest this instead:
Reasoning:
use_fingerprints
andfingerprint_generator_options
. Instead have argumentfingerprint_generator
that is implicit foruse_fingerprints
, it contains all thefingerprint_generator_options
and it also allows custom implementations of generator to be passed.DefaultFingerprintGenerator
is very convenient for us, as we can set it tobrowserforge adapter
for now and change it later.Currently we do not have our own injector, so we have to use browser forge injector. That means that temporarily we have to accept
Fingerprint
definition frombrowserforge
as output that is expected fromAbstractFingerprintGenerator.generate()
.It is explicitly mentioned there as something that is subject to change.
Why doing it like this?
This change is already usable as it is and brings most value now. Common user can use some sort of fingerprint generation offered by
browserforge
.Adding our own type
Fingerprint
and our own injector is some additional effort and has almost no value for external users and can be part of future refactoring. This refactoring would affect only very advanced users that created their own custom fingerprint generator (quite advanced for experimental feature that is not yet even available. So purely theoretical.)Issues