Skip to content

scrapfly/fingerprint-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fingerprint Generator

A fast browser data generator that mimics actual traffic patterns in the wild. With extensive data coverage.

Created by daijro. Data provided by Scrapfly.


Features

  • Uses a Bayesian generative network to mimic real-world web traffic patterns
  • Extensive data coverage for nearly all known browser data points
  • Creates complete fingerprints in a few milliseconds ⚡
  • Easily specify custom criteria for any data point (e.g. "only Windows + Chrome, with Intel GPUs")
  • Simple for humans to use 🚀

Demo Video

Here is a demonstration of what fpgen generates & its ability to filter data points:

demo.mp4

Installation

Install the package using pip:

pip install fpgen

Downloading the model

Fetch the latest model:

fpgen fetch

This will be ran automatically on the first import, or every 5 weeks.

To decompress the model for faster generation (up to 10-50x faster!), run:

fpgen decompress

Note: This action will use an additional 100mb+ of storage.

CLI Usage
Usage: python -m fpgen [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  decompress  Decompress model files for speed efficiency (will take 100mb+)
  fetch       Fetch the latest model from GitHub
  recompress  Compress model files after running decompress
  remove      Remove all downloaded and/or extracted model files


Usage

Generate a fingerprint

Use the Generator object:

>>> import fpgen
>>> gen = fpgen.Generator()
>>> gen.generate()

See example output.


Filtering the output

Setting fingerprint criteria

You can narrow down generated fingerprints by specifying filters for any data field.

# Only generate fingerprints with Windows, Chrome, and Intel GPU:
>>> gen.generate(
...     os='Windows',
...     browser='Chrome',
...     gpu={'vendor': 'Google Inc. (Intel)'}
... )
This can also be passed as a dictionary.
>>> gen.generate({
...     'os': 'Windows',
...     'browser': 'Chrome',
...     'gpu': {'vendor': 'Google Inc. (Intel)'},
... })

Multiple constraints

Pass in multiple constraints for the generator to select from.

gen.generate({
    'os': ('Windows', 'Mac OS X'),
    'browser': ('Firefox', 'Chrome'),
})

If you are passing many nested constraints, run fpgen decompress to improve model performance.

Control the window size

Constrain the minimum/maximum width and height of the window:

bounds = fpgen.WindowBounds(
    min_width=100,
    max_width=1280,
    min_height=400,
    max_height=720,
)
gen.generate(window_bounds=bounds)

Control what data is generated

To generate specific data fields, use the target parameter with a string (or a list of strings).

Examples

Only generate HTTP headers:

>>> gen.generate(target='headers')
{'accept-language': 'uk-UA,uk;q=0.9,en-US;q=0.8,en;q=0.7', 'accept-encoding': 'gzip, deflate, br, zstd', 'accept': '*/*', 'priority': 'u=1, i', 'sec-ch-ua': '"Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"', 'sec-ch-ua-mobile': '?0', 'sec-ch-ua-platform': '"macOS"', 'sec-fetch-dest': 'empty', 'sec-fetch-mode': 'cors', 'sec-fetch-site': 'same-site', 'sec-gpc': None}

Generate a User-Agent for Windows & Chrome:

>>> gen.generate(
...     os='Windows',
...     browser='Chrome',
...     # Nested targets must be seperated by dots:
...     target='headers.user-agent'
... )
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0'

Generate a Firefox TLS fingerprint:

>>> gen.generate(
...     browser='Firefox',
...     target='network.tls.scrapfly_fp'
... )
{'version': '772', 'ch_ciphers': '4865-4867-4866-49195-49199-52393-52392-49196-49200-49162-49161-49171-49172-156-157-47-53', 'ch_extensions': '0-5-10-11-13-16-23-27-28-34-35-43-45-51-65037-65281', 'groups': '4588-29-23-24-25-256-257', 'points': '0', 'compression': '0', 'supported_versions': '772-771', 'supported_protocols': 'h2-http11', 'key_shares': '4588-29-23', 'psk': '1', 'signature_algs': '1027-1283-1539-2052-2053-2054-1025-1281-1537-515-513', 'early_data': '0'}

You can provide multiple targets as a list.


Query possible values

You can get a list of a target's possible values by passing it into fpgen.query:

List all possible browsers:

>>> fpgen.query('browser')
['Chrome', 'Edge', 'Firefox', 'Opera', 'Safari', 'Samsung Internet', 'Yandex Browser']

Passing a nested target:

>>> fpgen.query('navigator.maxTouchPoints') # Dot seperated path
[0, 1, 2, 5, 6, 9, 10, 17, 20, 40, 256]

Note

Since fpgen is trained on live data, queries may occasionally return invalid or anomalous values. These values will typically only appear in about 1 out of every 20,000 generations.


Custom fingerprint filters

You can manually omit possible values, then pass a new list into the generator:

# Get possible values for screen.width
values = fpgen.query('screen.width')

# Only allow values above 1000
def width_filter(width):
    return width > 1000
values = filter(width_filter, values)

# Pass in the new list of possible widths:
output = gen.generate(screen: {'width': values})

Generated data

Here is a rough list of the data fpgen can generate:

  • Browser data:
    • All navigator data
    • All mimetype data: Audio, video, media source, play types, PDF, etc
    • All window viewport data (position, inner/outer viewport sizes, toolbar & scrollbar sizes, etc)
    • All screen data
    • Supported & unsupported DRM modules
    • Memory heap limit
  • System data:
    • GPU data (vendor, renderer, WebGL/WebGL2, extensions, context attributes, parameters, shader precision formats, etc)
    • Battery data (charging, charging time, discharging time, level)
    • Screen size, color depth, taskbar size, etc.
    • Full fonts list
    • Cast receiver data
  • Network data:
    • HTTP headers
    • TLS fingerprint data
    • HTTP/2 fingerprint & frames
    • RTC video & audio capabilities, codecs, clock rates, mimetypes, header extensions, etc
  • Audio data:
    • Audio signal
    • All Audio API constants (AnalyserNode, BiquadFilterNode, DynamicsCompressorNode, OscillatorNode, etc)
  • Internationalization data:
    • Regional internationalization (Locale, calendar, numbering system, timezone, date format, etc)
    • Voices
  • And much more!

For a more complete list, see the full example output.