Skip to content

An open-source software for synthetic web-based user interface and content dataset generation. To cite this Original Software Publication: https://www.sciencedirect.com/science/article/pii/S2352711022000073

License

Notifications You must be signed in to change notification settings

ElsevierSoftwareX/SOFTX-D-20-00055

 
 

Repository files navigation

WebGenerator

Generate easily probabilistic dataset of web interfaces and content. The datasetter allows you to generate HTML files, their corresponding screenshots and a JSON file with the labeled HTML elements. This way you can train supervised and non-supervised models. You can also set probabilities and options generation of the batch to suit your needs.

Example 3

This development is kindly supported by the awesome SDAS Group.

Some selected examples

Example 1

Example 2

Example 3

A full dataset of 1000 elements with 800x600 size generated with the tool can be shown here and can be downloaded here. In this dataset you will find a folder with CSS, js, HTML files, image folders and JSON files. The html directory has html files rw prefix with the name (rw_0.html, row_1.html,.., row_n.html). Inside the CSS folder, the Bootstrap distribution file with the web page's color palette and another file with the necessary CSS rules for the sidebar and extra required styling. The js folder contains the needed JQuery and Bootstraps Javascript files.

Requirements

Browser and driver

The chrome driver allows Web Generator manage instances of the browser to take the screenshots and create tags annotations of the inner html elements.

  1. If you have a Chrome or Chromium browser installed you can skip this step. Otherwise you can download either a setup or a zip file with the software. In this case we recommend downloading Chromium from this builds website. You should select "Archive" (Zip folder) or Installer.
  2. Next you have to download the Chrome Driver from here. Make sure you have SAME VERSIONS for the driver and the browser. Once downloaded the driver, extract and put the file in your browser's executable folder. If you installed Chrome the path could be C:/Program Files/Google/Chrome/Application.

You can always check the official documentation of Selenium

Installation

Simply git clone this repository or download the zip folder:

git clone https://github.com/agsoto/webgenerator.git
cd webgenerator

Then install the dependencies

pip install -r requirements.txt

Since screen capturing feature depends on Selenium Driver, you should add the path to the system's enviroment variables. Look how to set your enviroment variables on Windows and Mac. Or if your'e using linux you can create a symbolic link: ln -s path-to-executable-driver chromedriver.

However if you don't want to add an eviroment variable, when using the class ScreenShutter, you can set the path to the driver this way:

ScreenShutter(driver_path="path-to-executable-driver")

This optional parameter could be set as it appears in line 18 of Main.py file.

Execution

There's a code example of the use of the generator in the Main.py file. Once you're all set just run:

python ./Main

Potential Applications

This dataset has a potential applications for will generate GUI web, here you will find three deep learning models examples.

  • GAN: To generate GUI web images through web generator images.
  • Fast RCNN: To detect components in web page's images.
  • Pix2Pix: To generate GUI web images through images's edges (canny mask).

GAN

Faster RCNN

Pix2Pix

Generation Probabilities

The parameters for the WebLayoutProbabilities object (that is used for the generation), are described below.

Param # Name Type Description
1 with_sidebar_p float Probability that the Sidebar is present
2 with_header_p float Probability that the Header is present
3 with_navbar_p float Probability that the Navbar is present
4 with_footer_p float Probability that the Footer is present
5 layouts_p list[4] List with the probabilities for each possible layout. The sum of the probabilities should be 1
6 boxed_body_p float Probability that the page's Body is boxed inside a container
7 big_header_p float Probability of having a big header (A big header is considered 50% or more of the screen height)
8 sidebar_first_p float Probability of the Sidebar being at the left side of the Body
9 navbar_first_p float Probability of the Navbar being above the header
10 bg_color_classes_p list[3] List with the probabilities for the combination of CSS Bootstrap's background color classes. The sum of the probabilities should be 1

About

An open-source software for synthetic web-based user interface and content dataset generation. To cite this Original Software Publication: https://www.sciencedirect.com/science/article/pii/S2352711022000073

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 69.0%
  • Jupyter Notebook 30.0%
  • Other 1.0%