Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload target filesystem, Override File Naming #142

Closed
alexfornuto opened this issue Dec 6, 2019 · 36 comments · Fixed by #328
Closed

Upload target filesystem, Override File Naming #142

alexfornuto opened this issue Dec 6, 2019 · 36 comments · Fixed by #328
Labels

Comments

@alexfornuto
Copy link

alexfornuto commented Dec 6, 2019

Thanks to #139 I can now handle the JSON and HTML reports locally without having to upload the results to an LHCI server (thanks for that!).

In order to programmatically handle the files generated, I'd like to be able to define a consistent pattern for the files. For example:

IF I run lchi collect --url=https://pantheon.io/docs --url=https://some.staging.environment.io/docs,
AND I define some extra parameter like --format=$URL-$PASS_NUM (just spit-balling on this)
THEN the contents of .lighthouseci would look like:

.lighthouseci
|- lhr-https://pantheon.io/docs-1.html
|- lhr-https://pantheon.io/docs-1.json
|- lhr-https://pantheon.io/docs-2.html
|- lhr-https://pantheon.io/docs-2.json
|- lhr-https://pantheon.io/docs-3.html
|- lhr-https://pantheon.io/docs-3.json
|- lhr-https://some.staging.environment.io/docs-1.html
|- lhr-https://some.staging.environment.io/docs-1.json
|- lhr-https://some.staging.environment.io/docs-2.html
|- lhr-https://some.staging.environment.io/docs-2.json
|- lhr-https://some.staging.environment.io/docs-3.html
|- lhr-https://some.staging.environment.io/docs-3.json

Not that the filename needs to contain the full URL, this is just an example where it uses a parameter already defined in another flag. Anything consistently predictable would suite my needs.

@jzabala
Copy link
Contributor

jzabala commented Dec 6, 2019

Hey @alexfornuto is the incremental number sequence on the files something that you are looking for too?

In your example I see you used a pattern: {url}-{sequential-number}, but a result from the lib looks more like this:

Screen Shot 2019-12-06 at 1 44 10 PM

Or based on the image the only thing you want to change is the prefix lhr-?

@alexfornuto
Copy link
Author

Sorry for not being more precise. The numbers in my example correlate to the three passes I've configured on my end. Since I know that there are three passes, those numbers are predictable, whereas the current timestamp format is not.

I don't care about the lhr prefix, and my example includes it. The main point here is that the timestamp data can't be predicted, and I'd like to override it.

@patrickhulce
Copy link
Collaborator

@alexfornuto we really don't want to encourage folks to read directly from the .lighthouseci folder because it's an internal implementation detail we don't intend on treating as a public API.

If this is something you'd really like to see, I think we treat this as a feature request for upload --target=filesystem --report-name-pattern="%url%-%date%.%extension%" or something

@patrickhulce patrickhulce added enhancement New feature or request P3 labels Dec 6, 2019
@patrickhulce patrickhulce changed the title Feature Request - Override Default File Naming Upload target filesystem, Override Default File Naming Dec 6, 2019
@patrickhulce patrickhulce changed the title Upload target filesystem, Override Default File Naming Upload target filesystem, Override File Naming Dec 6, 2019
@jzabala
Copy link
Contributor

jzabala commented Dec 6, 2019

@alexfornuto we really don't want to encourage folks to read directly from the .lighthouseci folder because it's an internal implementation detail we don't intend on treating as a public API.

If this is something you'd really like to see, I think we treat this as a feature request for upload --target=filesystem --report-name-pattern="%url%-%date%.%extension%" or something

@patrickhulce that --target=filesystem option would probably be a better solution that writing the html files to the .lighthouseci folder as was introduced in #139. By writing the html's there, I suppose it gives the notion that you can use the content of the folder.

@jzabala
Copy link
Contributor

jzabala commented Dec 6, 2019

Sorry for not being more precise. The numbers in my example correlate to the three passes I've configured on my end. Since I know that there are three passes, those numbers are predictable, whereas the current timestamp format is not.

I don't care about the lhr prefix, and my example includes it. The main point here is that the timestamp data can't be predicted, and I'd like to override it.

Thanks for the detailed response @alexfornuto 🙂

@patrickhulce
Copy link
Collaborator

Haha, it most certainly is the better option and sounds like something we'll eventually need to tackle. I was looking for a quick way to unblock you @alexfornuto, but I suppose if you give a mouse a cookie... 😉

For now I view this as pretty low priority because all of the information is already there to be able to rename the files yourself (by parsing JSON), so a naming pattern is more convenience than new power (unlike #139).

@alexfornuto
Copy link
Author

Welp, the can of worms has been opened now. Sorry!

... all of the information is already there to be able to rename the files yourself (by parsing JSON)

Forgive my ignorance. I know that this must be the case, but I don't know how. If you could give me a brief example of how one would do this, I could probably unblock myself from there.

@patrickhulce
Copy link
Collaborator

Ah gotcha, no worries! Each of the .json files are a Lighthouse Result (LHR, for short), so they're structured like any regular Lighthouse result is structured.

You can pluck out .finalUrl or .fetchTime out of lhr-12345.json and use it to decide how to name lhr-12345.html :)

@alexfornuto
Copy link
Author

OK, thanks! That definitely helps. But in order to build a script that plucked data out of those files, I'd need to know what they would be named, right? It seems like a bit of a catch-22 to me.

@paulirish
Copy link
Member

start by reading all *.json files in the folder?

@alexfornuto
Copy link
Author

@paulirish indeed - I suppose this is where my own limitations as a technical writer forced to dev his own platform are catching up to me. I'll go away now and see if I can't figure out globbing filenames in JS.

@jzabala
Copy link
Contributor

jzabala commented Dec 7, 2019

hey @patrickhulce, just to clarify this issue:

  1. To add a filesystem choice for the target option.
  2. To add a --report-name-pattern required given the target=filesystem.
  3. The --report-name-pattern is a string that will replace the special characters:
    • %url%: url or file name (for static dir)
    • %date%: timestamp
    • %index%: the index number
  4. Will override files if they exists.

Questions:

  • Will --report-name-pattern include the full path or only the filename and another option will be created to provide the dir name?

  • In case the dir doesn't exist do we create it?

@patrickhulce
Copy link
Collaborator

These questions raise good points @jzabala in that I don't think we should pursue this right now until we have more than one potential consumer to iron out what the API should look like :)

@jzabala
Copy link
Contributor

jzabala commented Dec 7, 2019

These questions raise good points @jzabala in that I don't think we should pursue this right now until we have more than one potential consumer to iron out what the API should look like :)

Awesome @patrickhulce. If this issue gets a couple more +1, I wouldn't mind on returning the discussion about the API design and implementing it 👍🏼

@muratgozel
Copy link

I think this is the most important issue in this library. I would like to automate reading the results easily.

The only reasonable approach that comes to my mind right now is

  1. To read the contents of the directory .lighthouseci,
  2. Sort *.json files by creation date,
  3. Get the first *.json file content,
  4. Read scores.

That's too much and it actually would be another automation script. There may be a map between input URLs and output. The library may continue to create files as it wants in .lighthouseci folder but may give us a map file between URLs and scores of the last run.

@patrickhulce
Copy link
Collaborator

To read the contents of the directory .lighthouseci,
Sort *.json files by creation date,
Get the first *.json file content,
Read scores.

Can I ask what you're doing with the scores? There might be a higher level function we can automate within lighthouse-ci, and I'm not sure that a filesystem dump will help you here. You'll still have to read the directory to get the list of reports, do some parsing to find the one you're interested in, and then read the scores from the JSON whether this feature is implemented or not.

may give us a map file between URLs and scores of the last run.

I'm not sure I understand what you mean by this. Do you just want a summary JSON file that looks like

{
  "https://example.com/page1": [
    {"performance": 90, "pwa": 32},
    {"performance": 91, "pwa": 32},
    {"performance": 89, "pwa": 32},
  ],
  "https://example.com/page2": [
    // ...
  ]
}

@muratgozel
Copy link

I have a system that sends an email that contains a changelog, to the client for each deploy. I have started to think about attaching the scores of performance, accessibility etc. to the bottom of the email when I saw this tool.

The folder .lighthouseci is like a database for this tool I understand but then we need an API to access certain properties inside it.

The JSON summary you wrote works for me but one still may need to access more properties.

The way I'm going to use it will probably be like: (after parsing, inside an email)

Performance Scores Report

Homepage
Performance: 91  |  Accessibility: 92  |  Best Practices: 93  |  SEO: 94

Product Pages
Performance: 91  |  Accessibility: 92  |  Best Practices: 93  |  SEO: 94

Newly come to my mind that we may specify an identifier with the run, it may be a version number (1.3.44) or some other identifier and then we can access to the results in JSON format:

lhci autorun --id=1.3.44
# later on
lhci report --id=1.3.44
# and we have a JSON file: lhci-1.3.44.json :)

This would be better from the previous mapping idea.

@patrickhulce
Copy link
Collaborator

The folder .lighthouseci is like a database for this tool I understand but then we need an API to access certain properties inside it.

Ah, there's a slight disconnect here, .lighthouseci is the temporary storage folder for the reports from the most recent run until they get uploaded. It is cleared out every time you invoke lhci collect unless you use the --additive flag. It's just meant to store results for a single build, not a long-term database of reports.

As for your use case, I'm not sure there's much in a --target=filesystem that will significantly change the amount of work you need to do for that email. No matter what, we'd be saving each full report in its own file, so it'd still require inspecting the directory and/or parsing JSON to determine which report is the "homepage"/"product page"/etc that you need to link to. At best, I suppose a manifest replaces the fs.readdirSync('./dir') call with a require('./dir/manifest.json')? Perhaps you have a proposed data format for the manifest that would make your case easier, yet still generally applicable?

@muratgozel
Copy link

Yes. A manifest may save us from detecting the filenames of the report and it would be more sustainable. I'm not much interested with the --target=filesystem option actually. But the manifest idea can resolve both of our demands, I think.

@pahan35
Copy link
Contributor

pahan35 commented Jan 29, 2020

@patrickhulce could you please specify expected API to be able to help you with it?

@patrickhulce
Copy link
Collaborator

patrickhulce commented Jan 29, 2020

I'm currently thinking that the usage looks something like this...

lhci upload --target=filesystem --output-dir=./path/to/dir --report-filename-pattern="%%ORIGIN%%-%%PATHNAME%%-%%DATE%%.report.%%EXTENSION%%"

and it would output the median LHR reports into ./path/to/dir like

lhci-manifest.json
localhost_8000-path_to_page-2020_01_30_15_12_12.report.html
localhost_8000-path_to_page-2020_01_30_15_12_12.report.json

contents of lhci-manifest.json would be

[
  {
    "url": "http://localhost:8080/path/to/page",
    "representative": true,
    "extension": "html",
    "filePath": "./localhost_8000-path_to_page-2020_01_30_15_12_12.report.html",
    "summary": {"performance": 0.52, "accessibility": 0.79, "seo": 1, "best-practices": 0.23}
  },
  // ...
]

eventually support options like --include-all-reports which would add %%INDEX%% and copy the non-median reports too.

Any thoughts on this API folks?

@pahan35
Copy link
Contributor

pahan35 commented Jan 29, 2020

What if we replace remove extension from manifest and save here paths to HTML report and JSON data?

Also, with --include-all-reports I'd like to see those reports inside a property of representative record like

[
  {
    "url": "http://localhost:8080/path/to/page",
    "representative": true,
    "jsonPath": "./localhost_8000-path_to_page-2020_01_30_15_12_12.report.json,
    "reportPath": "./localhost_8000-path_to_page-2020_01_30_15_12_12.report.html",
    "summary": {"performance": 0.52, "accessibility": 0.79, "seo": 1, "best-practices": 0.23}
    // this is here only if  --include-all-reports
    "reports": [
      {
        "jsonPath": "path/to/report.json",
        "reportPath": "path/to/report.html",
        "summary": {"performance": 0.52, "accessibility": 0.79, "seo": 1, "best-practices": 0.23}
      }
    ]
  },
  // ...
]

Also, it would be nice to be able to use %%BRANCH_NAME%% inside file pattern to see it clearly from which branch each specific report is when sharing it out of PR context: discuss on a meeting, etc

@staceytay
Copy link

Are there any updates on this?

I'm working on running Lighthouse on each app release using https://github.com/treosh/lighthouse-ci-action. I would like to have the contents of lhci-manifest.json as output for that GH Action (as suggested by @patrickhulce in treosh/lighthouse-ci-action#41 (comment)) so that I can then send out the representative scores on each run. It seems that this issue is blocking treosh/lighthouse-ci-action#41.

I don't mind creating a PR for this too, but I've not very familiar with this project and might need some guidance if that's ok 🙂

@patrickhulce
Copy link
Collaborator

patrickhulce commented May 12, 2020

Thanks for the offer @staceytay! This is a slightly larger unscoped effort that has some interplay with breaking changes in #119 though, so I'm not sure it's a great candidate for external PR help at the moment.

FWIW, all of the data is available for the GH action to build this output themselves. If we just agree on an interface for the manifest then there's no need for this to block that issue.

How does everyone feel about...

interface EntrySummary {
  performance: number // all category scores on 0-1 scale
  accessibility: number
  'best-practices': number
  seo: number
  pwa: number
}

interface Entry {
  url: string // finalUrl of the run
  isRepresentativeRun: boolean // whether it was the median run for the URL
  jsonPath: string
  htmlPath: string
  summary: EntrySummary
}

// Manifest contains an array of entries, 1 for each run. 
// Let's say all are included by default, skipping over the `--include-all-reports` bit.
type Manifest = Array<Entry>

// Alternative, we group the entries by url
type Manifest = Record<string, Array<Entry>> 

@staceytay
Copy link

@patrickhulce sure thing, thanks!

At least for the use case I'm thinking of, this interface works, with either Manifest shapes.

@alekseykulikov
Copy link

@patrickhulce +1 for the format as an array.

lighthouse-ci-action@v3 sets content from links.json and assertion-results.json as output to improve composition with other tools (treosh/lighthouse-ci-action@b2e4263).

It would be great to support something like reports.json on the LHCI level so that we could expose it directly.

It's possible to build this data manually, but it may be different from the future LHCI output, and the logic of isRepresentativeRun depends on the internals and is not available in .lighthouseci folder.

@patrickhulce
Copy link
Collaborator

patrickhulce commented May 25, 2020

+1 for the format as an array.

Interesting, I would've expected a vote in favor of the map to do things like output.reports['https://example.com'][0].summary.performance without any .find. I'm not even sure how that works in actions output though tbh

We can do the array, SGTM :)

@alexfornuto
Copy link
Author

alexfornuto commented May 27, 2020

Unfortunately, #328 doesn't solve one of the main parts of this feature request, dictating the file naming. My job building CI processes to format this data is made harder by not having predictable and repeatable names for the the data files.

But thanks for the improvement!

EDIT:
P.S.: Unless manifest.json solves that. Is that what your example JSON is showing?

@patrickhulce
Copy link
Collaborator

dictating the file naming

That's what reportFilenamePattern is for.

--reportFilenamePattern [filesystem only] The pattern to use for naming Lighthouse reports.

manifest.json also lists all the reports and paths to them. I'm not sure how it can get any more configurable and predictable 😅

@alexfornuto
Copy link
Author

Oh, excellent! --reportFilenamePattern wasn't in the PR description, and I hadn't yet dived in to the files changed.

👍

@patrickhulce
Copy link
Collaborator

Quick poll on the expected behavior if manifest.json already exists in the target folder, should LHCI...

🚀 overwrite it
😕 throw an error and bail

@alexfornuto
Copy link
Author

alexfornuto commented May 27, 2020

What about:
🎶 each object includes a date key, and subsequent writes to manifest.json amended
?

@patrickhulce
Copy link
Collaborator

each object includes a date key, and subsequent writes to manifest.json amended

I don't want to assume that manifest.json is owned by us or encourage the idea that you're keeping some sort of database of all runs on local filesystem. If that's the direction I think we should throw :)

@alexfornuto
Copy link
Author

I don't want to ... encourage the idea that you're keeping some sort of database of all runs on local filesystem. If that's the direction I think we should throw :)

For more context: I'm not keeping any data longer than a CI job. I know that this is against Lighthouse best practice or whatever, but I need to run the test a few times to pull out outliers and then average scores. The tests can be wildly unpredictable and inaccurate sometimes. I'm trying to mitigate that within the confines of a CI job reporting the affects of a code/content change from a single PR.

Having said all that, I've voted for 🚀 as my second choice.

@patrickhulce
Copy link
Collaborator

Gotcha thanks for the extra context!

I need to run the test a few times to pull out outliers and then average scores

I'm very interested to hear why increasing --numberOfRuns doesn't satisfy your use case. What you're describing is exactly what Lighthouse CI is trying to do for you already, so if there are missing features there we want to plug them. Best to discuss on a separate issue though.

@alexfornuto
Copy link
Author

I'm very interested to hear why increasing --numberOfRuns doesn't satisfy your use case.

I'll be honest, I haven't looked at my CI testing since December, when I made this issue. When I can look at it again, I'll look into --numberOfRuns and make another issue if need me. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants