-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow packages without repository #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a minor refactoring suggestion, but this looks good to me. I will defer to @eridal for a final approval of this.
Updated cache to store packages without repository as well. |
This looks good to me. What do you think, @eridal? |
I have a few concerns with this proposal: The metadata is not handled correctly, and over time the collected stats will deviate from the packages. Currently the sum of repos in the metadata is exactly the number of packages: const packages = require('./data/packages.json')
const metadata = require('./data/metadata.json')
Object.keys(packages).length === Object.values(metadata.repos).reduce((a, b) => a + b, 0) There are a few changes required to get those numbers correctly:
This new feature is quite tricky as there are multiple cases to take into consideration, like a repo that used to have an url but now is null, or the other way around. |
@eridal thanks for your review and suggestions. But I didn't get this one:
Isn't it already return a string URL and we parse/validate it later? I don't see what do you want to change in it. |
It could be easier to implement this if we have tests for it. I already have tests that cover the update process and caching but as I mentioned before it requires some significant changes to the The biggest issue I had is the self-invoking anonymous function which makes it almost impossible to test because of Currently, I use a workaround to prevent the script from immediate execution like: module.exports = update
if (process.env.NODE_ENV !== 'test') {
/**
* Main
*/
update().then(
() => {
console.log(metadata)
process.exit(0)
},
(err) => {
console.error(err)
process.exit(1)
}
)
} The better way IMO is to split it and move all the update logic to a separate file. @eridal I can push my tests here and we will work on edge cases/new requirements as it seems this PR will never get finished without it. |
👍🏼 |
# Conflicts: # package-lock.json # package.json # scripts/stats.js
@zarianec I owe you an apology. You have been putting a good amount of efforts into this feature. It was easy for me to quickly hack a solution, submit that PR, but doing so disregarded your work in the process. I'm sorry about it and I'm gonna take action to fix that. This is open source, the code I wrote is from the community, and the more people involved the better; so I'd like you to take over this feature and push it to go live. So I'm gonna go ahead and close my PR, please use it as a source of inspiration --or not, you decide :) Some ideas behind this PR are super good. My only suggestion is try not to change too much the code, but rather make smaller, incremental changes. If changes are not part of this feature, please submit another PR only for that particular change (ie: mocha update) so that this PR will stay minimal while also making progress. Also it'd be nice if you can execute a full-history reprocess, so more people knows how to do so. Please accept my apologies, I am totally confident you will be able to deliver this feature! |
# Conflicts: # package-lock.json # package.json
@eridal thank you for your understanding, I really appreciate it. Some of the changes in your PR are really helpful and I moved it there (like handling invalid packages and better stats in the readme).
I didn't want to refactor the tests as part of this PR and planned to open a new one after we finish with this. So I revert it for now and add the metadata tests you created in #39
I did it already many times and must say that the caching you implemented is amazing - it saved me a LOT of time.
Unfortunately no. I ran the full rebuild of the index and the metadata test suddenly failed:
It took me some time to figure out why 2 packages are missing and the reason is pretty interesting. There is an NPM package called all-the-package-repos/scripts/update.js Line 218 in 5dd54c5
Is the same as: let foo = repos['constructor'] // we simply grab the object's constructor function
console.log(foo) // [Function: Object] The better (and safer) way to handle the arbitrary list of key-value pairs in javascript is const packages = new Map(fs.existsSync(files.packages) ? Object.entries(require(files.packages)) : []) |
scripts/update.js
Outdated
@@ -427,7 +427,7 @@ const writeChanges = (deferred) => { | |||
fs.writeFileSync(files.metadata, toJson(metadata)) | |||
|
|||
if (batch.found > 0) { | |||
fs.writeFileSync(files.packages, toJson(sort(packages))) | |||
fs.writeFileSync(files.packages, toJson(sort(Object.fromEntries(packages)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're using a Map when collecting data then turning it into an Object when finished? That makes sense to me. 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, basically this is it. I updated the sort
function so it will work with the Map as an input instead of an object - no need to convert the map to an object before sorting it.
scripts/update.js
Outdated
sorted[key] = object[key] | ||
delete object[key] | ||
sorted[key] = packagesMap.get(key) | ||
delete packagesMap.delete(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's an extra delete
keyword here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks
|
||
if (change.deleted) { | ||
if (typeof curr === 'string') { | ||
if (typeof curr !== 'undefined') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can take advantage of the map: package.has(name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked strange to me to check with packages.has()
and use the curr
variable then. And replacing curr
with packages.get()
everywhere looked too wordy.
Hey, this is looking great! I wonder if it makes sense that this package exposes the |
The issue with a So exposing the Map instead of Object makes sense but we should consider memory overhead for Object->Map conversion. Or we might expose the Object without a prototype ( |
With repository | ${notNull} | ${perc(notNull)}% | ||
Null repository | ${areNull} | ${perc(areNull)}% | ||
**Total** | ${total} | ${perc(total)}% | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unnecessary whitespace here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's required here to split two tables. See how it looks without it:
Packages | Count | Percentage |
---|---|---|
With repository | 1004148 | 64.84% |
Null repository | 544609 | 35.16% |
Total | 1548757 | 100.00% |
Providers | Count | Percentage |
:-------- | -----: | ----------: |
GitHub | 986341 | 63.69% |
GitLab | 3624 | 0.23% |
Bitbucket | 1148 | 0.07% |
Others | 13035 | 0.84% |
Total | 1004148 | 64.84% |
hostnames.js
Outdated
@@ -3,7 +3,7 @@ | |||
const URL = require('url') | |||
const countValues = require('count-array-values') | |||
const urls = Object.values(require('.')) | |||
const hostnames = urls.map(url => URL.parse(url).hostname.replace(/^www\./i, '')) | |||
const hostnames = urls.filter(u => u).map(url => URL.parse(url).hostname.replace(/^www\./i, '')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use .filter(Boolean)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fixed.
expect(urls.some(url => isUrl(url)), 'should contain packages with repos').to.equal(true) | ||
expect(urls.some(url => url === null), 'should contain packages without repos').to.equal(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expecting at least a null
, which will not be the case if this is merged into current master. I'd say use the test to expect what we know it might be there, either a null or an url..
for (const url of Object.values(repos)) {
expect(url === null || isUrl(url), url).to.equal(true)
}
Also nothing prevents the packages to be all set, and while super low probability there could be no-null at some point :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which will not be the case if this is merged into current master
Agree, it will fail if we merge it without updating the data. On the other hand master with these changes and old index should be considered broken so probably it's fine.
while super low probability there could be no-null at some point
The probability of that is so low that I would rather consider it a bug if our index doesn't have any null
value. It will prevent the situation when packages without a repository are missing in the index because of a bug in the code.
Leaving aside those small comments, this looks ready to ship. Great work @zarianec! The const foo = {}
console.log(typeof foo.hasOwnProperty) // function
foo.hasOwnProperty = 'nope'
console.log(typeof foo.hasOwnProperty) // string Let's talk about release. This a breaking change for others so we need a major version. |
🍿 Great to see the progress here! |
The release plan should be the same as @eridal described in the #32 (comment) with one additional step - update the major version of the package. |
Sounds good. Wanna have a zoom call soon to pair on the release? |
@zeke sorry, I had a lot of stuff to do last week. I run the update script locally to have metadata and packages ready. Probably it will run all night. I will push updated data to this PR tomorrow and it will be ready for merge. So the plan is like this:
good idea but could be tricky since I'm in a CEST timezone. DM me in Twitter so we could find the best time slot. |
Followed you, but don't see an option to DM just yet. You DM me. :) |
Thanks for the updates @zarianec. I'll try to get this released in the new few days. I see that there's a conflict. Is that expected? Should I force the changes from your updated version of |
Yes, it's safe to force the changes from |
@zeke I just checked if I have all required permissions to release this PR. Looks like I'm good to go so I'm going to release this PR tomorrow (need to rebuild index from scratch to eliminate NPM's private packages leakage) |
Thanks @zarianec Sorry I totally dropped the ball on this. 🙈 |
This is the follow-up of #30 and #31. This PR allows packages without a repository to be added to the
packages.json
with anull
value.resolve #30