-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To provide an answer to the question "How many projects use more than one license?" #58
Comments
Hmmm, it brings another vital question to the table: How can we recognize the license(s) of a project? GitHub provides no metadata about it, in fact, GitHub does not even know what License a given project uses. We have to design some heuristic/technique to grasp this information. The possible approaches that come to my mind are to look for a LICENSE file in the root of the project directory or look for a "License" section in the README. Then we scrap the text and look for patterns which will allow us to determine which License it is about. |
I agree. How hard would it be to implement one such heuristic and how comprehensive would the search be? |
Hi @gustavopinto , how to you plan to extract and discover the license of the project files? Are you planing to use something like FOSSology? I've started studying it to see how we could use their system with groundhog. (maybe you already know that) |
I don't know yet how FOSSology works. But, I think that a good start point could be analyzing all text files on the root directory. We then read all of them, and thus perform a string search looking for words like "license", "copy[right|left]" or similar ones. But I still don't know how to discover what is the exact license on the text. :P |
Maybe we can catalog all well-known license names, and thus try to find these names on the text files (only in files which exist a 'license-or-similar-word' in the text). We then must have two more license categories: |
I think using FOSSology, or at least trying to use it, would be a better idea. Why duplicate the effort that someone else has already spent? Someone has actually used it to answer precisely that question in the recent past: http://www.theregister.co.uk/2013/04/18/github_licensing_study/ |
Yes, you are right. Let's give FOSSology a chance. |
Here I'm again. I think it will be very hard to integrate FOSSology into groundhog, mainly because FOSSology is written in php 👎 Do any of you know other alternatives? |
But is it a library or a Web page/service? |
Both! I've tested their demo system on [1] but it seems to be only a web based service (I couldn't find any API). So I believe that if we want to use FOSSology we only have two options: either we run our own FOSSology server or we could rely on their web based demo system. IMO we should follow @gustavopinto 's first idea and develop some kind of algorithm to discover the project's licence, unless we find another FOSSology server. [1] - https://fossology.ist.unomaha.edu/ |
You can test their system in: https://fossology.ist.unomaha.edu/?mod=agent_nomos_once Just create a license text file and upload, it really works! |
yeah, @dnr2 is right. |
We are then set to discuss how we can develop this algorithm for identifying licenses. To begin the discussion, IMO there are tree places in a GitHub project to search for the license text:
BTW, this is another limitation of GitHub. The service does not formally collect licensing information for its hosted projects. This is based in my experience using GitHub, which is of course subjected to my personal biases. But I think it's a good starting point anyway. Before figuring out how we will identify/classify the licensing data, we must know how we will obtain it. Bring your ideas onto the table! -- rodrigo On Jul 12, 2013, at 7:41 PM, Gustavo [email protected] wrote:
|
Sometimes the license is part of the Github webpage of a project, e.g.: https://github.com/johannilsson/android-pulltorefresh and https://github.com/openaphid/android-flip |
The project webpage is built using the README.md file. So, this case will be addressed by
|
Do any of you have seen projects with two or more licenses? |
I've never seen -- rodrigo On Jul 16, 2013, at 1:37 PM, Gustavo [email protected] wrote:
|
great news! https://github.com/blog/1530-choosing-an-open-source-license We can observe if projects will use this feature. If so, our problem was solved! |
Guys, I have created this class to organize the well-know license names. If any of you know another license, please, put in this file. |
ok! great |
Me and @rodrigoalvesvieira were thinking if we could use the FOSSology's "One-Shot License Analysis" [1] together with some kind of Browser Automation API like Watij [2] to scan the files and get the related licenses... => Pros: So what do you guys think? (sorry for going back to the FOSSology discussion again, but I believe that this systems really works..., so we could save time using it) [1] - https://fossology.ist.unomaha.edu/?mod=agent_nomos_once |
By the way, here are some interesting links: FOSSology algorithm: http://www.fossology.org/projects/fossology/wiki/Symbolic_Alignment_Matrix |
Hi Danilo. Could we follow your plan without having to spend a whole lot of effort? If so, I think it's worth a shot. I think that, in our next meeting, we should discuss how to make it easy to extend groundhog. Me and Rodrigo have talked a little bit about that, but we need to settle it down. |
We need to implement and test the features required to use Groundhog to answer the question in the title of the issue. We then have to use it to actually answer the question.
The text was updated successfully, but these errors were encountered: