FSDG and pip

17 replies [Last post]
chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

As discussed in this bug report,[1] pip allows the user to search and install software from pypi.org, some of which is proprietary. It looks like pip is going to be removed entirely[2] to address this freedom issue. However, since most software in the PyPI repository is free, I think it would be preferable to modify pip so that it refuses to recommend or install any non-free software in the PyPI repo.

For each PyPI package there is a json file at https://pypi.org/pypi/[package name]/json containing metadata including licensing information and a list of dependencies. I've written some code that uses this information to check the license of a package and packages in its dependency tree to determine whether it is or requires non-free software. I've modified pip's code so that 'pip search' only lists packages that pass the license check and 'pip install [package]' will refuse to install a package that is non-free or has non-free dependencies.

Before I put more time into this, I'd like some feedback on several points.

(1) Is this approach potentially sufficient to satisfy the FSDG? If not, the rest of these questions are irrelevant.

(2) With what I have now, attempting to install proprietary software returns

[package] has non-free or unclear license.
Not installing [package]

and attempting to install free software with proprietary dependencies returns

one or more dependencies of [package] has non-free or unclear license.
Not installing [package]

Is this appropriate, or should pip act as if the software does not exist at all?

(3) The biggest challenge has been that the license data is inconsistent and often unclear. Many packages do state the license using a consistent format, such as

License :: OSI Approved :: GNU General Public License v3 (GPLv3)

but others use a variety of formats such as "GPLv3", "GNU GPL", or "GNU General Public License", "gpl", or even the entire text of the GPL. Some packages have multiple license statements or no license statements.

This makes it a real pain to compile a whitelist of acceptable license statements, and it seems inevitable that some free packages are going to be inaccurately excluded. While we should avoid excluding free software as much as possible, the most important thing is that no proprietary software slips through. I would appreciate input on the standard of clarity that should be required in order to whitelist a license statement. If you think you can provide guidance, please see here[3] for the whitelist as it currently stands and a description of my approach so far.

(4) Is anyone aware of situations other than from 'pip search [query]' and 'pip install [non-free package]' in which pip's behavior needs to be modified in order to satisfy the FSDG?

Thanks for any guidance or assistance anyone can provide.

[1] https://trisquel.info/en/issues/3741
[2] https://devel.trisquel.info/trisquel/ubuntu-purge/merge_requests/33
[3] https://notabug.org/chaosmonk/pip/src/fsdg/src/pip/whitelist.py

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

(5) My programming experience is limited and I took this on partially as an educational project, so technical feedback is also welcome.

jxself
Offline
Joined: 09/13/2010

It's a nice idea but as you've said it's hard to do in an automated way. Human intervention will always be needed. This is probably why Trisquel is doing what it's doing; It's a far easier task to remove it than it is to filter and maintain it.

This problem isn't specific to Trisquel though. It affects all FSF-endorsed distros. Therefore, it's probably best to work on this in a collaborative, cross-distro way. There is a mailing list set up for cross-distro collaboration like this called gnu-linux-libre: https://lists.nongnu.org/mailman/listinfo/gnu-linux-libre

This effort is probably best discussed and organized there. In this way, people from all FSF-endorsed distros will be able to participate and something can be worked out such that this problem only needs to be solved once, and for the collective benefit of all distros instead of each distro reinventing the wheel for themselves and doing the same filtering/maintenance work.

Personally, I don't know that your method of warning people is sufficient because this filter only acts if they use the client program to access it. It doesn't prevent other methods, like if someone accesses it directly. (A similar thing could be pointed out by filtering out non-free .debs but I could just open my browser to look at the repo and still see them.) People still get sent to/referred to/whatever-you-call it to that place regardless.

Rather, there shouldn't *be* a repo with non-free things in it. So: Copy the free things into a new repo, and then all FSF-endorsed distros can change the address in their copy of pypi to access the new location instead.

Effectively, this is the option #2 mentioned by RMS in:
https://lists.libreplanet.org/archive/html/libreplanet-discuss/2016-04/msg00078.html

But, this is something that can be discussed on the mailing list with input from other distros.

Magic Banana

I am a member!

I am a translator!

Online
Joined: 07/24/2010

RMS wrote:

We should look for volunteers to make replacement repositories for a couple of them, based on automatic filtering not manual vetting.
https://lists.libreplanet.org/archive/html/libreplanet-discuss/2016-04/msg00078.html

Maybe an issue could be filed to propose to adopt a machine-readable license format. Maybe that of Debian's packages: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ which itself uses the SPDX License List, i.e., https://spdx.org/licenses/

pip's searching capabilities would then be enhanced and could be used to reliably list the free libraries to be copied to a new 100%-free repository that the FSF could host.

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

> Maybe an issue could be filed to propose to adopt a machine-readable
> license format. Maybe that of Debian's packages:
> https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ which
> itself uses the SPDX License List, i.e., https://spdx.org/licenses/

That would be a massive improvement. There are two places in the
metadata to add a license. The one here[1] is the one that does
always appear to have a consistent format like

License :: OSI Approved :: GNU General Public License v2 (GPLv2)
License :: CeCILL-B Free Software License Agreement (CECILL-B)
License :: OSI Approved :: MIT License

so it seems like the inconsistently formatted licenses are deprecated,
but unless someone goes and updates the licenses of older packages or
PyPI removes packages that don't conform there is still this problem.

Another challenge is that even with the newer license format, multiple
license statements are allowed. If multiple license statements indicated
dual licensing then this would not a be a problem, but this package[2]
seems to indicate otherwise.

In the older "license" field it says "Artistic-License-2.0 +
Forced-Fairplay-Constraints", which indicates that it is non-free, but it's
not machine readable. The newer, "classifiers field has two license
statements:

License :: OSI Approved :: Artistic License
License :: Free To Use But Restricted

"Artistic License" is already too vague, because it does not specify the
version and version 1.0 is non FSF approved, but this could be addressed
by Debian's format. However, allowing multiple license statements no
matter how specific they are is a problem if one of those is a free
license and the software is not free.

[1] https://packaging.python.org/tutorials/packaging-projects/
[2] https://pypi.org/pypi/yamldata/json

calher

I am a member!

Offline
Joined: 06/19/2015

linux-libre isn't even allowed to mention the name of non-free
software.

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

> It's a nice idea but as you've said it's hard to do in an automated way.
> Some human intervention will always be needed.

Yeah, you're probably right. There's so much ambiguity in the license
statements that any automated approach aggressive enough to remove all
proprietary software would also remove so much free software that pip
would no longer be very useful, especially since a package can't be
considered free unless its entire dependency tree can.

> This is probably why Trisquel
> is doing what it's doing; It's easier to remove it than it is to filter and
> maintain it.

Yes, there's the work-to-return ratio to consider. I think it was right
to just remove Snap completely, for example, since it was already not very
useful to people who don't want to install proprietary software. A
patch that could be implemented as a package helper seemed like it might
be worth it to salvage pip, but especially after reading onpon4's
comment I'm beginning to doubt whether additional efforts are worth it.

> Personally, I don't know that your method of warning people sufficient
> because this filter only acts if they use the client program to access it.
> It doesn't prevent other methods, like if someone accesses it directly. (A
> similiar thing could be pointed out by filtering out non-free .debs but I
> could just open my browser to look at the repo and still see them.) People
> still get sent to/referred to/whatever-you-call it to that place regardless.

My thinking was that if the non-free package is excluded from pip's
search results then anyone who tries to install it already knows it
exists, and that explaining that the package is non-free is better then
letting them assume that pip is just not working and installing the
package by another means. However, I see your point. Maybe it is
better to just ignore the package. That's more similar to how apt
behaves when you try to install an Ubuntu package rejected by Trisquel,
such as

$ sudo apt install chromium
...
Package chromium is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
chromium-bsu:i386 chromium-bsu

E: Package 'chromium' has no installation candidate

> Rather, there shouldn't *be* a repo with non-free things in it.

That's a good point. A non-free repo shouldn't exist, so it would be
strange to rely on one.

> So: Copy the
> free things into a new repo, and then all FSF-endorsed distro can change the
> address in their copy of pypi to access the new location.
>
> Effectively, this is the option #2 mentioned by RMS in:
> https://lists.libreplanet.org/archive/html/libreplanet-discuss/2016-04/msg00078.html

I read that thread and agreed that a new repo would be a good solution,
especially since as you point out this problem is not specific to
Trisquel and a free PyPI replacement could be used by all FSDG distros.

I get frustrated, though, that I spend so much time agreeing that
things should be done and not much time doing them. I wouldn't know
where to begin creating a replacement for the PyPI repo, so I approached
the problem in a way that was within my skill set. It seems not to have
been a great approach, although the automatic filtering could at least
reduce the amount of additional manual work needed to create a new repo.

SuperTramp83

I am a translator!

Offline
Joined: 10/31/2014

>Rather, there shouldn't *be* a repo with non-free things in it. So: Copy the free things into a new repo, and then all FSF-endorsed distros can change the address in their copy of pypi to access the new location instead.

this.

onpon4
Offline
Joined: 05/30/2012

I tend to think that PyPI is less important than you might think. Yes, it's convenient. But it's a language-specific installer, easy to install yourself if you really want it, and even if you don't have it, it's perfectly easy to just download the files from PyPI, extract, and do "python{2|3} setup.py build && sudo python{2|3} setup.py install" (note: the target audience of pip is developers, not end-users). Not to mention, important libraries should just be included in the regular repo.

aloniv

I am a translator!

Offline
Joined: 01/11/2011

I agree with onpon4. In addition, pip does not require cryptographic package signing using tools such as GPG so you could be downloading altered packages if someone breaks into the PyPI website and replaces a package with a malicious version.

PyPI did in fact contain malicious packages in the past - the issue was reported online, e.g. here:

https://developers.slashdot.org/story/17/09/16/2030229/pythons-official-repository-included-10-malicious-typo-squatting-modules

Of course the package signing problem can also occur in code repositories such as GitLab as well if those do not impose GPG signing of commits (which I gather most do not). Of course the GNU/Linux package managers do not solve the problem if they grab code from such code repositories without verifying the cryptographic signatures of the original developers.

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

> I tend to think that PyPI is less important than you might think.

Thanks, onpon4. You would know much better than I how useful pip is
to developers, so I'm sure you're right. Do you mind clarifying though
whether by

> important libraries
> should just be included in the regular repo.

you mean that important libraries are already included in the repo or
that they ought to be?

onpon4
Offline
Joined: 05/30/2012

Most that I'm aware of already are, though there's sometimes a problem of them not being available for Python 3 (in particular Pygame and Pyglet; note, both of these libraries work with Python 3, so this is a packaging problem in both cases).

I tend to think of my own libraries (sge, xsge_*, tmx) as important, and those aren't included, but I'm probably the only developer on the planet who uses those. ;)

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

It occurs to me that if creating a cross-distro free replacement
repository is realistic, a better target might be addons.mozilla.org

Firefox and Thunderbird addons are used more frequently and by
non-developers, Trisquel is already attempting to maintain a free
replacement[1] manually, and the issue of non-free addons is also a
problem for all FSDG distros.

From this page[2] it appears that, unlike PyPI, Mozilla requires a
license statement and provides a clearly defined set of options:

Mozilla Public License, version 2.0
GNU General Public License, version 2.0
GNU General Public License, version 3.0
GNU Lesser General Public License, version 2.1
GNU Lesser General Public License, version 3.1
MIT/X11 License
BSD License
All Rights Reserved
Other

Apart from "All Rights Reserved" and "Other" all of these licenses are
free under the FSDG (and GPL-compatible), so automatic filtering should
be much more realistic. Anything labeled "Other" would have to be
rejected, and this would probably mean leaving out some free software,
but surely not as much as what gets left out by Trisquel's manual
approach.

[1] https://trisquel.info/en/browser
[2] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Distribution/Submitting_an_add-on

jxself
Offline
Joined: 09/13/2010

Right. Those things (add-ons and Firefox and Thunderbird and other Mozilla-branded things like the Rust programming language as well as other things) are all good candidates to be handled in a cross-distro way as well. Currently the FSF-endorsed distros each solve that problem on their own which results in duplication of effort.

There is also an effort to include plugins in the FSF's online directory at https://directory.fsf.org/wiki/Main_Page

This seems a better resource to send people to when they want add-ons. This resource can also be updated in a distro-agnostic way.

Magic Banana

I am a member!

I am a translator!

Online
Joined: 07/24/2010

Mozilla requires a license statement and provides a clearly defined set of options

I was not aware of that. I thought the license name was a free text field. That may mean the two small scripts (in the attached archive) I wrote some time ago may be modified to automatically list the add-ons whose authors choose one of the seven free software licenses that Mozilla suggests. I guess most of the free software add-ons must be under one of those seven licenses. More add-ons (e.g., "HTTPS Everywhere" under the "Multiple" license, what is basically meaningless to decide the free/proprietary status) could then be added.

In the archive:

  1. 'pop-addons' aims to download Web pages of add-ons from addons.mozilla.org;
  2. 'free-addons' takes in argument the directory created by the previous script (or even nothing if the directory is not renamed and the script is called from the parent directory) and classify the add-ons w.r.t. their licenses, either asking or automatically, if it has already got the answer (during the same execution, a previous execution or even during the execution of another user who would have shared her accept/reject files).

Writing down the seven licenses in the accept file and slightly modifying the script to automatically reject any other license is easy.

AttachmentSize
pop-n-free.tar.gz 1.34 KB
chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

> I thought the license name was a free text field.

I was partly mistaken. If you select "Other" you are indeed given a text field in which to enter a different license. However, as you say, most free addons are probably under one of the seven licenses listed, and anyone using one of these licenses would be unlikely to bother clicking "Other" and typing out the license in a different format. As a result, most free addons have a machine-readable license statement.

It's true that the yes/no dialog in free-addons would not be needed to automatically reject all licenses other than those seven, but it only takes seven yes's to accept them all, at which point your yes/no dialog works well for dealing with the handful of other free licenses.

Mozilla requires developers who select "Other" to upload the text of their license, which is available at

https://addons.mozilla.org/en-US/firefox/addon/[name of addon]/license

so I tried out in addition to your y/n options adding a 'v' option to view the text of the license and an 'o' option to add only the current addon without accepting other addons with the same license statement. This might help with situations like HTTPS Everywhere. The text for HTTPS Everywhere is

HTTPS Everywhere:
Copyright © 2010-2018 Electronic Frontier Foundation and others
Licensed GPL v2+
HTTPS Everywhere Rulesets (src/chrome/content/rules):
To the extent copyright applies to the rulesets, they can be used according to
GPL v2 or later.
Issue Format Bot (utils/issue-format-bot/*):
Copyright © 2017 AJ Jordan, AGPLv3+
The build system incorporates code from Python 3.6
Copyright © 2001-2018 Python Software Foundation; All Rights Reserved

from which one can determine that the addon is free and add HTTPS Everywhere without accepting "Multiple" for other addons.

I'm having a little with your script and the MIT/X11 License. I am prompted to accept or reject "MIT\u002FX11 License", and accepting it does add it to pop-n-free/accept, but at the next X11 addon it appears not to find it in the list of accepted licenses. "MIT\u002FX11 License" gets added to pop-n-free an additional time each time I accept it and is never recognized as having already been accepted. It seems like it might have something to do with '/'? Is it a locale thing?

AttachmentSize
pop-n-free.tar.gz 1.41 KB
Magic Banana

I am a member!

I am a translator!

Online
Joined: 07/24/2010

so I tried out in addition to your y/n options adding a 'v' option to view the text of the license and an 'o' option to add only the current addon without accepting other addons with the same license statement.

Nice! I guess we could organize a day where volunteers on this forum would take care of a few pages (whose numbers are in the interval in argument of 'pop-addons'). However, I do not really know how to add add-ons on https://trisquel.info/en/browser/addons and it would be nice to have 'pop-addons' check whether the add-on is already in Abrowser's catalog.

I accept it and is never recognized as having already been accepted.

That is because the backslash is the escape character for 'grep' which is given the license name to search in the accept/reject files. The two 'grep -qx "$license"' in free-addons should be changed into 'fgrep -qx "$license"' so that "$license" is seen as a fixed string.

That said, showing the unicode "\u002F" instead of "/" os not nice: "| sed 's/\\u002F/\//g'" could be appended to grep -o '"license":{[^}]*,"name":"[^"]*' $dir/addons/$a | sed -n '1 s/.*"//p' to only treat that case (I do not not know if there are others).

chaosmonk

I am a member!

I am a translator!

Offline
Joined: 07/07/2017

> However, I do not really know how to add add-ons on
> https://trisquel.info/en/browser/addons

I'm not sure either. I've sent David an email asking how.

> That is because the backslash is the escape character for 'grep' which is
> given the license name to search in the accept/reject files. The two 'grep
> -qx "$license"' in free-addons should be changed into 'fgrep -qx "$license"'
> so that "$license" is seen as a fixed string.

That did it. Thanks!