Do youtube-dl/HTML5 Video Everywhere run nonfree JS?

25 replies [Last post]
calher

I am a member!

Offline
Joined: 06/19/2015

I looked at the source code to youtube-dl. youtube.py, the code that handles YouTube, calls JSInterpreter on line 1056. Does this mean that youtube-dl is running nonfree JavaScript code from Google on my computer?

Does HTML5 Video Everywhere do the same thing in IceCat? It seems to work when I disable JS.

SuperTramp83

I am a translator!

Offline
Joined: 10/31/2014

AFAIK by default youtube-dl does not run any javascript code.

calher

I am a member!

Offline
Joined: 06/19/2015

Then what does JSInterpreter do? What does youtube.py do with it?

It looks fishy, and I've seen discussions on the GitHub page about "fixing" support for a site by putting the JS through JSInterpreter.

https://github.com/rg3/youtube-dl/issues/12129#issuecomment-279642351

onpon4
Offline
Joined: 05/30/2012

I am not familiar with JavaScript, so looking at the class in qestion, I can't really tell what it's doing. But from the naming, it definitely looks like a JavaScript interpreter.

If youtube-dl does interpret JavaScript code to find the video file, maybe a no-JavaScript option should be added, or better yet, a warning that JavaScript code is about to be executed could be added (asking whether or not to continue).

calher

I am a member!

Offline
Joined: 06/19/2015

Yes, that would be a good solution, I think.

hack and hack
Offline
Joined: 04/02/2015

I don't have the time to compare right now, but if it's the same as this one (https://github.com/NeilFraser/JS-Interpreter), it's sandboxed (curious about how it works, if it's the same one).

calher

I am a member!

Offline
Joined: 06/19/2015

A Windows 10 virtual machine is sandboxed. So what?

hack and hack
Offline
Joined: 04/02/2015

So your example is flawed, since Windows is rotten from the inside already.

JS in youtube-dl is used to get the video's ID.
How? We don't know exactly yet.

JS is problematic because:
- it gathers my data
- it can be executed on my PC, most likely to gather data anyway (regarding youtube)

JS is problematic if:
- it's proprietary (no access to what it really does)
- it's executed without limits.

Sandboxing can matter regarding the last point.
For example, simply firejailing youtube-dl should work to limit how much of my PC this program has access to.

Then there's the matter of linking my IP to whatever I'm watching/listening.
In theory, this only possible through TOR, using a convoluted process (download from the link I get from TOR, but only read the file when TOR is turned off).
But maybe a VPN is enough (even if it's not de-anonymizing) because why go through the effort of inspecting further if it costs too much ressources. Disclaimer: my reasoning probably isn't without flaws.

calher

I am a member!

Offline
Joined: 06/19/2015

My point is that sandboxing is a security solution that has nothing to do with software freedom.

hack and hack
Offline
Joined: 04/02/2015

And mine is that privacy matters more to me than absolute software freedom, which might also interest other people than you.

hack and hack
Offline
Joined: 04/02/2015

https://github.com/rg3/youtube-dl/blob/master/youtube_dl/jsinterp.py

So this is the infamous JS interpreter.
I doesn't seem that bad, but I don't understandmost of it, so...

For example, I couldn't find those things imported at the top (after a glance).

Soon.to.be.Free
Offline
Joined: 07/03/2016

The imports at the top are modules provided with Python: json provides tools for handling json, re is for handling regular expressions, and operator simply allows binary operations (e.g. x+y, x*y) to be expressed as functions.

As you, its capacity seems rather limited. With the caveat that my 'audit' didn't involve reading every single line of code to death, the interpreter's capacity seems largely limited to basic arithmetic and string operations, assignment, and function/class definitions. It also appears to be able to handle json dumps, although what exactly this involves is not apparent.

Overall, it could perhaps be argued it's no worse than allowing arbitrary CSS to run in the browser, since that already permits mathematical operations. However, this is certainly something to be wary of.

hack and hack
Offline
Joined: 04/02/2015

Interesting, thanks :)

I wonder whether it makes function calls to other files.
I'll try and roughly figure out those json dumps things.

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

As you, its capacity seems rather limited.

There are assignments, recursions, etc. That probably is enough for the interpreted language to be Turing-complete: https://en.wikipedia.org/wiki/Turing_completeness

The problem is not the interpreter but the code that is interpreted, which can be free or not (and can do anything a computer can do if the interpreted language is Turing-complete). But is the interpreter really taking arbitrary code from the Web?

Soon.to.be.Free
Offline
Joined: 07/03/2016

>That probably is enough for the interpreted language to be Turing-complete

It seems to be so. On the other hand- and it's no excuse for running proprietary software- there doesn't seem to be a great deal of functionality: for example, there seems to be no way to communicate over the Internet, access a permanent data store, invoke third-party functions, and so on. It seems relatively harmless from a privacy/security perspective, though of course it wouldn't take much for that to change.

>But is the interpreter really taking arbitrary code from the Web?

Unfortunately, yes- perhaps not in actual usage, but it's set up to do so. The module containing the interpreter is imported by youtube_dl/extractor/youtube.py, and the function _parse_sig_js invokes that to run some code it's fed. The following block of code then calls that function with the source of a webpage it downloads:

if player_type == 'js':
code = self._download_webpage(
player_url, video_id,
note=download_note,
errnote='Download of %s failed' % player_url)
res = self._parse_sig_js(code)

This seems to be the only use of the system for YouTube (I haven't looked at other sites), and what exactly sets the player type to 'js' I don't know. It may be worth noting that there's also SWF interpreter, which is invoked very similarly to the way the JS one is (except with player type swf instead).

EDIT: Probably irrelevant, but a re-read suggests I forget the said in "As you said".

calher

I am a member!

Offline
Joined: 06/19/2015

Someone wants to talk to you about this post. Could I give them your email address so they can talk to you about it?

Soon.to.be.Free
Offline
Joined: 07/03/2016

If you can find it, for sure- although could you then please tell me how you found it? Otherwise, I'm afraid it's not possible to publish my e-mail address (spambots and what not). I'm still happy to discuss whatever is of interest with them, but it would have to be through some other means- potentially (though not necessarily) this forum.

Also, although I'm happy to discuss it, do be aware that I'm not particularly experienced with the issue. My expertise are largely limited to a (barely) functional knowledge of Python and enough time and patience to Ctrl+F and grep through a codebase.

calher

I am a member!

Offline
Joined: 06/19/2015

Send it to me, then.

IRC: CharlieBrown on Freenode

Email: name at domain

http://www.interhack.net/pubs/munging-harmful/

Soon.to.be.Free
Offline
Joined: 07/03/2016

I've sent the address to you. On the other hand, the does make a very good point. For confirmation/future reference, I'll post the address here:

gpast [underscore] panama [at] protonmail [dot] com

ADFENO
Offline
Joined: 12/31/2012

I tried to report this issue to the Workgroup for free system
distributions
([[https://lists.gnu.org/mailman/listinfo/gnu-linux-libre]]) and they
are essentially still in doubt that youtube-dl is really running
non-free software. We might need some actual proof that the code is
being run (instead of simple variable declarations).

NOTE Even though I'm not a developer, I know there is a distinction
between running the code that's inside a variable and putting the
contents of that variable inside another.

See the replies I got so far at:
[[http://lists.nongnu.org/archive/html/gnu-linux-libre/2017-04/msg00001.html]]. Please
subscribe to the workgroup's list and reply there instead if you do want
to contribute.

--
- [[https://libreplanet.org/wiki/User:Adfeno]]
- Palestrante e consultor sobre /software/ livre (não confundir com
gratis).
- "WhatsApp"? Ele não é livre, por isso não uso. Iguais a ele prefiro
GNU Ring, ou Tox. Quer outras formas de contato? Adicione o vCard
que está no endereço acima aos teus contatos.
- Pretende me enviar arquivos .doc, .ppt, .cdr, ou .mp3? OK, eu
aceito, mas não repasso. Entrego apenas em formatos favoráveis ao
/software/ livre. Favor entrar em contato em caso de dúvida.

SuperTramp83

I am a translator!

Offline
Joined: 10/31/2014

Well, I just asked on their IRC channel..

SuperTramp83> I'd like to ask one of the developers a question: does youtube-dl run proprietary javascript?
einstein95> What do you mean
RiCON> limbo_: make a page that prints the result from "youtube-dl -J ", done.
SuperTramp83> einstein95, does it download and execute javascript from say youtube when you use it in order to then download the vid?
einstein95> Have a look at the code and see for yourself
SuperTramp83> einstein95, unfortunately I am not able to do that but I'm interested in finding out if it does. I'd like to know if it is bad for muh freedom..
einstein95> What freedom
einstein95> https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/youtube.py#L1049
SuperTramp83> https://trisquel.info/en/forum/do-youtube-dlhtml5-video-everywhere-run-nonfree-js
SuperTramp83> https://www.gnu.org/philosophy/free-sw.en.html
einstein95> All it does is run a bit of JS to get the video signature
SuperTramp83> I see. so the answer is yes, it does run some non-free js. tx
einstein95> Nothing worth seriously worring about
einstein95> *y

hack and hack
Offline
Joined: 04/02/2015

Thanks SuperTramp :)

To get the video signature, huh.
Question remaining is, how does it run that said JS.
30 matches for the word "signature" in that youtube.py.

This isn't easy to understand.

Seems it's also atline 960, 964.
And maybe line 1049, as mentioned by OP.

calher

I am a member!

Offline
Joined: 06/19/2015

Thank you for all your help.

It appears YouTube videos require running JavaScript.

So, where do we go from here? Is it possible to view YouTube anymore? Do we need to encourage people not to post YouTube links now?

Soon.to.be.Free
Offline
Joined: 07/03/2016

>It appears YouTube videos require running JavaScript.

Possibly, but that hasn't been established here. It's clear that using the YouTube interface provided by Google requires JS, and that youtube-dl uses it, but I'm not sure if that extends to all other video download/viewing tools. There's some mentioned in https://trisquel.info/en/forum/you-cannot-watch-youtube-libre-software-computer: ViewTube and VLC are two worth a look.

Regardless, even if this were the case, it's still not necessarily that bad. JavaScript itself isn't necessarily an issue: it's just a programming language, like Python and C. It can hurt freedom when the code is proprietary, requires proprietary components to run, or is indiscriminately copied from someone else's domain, but none of these is an integral part of the language.

>So, where do we go from here? Is it possible to view YouTube anymore? Do we need to encourage people not to post YouTube links now?

In regards to viewing YouTube, it is still possible: see above. As for posting links, that's by corollary not necessarily an incitement to submit to Alphabet Corporation. That said, linking to alternative sources where possible would be ideal. Posting to YT, of course, is strongly advised against, as the cost in privacy and security is a significant one.

Where to go from here is an interesting question. Switching to an alternative program is probably a good idea, if you weren't using one before. Potentially it might be worth petitioning the YouTube-DL developers, but that presumes it's their fault: obviously they were the ones who decided to implement the code, but did they have a choice? It's still unclear, as far as I'm aware, whether other 'interfaces' to the video-sharing service are equally problematic. Even the kind of code being run is not currently clear, though it is of course obvious that the potential to execute a Turing-Complete subset of JavaScript exists. Overall, whilst it's definitely worth investigating further, it's still too early to state what exactly the issue is, let alone lay blame and take action. One thing is already obvious, though: Google don't always follow their motto.

calher

I am a member!

Offline
Joined: 06/19/2015

The JavaScript code that youtube-dl is dealing with is the JavaScript code that came from YouTube. We are talking about proprietary JavaScript code here.

So, it is bad.

Soon.to.be.Free
Offline
Joined: 07/03/2016

To clarify, the statement "Regardless, even if this were the case, it's still not necessarily that bad." wasn't intended to excuse the execution of proprietary JavaScript. the reference was to the more general nature of the assertion that "YouTube videos require running JavaScript"- I probably misinterpreted here, but the response was to the implicit predicate that JavaScript is a problem.