Writing custom programs for yt-dlp's jsinterp

No replies
Jacob K
Offline
Joined: 01/13/2022

Writing custom programs for yt-dlp's jsinterp

This is a pretty technical post, but maybe some people will find it interesting. The short version is that yt-dlp's JavaScript interpreter is actually running the JavaScript sent by YouTube, but I'm not sure how much of a problem that is.

Someone in the #hyperbola IRC channel linked a post from jxself about yt-dlp and JavaScript interpretation [1].

I've heard people say before that youtube-dl's JavaScript interpreter isn't a "real interpeter" because it isn't Turing-complete and doesn't have branching [2] (though there seems to be disagreement about this even in the past [3]). I accepted the idea that it wasn't a real interpreter for a long time, without investigating further.

Inspired by jxself's post, I decided to try writing some programs in yt-dlp's interpreter. I found that I was able to write programs that suggest the interpreter is Turing-complete. It seems like it probly wasn't Turing-complete a long time ago, but at some point, enough functionality was added to make the interpreter Turing-complete. In particular, for loops and switches seem like sufficient additions to make the language Turing-complete. Maybe those aren't even necessary.

I'm still running Trisquel 11, so I'm using yt-dlp version 2022.04.08-1. You can go to a directory for programs (e.g. ~/programs) and run `apt source yt-dlp`. Then, navigate to yt-dlp-2022.04.08 and run `python3`. Since you are in the yt-dlp source folder, you can run `from yt_dlp.jsinterp import JSInterpreter` to get access to the JSInterpreter class. From there, you can create an instance of JSInterpreter and run `.call_function('functionName', argument)` on the instance to get the output of a function in the input string passed when creating the class instance. For example, here's a function to output a list of Fibonacci numbers: [4]

(I couldn't figure out how to make the code look good on the forum so I used Codeberg links instead.)

If you copy/paste the code from that file into the interpreter, you can then call the function like this: `print(jsi.call_function('fib', 10))` That will give you the first 10 Fibonacci numbers. You can alternatively copy the Python file to the working directory, and run it with -i so you can continue interacting afterwards.

I also wrote a program to simulate elementary cellular automata, which is known to be Turing-complete (with Rule 110 [5]): [6]

Then you can use `print(jsi.call_function('simulate', [[1], [0, 1, 1, 0, 1, 1, 1, 0], 3]))` to print the first few steps of Rule 110 starting from just 1 filled cell.

I also played around with jsinterp from youtube-dl version 2017.04.16, from git [7]. It lacks for loops, switch statements, and push/pop for arrays. I chose this version because it's before Magic Banana and rain1 made claims about whether the interpreter was Turing-complete. With the use of loop unrolling and some clever bitwise math, I was able to write a simulator Rule 110 [8], but it seems like the number of iterations has to be baked into the program and not an argument, as though a loop has been unrolled. I think this still satisfies the colloquial definition of Turing-complete.

youtube-dl version 2015.02.01 was the release that added support for bitwise operations, and this version of Rule 110 works there, though you have to run it in python2.7 instead of python3. In version 2015.01.30.2, there are no arithmetic or bitwise operations, and the lack of the call_function Python interface makes testing a bit more inconvenient. There are still several string manipulation functions, and maybe it is possible to use them in clever ways to simulate a Turing machine, but I am not sure. I think this is an open problem: is the JavaScript interpreter in version 2015.01.30.2 of youtube-dl Turing-complete?

More generally, we could ask: what's the oldest release version of youtube-dl with a Turing-complete interpreter, and what's the newest that doesn't have a Turing-complete interpreter?

Google could at any time decide to send programs that (in addition to calculating the string needed to get the video) generate many numbers in the Fibonacci sequence, or possibly something more complicated (seems somewhat unlikely to me). There are a couple of forks of youtube-dl that aim to avoid running software downloaded from the video site, such as avideo [9] (notabug is down right now though so you can't download it) and hypervideo (I think this was a rebranding of avideo, but it's now removed from Hyperbola repos).

The code Google sends that youtube-dl or yt-dlp runs does seem nonfree, but I also think it's very different from most nonfree software. The software is designed to force a specific interface (the official YouTube app) and prevent unauthorized saving of videos, but in the context of yt-dlp it fails at both of those things. It doesn't seem to affect my life as much as the official YouTube app would, even if both are nonfree software. I'm not ccompletely convinced that it would really be better to avoid this software. Nonfree software is bad for reasons that might not really apply here. I'm pretty unsure about this, so I'd like to hear what others have to say about this aspect also.

Invidious often allows downloading videos from YouTube, without running any JavaScript at all. I would guess that operators of Invidious instances are still running the nonfree server there, but I think most poeple here would agree it's a still a good thing overall. I guess it's a bit like joining a Microsoft Teams room as a group instead of each individiually: fewer people need to run the nonfree software, and it's also symbolic.

Maybe in the future it will be possible to make some sort of algorithm that takes a nonfree signature function as input an outputs a free signature function. Then one person could run that algorithm each time YouTube updates the signature function and distribute the resulting free function to anyone who wants it. It seems far-fetched, but this is what humans do when they reverse-engineer software to write a free replacement, so maybe someone will figure out how to automate that eventually, at least for a known subset of possible functions.

[1] https://jxself.org/shifting-the-trap.shtml
[2] https://lists.gnu.org/archive/html/gnu-linux-libre/2017-07/msg00003.html
[3] https://trisquel.info/en/forum/do-youtube-dlhtml5-video-everywhere-run-nonfree-js#comment-113488
[4] https://codeberg.org/JacobK/jsinterp.py-custom-programs/src/branch/main/fib-yt-dlp.py
[5] https://en.wikipedia.org/wiki/Rule_110
[6] https://codeberg.org/JacobK/jsinterp.py-custom-programs/src/branch/main/elementary-cellular-automata-yt-dlp.py
[7] https://github.com/ytdl-org/youtube-dl/
[8] https://codeberg.org/JacobK/jsinterp.py-custom-programs/src/branch/main/left-life-youtube-dl.py
[9] https://notabug.org/GPast/avideo