Need help with a Perl script

9 réponses [Dernière contribution]
GNUser
Hors ligne
A rejoint: 07/17/2013

Hey guys, so I am not skilled in Perl (I always do my stuff in Bash), but I found a cool youtube downloader script in Perl, and wanted to use it. I tried to modify the "curl" option to use "--socks5-hostname 127.0.0.1:9150" as to use my Tor connection (which I sometimes do in Bash scripts). I was able to modify the first occurrence of curl (line 13) but the second occurrence (line 197) gives me an error. I hope someone more knowledgeable in Perl can help me. The script is down here (it's quite big but hope it will be ok).

#!/usr/bin/perl -T

use strict;
use warnings;

#
## Calomel.org ,:, Download Youtube videos
## Script Name : youtube_download.pl
## Version : 0.58
## Valid from : March 2016
## URL Page : https://calomel.org/youtube_wget.html
## OS Support : Linux, Mac OSX, OpenBSD, FreeBSD
# `:`
## Two arguments
## $1 Youtube URL from the browser
## $2 prefix to the file name of the video (optional)
#

############ options ##########################################

# Option: what file type do you want to download? The string is used to search
# in the youtube URL so you can choose mp4, webm, avi or flv. mp4 is the most
# compatable and plays on android, ipod, ipad, iphones, vlc and mplayer.
my $fileType = "mp4";

# Option: what visual resolution or quality do you want to download? List
# multiple values just in case the highest quality video is not available, the
# script will look for the next resolution. You can choose "itag=22" for 720p,
# "itag=18" which means standard definition 640x380 and "itag=17" which is
# mobile resolution 144p (176x144). The script will always prefer to download
# the first listed resolution video format from the list if available.
my $resolution = "itag=22,itag=18";

# Option: How many times should the script retry if the download fails?
my $retryTimes = 2;

# Option: turn on DEBUG mode. Use this to reverse engineering this code if you are
# making changes or you are building your own youtube download script.
my $DEBUG=0;

#################################################################

# initialize global variables and sanitize the path
$ENV{PATH} = "/bin:/usr/bin:/usr/local/bin:/opt/local/bin";
my $prefix = "";
my $retry = 1;
my $retryCounter = 0;
my $user_url = "";
my $user_prefix = "";

# collect the URL from the command line argument
chomp($user_url = $ARGV[0]);
my $url = "$1" if ($user_url =~ m/^([a-zA-Z0-9\_\-\&\?\=\:\.\/]+)$/ or die "\nError: Illegal characters in YouTube URL\n\n" );

# declare the user defined file name prefix if specified
if (defined($ARGV[1])) {
chomp($user_prefix = $ARGV[1]);
$prefix = "$1" if ($user_prefix =~ m/^([a-zA-Z0-9\_\-\.\ ]+)$/ or die "\nError: Illegal characters in filename prefix\n\n" );
}

# if the url down below does not parse correctly we start over here
tryagain:

# make sure we are not in a tryagain loop by checking the counter
if ( $retryTimes < $retryCounter ) {
print "\n\n Stopping the loop because the retryCounter has exceeded the retryTimes option.";
print "\n The video may not be available at the requested resolution or may be copy protected.\n\n";
print "\nretryTimes counter = $retryTimes\n\n" if ($DEBUG == 1);
exit;
}

# download the html from the youtube page containing the page title and video
# url. The page title will be used for the local video file name and the url
# will be sanitized to download the video.
my $html = `curl --socks5-hostname 127.0.0.1:9150 -A "Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0" -H "Accept-Language: en-us,en;q=0.5" -sS -L --compressed "$url"` or die "\nThere was a problem downloading the HTML page.\n\n";

# format the title of the page to use as the file name
my ($title) = $html =~ m/(.+)<\/title>/si;
$title =~ s/[^\w\d]+/_/g or die "\nError: we could not find the title of the HTML page. Check the URL.\n\n";
$title = lc ($title);
$title =~ s/_youtube//ig;
$title =~ s/^_//ig;
$title =~ s/_amp//ig;
$title =~ s/_39_s/s/ig;
$title =~ s/_quot//ig;

# filter the URL of the video from the HTML page
my ($download) = $html =~ /"url_encoded_fmt_stream_map"(.*)/ig;

# Print the raw separated strings in the HTML page
#print "\n$download\n\n" if ($DEBUG == 1);

# This is where we loop through the HTML code and select the file type and
# video quality.
my @urls = split(',', $download);
OUTERLOOP:
foreach my $val (@urls) {
# print "\n$val\n\n";

if ( $val =~ /$fileType/ ) {
my @res = split(',', $resolution);
foreach my $ress (@res) {
if ( $val =~ /$ress/ ) {
print "\n html to url separation complete.\n\n" if ($DEBUG == 1);
print "$val\n" if ($DEBUG == 1);
$download = $val;
last OUTERLOOP;
}
}
}
}

# clean up by translating url encoding and removing unwanted strings
print "\n Start regular expression clean up...\n" if ($DEBUG == 1);
$download =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
$download =~ s/sig=/signature=/g;
$download =~ s/\\u0026/\&/g;
$download =~ s/(type=[^&]+)//g;
$download =~ s/(fallback_host=[^&]+)//g;
$download =~ s/(quality=[^&]+)//g;
$download =~ s/&+/&/g;
$download =~ s/&$//g;
$download =~ s/%2C/,/g;
$download =~ s/%252F/\//g;
$download =~ s/^:"url=//g;
$download =~ s/\"//g;
$download =~ s/\?itag=22&/\?/;

# print the URL before adding the page title.
print "\n The download url string: \n\n$download\n" if ($DEBUG == 1);

# check for &itag instances and either remove extras or add an additional
my $counter1 = () = $download =~ /&itag=\d{2,3}/g;
print "\n number of itag= (counter1): $counter1\n" if ($DEBUG == 1);
if($counter1 > 1){ $download =~ s/&itag=\d{2,3}//; }
if($counter1 == 0){ $download .= '&itag=22' }

# save the URL starting with http(s)...
my ($youtubeurl) = $download =~ /(https?:.+)/;

# is the URL in youtubeurl the variable? If not, go to tryagain above.
if (!defined $youtubeurl) {
print "\n URL did not parse correctly. Let's try another mirror...\n";
$retryCounter++;
sleep 2;
goto tryagain;
}

# collect the title of the page
my ($titleurl) = $html =~ m/(.+)<\/title>/si;
$titleurl =~ s/ - YouTube//ig;

# combine file variables into the full file name
my $filename = "unknown";
$filename = "$prefix$title.$fileType";

# url title to url encoding. all special characters need to be converted
$titleurl =~ s/([^A-Za-z0-9\+-])/sprintf("%%%02X", ord($1))/seg;

# combine the youtube url and title string
$download = "$youtubeurl\&title=$titleurl";

# Process check: Are we currently downloading this exact same video? Two of the
# same download processes will overwrite each other and corrupt the file.
my $running = `ps auwww | grep [c]url | grep -c "$filename"`;
print "\n Is the same file name already being downloaded? $running" if ($DEBUG == 1);
if ($running >= 1)
{
print "\n Already $running process, exiting." if ($DEBUG == 1);
exit 0;
};

# Print the long, sanitized youtube url for testing and debugging
print "\n The following url will be passed to curl:\n" if ($DEBUG == 1);
print "\n$download\n" if ($DEBUG == 1);

# print the file name of the video being downloaded for the user
print "\n Download: $filename\n\n" if ($retryCounter == 0 || $DEBUG == 1);

# print the itag quantity for testing
my $counter2 = () = $download =~ /&itag=\d{2,3}/g;
print "\n Does itag=1 ? $counter2\n\n" if ($DEBUG == 1);
if($counter2 < 1){
print "\n URL did not parse correctly (itag).\n";
exit;
}

# Background the script before the download starts. Use "ps" if you need to
# look for the process running or use "ls -al" to look at the file size and
# date.
fork and exit;

# Download the video, resume if necessary
# print "$filename";
# print "$download";
# sleep 20;
system("curl", "-sSRL", "--socks5-hostname 127.0.0.1:9150", "-A 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0'", "-H 'Accept-Language: en-us,en;q=0.5'", "-o", "$filename", "--retry", "5", "-C", "-", "$download");

# Print the exit error code
print "\n exit error code: $?\n" if ($DEBUG == 1);

# Exit Status: Check if the file exists and we received the correct error code
# from the curl system call. If the download experienced any problems the
# script will run again and try to continue the download until the retryTimes
# count limit is reached.

if( $? == 0 && -e "$filename" && ! -z "$filename" )
{
print "\n Finished: $filename\n\n" if ($DEBUG == 1);
}
else
{
print STDERR "\n FAILED: $filename\n\n" if ($DEBUG == 1);
$retryCounter++;
sleep $retryCounter;
goto tryagain;
}

#### EOF #####

danieru
Hors ligne
A rejoint: 01/06/2013

http://stackoverflow.com/

Also:
>"gives me an error"
What error?

GNUser
Hors ligne
A rejoint: 07/17/2013

curl: option --socks5-hostname 127.0.0.1:9150: is unknown
curl: try 'curl --help' or 'curl --manual' for more information

Which is weird considering the first time curl runs it uses the tor proxy just fine (i checked and it is using the proxy, not bypassing it).
I think there is something wrong in the line:
system("curl", "-sSRL", "--socks5-hostname 127.0.0.1:9150", "-A 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0'", "-H 'Accept-Language: en-us,en;q=0.5'", "-o", "$filename", "--retry", "5", "-C", "-", "$download");
but i don't know what.

danieru
Hors ligne
A rejoint: 01/06/2013

>"curl: option --socks5-hostname 127.0.0.1:9150: is unknown"
>"I think there is something wrong in the line:
system("curl", "-sSRL", "--socks5-hostname 127.0.0.1:9150", "-A 'Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Firefox/31.0'", "-H 'Accept-Language: en-us,en;q=0.5'", "-o", "$filename", "--retry", "5", "-C", "-", "$download");
but i don't know what."

Why do you think that? Actually I too think there's something wrong with that line.
You added that "--socks5-hostname 127.0.0.1:9150", right?
Because I think it should be:
"--socks5-hostname '127.0.0.1:9150'"

GNUser
Hors ligne
A rejoint: 07/17/2013

Wish it was that simple.
I had already tried that and it gave the same error.
Also, notice that the first curl occurrence is okay (just copy paste the script and give it a go, check what happens). The second one is where the script fails. It actually manages to do the "discover video title and url" thing very well, it just fails to download.

danieru
Hors ligne
A rejoint: 01/06/2013

Wait just sec.
Why are you using this script instead of using youtube-dl?
youtube-dl supports proxy too.

--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify
a proper scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection

GNUser
Hors ligne
A rejoint: 07/17/2013

Because it's still experimental. I am not even sure if they have hostname being resolved over proxy or directly. CUrl on the other hand is a software that I have tried and used many times, and I have relatively confidence on it's socks implementation.
ALso, I am not even sure if youtube-dl current repo's version has it or not (though I suspect, given the fact that youtube-dl is updated by automatic updates almost every other day on Trisquel).
Thanks anyway, if I was into youtube-dl it would have been nice to know this :)

BugRep
Hors ligne
A rejoint: 04/05/2012

This gives you a clue
curl: option --socks5-hostname 127.0.0.1:9150: is unknown

Curl thinks that the whole thing is one parameter. You can get the same result by running
curl '--socks5-hostname 127.0.0.1:9150' localhost

What you should do is make it separate. You should try removing the quotes (") altogether, or putting both parts in separate quotes like this:
"--socks5-hostname" "127.0.0.1:9150"
or this:
"--socks5-hostname", "127.0.0.1:9150"
In the latter case the comma is part of the perl syntax.

GNUser
Hors ligne
A rejoint: 07/17/2013

Thanks! That worked! I love to learn new stuff :)
However there seems to be a problem with the script itself (which may have been a mistake on my part) so I will have to check it better and let you guys know if it worked or not :)

GNUser
Hors ligne
A rejoint: 07/17/2013

It's working now! I even corrected a bug in the end of the script, it would never write "Finished name_of_video" and now it does (would give this back to the guys who wrote the script, but they never answered my emails so I don't think they even read them).
Also, I updated the curl parameters to look more like firefox, or in this case the latest TBB. Hope you guys enjoy it (if anyone needs it lol).

Also, I made some small tests with youtube-dl socks proxy support and it looks good. I will use it if necessary, but I am unsure what info youtube-dl might leak about system, version, date/time, etc. So far socks5 looks good.

Here is the script:

#!/usr/bin/perl -T

use strict;
use warnings;

#
## Calomel.org ,:, Download Youtube videos
## Script Name : youtube_download.pl
## Version : 0.58
## Valid from : March 2016
## URL Page : https://calomel.org/youtube_wget.html
## OS Support : Linux, Mac OSX, OpenBSD, FreeBSD
# `:`
## Two arguments
## $1 Youtube URL from the browser
## $2 prefix to the file name of the video (optional)
#

############ options ##########################################

# Option: what file type do you want to download? The string is used to search
# in the youtube URL so you can choose mp4, webm, avi or flv. mp4 is the most
# compatable and plays on android, ipod, ipad, iphones, vlc and mplayer.
my $fileType = "mp4";

# Option: what visual resolution or quality do you want to download? List
# multiple values just in case the highest quality video is not available, the
# script will look for the next resolution. You can choose "itag=22" for 720p,
# "itag=18" which means standard definition 640x380 and "itag=17" which is
# mobile resolution 144p (176x144). The script will always prefer to download
# the first listed resolution video format from the list if available.
my $resolution = "itag=22,itag=18";

# Option: How many times should the script retry if the download fails?
my $retryTimes = 2;

# Option: turn on DEBUG mode. Use this to reverse engineering this code if you are
# making changes or you are building your own youtube download script.
my $DEBUG=0;

#################################################################

# initialize global variables and sanitize the path
$ENV{PATH} = "/bin:/usr/bin:/usr/local/bin:/opt/local/bin";
my $prefix = "";
my $retry = 1;
my $retryCounter = 0;
my $user_url = "";
my $user_prefix = "";

# collect the URL from the command line argument
chomp($user_url = $ARGV[0]);
my $url = "$1" if ($user_url =~ m/^([a-zA-Z0-9\_\-\&\?\=\:\.\/]+)$/ or die "\nError: Illegal characters in YouTube URL\n\n" );

# declare the user defined file name prefix if specified
if (defined($ARGV[1])) {
chomp($user_prefix = $ARGV[1]);
$prefix = "$1" if ($user_prefix =~ m/^([a-zA-Z0-9\_\-\.\ ]+)$/ or die "\nError: Illegal characters in filename prefix\n\n" );
}

# if the url down below does not parse correctly we start over here
tryagain:

# make sure we are not in a tryagain loop by checking the counter
if ( $retryTimes < $retryCounter ) {
print "\n\n Stopping the loop because the retryCounter has exceeded the retryTimes option.";
print "\n The video may not be available at the requested resolution or may be copy protected.\n\n";
print "\nretryTimes counter = $retryTimes\n\n" if ($DEBUG == 1);
exit;
}

# download the html from the youtube page containing the page title and video
# url. The page title will be used for the local video file name and the url
# will be sanitized to download the video.
my $html = `curl --socks5-hostname 127.0.0.1:9150 -A "Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.0" -H "Accept-Language: en-us,en;q=0.5" -sS -L --compressed "$url"` or die "\nThere was a problem downloading the HTML page.\n\n";

# format the title of the page to use as the file name
my ($title) = $html =~ m/(.+)<\/title>/si;
$title =~ s/[^\w\d]+/_/g or die "\nError: we could not find the title of the HTML page. Check the URL.\n\n";
$title = lc ($title);
$title =~ s/_youtube//ig;
$title =~ s/^_//ig;
$title =~ s/_amp//ig;
$title =~ s/_39_s/s/ig;
$title =~ s/_quot//ig;

# filter the URL of the video from the HTML page
my ($download) = $html =~ /"url_encoded_fmt_stream_map"(.*)/ig;

# Print the raw separated strings in the HTML page
#print "\n$download\n\n" if ($DEBUG == 1);

# This is where we loop through the HTML code and select the file type and
# video quality.
my @urls = split(',', $download);
OUTERLOOP:
foreach my $val (@urls) {
# print "\n$val\n\n";

if ( $val =~ /$fileType/ ) {
my @res = split(',', $resolution);
foreach my $ress (@res) {
if ( $val =~ /$ress/ ) {
print "\n html to url separation complete.\n\n" if ($DEBUG == 1);
print "$val\n" if ($DEBUG == 1);
$download = $val;
last OUTERLOOP;
}
}
}
}

# clean up by translating url encoding and removing unwanted strings
print "\n Start regular expression clean up...\n" if ($DEBUG == 1);
$download =~ s/\%([A-Fa-f0-9]{2})/pack('C', hex($1))/seg;
$download =~ s/sig=/signature=/g;
$download =~ s/\\u0026/\&/g;
$download =~ s/(type=[^&]+)//g;
$download =~ s/(fallback_host=[^&]+)//g;
$download =~ s/(quality=[^&]+)//g;
$download =~ s/&+/&/g;
$download =~ s/&$//g;
$download =~ s/%2C/,/g;
$download =~ s/%252F/\//g;
$download =~ s/^:"url=//g;
$download =~ s/\"//g;
$download =~ s/\?itag=22&/\?/;

# print the URL before adding the page title.
print "\n The download url string: \n\n$download\n" if ($DEBUG == 1);

# check for &itag instances and either remove extras or add an additional
my $counter1 = () = $download =~ /&itag=\d{2,3}/g;
print "\n number of itag= (counter1): $counter1\n" if ($DEBUG == 1);
if($counter1 > 1){ $download =~ s/&itag=\d{2,3}//; }
if($counter1 == 0){ $download .= '&itag=22' }

# save the URL starting with http(s)...
my ($youtubeurl) = $download =~ /(https?:.+)/;

# is the URL in youtubeurl the variable? If not, go to tryagain above.
if (!defined $youtubeurl) {
print "\n URL did not parse correctly. Let's try another mirror...\n";
$retryCounter++;
sleep 2;
goto tryagain;
}

# collect the title of the page
my ($titleurl) = $html =~ m/(.+)<\/title>/si;
$titleurl =~ s/ - YouTube//ig;

# combine file variables into the full file name
my $filename = "unknown";
$filename = "$prefix$title.$fileType";

# url title to url encoding. all special characters need to be converted
$titleurl =~ s/([^A-Za-z0-9\+-])/sprintf("%%%02X", ord($1))/seg;

# combine the youtube url and title string
$download = "$youtubeurl\&title=$titleurl";

# Process check: Are we currently downloading this exact same video? Two of the
# same download processes will overwrite each other and corrupt the file.
my $running = `ps auwww | grep [c]url | grep -c "$filename"`;
print "\n Is the same file name already being downloaded? $running" if ($DEBUG == 1);
if ($running >= 1)
{
print "\n Already $running process, exiting." if ($DEBUG == 1);
exit 0;
};

# Print the long, sanitized youtube url for testing and debugging
print "\n The following url will be passed to curl:\n" if ($DEBUG == 1);
print "\n$download\n" if ($DEBUG == 1);

# print the file name of the video being downloaded for the user
print "\n Download: $filename\n\n" if ($retryCounter == 0 || $DEBUG == 1);

# print the itag quantity for testing
my $counter2 = () = $download =~ /&itag=\d{2,3}/g;
print "\n Does itag=1 ? $counter2\n\n" if ($DEBUG == 1);
if($counter2 < 1){
print "\n URL did not parse correctly (itag).\n";
exit;
}

# Background the script before the download starts. Use "ps" if you need to
# look for the process running or use "ls -al" to look at the file size and
# date.
fork and exit;

# Download the video, resume if necessary
system("curl", "--socks5-hostname", "127.0.0.1:9150", "-A 'Mozilla/5.0 (Windows NT 6.1; rv:45.0) Gecko/20100101 Firefox/45.0'", "-H 'Accept-Language: en-us,en;q=0.5'", "-o", "$filename", "--retry", "5", "-C", "-", "$download");

# Print the exit error code
print "\n exit error code: $?\n" if ($DEBUG == 1);

# Exit Status: Check if the file exists and we received the correct error code
# from the curl system call. If the download experienced any problems the
# script will run again and try to continue the download until the retryTimes
# count limit is reached.

if( $? == 0 && -e "$filename" && ! -z "$filename" )
{
print "\n Finished: $filename\n\n" if ($DEBUG == 0);
}
else
{
print STDERR "\n FAILED: $filename\n\n" if ($DEBUG == 1);
$retryCounter++;
sleep $retryCounter;
goto tryagain;
}

#### EOF #####

Thanks for the help guys.