根据URL批量下载图片(能够打破防盗链设置)

· Read in about 2 min · (295 Words)

本文提供根据URL批量下载图片的Perl脚本,能够打破防盗链设置。

至于如何获得图片的URL,一般的浏览器都支持右键点击图片“复制图片链接”,有的网站的图片URL有规律,这就最理想了,用excel扩展出n个URL即可!

虽然Firefox可以在图片上点击右键“查看图片信息”显示该页面所有图片的地址,并能够批量”另存为”,但如果遇到某些图片链接,比如QQ空间上的图片的链接,它不是以图片后缀名结尾的,此方法会把所有图片保存为同一个文件名的文件,即覆盖先前文件,最终只得到一个文件。此脚本通过计数将图片保存为不重复的文件。

第一次运行batch_download_images.pl,它会自动生成一个默认的URL文件,如下所示:

# 设置referer是为了反“防盗链”,只需设置该网站任一页面的地址即可
referer = http://www.wzsky.net/html/Website/Color/117958_11.html

# urls
http://www.wzsky.net/img2013/uploadimg/20130906/12162977.jpg
http://www.wzsky.net/img2013/uploadimg/20130906/12162978.jpg
http://www.wzsky.net/img2013/uploadimg/20130906/12162979.jpg

代码如下:

#!/usr/bin/perl
# Name    : Batch download images by url file.
# Author  : Wei Shen
# Contact : shenwei356#gmail.com
# Site    : http://shenwei.me
# Date    : 2013-10-22
# Update  : 2013-10-22

use strict;
use LWP::UserAgent;

unless ( @ARGV >= 1 ) {
    &create_sample_urls_file() unless -e "urls.txt";
    die "\nUsage: $0 <URLs File> [<URLs File> ...]\n\n";
}

my $s = "file_";
my $n = 0;
my ( $file, $url, $f );

my $browser = LWP::UserAgent->new;
my $response;
my $referer;
my ( $para, $urls );

while ( @ARGV > 0 ) {
    $file = shift @ARGV;
    ( $para, $urls ) = &read_urls($file);
    $referer = $$para{referer};

    for $url (@$urls) {
        if ( $url =~ /\/([^\/]+?\.[\w]+?)$/ ) {
            $f = $1;
        }
        else {
            $f = "$s$n.jpg";
            $n++;
        }

        $response = $browser->get( $url, Referer => $referer );
        open( OUT, ">$f" ) || die $!;
        binmode(OUT);
        print OUT $response->content;
        close(OUT);

        print $f, "\n";
    }
}

sub read_urls ($) {
    my ($file) = @_;
    my $para   = {};
    my $urls   = [];
    open IN, $file or die "File $file failed to open.\n";
    while (<IN>) {
        s/^\s+//g;
        s/\s+$//g;
        next if $_ eq ''    # blank line
          or /^#/;          # annotation
        s/\#.*//g;          # delete annotation
        if (/([\w\_]+)\s*=\s*(.+)/) {
            warn "$1 was defined more than once\n" if defined $$para{$1};
            $$para{$1} = $2;
            warn "value of $1 undefined!\n" if $2 eq '';
        }
        else {
            s/\r?\n//;
            push @$urls, $_;
        }
    }
    close IN;
    return ( $para, $urls );
}

# Create sample urls file
sub create_sample_urls_file {
    my $content = <<"URL";
# 设置referer是为了反“防盗链”,只需设置该网站任一页面的地址即可
referer = http://www.wzsky.net/html/Website/Color/117958_11.html

# urls
http://www.wzsky.net/img2013/uploadimg/20130906/12162977.jpg
http://www.wzsky.net/img2013/uploadimg/20130906/12162978.jpg
http://www.wzsky.net/img2013/uploadimg/20130906/12162979.jpg

URL
    open OUT, ">", "urls.txt"
      or die "Failed to create default url file\n";
    print OUT $content;
    close OUT;

}

此脚本仅供交流,请尊重版权,不要盗用别人网站上的图片资源。