Monday, September 12, 2005

"Fetch! Good Spidey!" *pat* *pat*

Quick definitions:
Spider/crawler: A program that browses (usually) multiple webpages for you, and finds content you ask it to.
Scraper: A program that "scrapes" data of a certain type off of a webpage. If you download a picture you are "scraping" that picture. Automating the process means you can let the computer get it for you.

Ok, so my most recent project has been this Spider/scraper Spiderfetch.pl. It was originally intended to scrape manga off of webservers, but it had the (obvious) functionallity to pull porn off of free servers. I found the necessity to add some counter-obfuscation methodologies some webmasters were implimenting (they really don't want you to scrape their content :-P )I later incorperated the ability to follow links and search the linked pages for the content you want. There are a lot of pages that sort, catagorize, and link directly to sample/free content. This makes your free porn collection endeavors considerably easier. However, many "link pages" have some form of process which counts how many each link is clicked so they can get stats on what most visitors prefer to see, and where they came from. This also serves as a form of link obfuscation that must be circumvented. So I added the functionality to deal with these "processed links", I still can't follow unobvious links, but that functionallity should be coming soon! Since I had already built in the ability to pick up different extensions, it was already programmed to pick up movies of different extensions, mp3s whatever you want. I've yet to add the functionallity to pick up content on the page (it only picks up content linked from the page), but that will come soon too.

But does it work???

Currently, my girlfriend and I *actually* have more porn than we know what to do with.

Impressive? yes.

I even added more functionallity to circumvent spider prevention tools/forms/etc. after a while it became less about the porn, and more about the programming challenge... but that's how most of my projects work out anyways!

My friend was (understandably) interested in the prospect of the script, and asked that I send him a copy. I had just worked a tough site, and had a hard-coded version of the script, and found it amusing that instead of trading porn, I was providing the means to obtain the porn I had. Know what I mean? Kind of like bittorrent, you don't send the file, you send the ability to obtain the file.

Essencially I've reduced 100MB of data into a 3K perl script that can be run, or not run at his lesure to obtain the mass of data. Cool eh?
of course as opposed to bittorrent, this is contrived without the concerns for the host's bandwidth, but... then... hey... free porn! :-D

I've been meaning to get some hosting, but I don't care enough to. And since some of the characters in the script get fuxored with the HTML formatting, I can't just post the plain text. Whatever.

No comments: