Metatag Extractor Script in PHP/Perl/Python/Shell Script

VN:F [1.9.22_1171]
Rating: 5.0/5 (2 votes cast)

I’m looking for a script to extract metatags from all pages of a website. The metatags which we are interested in are

– Title

– Keywords

– Description

The input to this script would be

– Website’s URL (e.g. http://www.domain.com)

– Option to extract metatags from all pages or just one URL as specified in previous input

– Directories to exclude (e.g. http://www.domain.com/directory/ , http://www.domain.com/directory_2/)

Output of this script will be a csv file containing following in that order

URL, Title, Keywords, Description

Additional inputs:

– Delay between each GET

– Total number of urls to crawl – this is for testing purpose

– Option to just print (export) URLs in CSV file – for testing purpose

Script must be able to cope with crawling about 2k pages in the website. If there is very significant delay in crawling and extracting these details for 2k urls then this script is not suitable for our use.

You can use any programming language from PHP/Perl/Python etc or shell scripting language.

We would like to avoid database unless you think it’s absolutely necessary.

There might be chances that the target website’s server has restriction on how many requests you can send in short time so you will need to include a delay between each GET.

We need this script asap.

The script must be fully tested against above mentioned requirements.

Provide your best solution by mentioning the language you will use and cost for the final script.

If you already have similar script then you are welcome to show us working example.

Apply via following form.

Job Application Form

Your Name (required):

Your Email (required):

Cover Letter (required)

Attach CV (.txt/.doc/.docx/.pdf files only)

captcha

You can skip to the end and leave a response. Pinging is currently not allowed.

Leave a Reply

You must be logged in to post a comment.

Wordpress Theme Design by Swami Web Design, SEO by SwamiSEO