segunda-feira, 4 de julho de 2011

Script Crawler Python - Web Crawler Security Tool

The web Crawler is a python based tool that automatically spider a web site. This tool also look for directory indexing and crawl the directories with indexing again to list all files in it. There is also an option that allows download the files found and it can be used with FOCA or other software to extract metadata from files.

Current stable version is 0.4 and the main features are:

Crawl http and https web sites.
Crawl http and https web sites not using common ports.
Uses regular expressions to find ‘href’ and ‘src’ html tag. Also content links.
Identifies relative links.
Identifies domain related emails.
Identifies directory indexing.
Detects references to URLs like ‘file:’, ‘feed=’, ‘mailto:’, ‘javascript:’ and others.
Uses CTRL-C to stop current crawler stages and continue working.
Identifies file extensions (zip, swf, sql, rar, etc.)
Download files to a directory:
Download every important file (images, documents, compressed files, etc)
Or download specified files types.
Or download a predefined set of files (like ‘document’ files: .doc, .xls, .pdf, .odt, .gnumeric, etc.).
Maximum amount of links to crawl. A default value of 5000 URLs is set.
Follows redirections using HTML and JavaScript Location tag and HTTP response codes.

Note: This crawler can be used with Domain Analyzer Security Tool. (See Domain Analyzer)

http://sourceforge.net/projects/webcrawler-py/

Nenhum comentário:

Postar um comentário

Observação: somente um membro deste blog pode postar um comentário.