Sam-I-Am on Web Development

Sam Foster on the web and web-ish software developmen

Tuesday, February 18, 2003

HTML::Index on CPAN.
This is a set of (perl) modules for creating, storing and searching indexes of html files that looks like a handy starting point for my html indexer. Seems like I might be able to sub-class it to use my own parser and store the code and throw out the content. So I could search for things like which pages on site.com are still using font tags? Which call such-and-such stylesheet or javascript library.
The real trick is going to be getting useful search results for tag combinations.
And don't forget I want to offer a download of the results in csv or xls format!

0 Comments:

<< Home