Fun Detagger Script in PHP
Question:
Anyone know of any free/shareware utility that will strip out
*all* the tags from HTML *other* than the table structure ?
I keep needing to cut stuff out of web pages
(e.g. my overture results pages)
and paste them into ms Excel or CSV or basic, basic
HTML.
Answer: Sure! With PHP:
<?php
// a fun detagger script
// lives at: https://artlung.com/lab/php/detagger/
// created by Joe Crawford
// feel free to use and modify
// December 18 2002
// https://artlung.com/
// main functions used:
// http://www.php.net/manual/en/function.strip-tags.php
// http://www.php.net/manual/en/function.file-get-contents.php
/* **************************************** */
// this is a version of file_get_contents()
// that older versions of PHP can understand
// may need to be removed.
function file_get_contents($filename) {
$fp = @fopen($filename, "r");
if (!($fp)) {
return 0;
}
while (!feof($fp)) {
$temp .= fread($fp, 4096);
}
return $temp;
}
/* **************************************** */
/* **************************************** */
// what url or file do we want to parse?
$url = "http://www.google.com/";
// what tags do we want to save?
$tags_to_keep = "<table><tr><td><th>";
// grab the url
$string = file_get_contents ($url);
// strip the tags
$cleaned_string = strip_tags ($string, $tags_to_keep);
// print the results
print $cleaned_string;
/* **************************************** */
?>