Fun Detagger Script in PHP
Question:
Anyone know of any free/shareware utility that will strip out *all* the tags from HTML *other* than the table structure ?
I keep needing to cut stuff out of web pages (e.g. my overture results pages) and paste them into ms Excel or CSV or basic, basic HTML.
Answer: Sure! With PHP:
<?php // a fun detagger script // lives at: http://artlung.com/lab/php/detagger/ // created by Joe Crawford // feel free to use and modify // December 18 2002 // http://artlung.com/ // main functions used: // http://www.php.net/manual/en/function.strip-tags.php // http://www.php.net/manual/en/function.file-get-contents.php /* **************************************** */ // this is a version of file_get_contents() // that older versions of PHP can understand // may need to be removed. function file_get_contents($filename) { $fp = @fopen($filename, "r"); if (!($fp)) { return 0; } while (!feof($fp)) { $temp .= fread($fp, 4096); } return $temp; } /* **************************************** */ /* **************************************** */ // what url or file do we want to parse? $url = "http://www.google.com/"; // what tags do we want to save? $tags_to_keep = "<table><tr><td><th>"; // grab the url $string = file_get_contents ($url); // strip the tags $cleaned_string = strip_tags ($string, $tags_to_keep); // print the results print $cleaned_string; /* **************************************** */ ?>