ARTLUNG LAB Share

Fun Detagger Script in PHP

Question:

Anyone know of any free/shareware utility that will strip out *all* the tags from HTML *other* than the table structure ?

I keep needing to cut stuff out of web pages (e.g. my overture results pages) and paste them into ms Excel or CSV or basic, basic HTML.

Answer: Sure! With PHP:

<?php

// a fun detagger script
// lives at: https://artlung.com/lab/php/detagger/
// created by Joe Crawford
// feel free to use and modify
// December 18 2002
// https://artlung.com/

// main functions used:
// http://www.php.net/manual/en/function.strip-tags.php
// http://www.php.net/manual/en/function.file-get-contents.php



/* **************************************** */
// this is a version of file_get_contents()
// that older versions of PHP can understand
// may need to be removed.
function file_get_contents($filename) {
   $fp = @fopen($filename, "r");
   if (!($fp)) {
      return 0;
   }
   while (!feof($fp)) {
      $temp .= fread($fp, 4096);
   }
   return $temp;
}
/* **************************************** */



/* **************************************** */
// what url or file do we want to parse?
$url = "http://www.google.com/";

// what tags do we want to save?
$tags_to_keep = "<table><tr><td><th>";

// grab the url
$string = file_get_contents ($url);

// strip the tags
$cleaned_string = strip_tags ($string, $tags_to_keep);

// print the results
print $cleaned_string;
/* **************************************** */

?>