Foros del Web - Ver Mensaje Individual - Como obtener html de nodo actual? (DOMXPath)

InKarC · #1 (**permalink**) 28/11/2010, 14:18

Bueno, aclaremos soy muy noob en PHP y esta pregunta probablemente va a resultar muy tonta para alguien experimentado...

La duda es esta; necesito un php que coja todo los tags "img" de un sitio; consegui un script que lo "casi" que lo hace.... digo "casi" por que solo me coje el atributo "src", necesito que coja todo el html....

Esto es un elemento DOMXPath....
la parte crucial debe ser esta:

Código:

$url = $href->getAttribute('src');
// esto me devuelve solo el atributo src... por ej 'http://www.google.com.co/logos/classicplus.png
_______________________
_______________________

necesito que sea algo como (esto no sirve, es solo ejemplo de lo que quiero conseguir)

Código:

$url = $href->getThisHtml();

// deveria devolver todo el tag '<img src='http://www.google.com.co/logos/classicplus.png' alt="logo google" />
_______________________
_______________________

Al que le interese aqui esta todo el codigo:

Código:

<?php 
function Conectarse() 
{ 
   if (!($link=mysql_connect("localhost","root",""))) 
   { 
      //echo "Error conectando a la base de datos."; 
      exit(); 
   } 
   if (!mysql_select_db("test",$link)) 
   { 
     // echo "Error seleccionando la base de datos."; 
      exit(); 
   } 
   return $link; 
} 

$link=Conectarse(); 
echo "Conexión con la base de datos conseguida.<br>"; 

?> 
<?php
function storeLink($url,$gathered_from) {
	global $link;
	$query = "INSERT INTO geted (url, gathered_from) VALUES ('$url', '$gathered_from')";
	mysql_query($query, $link) or die('Error, insert query failed');
}

$target_url = "http://www.deviantart.com";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
	echo "<br />cURL error number:" .curl_errno($ch);	
	echo "<br />cURL error:" . curl_error($ch);
	exit;
}

// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all the on the page
$xpath = new DOMXPath($dom);



$hrefs = $xpath->evaluate("/html/body//img");

for ($i = 0; $i < $hrefs->length; $i++) {
	$href = $hrefs->item($i);
	$url = $href->getAttribute('src');
	storeLink($url,$target_url);
	echo "<br />Link stored: $url";
}
?>