Foros del Web - Ver Mensaje Individual

andreiya · #7 (**permalink**) 12/01/2015, 14:49

Cita:

Iniciado por enlinea777

Hola Andreiya

aqui te dejo el codigo para parsear en google, espero te ayude.

aqui el ejemplo en vivo

[URL="http://creaelicita.cl/ayuda/foro_1117870_v1.php?texto=gatos"]http://creaelicita.cl/ayuda/foro_1117870_v1.php?texto=gatos[/URL]

usa la variable texto para buscar lo que quieras

Código PHP:

  <?php 
include('simple_html_dom.php');  
   
        $handler = curl_init();// iniciar el comando CURL POR PHP 
        curl_setopt($handler, CURLOPT_URL, 'https://www.google.com/search?q=gatitos&ie=utf-8&oe=utf-8'); 
        curl_setopt($handler, CURLOPT_RETURNTRANSFER, 1); 
        curl_setopt($handler,CURLOPT_USERAGENT,'Mozilla/4.0 (PDA; PalmOS/sony/model prmr/Revision:1.1.54 (en)) NetFront/3.0');// IMPORTANTE HACRCE PASAR POR UNA PALM TE ASEGURA UN RESULTADO MAS LIMPIO 
        // OTROS COMPLEMENTOS 
        curl_setopt($handler,CURLOPT_HOTS,'www.google.com'); 
        curl_setopt($handler,CURLOPT_ACCEPT,'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'); 
        curl_setopt($handler,CURLOPT_ACCEPTLANGUAGE,'es-cl,es;q=0.8,en-us;q=0.5,en;q=0.3'); 
        curl_setopt($handler,CURLOPT_COOKIE,'NID=67=nHypWZWZS_T-SlSSuXAJTMhZB4gIvYOtOUP0vjoOPGjE8HYSK0ONSY1luZeWkIvH7iqpVdxH6Xf0BEcTqIj1Y6rrZJQ8z8O3HdX2gFekYHs-Eh0dJ6d_aw8Lq2VHtFwlBLVxEk5P4t77cyvTNK7EeEDrmUtrgoBFEX72EDgPHvJ42GJshGlaYLTa-Gr2pguDypLsCNRySdDN5dyxSnIn; SID=DQAAAAICAABeps3N4B3sU4HJ1vIE97a98emN7n94wpIoW5MJRRMNHexTSzoY_NfXTV8CIZ2NW6vnC0K2G8Sq7YqEWI_pEEuHPvWA4CkLxMFMC_An3BkmgX9ZD9MywaLuUGlMF-3Hyf8PgJbc6NYTqaN1Qo6duT3aMplkRVmVMHVdPfq46VffGIq0ayEzcd-uYmTjE7ckT0rpb-RqrcuGUyvqUZPnvFbCwDqcGtSd98bEj6bfwDyG3Ap6apF_v6DmQan-eNkIE1FEXzHB0bXd_nKROCm4FKJrz1tONYJGRVP43XHbgv89UkEyF9OAgiZvUB3PyFUU4ihkv0raYQZBcLiiY6NvRPANpKfN885MNwaFPHLHR1woTELjrorj-YASvVxr8ai3blqs7nFI8fy1S_DwWe_weJxn6PvZsX757xh0GQBMOfKznHlx6XFNKaVmwlhdhwlVY3UfJO-svRhpOSCReeLzFlhye19A1sSY-HCTlBEZVI6eXKhCkb5YzTr5DZM79bC2Px45RBu39M4i8znhjIgNe3a6xC5P6crGjAkes6qx7LEdc9keJB27X0ZN5e_S6wccrp9zbMGn9jJ03hW47A4FISDqhyETipj40sXIx0xEDungKl9U_4cGbAMeyWdjwc-HqVu4jSHNw5mOwNloTY_zFWlAdXnalvNzRoBEiRWHHh59Z9rtvxnekr33q7t3xMqo8Gg; HSID=AbEEV6EbfNcqLumK2; SSID=ApgzPR1FFU_d2zQU7; APISID=HugUvpV0VR3gJMIQ/AVyBODG-BvQpt5r7O; SAPISID=oCYqAoqsB6_lPdrh/AFw8b_LEX6RubiuAr; PREF=ID=b904718712cc0d24:U=39272d8d77cdf325:FF=0:LD=es-419:NR=100:TM=1407341101:LM=1421081382:GM=1:SG=1:S=Iy_OMhncF0Buyype; GOOGAPPUID=739; enabledapps.uploader=0; llbcs=0'); 
        curl_setopt($ch,CURLOPT_CONNECTION,'keep-alive'); 
        curl_setopt($ch,CURLOPT_PRAGMA,'no-cache'); 
        curl_setopt($ch,CURLOPT_CACHECONTROL,'no-cache'); 
    $response = curl_exec($handler);// EJECURAT Y DEVOLVER EL RESULTADO 
     
    curl_close($handler); //CERRAR LA LLAMADA 
     
    // ALGUNOS FILTROS  PARA EL HTML   
    $response = ereg_replace("<script([^>]*)>([^<]*)<\/script>",'',$response); 
    $response = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) class=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) bgcolor=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) border=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) title=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) onDblClick=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) onClick=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) class=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) role=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) tabindex=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) aria-expanded=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) aria-haspopup=".*?"/i', '$1', $response); 
    $response = preg_replace('/(<[^>]+) data-ved=".*?"/i', '$1', $response); 
    $response = preg_replace('/\<ol\>/i', '$1', $response);$response = preg_replace('/\<\/ol\>/i', '$1', $response); 
    $response = preg_replace('/\<h.\>/i', '$1', $response);$response = preg_replace('/\<\/h.\>/i', '$1', $response); 
    $response = preg_replace('/\<br\>/i', '$1', $response); 
    $response = preg_replace('/\<span\>/i', '$1', $response);$response = preg_replace('/\<\/span\>/i', '$1', $response); 
    $response = preg_replace('/\<ul\>/i', '$1', $response);    $response = preg_replace('/\<\/ul\>/i', '$1', $response); 
    $response = preg_replace('/\<b\>/i', '$1', $response);    $response = preg_replace('/\<\/b\>/i', '$1', $response); 
    $response = ereg_replace("<cite([^>]*)>([^<]*)<\/cite>",'',$response); 
    $response =preg_replace('/thead/', 'tr', $response); 
    $response =preg_replace('/\<\/th>/', '</td> ', $response); 
    $response = preg_replace('/tbody/', 'tr', $response); 
 
?><h1>Links</h1><?       
$html = str_get_html($response);  // USAR  simple_html_dom   PARA FACILITAR LA BUSQUEDA 
$ret = $html->find('a'); // BUSCAR LOS LINKS 
 
    foreach($ret as $element){// RECORRER LOS LINKS 
        $link=$element->href; 
        $link='http'.preg_replace('/http(.*)http/', '', $link);// REEMPLASA LOS LINKS DE GOOGLE 
        $texto=$element->plaintext;// TEXTO DEL LINK 
        $q=substr($link,0,5);// LOS PRIMEROS 5 CARACTERES PARA FILTRAR RESULTADOS ENTRE PAGINAS DE GOOGLE Y BUSQIEDAS 
         
        if($q=='http:' && $texto!='Cached'){// FILTRADO ENTRE PAGINAS DE GOOGLE Y BUSQIEDAS 
        ?> 
        <a href="<?=$link?>"><?=$texto?$texto:substr($link,0,(strpos(substr($link,8),"/")+8))// CODIGO AVANZADO (NO TOCAR :-)  )?></a><br> 
        <?  
        } 
 
} 
?>

NOTA: Usa el codigo del cometario anterior para el include

Gracias cielo, llevo días trasteando con las opciones de esa página, pero no me salía muy bien. Al final construí una función personalizada.
Me guardo este código para probarlo en otras cosillas ;)
Añado un tema importante que, por ser novata, no sabía.

La función file_get_contents,( normalmente me ha funcionado más lenta que cURL, pero alguna vez también sucede con cURl)me ha tenido loca unos días.

El motivo, que no almacena fielmente lo que muestra la web desde tu ordenador.
Ejemplo:
Descargaba una página con file_get_contents, y al introducir las búsquedas en la cadena almacenada no me salían los resultados.

Después de mirar y remirar el código me di cuenta de que era el user-agent lo que fallaba. Al mandar la petición con cURL(que permite configurar el user-agent) ya la página sí me devolvía la versión que me hacía falta(en la que aparecían los links)

Todo esto contado parece una tontería, pero me ha dado dolores de cabeza y reescribir el código mil veces hasta que me di cuenta de que la cadena que devolvía uno y otro eran diferentes.

Doy el tema por solucionado y te mando un besito grande enlinea777

por la ayudita prestada