Foros del Web » Programando para Internet » PHP »

Lista de nombres de spider-bots

Estas en el tema de Lista de nombres de spider-bots en el foro de PHP en Foros del Web. Hola comunidad. Recientemente he terminado un sistema de estadísticas web en php. Lo único que me falta es una lista completa con los nombres de ...
  #1 (permalink)  
Antiguo 04/04/2008, 19:49
(Desactivado)
 
Fecha de Ingreso: diciembre-2006
Mensajes: 529
Antigüedad: 17 años, 4 meses
Puntos: 11
Lista de nombres de spider-bots

Hola comunidad.

Recientemente he terminado un sistema de estadísticas web en php.
Lo único que me falta es una lista completa con los nombres de los spider-bots de los buscadores más imaportantes.

Por ejemplo: Google -> Googlebot

Código PHP:
$this->se_def = array(
'MSN' => array('MSN''msnbot''msn.''http://www.msn.com''http://search.msn.com/msnbot.htm''q='),
'Yahoo!' => array('Yahoo!''Yahoo''yahoo.''http://www.yahoo.com''http://help.yahoo.com/help/us/ysearch/slurp''p='),
'AOL' => array('AOL''''search.aol.''http://www.aol.com''http://www.aol.com''&query='),
'Google' => array('Google''Googlebot''google.''http://www.google.com''http://www.google.com/bot.html''q='),
'Dog Pile' => array('Dog Pile''''dogpile.''http://www.dogpile.com''http://www.dogpile.com''Web/'),
'MetaCrawler' => array('MetaCrawler''''metacrawler.''http://www.metacrawler.com''http://www.metacrawler.com''web/'),
'Ask Jeeves' => array('Ask Jeeves''''ask.''http://www.ask.com''http://www.ask.com''q='),
'All The Web' => array('AllTheWeb''''alltheweb.''http://www.alltheweb.com''http://www.alltheweb.com''q='),
'Google Image' => array('Google Image''Googlebot-Image''images.google.''http://images.google.com''http://www.google.com/bot.html''q='),
'Altavista' => array('Altavista''Scooter''altavista.''http://www.altavista.com''http://www.altavista.com''q=')
); 
Es la lista incompleta que pude lograr hasta ahora pero verán que algunos todavía les falta en el [1] el nombre del bot.

Alguien sabe de alguna lista actualizada con esta info?

Gracias espero sus aportes.
  #2 (permalink)  
Antiguo 04/04/2008, 20:14
(Desactivado)
 
Fecha de Ingreso: diciembre-2006
Mensajes: 529
Antigüedad: 17 años, 4 meses
Puntos: 11
Re: Lista de nombres de spider-bots

Al final... encontré una lista muy buena.
Por si a alguien alguna vez le sirve aqui va:

Son los nombres de los Spider-bots que te van a permitir saber si un robot ha indexado tu website implementando esto en algún sistema de estadísticas web.

Código:
 
(1, 'acme.spider', 'Acme Spider'),
(2, 'ahoythehomepagefinder', 'Ahoy! The Homepage Finder'),
(3, 'alkaline', 'Alkaline'),
(4, 'appie', 'Walhello appie'),
(5, 'arachnophilia', 'Arachnophilia'),
(6, 'architext', 'ArchitextSpider'),
(7, 'aretha', 'Aretha'),
(8, 'ariadne', 'ARIADNE'),
(9, 'arks', 'arks'),
(10, 'aspider', 'ASpider (Associative Spider)'),
(11, 'atn.txt', 'ATN Worldwide'),
(12, 'atomz', 'Atomz.com Search Robot'),
(13, 'auresys', 'AURESYS'),
(14, 'backrub', 'BackRub'),
(15, 'biUKrother', 'Big Brother'),
(16, 'bjaaland', 'Bjaaland'),
(17, 'blackwidow', 'BlackWidow'),
(18, 'blindekuh', 'Die Blinde Kuh'),
(19, 'bloodhound', 'Bloodhound'),
(20, 'brightnet', 'bright.net caching robot'),
(21, 'bspider', 'BSpider'),
(22, 'cactvschemistryspider', 'CACTVS Chemistry Spider'),
(23, 'calif[^r]', 'Calif'),
(24, 'cassandra', 'Cassandra'),
(25, 'cgireader', 'Digimarc Marcspider/CGI'),
(26, 'checkbot', 'Checkbot'),
(27, 'churl', 'churl'),
(28, 'cmc', 'CMC/0.01'),
(29, 'collective', 'Collective'),
(30, 'combine', 'Combine System'),
(31, 'conceptbot', 'Conceptbot'),
(32, 'coolbot', 'CoolBot'),
(33, 'core', 'Web Core / Roots'),
(34, 'cosmos', 'XYLEME Robot'),
(35, 'cruiser', 'Internet Cruiser Robot'),
(36, 'cusco', 'Cusco'),
(37, 'cyberspyder', 'CyberSpyder Link Test'),
(38, 'deweb', 'DeWeb(c) Katalog/Index'),
(39, 'dienstspider', 'DienstSpider'),
(40, 'digger', 'Digger'),
(41, 'diibot', 'Digital Integrity Robot'),
(42, 'directhit', 'Direct Hit Grabber'),
(43, 'dnabot', 'DNAbot'),
(44, 'download_express', 'DownLoad Express'),
(45, 'dragonbot', 'DragonBot'),
(46, 'dwcp', 'DWCP (Dridus Web Cataloging Project)'),
(47, 'e-collector', 'e-collector'),
(48, 'ebiness', 'EbiNess'),
(49, 'eit', 'EIT Link Verifier Robot'),
(50, 'elfinbot', 'ELFINBOT'),
(51, 'emacs', 'Emacs-w3 Search Engine'),
(52, 'emcspider', 'ananzi'),
(53, 'esther', 'Esther'),
(54, 'evliyacelebi', 'Evliya Celebi'),
(55, 'nzexplorer', 'nzexplorer'),
(56, 'fdse', 'Fluid Dynamics Search Engine robot'),
(57, 'felix', 'Felix IDE'),
(58, 'ferret', 'Wild Ferret Web Hopper #1, #2, #3'),
(59, 'fetchrover', 'FetchRover'),
(60, 'fido', 'fido'),
(61, 'finnish', 'Hämähäkki'),
(62, 'fireball', 'KIT-Fireball'),
(63, '[^a]fish', 'Fish search'),
(64, 'fouineur', 'Fouineur'),
(65, 'francoroute', 'Robot Francoroute'),
(66, 'freecrawl', 'Freecrawl'),
(67, 'funnelweb', 'FunnelWeb'),
(68, 'gama', 'gammaSpider, FocusedCrawler'),
(69, 'gazz', 'gazz'),
(70, 'gcreep', 'GCreep'),
(71, 'getbot', 'GetBot'),
(72, 'geturl', 'GetURL'),
(73, 'golem', 'Golem'),
(74, 'googlebot', 'Googlebot (Google)'),
(75, 'grapnel', 'Grapnel/0.01 Experiment'),
(76, 'griffon', 'Griffon'),
(77, 'gromit', 'Gromit'),
(78, 'gulliver', 'Northern Light Gulliver'),
(79, 'hambot', 'HamBot'),
(80, 'harvest', 'Harvest'),
(81, 'havindex', 'havIndex'),
(82, 'hometown', 'Hometown Spider Pro'),
(83, 'htdig', 'ht://Dig'),
(84, 'htmlgobble', 'HTMLgobble'),
(85, 'hyperdecontextualizer', 'Hyper-Decontextualizer'),
(86, 'iajabot', 'iajaBot'),
(87, 'ibm', 'IBM_Planetwide'),
(88, 'iconoclast', 'Popular Iconoclast'),
(89, 'ilse', 'Ingrid'),
(90, 'imagelock', 'Imagelock'),
(91, 'incywincy', 'IncyWincy'),
(92, 'informant', 'Informant'),
(93, 'infoseek', 'InfoSeek Robot 1.0'),
(94, 'infoseeksidewinder', 'Infoseek Sidewinder'),
(95, 'infospider', 'InfoSpiders'),
(96, 'inspectorwww', 'Inspector Web'),
(97, 'intelliagent', 'IntelliAgent'),
(98, 'irobot', 'I, Robot'),
(99, 'iron33', 'Iron33'),
(100, 'israelisearch', 'Israeli-search'),
(101, 'javabee', 'JavaBee'),
(102, 'jbot', 'JBot Java Web Robot'),
(103, 'jcrawler', 'JCrawler'),
(104, 'jeeves', 'Jeeves'),
(105, 'jobo', 'JoBo Java Web Robot'),
(106, 'jobot', 'Jobot'),
(107, 'joebot', 'JoeBot'),
(108, 'jubii', 'The Jubii Indexing Robot'),
(109, 'jumpstation', 'JumpStation'),
(110, 'katipo', 'Katipo'),
(111, 'kdd', 'KDD-Explorer'),
(112, 'kilroy', 'Kilroy'),
(113, 'ko_yappo_robot', 'KO_Yappo_Robot'),
(114, 'labelgrabber.txt', 'LabelGrabber'),
(115, 'larbin', 'larbin'),
(116, 'legs', 'legs'),
(117, 'linkidator', 'Link Validator'),
(118, 'linkscan', 'LinkScan'),
(119, 'linkwalker', 'LinkWalker'),
(120, 'lockon', 'Lockon'),
(121, 'logo_gif', 'logo.gif Crawler'),
(122, 'lycos', 'Lycos'),
(123, 'macworm', 'Mac WWWWorm'),
(124, 'magpie', 'Magpie'),
(125, 'marvin', 'marvin/infoseek'),
(126, 'mattie', 'Mattie'),
(127, 'mediafox', 'MediaFox'),
(128, 'merzscope', 'MerzScope'),
(129, 'meshexplorer', 'NEC-MeshExplorer'),
(130, 'mindcrawler', 'MindCrawler'),
(131, 'moget', 'moget'),
(132, 'momspider', 'MOMspider'),
(133, 'monster', 'Monster'),
(134, 'motor', 'Motor'),
(135, 'muscatferret', 'Muscat Ferret'),
(136, 'mwdsearch', 'Mwd.Search'),
(137, 'myweb', 'Internet Shinchakubin'),
(138, 'netcarta', 'NetCarta WebMap Engine'),
(139, 'netcraft', 'Netcraft Web Server Survey'),
(140, 'netmechanic', 'NetMechanic'),
(141, 'netscoop', 'NetScoop'),
(142, 'newscan-online', 'newscan-online'),
(143, 'nhse', 'NHSE Web Forager'),
(144, 'nomad', 'Nomad'),
(145, 'northstar', 'The NorthStar Robot'),
(146, 'occam', 'Occam'),
(147, 'octopus', 'HKU WWW Octopus'),
(148, 'openfind', 'Openfind data gatherer'),
(149, 'orb_search', 'Orb Search'),
(150, 'packrat', 'Pack Rat'),
(151, 'pageboy', 'PageBoy'),
(152, 'parasite', 'ParaSite'),
(153, 'patric', 'Patric'),
(154, 'pegasus', 'pegasus'),
(155, 'perignator', 'The Peregrinator'),
(156, 'perlcrawler', 'PerlCrawler 1.0'),
(157, 'phantom', 'Phantom'),
(158, 'piltdownman', 'PiltdownMan'),
(159, 'pimptrain', 'Pimptrain.com\'s robot'),
(160, 'pioneer', 'Pioneer'),
(161, 'pitkow', 'html_analyzer'),
(162, 'pjspider', 'Portal Juice Spider'),
(163, 'pka', 'PGP Key Agent'),
(164, 'plumtreewebaccessor', 'PlumtreeWebAccessor'),
(165, 'poppi', 'Poppi'),
(166, 'portalb', 'PortalB Spider'),
(167, 'puu', 'GetterroboPlus Puu'),
(168, 'python', 'The Python Robot'),
(169, 'raven', 'Raven Search'),
(170, 'rbse', 'RBSE Spider'),
(171, 'resumerobot', 'Resume Robot'),
(172, 'rhcs', 'RoadHouse Crawling System'),
(173, 'roadrunner', 'Road Runner: The ImageScape Robot'),
(174, 'robbie', 'Robbie the Robot'),
(175, 'robi', 'ComputingSite Robi/1.0'),
(176, 'robofox', 'RoboFox'),
(177, 'robozilla', 'Robozilla'),
(178, 'roverbot', 'Roverbot'),
(179, 'rules', 'RuLeS'),
(180, 'safetynetrobot', 'SafetyNet Robot'),
(181, 'scooter', 'Scooter (AltaVista)'),
(182, 'search_au', 'Search.Aus-AU.COM'),
(183, 'searchprocess', 'SearchProcess'),
(184, 'senrigan', 'Senrigan'),
(185, 'sgscout', 'SG-Scout'),
(186, 'shaggy', 'ShagSeeker'),
(187, 'shaihulud', 'Shai\'Hulud'),
(188, 'sift', 'Sift'),
(189, 'simbot', 'Simmany Robot Ver1.0'),
(190, 'site-valet', 'Site Valet'),
(191, 'sitegrabber', 'Open Text Index Robot'),
(192, 'sitetech', 'SiteTech-Rover'),
(193, 'slcrawler', 'SLCrawler'),
(194, 'slurp', 'Inktomi Slurp'),
(195, 'smartspider', 'Smart Spider'),
(196, 'snooper', 'Snooper'),
(197, 'solbot', 'Solbot'),
(198, 'spanner', 'Spanner'),
(199, 'speedy', 'Speedy Spider'),
(200, 'spider_monkey', 'spider_monkey'),
(201, 'spiderbot', 'SpiderBot'),
(202, 'spiderline', 'Spiderline Crawler'),
(203, 'spiderman', 'SpiderMan'),
(204, 'spiderview', 'SpiderView(tm)'),
(205, 'spry', 'Spry Wizard Robot'),
(206, 'ssearcher', 'Site Searcher'),
(207, 'suke', 'Suke'),
(208, 'suntek', 'suntek search engine'),
(209, 'sven', 'Sven'),
(210, 'tach_bw', 'TACH Black Widow'),
(211, 'tarantula', 'Tarantula'),
(212, 'tarspider', 'tarspider'),
(213, 'techbot', 'TechBOT'),
(214, 'templeton', 'Templeton'),
(215, 'teoma_agent1', 'TeomaTechnologies'),
(216, 'titin', 'TitIn'),
(217, 'titan', 'TITAN'),
(218, 'tkwww', 'The TkWWW Robot'),
(219, 'tlspider', 'TLSpider'),
(220, 'ucsd', 'UCSD Crawl'),
(221, 'udmsearch', 'UdmSearch'),
(222, 'urlck', 'URL Check'),
(223, 'valkyrie', 'Valkyrie'),
(224, 'victoria', 'Victoria'),
(225, 'visionsearch', 'vision-search'),
(226, 'voyager', 'Voyager'),
(227, 'vwbot', 'VWbot'),
(228, 'w3index', 'The NWI Robot'),
(229, 'w3m2', 'W3M2'),
(230, 'wallpaper', 'WallPaper'),
(231, 'wanderer', 'the World Wide Web Wanderer'),
(232, 'wapspider', 'w@pSpider by wap4.com'),
(233, 'webbandit', 'WebBandit Web Spider'),
(234, 'webcatcher', 'WebCatcher'),
(235, 'webcopy', 'WebCopy'),
(236, 'webfetcher', 'Webfetcher'),
(237, 'webfoot', 'The Webfoot Robot'),
(238, 'weblayers', 'Weblayers'),
(239, 'weblinker', 'WebLinker'),
(240, 'webmirror', 'WebMirror'),
(241, 'webmoose', 'The Web Moose'),
(242, 'webquest', 'WebQuest'),
(243, 'webreader', 'Digimarc MarcSpider'),
(244, 'webreaper', 'WebReaper'),
(245, 'websnarf', 'Websnarf'),
(246, 'webspider', 'WebSpider'),
(247, 'webvac', 'WebVac'),
(248, 'webwalk', 'webwalk'),
(249, 'webwalker', 'WebWalker'),
(250, 'webwatch', 'WebWatch'),
(251, 'wget', 'Wget'),
(252, 'whatuseek', 'whatUseek Winona'),
(253, 'whowhere', 'WhoWhere Robot'),
(254, 'wired-digital', 'Wired Digital'),
(255, 'wmir', 'w3mir'),
(256, 'wolp', 'WebStolperer'),
(257, 'wombat', 'The Web Wombat'),
(258, 'worm', 'The World Wide Web Worm'),
(259, 'wwwc', 'WWWC Ver 0.2.5'),
(260, 'wz101', 'WebZinger'),
(261, 'xget', 'XGET'),
(262, 'nederland.zoek', 'Nederland.zoek'),
(263, 'antibot', 'Antibot'),
(264, 'awbot', 'AWBot'),
(265, 'baiduspider', 'BaiDuSpider'),
(266, 'bobby', 'Bobby'),
(267, 'boris', 'Boris'),
(268, 'bumblebee', 'Bumblebee (relevare.com)'),
(269, 'cscrawler', 'CsCrawler'),
(270, 'daviesbot', 'DaviesBot'),
(271, 'digout4u', 'Digout4u'),
(272, 'echo', 'EchO!'),
(273, 'exactseek', 'ExactSeek Crawler'),
(274, 'ezresult', 'Ezresult'),
(275, 'fast-webcrawler', 'Fast-Webcrawler (AllTheWeb)'),
(276, 'gigabot', 'GigaBot'),
(277, 'gnodspider', 'GNOD Spider'),
(278, 'ia_archiver', 'Alexa (IA Archiver)'),
(279, 'internetseer', 'InternetSeer'),
(280, 'jennybot', 'JennyBot'),
Hasta pronto
  #3 (permalink)  
Antiguo 04/04/2008, 20:15
(Desactivado)
 
Fecha de Ingreso: diciembre-2006
Mensajes: 529
Antigüedad: 17 años, 4 meses
Puntos: 11
Re: Lista de nombres de spider-bots

Elementos que faltaban de la lista anterior...
Código:
 
(281, 'justview', 'JustView'),
(282, 'linkbot', 'LinkBot'),
(283, 'linkchecker', 'LinkChecker'),
(284, 'mercator', 'Mercator'),
(285, 'msiecrawler', 'MSIECrawler'),
(286, 'perman', 'Perman surfer'),
(287, 'petersnews', 'Petersnews'),
(288, 'pompos', 'Pompos'),
(289, 'psbot', 'psBot'),
(290, 'redalert', 'Red Alert'),
(291, 'shoutcast', 'Shoutcast Directory Service'),
(292, 'slysearch', 'SlySearch'),
(293, 'turnitinbot', 'Turn It In'),
(294, 'ultraseek', 'Ultraseek'),
(295, 'unlost_web_crawler', 'Unlost Web Crawler'),
(296, 'voila', 'Voila'),
(297, 'webbase', 'WebBase'),
(298, 'webcompass', 'webcompass'),
(299, 'wisenutbot', 'WISENutbot (Looksmart)'),
(300, 'yandex', 'Yandex bot'),
(301, 'zyborg', 'Zyborg (Looksmart)'),
(308, 'mixcat', 'morris - mixcat crawler'),
(305, 'netresearchserver', 'Net Research Server'),
(306, 'vagabondo', 'vagabondo (test version WiseGuys webagent)'),
(307, 'szukacz', 'Szukacz crawler'),
(309, 'grub-client', 'Grub\'s distributed crawler'),
(310, 'fluffy', 'fluffy (searchhippo)'),
(311, 'webtrends link analyzer', 'webtrends link analyzer'),
(312, 'naverrobot', 'naver'),
(313, 'steeler', 'steeler'),
(314, 'bordermanager', 'bordermanager'),
(315, 'nutch', 'Nutch'),
(316, 'teradex', 'Teradex'),
(317, 'deepindex', 'DeepIndex'),
(318, 'npbot', 'NPBot'),
(319, 'webcraftboot', 'Webcraftboot'),
(320, 'franklin locator', 'Franklin locator'),
(321, 'internet ninja', 'Internet ninja'),
(322, 'space bison', 'Space bison'),
(323, 'gornker', 'gornker crawler'),
(324, 'gaisbot', 'Gaisbot'),
(325, 'cj spider', 'CJ spider'),
(326, 'semanticdiscovery', 'Semantic Discovery'),
(327, 'zao', 'Zao'),
(328, 'web downloader', 'Web Downloader'),
(329, 'webstripper', 'Webstripper'),
(330, 'zeus', 'Zeus'),
(331, 'webrace', 'Webrace'),
(332, 'christcrawler', 'ChristCENTRAL'),
(333, 'webfilter', 'Webfilter'),
(334, 'webgather', 'Webgather'),
(335, 'surveybot', 'Surveybot'),
(336, 'nitle blog spider', 'Nitle Blog Spider'),
(337, 'galaxybot', 'Galaxybot'),
(338, 'fangcrawl', 'FangCrawl'),
(339, 'searchspider', 'SearchSpider'),
(340, 'msnbot', 'msnbot'),
(341, 'computer_and_automation_research_institute_crawler', 'computer and automation research institute crawler'),
(342, 'overture-webcrawler', 'overture-webcrawler'),
(343, 'exalead ng', 'exalead ng'),
(344, 'denmex websearch', 'denmex websearch'),
(345, 'linkfilter.net url verifier', 'linkfilter.net url verifier'),
(346, 'mac finder', 'mac finder'),
(347, 'polybot', 'polybot'),
(348, 'quepasacreep', 'quepasacreep'),
(349, 'xenu link sleuth', 'xenu link sleuth'),
(350, 'hatena antenna', 'hatena antenna'),
(351, 'timbobot', 'timbobot'),
(352, 'waypath scout', 'waypath scout'),
(353, 'technoratibot', 'technoratibot'),
(354, 'frontier', 'frontier'),
(355, 'blogosphere', 'blogosphere'),
(356, 'my little bot', 'my little bot'),
(357, 'illinois state tech labs', 'illinois state tech labs'),
(358, 'splatsearch.com', 'splatsearch'),
(359, 'blogshares bot', 'blogshares bot'),
(360, 'fastbuzz.com', 'fastbuzz'),
(361, 'obidos-bot', 'obidos'),
(362, 'blogwise.com-metachecker', 'blogwise.com metachecker'),
(363, 'bravobrian bstop', 'bravobrian bstop'),
(364, 'feedster crawler', 'feedster'),
(365, 'isspider', 'blogpulse'),
(366, 'syndic8', 'syndic8'),
(367, 'blogvisioneye', 'blogvisioneye'),
(368, 'downes/referrers', 'downes/referrers'),
(369, 'naverbot', 'naverbot'),
(370, 'soziopath', 'soziopath'),
(371, 'nextopiabot', 'nextopiabot'),
(372, 'ingrid', 'ingrid'),
(373, 'vspider', 'vspider'),
(374, 'yahoo', 'Yahoo'),
(375, 'sherlock-spider', 'Sherlock Spider'),
(376, 'mercubot', 'Mercubot'),
(377, 'mediapartners-google', 'Mediapartners Google'),
(378, 'jetbot', 'JetBot'),
(379, 'faxobot', 'FaxoBot'),
(380, 'cosmixcrawler', 'cosmix crawler'),
(381, 'exabot', 'exabot'),
(382, 'sitespider', 'sitespider'),
(383, 'pipeliner', 'pipeliner'),
(384, 'ccgcrawl', 'ccgcrawl'),
(385, 'cydralspider', 'cydralspider'),
(386, 'crawlconvera', 'crawlconvera'),
(387, 'blogwatcher', 'blogwatcher'),
(388, 'mozdex', 'mozdex'),
(389, 'aleksika spider', 'aleksika spider'),
(390, 'e-societyrobot', 'e-societyrobot'),
(391, 'enterprise_search', 'enterprise search'),
(392, 'seekbot', 'seekbot')
Atención: Estás leyendo un tema que no tiene actividad desde hace más de 6 MESES, te recomendamos abrir un Nuevo tema en lugar de responder al actual.
Respuesta




La zona horaria es GMT -6. Ahora son las 15:46.