项目作者: VIPnytt

项目描述 :
User-Agent parser for robots.txt, X-Robots-tag and Robots-meta-tag rule sets
高级语言: PHP
项目地址: git://github.com/VIPnytt/UserAgentParser.git
创建时间: 2016-04-08T14:53:39Z
项目社区:https://github.com/VIPnytt/UserAgentParser

开源协议:MIT License

下载


Build Status
Scrutinizer Code Quality
Maintainability
Test Coverage
License
Packagist

User-Agent parser for robot rule sets

Parser and group determiner optimized for robots.txt, X-Robots-tag and Robots-meta-tag usage cases.

SensioLabsInsight

Requirements:

  • PHP 5.5+, 7.0+ or 8.0+

Installation

The library is available for install via Composer. Just add this to your composer.json file:

  1. {
  2. "require": {
  3. "vipnytt/useragentparser": "^1.0"
  4. }
  5. }

Then run php composer update.

Features

  • Stripping of the version tag.
  • List any rule groups the User-Agent belongs to.
  • Determine the correct group of records by finding the group with the most specific User-agent that still matches.

When to use it?

  • When parsing robots.txt rule sets, for robots online.
  • When parsing the X-Robots-Tag HTTP header.
  • When parsing Robots meta tags in HTML / XHTML documents.

Note: Full User-agent strings, like them sent by eg. web-browsers, is not compatible, this is by design.
Supported User-agent string formats are UserAgentName/version with or without the version tag. Eg. MyWebCrawler/2.0 or just MyWebCrawler.

Getting Started

Strip the version tag.

  1. use vipnytt\UserAgentParser;
  2. $parser = new UserAgentParser('googlebot/2.1');
  3. $product = $parser->getProduct()); // googlebot

List different groups the User-agent belongs to

  1. use vipnytt\UserAgentParser;
  2. $parser = new UserAgentParser('googlebot-news/2.1');
  3. $userAgents = $parser->getUserAgents());
  4. array(
  5. 'googlebot-news/2.1',
  6. 'googlebot-news/2',
  7. 'googlebot-news',
  8. 'googlebotnews',
  9. 'googlebot'
  10. );

Determine the correct group

Determine the correct group of records by finding the group with the most specific User-agent that still matches your rule sets.

  1. use vipnytt\UserAgentParser;
  2. $parser = new UserAgentParser('googlebot-news');
  3. $match = $parser->getMostSpecific(['googlebot/2.1', 'googlebot-images', 'googlebot'])); // googlebot

Cheat sheet

  1. $parser = new UserAgentParser('MyCustomCrawler/1.2');
  2. // Determine the correct rule set (robots.txt / robots meta tag / x-robots-tag)
  3. $parser->getMostSpecific($array); // string
  4. // Parse
  5. $parser->getUserAgent(); // string 'MyCustomCrawler/1.2'
  6. $parser->getProduct(); // string 'MyCustomCrawler'
  7. $parser->getVersion(); // string '1.2'
  8. // Crunch the data into groups, from most to less specific
  9. $parser->getUserAgents(); // array
  10. $parser->getProducts(); // array
  11. $parser->getVersions(); // array

Specifications