项目作者: essence

项目描述 :
Extracts information about web pages, like youtube videos, twitter statuses or blog articles.
高级语言: PHP
项目地址: git://github.com/essence/essence.git
创建时间: 2012-08-03T18:26:52Z
项目社区:https://github.com/essence/essence

开源协议:Other

下载


Essence

Build status
Scrutinizer Code Quality
Code Coverage
Total downloads

Essence is a simple PHP library to extract media information from websites, like youtube videos, twitter statuses or blog articles.

If you were already using Essence 2.x.x, you should take a look at the migration guide.

Installation

  1. composer require essence/essence

Example

Essence is designed to be really easy to use.
Using the main class of the library, you can retrieve information in just those few lines:

  1. $Essence = new Essence\Essence();
  2. $Media = $Essence->extract('http://www.youtube.com/watch?v=39e3KYAmXK4');
  3. if ($Media) {
  4. // That's all, you're good to go !
  5. }

Then, just do anything you want with the data:

  1. <article>
  2. <header>
  3. <h1><?php echo $Media->title; ?></h1>
  4. <p>By <?php echo $Media->authorName; ?></p>
  5. </header>
  6. <div class="player">
  7. <?php echo $Media->html; ?>
  8. </div>
  9. </article>

What you get

Using Essence, you will mainly interact with Media objects.
Media is a simple container for all the information that are fetched from an URL.

Here are the default properties it provides:

  • type
  • version
  • url
  • title
  • description
  • authorName
  • authorUrl
  • providerName
  • providerUrl
  • cacheAge
  • thumbnailUrl
  • thumbnailWidth
  • thumbnailHeight
  • html
  • width
  • height

These properties were gathered from the OEmbed and OpenGraph specifications, and merged together in a united interface.
Based on such standards, these properties should be a solid starting point.

However, “non-standard” properties can and will also be setted.

Here is how you can manipulate the Media properties:

  1. // through dedicated methods
  2. if (!$Media->has('foo')) {
  3. $Media->set('foo', 'bar');
  4. }
  5. $value = $Media->get('foo');
  6. // or directly like a class attribute
  7. $Media->customValue = 12;

Note that Essence will always try to fill the html property when it is not available.

Advanced usage

The Essence class provides some useful utility functions to ensure you will get some information.

Extracting URLs

The crawl() and crawlUrl() methods let you crawl extractable URLs from a web page, either directly from its source, or from its URL (in which case Essence will take care of fetching the source).

For example, here is how you could get the URL of all videos in a blog post:

  1. $urls = $Essence->crawlUrl('http://www.blog.com/article');
  1. array(2) {
  2. [0] => 'http://www.youtube.com/watch?v=123456',
  3. [1] => 'http://www.dailymotion.com/video/a1b2c_lolcat-fun'
  4. }

You can then get information from all the extracted URLs:

  1. $medias = $Essence->extractAll($urls);
  1. array(2) {
  2. ['http://www.youtube.com/watch?v=123456'] => object(Media) {}
  3. ['http://www.dailymotion.com/video/a1b2c_lolcat-fun'] => object(Media) {}
  4. }

Replacing URLs in text

Essence can replace any extractable URL in a text by information about it.
By default, any URL will be replaced by the html property of the found Media.

  1. echo $Essence->replace('Look at this: http://www.youtube.com/watch?v=123456');
  1. Look at this: <iframe src="http://www.youtube.com/embed/123456"></iframe>

But you can do more by passing a callback to control which information will replace the URL:

  1. echo $Essence->replace($text, function($Media) {
  2. return <<<HTML
  3. <p class="title">$Media->title</p>
  4. <div class="player">$Media->html</div>
  5. HTML;
  6. });
  1. Look at this:
  2. <p class="title">Video title</p>
  3. <div class="player">
  4. <iframe src="http://www.youtube.com/embed/123456"></iframe>
  5. <div>

This makes it easy to build rich templates or even to integrate a templating engine:

  1. echo $Essence->replace($text, function($Media) use ($TwigTemplate) {
  2. return $TwigTemplate->render($Media->properties());
  3. });

Configuring providers

It is possible to pass some options to the providers.

For example, OEmbed providers accepts the maxwidth and maxheight parameters, as specified in the OEmbed spec.

  1. $options = [
  2. 'maxwidth' => 800,
  3. 'maxheight' => 600
  4. ];
  5. $Media = $Essence->extract($url, $options);
  6. $medias = $Essence->extractAll($urls, $options);
  7. $text = $Essence->replace($text, null, $options);

Other providers will just ignore the options they don’t handle.

Configuration

Essence currently supports 68 specialized providers:

  1. 23hq Deviantart Kickstarter Sketchfab
  2. Animoto Dipity Meetup SlideShare
  3. Aol Dotsub Mixcloud SoundCloud
  4. App.net Edocr Mobypicture SpeakerDeck
  5. Bambuser Flickr Nfb Spotify
  6. Bandcamp FunnyOrDie Official.fm Ted
  7. Blip.tv Gist Polldaddy Twitter
  8. Cacoo Gmep PollEverywhere Ustream
  9. CanalPlus HowCast Prezi Vhx
  10. Chirb.it Huffduffer Qik Viddler
  11. CircuitLab Hulu Rdio Videojug
  12. Clikthrough Ifixit Revision3 Vimeo
  13. CollegeHumor Ifttt Roomshare Vine
  14. Coub Imgur Sapo Wistia
  15. CrowdRanking Instagram Screenr WordPress
  16. DailyMile Jest Scribd Yfrog
  17. Dailymotion Justin.tv Shoudio Youtube

Plus the OEmbed and OpenGraph providers, which can be used to extract any URL.

You can configure these providers on instanciation:

  1. $Essence = new Essence\Essence([
  2. // the SoundCloud provider is an OEmbed provider with a specific endpoint
  3. 'SoundCloud' => Essence\Di\Container::unique(function($C) {
  4. return $C->get('OEmbedProvider')->setEndpoint(
  5. 'http://soundcloud.com/oembed?format=json&url=:url'
  6. );
  7. }),
  8. 'filters' => [
  9. // the SoundCloud provider will be used for URLs that matches this pattern
  10. 'SoundCloud' => '~soundcloud\.com/[a-zA-Z0-9-_]+/[a-zA-Z0-9-]+~i'
  11. ]
  12. ]);

You can also disable the default ones:

  1. $Essence = new Essence\Essence([
  2. 'filters' => [
  3. 'SoundCloud' => false
  4. ]
  5. ]);

You will find the default configuration in the standard DI container of Essence
(see the following part).

Customization

Almost everything in Essence can be configured through dependency injection.
Under the hoods, the constructor uses a dependency injection container to return a fully configured instance of Essence.

To customize the Essence behavior, the easiest way is to configure injection settings when building Essence:

  1. $Essence = new Essence\Essence([
  2. // the container will return a unique instance of CustomHttpClient
  3. // each time an HTTP client is needed
  4. 'Http' => Essence\Di\Container::unique(function() {
  5. return new CustomHttpClient();
  6. })
  7. ]);

The default injection settings are defined in the Standard container class.

Try it out

Once you’ve installed essence, you should try to run ./cli/essence.php in a terminal.
This script allows you to test Essence quickly:

  1. # will fetch and print information about the video
  2. ./cli/essence.php extract http://www.youtube.com/watch?v=4S_NHY9c8uM
  3. # will fetch and print all extractable URLs found at the given HTML page
  4. ./cli/essence.php crawl http://www.youtube.com/watch?v=4S_NHY9c8uM

Third-party libraries

If you’re interested in embedding videos, you should take a look at the Multiplayer lib.
It allows you to build customizable embed codes painlessly:

  1. $Multiplayer = new Multiplayer\Multiplayer();
  2. if ($Media->type === 'video') {
  3. echo $Multiplayer->html($Media->url, [
  4. 'autoPlay' => true,
  5. 'highlightColor' => 'BADA55'
  6. ]);
  7. }