项目作者: lrlna

项目描述 :
a puppeteer walker 🕷 🕸
高级语言: JavaScript
项目地址: git://github.com/lrlna/puppeteer-walker.git
创建时间: 2017-11-14T13:27:01Z
项目社区:https://github.com/lrlna/puppeteer-walker

开源协议:Apache License 2.0

下载


puppeteer-walker

npm version build status
downloads js-standard-style

A crawler to go through your given site in a headless chrome using
puppeteer. Returns an object
containing host, current path, and current DOM object

Usage

  1. var Walker = require('puppeteer-walker')
  2. var walker = Walker()
  3. walker.on('end', () => console.log('finished walking'))
  4. walker.on('error', (err) => console.log('error', err))
  5. walker.on('page', async (page) => {
  6. var title = await page.title()
  7. console.log(`title: ${title}`)
  8. })
  9. walker.walk('https://avocado.choo.io')

API

walker = PuppeteerWalker()

Create a new walker instance.

walker.on('page', async cb(Page, push))

Listen to a page event. Returns an instance of the puppeteer Page
Class
.
The callback has to be an Async Function.

Use the push(url) method to add more pages into the internal walker queue.
This is useful for busting past login forms, and the like.

walker.on('error', cb(err))

Listen to error events.

walker.on('end', cb)

Listen to an end event.

walker.walk(url)

Start walking the URL.

See Also

License

Apache-2.0