项目作者: howie6879

项目描述 :
Google search results crawler, get google search results that you need
高级语言: Python
项目地址: git://github.com/howie6879/magic_google.git
创建时间: 2017-01-12T06:55:21Z
项目社区:https://github.com/howie6879/magic_google

开源协议:

下载


magic_google

1.What’s magic_google

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

php - MagicGoogle

2.How to Use?

Run

  1. pip install magic_google
  2. # Or
  3. pip install git+https://github.com/howie6879/magic_google.git
  4. # Or
  5. git clone https://github.com/howie6879/magic_google.git
  6. cd magic_google
  7. vim google_search.py
  8. # Or
  9. python setup.py install

Example

  1. from magic_google import MagicGoogle
  2. import pprint
  3. # Or PROXIES = None
  4. PROXIES = [{
  5. 'http': 'http://192.168.2.207:1080',
  6. 'https': 'http://192.168.2.207:1080'
  7. }]
  8. # Or MagicGoogle()
  9. mg = MagicGoogle(PROXIES)
  10. # Crawling the whole page
  11. result = mg.search_page(query='python')
  12. # Crawling url
  13. for url in mg.search_url(query='python'):
  14. pprint.pprint(url)
  15. # Output
  16. # 'https://www.python.org/'
  17. # 'https://www.python.org/downloads/'
  18. # 'https://www.python.org/about/gettingstarted/'
  19. # 'https://docs.python.org/2/tutorial/'
  20. # 'https://docs.python.org/'
  21. # 'https://en.wikipedia.org/wiki/Python_(programming_language)'
  22. # 'https://www.codecademy.com/courses/introduction-to-python-6WeG3/0?curriculum_id=4f89dab3d788890003000096'
  23. # 'https://www.codecademy.com/learn/python'
  24. # 'https://developers.google.com/edu/python/'
  25. # 'https://learnpythonthehardway.org/book/'
  26. # 'https://www.continuum.io/downloads'
  27. # Get {'title','url','text'}
  28. for i in mg.search(query='python', num=1):
  29. pprint.pprint(i)
  30. # Output
  31. # {'text': 'The official home of the Python Programming Language.',
  32. # 'title': 'Welcome to Python .org',
  33. # 'url': 'https://www.python.org/'}

You can see google_search.py

If you need a big amount of querie but only having an ip address,I suggest you can have a time lapse between 5s ~ 30s.

The reason that it always return empty might be as follows:

  1. <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
  2. <TITLE>302 Moved</TITLE></HEAD><BODY>
  3. <H1>302 Moved</H1>
  4. The document has moved
  5. <A HREF="https://ipv4.google.com/sorry/index?continue=https://www.google.me/s****">here</A>.
  6. </BODY></HTML>