The aim of the project is to find Emails for a certain group of professionals who has a affiliation or authorship on the published articles using Google X-Ray Search
a) The key goal of the project is to scrape Email addresses by using Google
b) The input data considered to be “Google searching operators and desired keywords”
c) The another key motivation was to scrape the emails from Google without taking the help from 3rd party service providers.
There are 3 projects as below.
g
refers to Google
whereas 4
refers to the number of spiders.To get Email address, we have come up with an idea which is found to be worth trying. For example, Google X-Ray search techniques have been considered while developing these spiders.
g4spiders
project, spiders were cloned and 4 spiders are merged into 2.name spider
consideration to be the one and other 3 ( jobtitle
, companydomain
, email
) spiders into another one.2.1 spider1: get the name and corresponding linkedin urls
2.2 spider2: get the “job title” and “currently working company name” by feeding “the name” and “linked url” obtained from spider1
2.3 spider3: get the domain name by using “jobtitle” and “current company” information obtained from spider2
2.4 spider4: get the Email addresses using the domain name and individual’s name obtained from spider3
First off all, To extract email from primarily Google search results, mentioned tasks split into 4 parts.
Name_Spider_1
JobTitle_Spider_2
Domain_Spider_3
Email_Spider_4
Once the test for 4 above spiders worked as expected, and test passed from thereon, the effort was to merge those 4 spiders into the one so that it could be easier to maintain.
In this case, 2 spiders built, out of which , one is called “spider4in1” and another one is “spider3in1”
The “spider4in1” was built but it has some issues in terms of around 70% duplicates found in spider1
Well, the way it works, using “Google search” spider1 extracts the “Name” and “LinkedIn URL”
The raw data from spider1 fed into the spider_3in1.
spider3in1 extracts “job title”, “domain”, and “emails”.
This tool is for educational purposes only. Any damage you make will not affect the author.