Scraping Sales Leads with Selenium and HubSpot
For this tutorial I will be taking on a task that almost every company faces at some point; filling a database with sales leads. The tools I chose to use for this project are Python, Selenium and HubSpot. I chose to use these tools because they are commonly used, well supported and most importantly free.
After reading this tutorial you should have a high-level understanding of:
· How to scrape data using Python and Selenium
· How to process data with Pandas for upload to Hubspot API
· How to bulk upload contacts using Hubspot API
For this example, let’s imagine that I am working for a MLS provider that wants to reach out to real estate agents in the area to sell them access to their listing platform. The first thing we need to do is open a python file, import all of our packages and setup our Selenium driver.
Now, let’s find a website. After a quick google search I was able to find the website for a popular real estate company that lists contact information for their relators. Here is the link that we will be scraping, it currently contains contact information for 350 agents in my area.
Now let’s start scraping data. Due to time restraints, I will not be going into detail about selecting HTML elements with the Selenium driver, for more info please check this documentation. First, let’s loop through the pages of this website and dump all of the data we need into an empty list.
Great, now that we have the data loaded in-memory we can clean it up and process it for upload to the HubSpot Api. After a quick glance we can see that this data arrived relatively clean. We will only need to split with a delimiter of ‘\n’ and clean the the ‘M:’ off the beginning of each phone number. Let’s convert the data format to a pandas DataFrame and do just that.
Just like that, we have now turned this webpage into structured data suitable for a .csv file. For the final step I will be pushing the new contacts into HubSpot CRM using the HubSpot Python API. This API allows admins to bulk upload up to 100 new contacts at a time to HubSpot via Python. The first thing we will need to do is to create a temporary folder to drop our .csv files. Next we will need to chunk our DataFrame into smaller pieces to fit the upload limit and save the files to the temporary folder.
Next we will loop through the files in the temporary folder and bulk upload each to the HubSpot API. Lastly, I will delete my temporary folder and quit the Selenium driver.
After the script has ran, we can check the ‘Past Imports’ page in HubSpot to see if the files have successfully uploaded.
After that, we can check the ‘Contacts’ page to ensure everything looks correct.
Success! We now have 350 new contacts in HubSpot!
The full script is available on My Github.