Web automation with Selenium and Python

I recently discovered Selenium, a very useful tool to automate browsers navigation. Selenium allows to write scripts to automatically perform actions on a web browser: visit a page, click on a link, fill in a form… and retrieve the results of these actions. In this post, I am going to show you how to use Selenium from Python to automatically send messages to a list of Flickr contact through a contact form from your Flickr account.

First you can use pip to install the selenium package.

pip install selenium

Selenium requires a driver to interface with the chosen browser. Since we are going to use Firefox in this example, we will need to download the last release of geckodriver and add the directory containing the executable to the system path. Selenium will indeed try to locate the driver from your system path. On Unix system you can use the command export PATH=$PATH: and on windows you will need to go to System->Advanced->Environment Variables->Path->Edit. Don’t forget to restart your system after adding the path to the geckodriver executable to your system path!

You can now import the webdriver and the Firefox component from the selenium library.

#Import libraries
from selenium import webdriver
from selenium.webdriver.firefox.webdriver import FirefoxProfile
import time

Then, we import the list of Flickr users’ contact ID (one column csv file with a header).

#Import list user id
listcontact = open(pathtoinput)             

The next step will be to create an instance of Firefox WebDriver with the command webdriver.Firefox(). It is also possible to associate this instance to an already existing Firefox profile. Since I didn’t manage to automatically log in to my fake Yahoo account with selenium I decided to create a dummy Firefox profile that contains a login cookie to Flickr. In practrice you just need to login to your Yahoo account from this profile before running the script. Firefox profiles are usually located on the AppData folder on windows AppData/Roaming/Mozilla/Firefox/Profiles/xxxx.default and in an hidden folder located on your home on Ubuntu ./mozilla/firefox/xxxx.default.

#Instantiate a Firefox webbrowser (the profile should be already logged to Flickr)
profile = FirefoxProfile(pathtoprofile)
browser = webdriver.Firefox(profile)

We can finally loop over the Flickr IDs, get to the url contact form of the Flickr user, and automatically fill the subject of the message and the content of the message. At this stage you need to identify what is the source code behind the object you want to interact with. You can install the Firefox plugins firebug and firepath to easily identify the paths associated with fields and click buttons on a web page.

#Loop over your contact list
for line in listcontact:
    #Extract Flickr ID
    attr = line.rstrip('\n\r').split(';')         

    #Contact form url 
    url="https://www.flickr.com/mail/write/?to=" + id

    #Identification of subject and message xpath
    subject = browser.find_element_by_xpath(".//*[@id='subject']")
    message = browser.find_element_by_xpath(".//*[@id='message']")
    #Write de subject and the message

The code above is available on my website along with a R script to download data with the Flickr API.