1. Home

# Using Selenium with PHP to crawl web pages¶

in

Selenium is an awesome tool to automate the testing of your application, although, there are a number of better performing headless solutions available today for testing (Phantom.js, Zombie.js). Selenium can still be extremely useful to load a web page, perform some actions like a search and extract data from it.

Here are 2 answers I posted on StackOverflow.com demonstrating the basic concept, let’s have a look and get up and running with Selenium.

## Installing the tools¶

Before we do anything we need to install all the tools, this is straight forward, nothing out of the ordinary here.

composer require facebook/webdriver


# Download it

# Start it
java -jar selenium-server-standalone-2.53.0.jar

# INFO - Launching a standalone Selenium Server
# INFO - Java: Oracle Corporation 25.45-b02
# ... ... ...
# INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
# INFO - Selenium Server is up and running


# Download QuickJava to your project dir
# We'll need to reference it later on


4. Download Firefox, if you don’t have it already. We’ll use firefox as our web driver for this tutorial.

## Using selenium with PHP facebook web driver¶

Now we’ve got everything installed and up and running, let’s start playing with it!

Somewhere in your project (or in a new PHP script), place the following code. Remember to change the path to QuickJava to where you downloaded it to.

use Facebook\WebDriver\Firefox\FirefoxProfile;

// Change this to the path of you xpi
$rootDir =$this->container->getParameter('kernel.root_dir');
$extensionPath =$rootDir.'/../bin/selenium/quickjava-2.0.6-fx.xpi';

// Build our firefox profile
$profile = new FirefoxProfile();$profile->addExtension($extensionPath);$profile->setPreference('thatoneguydotnet.QuickJava.curVersion', '2.0.6.1');

// Disable all these
$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Images', 2);$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.AnimatedImage', 2);
$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.CSS', 2);$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Flash', 2);
$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Java', 2);$profile->setPreference("thatoneguydotnet.QuickJava.startupStatus.Silverlight", 2);

//$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Cookies', 2); //$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.JavaScript', 2);

// Create DC
$dc = DesiredCapabilities::firefox();$dc->setCapability(FirefoxDriver::PROFILE, $profile); // Create our new driver$driver = RemoteWebDriver::create($host,$dc);
$driver->get('http://stackoverflow.com'); // The HTML Source code$html = $driver->getPageSource(); // Firefox should be open and you can see no images or css was loaded  ## Disable loading images with selenium¶ You can also disable the loading of images, this dramatically improve your speed and save bandwidth. We can do this by simply setting a preference on our FirefoxProfile. // Build our firefox profile$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Images', 2);

// Create DC ...


// Build our firefox profile
$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.Cookies', 2);$profile->setPreference('thatoneguydotnet.QuickJava.startupStatus.JavaScript', 2);