How to write a simple scraper in PHP without Regex

Web scrappers are simple programs that are used to extract certain data from the web. Usually the structure of the the pages is known so scrappers have reduced complexity compared to parsers and crawlers.

In this tutorial we are going to create a simple parser that extract the title and favicon from any html page.

Usually scrappers are based on regular expressions but we are going to avoid them because they are difficult to manage and sometimes they have unexpected results. We are going to use simple php string functions instead.
Continue reading

Php Class to Retrieve Alexa Rank

Alexa is a service acquired by Amazon which offers a web traffic report for websites. They retrieve the data from toolbar that can be installed in different browsers, centralize the data and display reports to anyone. The most important indicator is the Alexa Rank. It represents the rank of a webpage in a list of all the websites. It’s not the 100% accurate but it gives a good indication.

Alexa does not offer any free API to obtain Alexa Rank. However there is a simple method to obtain it in the same way the Alexa Toolbar does. All you have to do is to invoke the following url(replacing php-html.net with your domain): http://data.alexa.com/data?cli=10&dat=s&url=php-html.net
Continue reading

How to handle URLs in PHP

URL handling is one of the tasks you have to do from time to time in PHP. Sometimes you have to do it because you want to record the referral sites, other times because you want to write your own spider or just because you want to retrieve your current URL.

PHP is a language developed around web for web developers and it contains all the functions you might need in your quests. There is a section in php documentation which groups the URL functions. Along with a few functions used to encode/decode which are rarely used the package contains the functions you can not live without:
Continue reading