HTML5 is one of the most fashionable topics in today’s web programming world. HTML5 contains lots of new features that allege to change the web face, big players like Google and Apple are pushing for it and browsers competes to implement more and more html5 features. In this conditions there is no surprise that everyone is talking about it.
In this post we are going to analyse exhaustively an HTML5 simple feature which is already implemented in all the modern browsers: local storage. Local Storage is just a part of the Web Storage api. It is already defined in the HTML5 specifications and it’s implemented in all the modern browsers:
Web scrappers are simple programs that are used to extract certain data from the web. Usually the structure of the the pages is known so scrappers have reduced complexity compared to parsers and crawlers.
In this tutorial we are going to create a simple parser that extract the title and favicon from any html page.
Usually scrappers are based on regular expressions but we are going to avoid them because they are difficult to manage and sometimes they have unexpected results. We are going to use simple php string functions instead.
Cache is a programming concept that can be used in a various range of applications and for various purposes. A cache library can be used for storing database queries for later use, to store rendered pages to be served again without generating them again, or to save indexed pages in a crawler application to be processed by multiple modules.
A cache mechanism is more simple that it might sound. It’s just a simple module that should implement 2 actions:
- to store a value(identified by a key).
- to retrieve a value if it’s not expired.
- additionally it can offer a mechanism to invalidate a set of values or the entire cache.
In this tutorial we are going to create a disk cache script. It stores the string values in files, each value is stored in a file and it contains an additional file to store the expiration date. Performance wise, this is not the best approach, but the script is designed like that with a clear purpose: the additional file can be used to store additional attributes, beside the expiration date. Imagine an application that crawls pages, with different modules. Each time a module crawls the page, it adds it’s result to the additional file.
Alexa is a service acquired by Amazon which offers a web traffic report for websites. They retrieve the data from toolbar that can be installed in different browsers, centralize the data and display reports to anyone. The most important indicator is the Alexa Rank. It represents the rank of a webpage in a list of all the websites. It’s not the 100% accurate but it gives a good indication.
Alexa does not offer any free API to obtain Alexa Rank. However there is a simple method to obtain it in the same way the Alexa Toolbar does. All you have to do is to invoke the following url(replacing php-html.net with your domain): http://data.alexa.com/data?cli=10&dat=s&url=php-html.net
The model view controller pattern is the most used pattern for today’s world web applications. It has been used for the first time in Smalltalk and then adopted and popularized by Java. At present there are more than a dozen PHP web frameworks based on MVC pattern.
Despite the fact that the MVC pattern is very popular in PHP, is hard to find a proper tutorial accompanied by a simple source code example. That is the purpose of this tutorial.
URL handling is one of the tasks you have to do from time to time in PHP. Sometimes you have to do it because you want to record the referral sites, other times because you want to write your own spider or just because you want to retrieve your current URL.
PHP is a language developed around web for web developers and it contains all the functions you might need in your quests. There is a section in php documentation which groups the URL functions. Along with a few functions used to encode/decode which are rarely used the package contains the functions you can not live without:
It happens pretty often for me to have to run shell commands in a hosting environment. I do it all the time via a simple php script. I tested it on godaddy and dreamhost and on other hostings environments and it works fine.
Before starting the tutorial you should note that if this script is not handled carefully it can have undesired results. A wrong rm command can delete all the files you have on your hosting, so run the commands with care.
It’s not a common problem but sometimes you have to check if 2 texts are similar. If you have to aggregate data from multiple sources you might know what I’m talking about.
The most simple thing you can try is to simply compare the 2 strings. A simple comparison will not help if one of the strings are contains an extra space. A more serious algorithm should be used for such cases. Fortunately php provides us several functions that can be used.
Sending mails from PHP can raise certain problems. Usually the mail is sent from php through a simple function PHP function: mail(…). However the function needs a module that should be enabled in the php ini file. Not all the hosting providers enable it and you can not make changes in php admin area in the php shared hosting. In this case the another alternative should be used. There are 2 php libraries I’ve tried to send a mail which will be described in this tutorial: PHPMailer and Mail PEAR Package.
1. Svn commands on server
Create a repository – in order to create a svn repository(server) you need access on the server machine. The command can work only locally and it can not work on network paths:
svn admin c:\\svn\serverfiles\
svn admin /svn/serverfiles/
Start the server – to start a server you must run the following command(if svnserver is not present in the path you can add it or you can use the full pathname).:
svnserve --daemon --root c:\\svn\serverfiles\