Cache is a programming concept that can be used in a various range of applications and for various purposes. A cache library can be used for storing database queries for later use, to store rendered pages to be served again without generating them again, or to save indexed pages in a crawler application to be processed by multiple modules.
- A cache mechanism is more simple that it might sound. It’s just a simple module that should implement 2 actions:
- to store a value(identified by a key).
- to retrieve a value if it’s not expired.
- additionally it can offer a mechanism to invalidate a set of values or the entire cache.
In this tutorial we are going to create a disk cache script. It stores the string values in files, each value is stored in a file and it contains an additional file to store the expiration date. Performance wise, this is not the best approach, but the script is designed like that with a clear purpose: the additional file can be used to store additional attributes, beside the expiration date. Imagine an application that crawls pages, with different modules. Each time a module crawls the page, it adds it’s result to the additional file.
The cache library in our tutorial is based only on a few functions grouped in a single class. Let’s start with the skeleton of the SimpleCache class:
class SimpleCache { public function exists($key) { } public function get($key) { } public function put($key, $content) { } }
Once we have the skeleton class, we create a test class:
// SimpleCacheTest.php - file to test the class require_once "SimpleCache.php"; // instantiate the class $cache = new SimpleCache(); // cache a value $cache->put('key', 'this value'); // retrieved the cached value by key $val = $cache->get('key'); // assert the retrieved value if ($val != 'this value') echo 'Something went wrong!'; else echo 'Value as expected. Cache Test Passed';
-
In the next step we add variables to the class, for configuration:
- the location where the cache files are stored. The location can be an absolute one or relative to the starting point of the script(in the code above is the cache subdirectory, located in the same directory where index.php is invoked from).
- the expiry interval in which the cache values expire.
class SimpleCache { private $cacheDir = 'cache'; private $expiryInterval = 2592000; //30*24*60*60; public function setCacheDir($val) { $this->cacheDir = $val; } public function setExpiryInterval($val) { $this->expiryInterval = $val; } ... }
Then we implement the put function. The code is very simple: we create the path if it doesn’t exist, then we we write the data in the main cache file and the expiration interval in the additional file:
public function put($key, $content) { $time = time(); //Current Time if (! file_exists($this->cacheDir)) mkdir($this->cacheDir); $filename_cache = $this->cacheDir . '/' . $key . '.cache'; //Cache filename $filename_info = $this->cacheDir . '/' . $key . '.info'; //Cache info file_put_contents ($filename_cache , $content); // save the content file_put_contents ($filename_info , $time); // save the time of last cache update }
The method to retrieve the code is pretty simple. We try to read the expiry time from the additional file. If it exists and the data is not expired we read the content of the cache file and return it:
public function get($key) { $filename_cache = $this->cacheDir . '/' . $key . '.cache'; //Cache filename $filename_info = $this->cacheDir . '/' . $key . '.info'; //Cache info if (file_exists($filename_cache) && file_exists($filename_info)) { $cache_time = file_get_contents ($filename_info) + (int)$this->expiryInterval; //Last update time of the cache file $time = time(); //Current Time $expiry_time = (int)$time; //Expiry time for the cache if ((int)$cache_time >= (int)$expiry_time) //Compare last updated and current time { return file_get_contents ($filename_cache); //Get contents from file } } return null; }
We create an additional function to check the existence of valid data in cache, similar to get function, in case we need to check it without reading it:
public function exists($key) { $filename_cache = $this->cacheDir . '/' . $key . '.cache'; //Cache filename $filename_info = $this->cacheDir . '/' . $key . '.info'; //Cache info if (file_exists($filename_cache) && file_exists($filename_info)) { $cache_time = file_get_contents ($filename_info) + (int)$this->expiryInterval; //Last update time of the cache file $time = time(); //Current Time $expiry_time = (int)$time; //Expiry time for the cache if ((int)$cache_time >= (int)$expiry_time) //Compare last updated and current time { return true; } } return false; }
Now that the code was written it can be testes using the test file. In a next tutorial we are going to rewrite this cache mechanism using the file modify date time instead of the additional time.
Why store the time when cache was created in a separate file? You can use filemtime() for that, remember to use touch() when you write to cache file
First of all, I wanted to have something very robust (I was not very confident the file change date is not changed by other processes on windows or linux, I assume it should be safe). Another reason is because I wanted to have the option to set the a different expiry interval for different files. Lets’ assume you write a crawler and cache the pages and you want some files cached for 1 hour, some other for one week(if the page contains a 404 error I would cache for one hour, then if it returns the same code 3 times I would increase the expiry time to 30 days). And another reason is my twisted mind.
However, your question is valid; I’m planning to create a version based on file change date.
i would prefer opcode cache than file cache. speed vs disk reading… nice demo 🙂
IMHO, It’s a good pedagogic sample but… You need often any little enhancement.
On production, PEAR::Cache_Lite is lite but complete for most of case.
Nice tutorial, with good explain for each step.
But I think that your code make a bottleneck with a lot of I/O access …
I have already make some file cache, and I think the best way is to remove cache file when it expires. You use a cronjob to remove all files, and the cache will be generated WHEN visitor need to access data. The PHP code only need to test if cache file exist, and generate file only if file is not available.
There’s a lot of problem with cache, but this post is juste for “simple cache” 🙂
But we can discuss about it 😉
Thanks, you are right about the bottleneck. Your approach is much better performance wise. When I created it performance was not a priority. For example a crawler is not supposed to work in real time. File deletion might be as well a separate process, or it could be a simple hook which empties a value when all the modules finishes processing it.
There is another option which performs better, using the file datetime field for expiration purposes. Maybe in a later post I’ll cover those 2 options.
Thanks, Cache Lite looks very nice, especially the file locking option for concurrent uses.
I dont understand why to use touch() when writing to cache file ?
I created one cache file without touching it and another I put touch function before put contents,then I use filatime and filemtime and same result for both files.
Can you explain for me ?
Thanks a lot for the script!
I have no idea if it’s the best solution or not, but it’s working just fine for me 🙂
cool, this script was nice …
sometimes create 2 diff file, one with content(big size), and time info(less size) is nice idea 😀
Nice post.
In your setters methods like: setCacheDir and setExpiryInterval you can end them with return $this, it will allow you chainability.
Pretty sweet script for caching the server side files.
Very nice!
Thanks for your code.
You could make available on github.
[]s
Perfect!!!! Thank you so much!
this is article is good for cache topic