Memory Management and Doctrine 1.2/php 5.2.x

by rp

Recently part of the a project that I am working on required me to import about a 1 million lines of records from csv format into the mysql database. Sounds like a straight forward job, though it was slightly complicated considering that the data structure was obviously not normalized.
I added a new task to my symfony app and added the required calls to doctrine to save records, however in the first pass, my code would run out of its 16mb limit in just 3000 inserts. After a few more tweaks, talking on irc and ranting on twitter, I managed to get the code working to that it can chug along happily and insert (or atleast that lot of a million) all the records into the database. So this was my learning from it.

  • PHP (5.2.x) does garbage collection under three condititions, a) when you tell it to b) when the variable is out of scope c) when the script ends. But the code path take by b and c are the same as a, so its worth writing the extra bit of code to unset variables when you don’t need them anymore
  • PHP 5.2.x is unable to garbage collect object graphs that have circular references. Doctrine objects are like that (ever tried doing a print_r on an model object.). So they are not cleared up even when they go out of scope resulting in memory leakage. Usually they are fine for most short scripts as its all cleared up when the script exits but in my case it took quite a while, hence killing it all.
    To work around that Doctrine provides free() method for its Doctrine_Record, Doctrine_Collection, and Doctrine_Query objects. This gets rid of the circular references and frees them up for garbage collect as soon as you unset them or when they go out of scope. So make sure you call them. The code looks like this.

        $object->save();
        $object->free(true);
    
  • Always hydrate as array for a doctrine object, especially when you need the data only to send it to the view, array graphs are more light weight than object graphs. Also ensure that you pick only the fields that you need. I agree that’s few extra lines of code but it will save you a lot of headache later.
  • Doctrine_Manager::connection()->clear() called explicitly also keeps the memory requirements under control. Though you need to ensure you call this at the end of everything else or your references will not be available

I am pretty excited about php 5.3.x and its new features including the new garbage collection feature which uses the algorithm described in the IBM paper Concurrent Cycle Collection in Reference Counted Systems

Hope this post is useful for anyone else trying to do something similar.