Memory Management and Doctrine 1.2/php 5.2.x

Recently part of the a project that I am working on required me to import about a 1 million lines of records from csv format into the mysql database. Sounds like a straight forward job, though it was slightly complicated considering that the data structure was obviously not normalized.
I added a new task to my symfony app and added the required calls to doctrine to save records, however in the first pass, my code would run out of its 16mb limit in just 3000 inserts. After a few more tweaks, talking on irc and ranting on twitter, I managed to get the code working to that it can chug along happily and insert (or atleast that lot of a million) all the records into the database. So this was my learning from it.

  • PHP (5.2.x) does garbage collection under three condititions, a) when you tell it to b) when the variable is out of scope c) when the script ends. But the code path take by b and c are the same as a, so its worth writing the extra bit of code to unset variables when you don’t need them anymore
  • PHP 5.2.x is unable to garbage collect object graphs that have circular references. Doctrine objects are like that (ever tried doing a print_r on an model object.). So they are not cleared up even when they go out of scope resulting in memory leakage. Usually they are fine for most short scripts as its all cleared up when the script exits but in my case it took quite a while, hence killing it all.
    To work around that Doctrine provides free() method for its Doctrine_Record, Doctrine_Collection, and Doctrine_Query objects. This gets rid of the circular references and frees them up for garbage collect as soon as you unset them or when they go out of scope. So make sure you call them. The code looks like this.

        $object->save();
        $object->free(true);
    
  • Always hydrate as array for a doctrine object, especially when you need the data only to send it to the view, array graphs are more light weight than object graphs. Also ensure that you pick only the fields that you need. I agree that’s few extra lines of code but it will save you a lot of headache later.
  • Doctrine_Manager::connection()->clear() called explicitly also keeps the memory requirements under control. Though you need to ensure you call this at the end of everything else or your references will not be available

I am pretty excited about php 5.3.x and its new features including the new garbage collection feature which uses the algorithm described in the IBM paper Concurrent Cycle Collection in Reference Counted Systems

Hope this post is useful for anyone else trying to do something similar.

Share:
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Sphinn
  • Facebook
  • Mixx
  • LinkedIn
  • Ma.gnolia
  • StumbleUpon
  • TwitThis
  • Yahoo! Buzz

Hacking DNS323 NAS

I had recently bought a Network Attached Storage DNS323 which runs two drives of 1tb each and I use them for my ever growing media including photos shot in RAW format. The good thing about this NAS box is that it runs linux and the manufacturer provides a hook to hack the system to run additional software.

I was interested in hacking this box for the following reasons:

  • I wanted to be able to run my own svn repo on it, as it was already being mirrored.
  • Instead of running this as RAID 1 mirroring, I wanted to get rsync to copy the data across on the other hard disk as RAID 1 has its own issues and I cant take chances with my media.
  • I wanted to mod the USB on this NAS box so that I can plug in an external hard disk to copy the data to an additional hdd without having to open anything up or use another computer.

I am listing the steps to get it up and running as a quick start guide for future, to avoid having to go through quite a few pages on the actual wiki page.

Copy fun_plug and fun_plug.tgz to top most level of Volume 1 and restart the NAS box.

The manufacturer has a hook in the framework itself to ensure that it looks for a file for fun_plug and execute it, this adds additional stuff that kickstarts the process, by enabling the telnet process.

Connecting the first time.
Once the box has re-booted, try telnetting to it (you will need to find its ip address, possibly using nmap (sudo nmap -sS -O 192.168.1.0/24). If you get the shell, everything got installed properly.

/#

You are now in!
Secure the box, FIRST
First run the pwconv utility and update the shadow-file

pwconv

and ensure that you promptly change your ‘root’ password as the first thing.

passwd

Now activate the root-user, which comes disabled by default

usermod -s /ffp/bin/sh root

Now ensure that the password gets written down and stored permanently on the drive.

store-passwd.sh

Enable ssh
Its a good idea to run shell sessions over ssh for obvious reasons so disable telnet and enable ssh using the following commands.

chmod a+x /ffp/start/sshd.sh
sh /ffp/start/sshd.sh start

chmod -x /ffp/start/telnetd.sh #that makes the start script unexecutable

Now reboot to confirm that telnet is off and ssh is on. Try both and if all worked properly you should only have access to login via ssh and not via telnet.

What next?
There are quite a few things you can do now that you have a proper operating system running on it, I shall install a few more packages and post about it and then setup the first three objectives why I started doing it in the first place.


Steps mostly taken from:

Share:
  • Digg
  • del.icio.us
  • Google Bookmarks
  • Sphinn
  • Facebook
  • Mixx
  • LinkedIn
  • Ma.gnolia
  • StumbleUpon
  • TwitThis
  • Yahoo! Buzz