I am currently working on some new user import code for Oxfords installation of Bodington (WebLearn) which takes a feed of about 35000 users and creates accounts and groups for them. Due to the large number of users involved and the small number of changes that are made each day it is important to try and keep the import running fast. At the moment the bottleneck seems to be the database which eats up most of the CPU during the import, a quick look at a process listing shows PostgreSQL using 80% or the CPU and Java using 10%. Looking at the statement logging it seemed that for each user it was always updating the user in the database even if nothing had changed. Update statements are expensive for a database compared to select statements so cutting down the number of updates seemed an obvious way to improve the performance.
Every object that is stored in the database by Bodington extends org.bodington.database.PersistentObject which provides the skeleton for database objects One feature of PersistentObject is that it keeps track of wether an object has unsaved changes or not. This is done through calling the setUnsaved() method which is done by most setters of the subclasseds. Eg:
public void setName(String name)
{
this.name = name;
setUnsaved();
}
Now this means that even if the name is set to the same value the object is flagged as being unsaved. One option would have been to have add checking to the userimporter so that it only updateded the name of a user if it was different to the current value:
if (!user.getName().equals(newName))
user.setName(newName);
However there is no reason why other sections of Bodington shouldn't make use of this checking and it is also a database issue so I moved this code into the setter and ended up with:
public void setName( String name )
{
if (name == null)
throw new IllegalArgumentException("Name cannot be null");
if (name.equals(this.name)
return;
this.name = name;
setUnsaved();
}
The checking to see that if name == null is because the database has a NOT NULL constraint and this also prevents a NullPointerException from being thrown if setName(null) is called. I added this checking to org.bodington.server.realm.Users and org.bodington.server.realm.Aliases and a quick test of updating 1000 users gave the results:
- Always Saving: 1minute 3 seconds.
- Saving Changes: 30 seconds.
which isn't a bad improvement in performance. One side note is that PersistentObject.save() always saves the object and I could have changed it so that it only saved changed objects but I'm not sure that all the setters of the subclasses call setUnsaved() so I am doing my checking in the importer.
The group mapping has come wholesale from some old user import code with a few changes and just needs a little tweaking and it should be nearly ready for some proper testing.