Saturday 4 February 2012

Incremental Builds in the Real World

The term "incremental build" (also known as "delta load") gets used a lot in the QlikView world, often as a magic solution to reducing reload times and minimising the load on underlying systems and IT infrastructure. The theory of how to implement them is, on the surface, pretty simple but as with many things in the QV world, theory and reality can be two very different things.

The first question you should ask yourself is "are incremental builds even possible?"

You might think answering that question is simple, the source table in question has a a unique record identifier and a last modified date or timestamp, so the answer is of course yes. Wrong! You need to ask another question, and forgetting this is a trap I've not only seen many people fall in to but which I'm guilty of forgetting more than once.

The important next question before laying finger to keyboard should be "are the last modified date or timestamps accurate and reliable?"

At this point you might be thinking why wouldn't they be accurate. The short answer is because people make mistakes. I have literally lost count of the number of customers I've worked with who, after running incremental builds within their QlikView apps for some time, have discovered the answers they are getting  have drifted from the numbers they get when running other reporting.

The most common cause lies with the accuracy of the last modified fields. A stored procedure or batch process is periodically run which updates records but doesn't update the last modified fields. I've even come across commercial software which updates the last modified fields sometimes and not others. Working out what is changing data but not updating the last modified fields can be tricky. The fasted way is often to talk to any system administrators and DBAs that might have an idea of what other processes update records. A rogue stored procedure might be simple to sort; a tactful word with the DBA could be all that's needed. Bugs in commercial software are not so simple to identify and solve though, even if a fix can be obtained, it may be many months before it can be implemented within the live environment.

If a fix can't be found, unfortunately there is no simple work around in QlikView, but there are a few things you can do to help. The simplest is to perform a full reload over night or after the culprit script has run.

Another important question to consider is "can records be deleted from the datasource?"

This can again cause the version of the data held by QlikView to drift from that in the source system. This is however something which can be addressed in QlikView during the incremental build process. Managing deletions is a whole topic on its own though and one I'll explain another day.

Thinking back to the first question we asked, what if there aren't last modified dates in the datasource tables? Well that doesn't necessarily mean the end of the incremental world. Ideally a DBA should be able to add a last modified field to the table which is updated each time the record is modified using a trigger, allowing you to perform a true incremental build. If this is not possible then there are still techniques and tricks than can be used to speed up a rebuild giving you what is sometimes known as a "pseudo-incremental build". Again this is a whole topic in itself and a subject for another post.

So to sum up, when thinking of implementing an incremental build process in your QlikView solution, ask the following questions before writing a single line of script:
  • Do the source system fact tables have last modified date or timestamps and unique record identifiers?
  • Are the last modified fields reliable?
  • Can records be deleted from the source system tables?

2 comments:

  1. Hi Matthew,
    I am more interested on 'pseudo-incremental build' concept and can you post this topic asap...

    Thanks,
    Sree

    ReplyDelete
  2. Hi,
    It's very interesting, can you please tell us the alternatives to perform Incremental Load.
    I'm very new to qlikview.

    Thanks in Advance.
    Pavan

    ReplyDelete