Wednesday 28 March 2012

Creating Empty Tables

I've had a couple of conversations over the last week or so on the subject of building scripts to be as robust as possible. It's a pretty large subject though and would be a huge post, so instead I've decided to pick out one topic in particular that came up and leave the rest for another day.

You'll have guessed from the title of this post that the topic I've chosen is around how to create an empty table in QlikView, and more importantly, why the hell would I want to anyway.

Let's consider a scenario, I have a number of spreadsheets containing budget data, they all have a similar structure and each spreadsheet contains a single month's budget. The file names all use the same naming convention starting "Budget" followed by the year and month for which they hold data (eg. "Budget 2012 Jan.xls").

My first thought might be to simply use a wild card load in QlikView like this:

MyBudgets:
LOAD
    *
FROM [Budget*.xls] (biff, embedded labels, table is Budget$);


There us one problem with this though, the keen eyed will notice I said that the files have a similar structure which would suggest they aren't all exactly the same. Because of this a wildcard load would fail. Even if it did work, without the same structure the files wouldn't be concatenated into the same table anyway.  

Alternatively I could choose load the files in one-by-one as I know their file names, concatenating them into the same table to use in my app. But this would mean that if I add more budget files in the future, I have to modify the script to add the new tables and this would create a high maintenance app.

Instead I'd like to loop through all the available files and load them into the same table which is simple enough to do in QlikView like this:

FOR EACH File IN filelist('Budgets\Budget*.xls')
    CONCATENATE(MyBudgets)
    LOAD
        *
    FROM [$(File)] (biff, embedded labels, table is Budget$);
NEXT


There is a problem with this code though, the table MyBudgets doesn't yet exist and thus, for the first file it tries to load, the CONCATENATION statement will fail. As with everything in QlikView there is more than one solution to this problem. One approach is to check within the loop if the table exists and if not then create the table instead. This is a perfectly valid method but doubles the amount of code I need to write. My preferred is to create an empty table called MyBudgets first so that the loop can simply concatenate all files to it.

Creating an empty table in QlikView is really simple using an inline load statement. Going back to my scenario, I know that the budget files all contain the field BudgetID and so I can create an empty MyBudgets table containing that field and then concatenate all my files to it like this:

MyBudgets:
LOAD * INLINE [BudgetID];

FOR EACH File IN filelist('Budgets\Budget*.xls')
    CONCATENATE(MyBudgets)
    LOAD
        *
    FROM [$(File)] (biff, embedded labels, table is Budget$);
NEXT


Problem solved! You can use this trick even if you don't know the name of a field. Simply create the empty table with a dummy field name such as "MyDummyField" and then drop the field after you've concatenated the tables to it.

Monday 19 March 2012

When QlikView Data Types Go Wrong

Writing the previous post Removing Leading Zeros reminded me of a lesson I learnt (the hard way) a few years ago about how QlikView perceives data types.

Anyone who has worked with QlikView for more than a few days will have realised that you don't have to explicitly tell it which data type a field contains. Instead, QV decides for itself and it does this at a value level rather than having a single data type for every entry in a field. Normally this makes a developer's life very simple and helps to speed up the process of building data models. I'm sure no one, including myself, would argue it is a bad thing but there is one situation where this can cause you a huge headache.

QlikView is possibly a little too clever when it comes to what it considers to be a number, and as well as looking for values which are composed of all digits, it looks for other possible number formats also. For example, QlikView considers that a value of 5E3 means 5 times 10 to the power of 3 which if you perform the math is 5000 in digit form.

On the surface this might not seem a problem but a few years ago I came across a customer who was complaining of lost products in their QlikView application and after scratching my head for 2 days trying to find the problem I finally discovered the cause of the issue. The client's product IDs where made up of 6 hexadecimal characters. For those who aren't familiar with hexadecimal, as well as the numbers 0 to 9, hex uses the letters A through F also and this obviously means its perfectly possible to have values such as 0005E3. And as you will have guessed by now, this was considered the same product as 005000.

What makes this problem even more difficult to find is that QlikView displays whichever of two values it happens to come across first. So if it finds product 0005E3 first, product 005000 seems to have disappeared and its data is linked to the former. If it comes across product 005000 first, product 0005E3 seems to have disappeared instead.

Thankfully fixing the problem is a lot easier than finding it and simply wrapping the field in the text() function forces QlikView to consider them as strings rather than numbers and thus won't consider the values as the same.

text(ProductID) AS ProductID

The chances of you stumbling upon this same issue are pretty slim but if you ever do, knowing this will hopefully save you a lot of time and a very real headache. That aside, hopefully I've made you think a little about how QlikView interprets the data you give it.

Wednesday 14 March 2012

Removing Leading Zeros

I was talking to someone this morning who asked me if it was possible to remove the leading zeros coming from his data source. After an initial "yes of course" followed by a longer period of thought, the solution we ended up with was as follows:

replace(ltrim(replace(MyField,'0',' ')),' ','0') AS MyField

This code first uses the replace() function to replace all zeros in the field with a space and then uses the ltrim() function which removes spacing from the begining of a string only. We then use replace() again to convert any remaining spaces back to zeros.

Thursday 8 March 2012

Inline Load Trick

Even the newest of QV developers have likely used an inline load and be familiar with the below screen. For those that might not have come across them, inline loads allow you to define a table of data within the QV script. You can use to wizard to create them which is available from the Edit Script dialog by using the menu, Insert -> Load Statement -> Load Inline.



There is however a feature of the Inline Data Wizard that often gets overlooked. If you look you will notice a Tools menu containing only a single tool, "Document Data...". Clicking on this opens up a new window like this:


This gives you the option of inserting values from existing fields in the data model. Please note, QlikView can't guess what the values will be when the script is next run so will show you the fields and values currently available in the app from the last time the script was run successfully. Clicking OK will paste the values into the Inline Data Wizard window.


And it's as simple as that. I most commonly use inline loads to create small, static mapping tables and this obviously means taking data from the existing data model and mapping them to groups or new values. This little trick is perfect for this and can save you having to create a temporary list box in the front end t copy the values from.

Wednesday 7 March 2012

Looping Through Excel Sheets

I answered a question on the QlikCommunity the other day and it reminded me of something I've not done for a little while, looping through sheets in an Excel file within a QV script.

The key to achieving this is to use an ODBC connection to connect to the Excel file first, and then use the SQLTables command to return a table listing the sheets (tables) within the file. Once you have this you can loop through this table and load the sheets like this:

ODBC CONNECT TO [Excel Files;DBQ=ExcelFile.xlsx];

XlsTables:
SQLTables;

DISCONNECT;

LET vRows = NoOfRows('XlsTables');

FOR i = 0 TO $(vRows)-1
    LET vSheetName = subfield(peek('TABLE_NAME', i,'XlsTables'),'$',1);

    $(vSheetName):
    LOAD
        *
    FROM [ExcelFile.xlsx]
    (ooxml, embedded labels, table is '$(sheetName)');
NEXT i

DROP TABLE XlsTables;


This little trick can come in handy and with a few tweaks it can also address a number of other senarios for dealing with Excel spreadsheets such as:
  • What if I want to load from an .xls file instead of a .xlsx file? Not a problem, simply change the Connect and Load statements accordingly.
  • What if I want all the sheets loaded into a single table? You can create a blank table first, then use the above to concatenate the sheets into it rather than into their own named tables.
  • What if the Excel file always has one sheet but it is named differently each time? Tweak the above to load the sheet into a known table in QV.
  • What if, rather than loading all sheets from the file, I want to load only those which start with a known string (eg. "My Data Tab 01/01/2012", "My Data Tab 01/02/2012", etc). Simply wrap an IF statement around the LOAD to check if the table name starts with the required string.
I've probably only ever used this trick a handful of times, but its a very useful tool to have in your pocket should you need it.

Friday 2 March 2012

Explaining Joins

Those familier with manipulating data in SQL or other BI technologies should have no problem understanding the different types of joins available in QlikView. But those who aren't sometimes need a little explanation and it's a topic that I've fond myself explaining often when training people.

The four primary logical join types, LEFT, RIGHT, INNER and OUTER are supported in QlikView. Let's consider the following eample tables and what the output for each will be:

Field1Field2
A1
B2
C3
Field2Field3
2X
3Y
4Z

As you can see, both tables contain a column called Field2 for which some, but not all, of the values have a matching entry in the other table. The syntax for performing a join in QlikView is relatively simple as follows:

MyTable:
LOAD
    Field1,
    Field2
FROM myqvd1.qvd (qvd);

LEFT JOIN (MyTable)
LOAD
    Field2,
    Field3
FROM myqvd2.qvd (qvd);


It is important to note that QlikView links records based on matching field names in the two tables being joined. In the example above, both tables contain a field called Field2 and thus QlikView will look for matches across these fields.

Also worth noting is the the table name defined in curved brackets. This defines the previously loaded table which the following table should be joined to. It is not mandetory but it is considered best practice to always specify a table to join to inorder to avoid confusion or mistakes. Without it, QlikView joins the table of data to the previously loaded table.

LEFT JOIN

In a LEFT join, all the records from the first table are kept, and only those records from the second table that match a record in the first table are kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
A1
B2X
C3Y

You will notice that there is a blank entry in Field3. This is because no matching record exists in the second table. You will also notice that the record in the second table for which there is no match in the first table has not been included.

RIGHT JOIN

In a RIGHT join, all the records from the second table are kept, and only those records from the first table that match a record in the second table are kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
B2X
C3Y

4Z

You will notice that there is a blank entry in Field1. This is because no matching record exists in the first table. You will also notice that the record in the first table for which there is no match in the second table has not been included.

INNER JOIN

In an INNER join, only records with a match across both tables will be kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
B2X
C3Y

You will notice that there are no blank entries. Records from either table for which a match was not found have not been included.

OUTER JOIN

In an OUTER join, All records will be kept and where possible, records will be matched. Using the above example tables, the output would be as follows:

Field1Field2Field3
A1
B2X
C3Y

4Z

You will notice that there are blank entries in both Field1 and Field3. This is because a matching record was not found in the other table.

When Joins Go Wrong

The most common and easily made mistake when joining tables is not having at least one field named the same in both tables. When this occurs, QlikView can't make any matches within the data and so outputs all possible combinations of the records in both tables. This is known as the cartesian product. If we take the above example but rename Field2 in the second table by mistake so it no longer matches the first table like this:

Field1Field2
A1
B2
C3
Field22Field3
2X
3Y
4Z

When performing any of the join types on these two tables, QlikView would create the following result:

Field1Field2Field22Field3
A12X
A13Y
A14Z
B22X
B23Y
B24Z
C32X
C33Y
C34Z

The produced table contains all possible combinations and therefor the number of rows it contains will be the number of records in the two tables multiplied together. In this example 3 x 3 = 9 records. This isn't the end of the world when data sizes are relatively small, but when mistakes are made joining tables containing millions of records, the resulting table can take a very long time to calculate and be so large that it uses up all the physical RAM and result in the machine becoming unresponsive. Anyone working with QlikView on large datasets for very long will have learnt this lesson the hard way. I remember making a mistake whilst joinging a billion record table to itself and had to explain red faced to the customer's IT department why I needed them to restart the non-responsive server as it tried to calculate the result.

A cartesian product isn't always the result of a mistake though and can also be performed on purpose in advanced data transformations. For example, if I have a calendar table with a record for each day but what I really need is to expand it to have a record for each 8 hour shift in each day, there is more than one way to approach the problem. I could load the calendar into a temp table and then concatenate it 3 times into a resulting table denoting the shift each time, or I could create a Shifts table containing a record for each shift and join it to the calendar table providing no matching field and thus purposfully causing a cartesian product.