Friday 2 March 2012

Explaining Joins

Those familier with manipulating data in SQL or other BI technologies should have no problem understanding the different types of joins available in QlikView. But those who aren't sometimes need a little explanation and it's a topic that I've fond myself explaining often when training people.

The four primary logical join types, LEFT, RIGHT, INNER and OUTER are supported in QlikView. Let's consider the following eample tables and what the output for each will be:

Field1Field2
A1
B2
C3
Field2Field3
2X
3Y
4Z

As you can see, both tables contain a column called Field2 for which some, but not all, of the values have a matching entry in the other table. The syntax for performing a join in QlikView is relatively simple as follows:

MyTable:
LOAD
    Field1,
    Field2
FROM myqvd1.qvd (qvd);

LEFT JOIN (MyTable)
LOAD
    Field2,
    Field3
FROM myqvd2.qvd (qvd);


It is important to note that QlikView links records based on matching field names in the two tables being joined. In the example above, both tables contain a field called Field2 and thus QlikView will look for matches across these fields.

Also worth noting is the the table name defined in curved brackets. This defines the previously loaded table which the following table should be joined to. It is not mandetory but it is considered best practice to always specify a table to join to inorder to avoid confusion or mistakes. Without it, QlikView joins the table of data to the previously loaded table.

LEFT JOIN

In a LEFT join, all the records from the first table are kept, and only those records from the second table that match a record in the first table are kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
A1
B2X
C3Y

You will notice that there is a blank entry in Field3. This is because no matching record exists in the second table. You will also notice that the record in the second table for which there is no match in the first table has not been included.

RIGHT JOIN

In a RIGHT join, all the records from the second table are kept, and only those records from the first table that match a record in the second table are kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
B2X
C3Y

4Z

You will notice that there is a blank entry in Field1. This is because no matching record exists in the first table. You will also notice that the record in the first table for which there is no match in the second table has not been included.

INNER JOIN

In an INNER join, only records with a match across both tables will be kept. Using the above example tables, the output would be as follows:

Field1Field2Field3
B2X
C3Y

You will notice that there are no blank entries. Records from either table for which a match was not found have not been included.

OUTER JOIN

In an OUTER join, All records will be kept and where possible, records will be matched. Using the above example tables, the output would be as follows:

Field1Field2Field3
A1
B2X
C3Y

4Z

You will notice that there are blank entries in both Field1 and Field3. This is because a matching record was not found in the other table.

When Joins Go Wrong

The most common and easily made mistake when joining tables is not having at least one field named the same in both tables. When this occurs, QlikView can't make any matches within the data and so outputs all possible combinations of the records in both tables. This is known as the cartesian product. If we take the above example but rename Field2 in the second table by mistake so it no longer matches the first table like this:

Field1Field2
A1
B2
C3
Field22Field3
2X
3Y
4Z

When performing any of the join types on these two tables, QlikView would create the following result:

Field1Field2Field22Field3
A12X
A13Y
A14Z
B22X
B23Y
B24Z
C32X
C33Y
C34Z

The produced table contains all possible combinations and therefor the number of rows it contains will be the number of records in the two tables multiplied together. In this example 3 x 3 = 9 records. This isn't the end of the world when data sizes are relatively small, but when mistakes are made joining tables containing millions of records, the resulting table can take a very long time to calculate and be so large that it uses up all the physical RAM and result in the machine becoming unresponsive. Anyone working with QlikView on large datasets for very long will have learnt this lesson the hard way. I remember making a mistake whilst joinging a billion record table to itself and had to explain red faced to the customer's IT department why I needed them to restart the non-responsive server as it tried to calculate the result.

A cartesian product isn't always the result of a mistake though and can also be performed on purpose in advanced data transformations. For example, if I have a calendar table with a record for each day but what I really need is to expand it to have a record for each 8 hour shift in each day, there is more than one way to approach the problem. I could load the calendar into a temp table and then concatenate it 3 times into a resulting table denoting the shift each time, or I could create a Shifts table containing a record for each shift and join it to the calendar table providing no matching field and thus purposfully causing a cartesian product.

14 comments:

  1. Dear Matthew,

    thanks for the examples, they really cleared my mind.
    one remark: the table resulting from the OUTER JOIN shouldn't rather be

    A 1 (empty)
    B 2 X
    C 3 Y
    (empty) 4 Z

    You wrote
    A 2 (empty) and
    (empty) 2 Z

    Many thanks
    Erik

    ReplyDelete
    Replies
    1. Hi Erik

      I'm glad it helps. Yep you are right, I've corrected it above now.


      Regards
      Matt

      Delete
  2. Hey Matthew,

    Thanks for the tips... but I have a question about a necessary treatment of the nullable fields, when some data is not present in one of the tables present in an OUTER JOIN...

    specifically is:
    I need to transform the null (empty) value into -1... how can I do it?

    ReplyDelete
    Replies
    1. Hi Carlos

      I'm not sure exactly what you mean but QlikView won't connect records with a null entry in the key field.

      You can transform a null into anything by modifying the load statement like this:
      if(isnull(MyField), -1,0) AS MyField

      You can read more on working with nulls in QlikView here http://www.qlikviewaddict.com/2012/04/handling-nulls-in-qlikview.html

      Regards
      Matt

      Delete
  3. Matthew,

    I got some clues from Qlikview Community, to my problem, but it still don't answer my question... but I think that help to explain what I wanted to say...

    The clue, was to use Mapping to Map a null value and the value I want... like the example code...

    // Map null into another value
    Map_Null1:
    MAPPING LOAD
    Null(),
    -2
    Autogenerate 1;

    Map_Null2:
    MAPPING LOAD
    Null(),
    'NotApply'
    Autogenerate 1;

    //now decide what map to use where
    MAP id_subchoice USING Map_Null1;
    MAP detail USING Map_Null2;

    //now load your data
    choices:
    LOAD *
    inline [ id_choice , name , formatted
    1 , 'choice1' , 'Choice 1.'
    2 , 'choice2' , 'Choice 2.'
    3 , 'choice3' , 'Choice 3.' ]
    ;
    outer join (choices)
    LOAD *
    inline [ id_subchoice , id_choice , detail
    1 , 1 , 'choice1_A'
    2 , 1 , 'choice1_B' ];

    //Now force the reload to apply the null mapping
    LOAD id_choice,
    id_subchoice,
    name,
    formatted,
    detail
    RESIDENT choices;


    This code do near what I need... the unique problem, is that the results for the OUTER JOIN are duplicated... and the duplicated are showed without the transformation.... :/


    With this example, do you think in any possibility to make the code work?

    Thanks,
    Carlos Figueiredo

    ReplyDelete
  4. Hi Matthew,
    thank you very much it helps me a lot on clearing my doubts.

    ReplyDelete
  5. I am newbie in Qlikview Scripting. About concatenating vars... I stay all afternoon trying resolve this issue.

    Problem: How to concat a variable with a string? It is basic and stupid but I am testing all possible formats but it is impossible.

    example:

    set a = "aaaaa";
    set b = $(a) & "bbbb";

    The target is to use b like a string composed by more variables.

    ReplyDelete
    Replies
    1. Finally I obtain the solution guys! I answer myself:

      ------[my example code]-----------------------------------
      //two vars declared ('b' is 'a' and string concat)
      set a = "aaaaa";
      set b = $(a)bbbb;

      //this creates a MessageBox with message 'b', title 'msgbox', //one OK button, with icon asterisk message and put the focus //in the first button defined.
      //
      //The following code: 'as x, 2 as r Autogenerate 1;' I do not //understand but if you do not put this we will have an error //executing script

      Load
      MsgBox('$(b)', 'msgbox', 'OK', 'ICONASTERISK', 'DEFBUTTON1') as x, 2 as r
      AutoGenerate 1;
      ----------------------------------------------------------

      We define two variables like String. 'b' is a new String concatenated with 'a'. The concatenation is direct without using any special modificator like another languages as typical '.', '+' or '&'.

      For use one variable and expand it inside String, we put directly inside String declarated, between ''.

      Enjoy!

      Delete
    2. Hi

      Your method of concatenating the strings will work but there is another solution which indicates why your original wasn't working:

      SET a = 'aaa';
      LET b = a & 'bbb';

      You will notice there are 2 differences in the second line. Firstly, you will notice the command SET has been replaced with LET. If you read this post you'll see the difference between the two http://www.qlikviewaddict.com/2012/02/set-vs-let.html

      The second difference is that once we use LET, we not longer need to use $() to expand the contents of the variable "a" and can simple use the variable name instead.

      Hope this helps!
      Matt

      Delete
    3. Hello, I watched many videos and read many documents about integrating between Google map and QlikView but I still failed to integrate between them. I want to add the country of Kuwait to my QlikView, could anyone help me from A to Z on how to integrate between QlikView and Google map and I would be thankful if it was a youtube tutorial.

      Delete
  6. Thank u
    What is the defalut join in qlikview?
    Please tell me

    ReplyDelete
  7. Hello Can you please help me with join condition:

    I have 2 qvds:
    QVD1 ..(15 ml. rows, 180 columns):
    A 1 2 3 4 5
    B 1234

    QVD2 (14 ml. rows and 190 columns)

    A 1234
    B 2345
    .
    .
    Z
    Name of columns are same.

    Looking to create a QVD3 ( as a result of Inner Join of QVD 1+2)

    Will this work? OR can you suggest optimum way to do this?

    QVD3:
    aaa:
    Load *
    from QVD1.qvd

    Inner join (aaa)

    Load * from QVD2
    Where field a='x' and b='y';

    let vdate=peek(date);
    store aaa into QVD3_$(vdate);

    ReplyDelete
  8. Hi Matthew, thanks for the post.
    I don't know why I cannot make it ! :(
    I have to files with column ID in common. But does not show the fields in common in same row... it creates two row for each ID. Could you please help? Thanks!

    ReplyDelete