Tuesday, 4 December 2012

QlikView Functions: dual()

As I touched upon in the first part of my series of posts on Dates in QlikView, one of the first things developers learn about QlikView is that there are no traditional data types. What they some times don't learn is that instead QlikView stores field data as dual values containing both a string, which is used for displaying to the user, and a numerical value used for sorting and calculations.

Let's consider a simple example of a field containing calendar months, where as well as the month name, QlikView also stores the number of the month.

NumberText
1Jan
2Feb
3Mar
......

Where possible, QlikView works out the text and numeric parts automatically for many data types when loading the data. All inbuilt QlikView functions also return a dual value and when passed a dual value they automatically use either the text or numeric part depending on what type of input they are expecting. Because of this automatic behaviour, many people are totally unaware that this is how QlikView stores field data. It is however a very powerful feature which adds to the speed and efficiency of QlikView. As a developer, being able to control this behaviour and define the dual values when QlikView can't work them out automatically would also be very powerful and thankfully this is indeed possible using the dual() function.

The dual() function allows us to specify the individual text and number parts to be combined into a single dual value:

dual(MyText, MyNumber) AS MyField
As I mentioned above, QlikView automatically works out which part of the dual value to use depending upon what you are asking it to do and so we normally don't need to give it a second thought. But again we can control this behaviour using the text() and num() functions should we need override and return one specific part:

= text(MyField)
= num(MyField)

There is one important point to note about QlikView automatically assigning dual values when loading data, which confused me for a while when I first came across it and I know has caught out many others. If I load 2 records into the same field for which the text part is different but the underlying number is the same, QlikView will assign the first of the text values it comes across to all following records where the numeric value is the same. So source data that looks like this:

NumberText
4Open
5Complete
5Closed

would be converted and stored in QlikView like this:

NumberText
4Open
5Complete
5Complete

As you can see, the last text value of "Closed" is replaced with the first value QlikView reads in for the same numeric value of 5, which happens in this example to be "Complete".

Once aware that this is how QlikView behaves you can use this to your advantage however. Because it prioritises the first text value it comes across, if you control which value this is, all other values with a matching underlying numeric value will be assigned to that first value and will result in a form of data cleansing. I'll leave it up to you to experiment and find ways to leverage this.

Because of it's simplicity and a lack of understanding on how QlikView stores data, the dual() function isn't used very often and so I thought it would be good to look at some practical examples of how it can be used and the benefits it gives. Hopefully these examples will make you consider how you could used it within your own applications.

Calendar Quarters

In the next part of my series of posts on Dates in QlikView, we'll look at separating a date into it's useful component parts, one of which is calendar or fiscal quarters. When using fiscal quarters particularly, it can be confusing to the user which calendar months fall within a given quarter and so we can use the dual function to give a useful text description to the user as well as retaining the underlying quarter number to use when performing calculations. One way to do this would be to use an inline mapping table like this:

QuarterMap:
MAPPING LOAD * INLINE [
    QuarterNum, QuarterDesc
    1, Apr - Jun
    2, Jul - Sep
    3, Oct - Dec
    4, Jan - Mar
]
;

Calendar:
LOAD
    Date,
    Month,
    Year,
    dual(applymap('QuarterMap', Quarter), Quarter) AS Quarter
RESIDENT MyData;

Assuming our underlying data table has a Quarter field containing the number of the quarter, we can use the applymap() function to look up the quarter number within the mapping table and return a more meaningful description. The dual() function then combines this description with the quarter number to make a single dual value field called Quarter. When displaying the field to the users, QlikView will use the text descriptions which are a lot more meaningful. When asked to perform a calculation using the field or when sorting the quarters, QlikView uses the underlying numeric value.

Shipping Statuses

This technique for using dual() was coined "Dual Flags" by Rob Wunderlich and is both simple and powerful. The basic principal is that rather than simply using 1 or 0 as a flag, you use dual() to create a dual value flag containing a more meaningful description also. For example, let's imagine we have a set of shipping data. Each record relates to an order that has either been shipped or not. Many would be tempted to create a simple flag to indicate the shipping status like this:

if(NOT isnull(ShipDate), 1, 0) AS Shipped
Instead we can use the dual() function as follows:

if(NOT isnull(ShipDate), dual("Shipped", 1), dual("Not Shipped", 0)) AS Shipped
This creates a dual value flag which opens up many more options for how we can use it. We can add the field in a list box which will display the more logical "Shipped" rather than a simple 1. All of the following would now also be possible and valid to use within expressions and/ or the script:

if(Shipped, ....
if(Shipped=1, ....
if(Shipped="Shipped", ....
count({$<Shipping={"Shipped"}>} ShipID)
sum(Shipped)

An interesting point to note here is that we can use sum(Shipped) rather than the less efficient count({$<Shipped={"Shipped"}>}ID) or even just count(Shipped). A sum is a simpler computational calculation than a count and so when working with very large data sets, the speed improvement of using sum() can be considerable.

CRM Statuses

I load data from a CRM system and each opportunity has a status such as "Lead", "In Progress", "Qualified", etc. Like the example of Quarters above, I can use a mapping table and applymap() to give each status an underlying number as well as the text description as follows:

StatusMap:
MAPPING LOAD * INLINE [
     OpStatus, OpStatusID
     Failed, 0
     Lead, 1
     In Progress, 2
     Qualified, 3
     Won, 4
     Closed, 5
]
;

Opportunities:
LOAD
     OpID,
     OpDate,
     dual(OpStatus, applymap('StatusMap', OpStatus)) AS OpStatus
RESIDENT MyData;

As before, when displaying OpStatus to the users we see the meaningful text description. But we can also use some interesting and powerful calculations because of the underlying number. For example we can count all active opportunities using the following simple set analysis:

count({$<OpStatus={">=Lead <=Won"}>} OpID)

I hope these examples show just some of the uses for the dual() function and that you might consider the benefits of using it in your QlikView applications going forward.

14 comments:

  1. awesome. Thanks for the tips and well written examples. Really helps to see it spelled out like this.

    ReplyDelete
  2. Excellent article, thank you.

    ReplyDelete
  3. Very deeply explained.. i was not aware of this magical behavior of dual function..
    Thanks for such a nice article

    ReplyDelete
  4. Great post with some nice examples.

    ReplyDelete
  5. excellent article. crisp and clear, thanks.

    ReplyDelete
  6. i have a column\
    ------------
    NONCORE- COLLECTION
    CORE-CODING
    ABSET
    BREAK
    -----------------------
    i used "Left(NAME,INDEX(NAME,'-')-1)AS CATAGORIES;"
    then it wil come

    -------
    CORE
    NONCORE
    ------------
    but not absent and break
    how do i get absent and break

    ReplyDelete
  7. Try looking at the subfield() function. It should do what you need. Something like subfield(NAME, '-', 1)

    Regards
    Matt

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Matthew,
    I have been trying to replicate your last Example (CRM) but the count expression does not work for me.
    When I change the Load and Won to 1 and 4 is working but not with the Text values.What the data behind the 'Opportunities' table?

    ReplyDelete
  10. Hi Robert
    It sound like you haven't created correct dual values as shown in the last but one example?

    Regards
    Matt

    ReplyDelete
    Replies
    1. Are you able to attached a sample or link to a sample?
      Thank you

      Delete
  11. Hi Matthew,

    Thanks for the article. Very well explained!

    One question: I usually prefer to use numeric values on the set analysis expressions because I was told text is more costly to evaluate. Thinking about large applications, are there any performance draw backs for using text on the set analysis expressions when using dual fields?

    Regards,
    Cesar

    ReplyDelete
  12. Hi Cesar

    That's a great question, and one that doesn't have a straightforward answer. It's a topic that could be the subject of an entire post in itself, but I'll try and keep the answer short here.

    One thing to keep in mind is that set analysis isn't like applying a where clause which is then tested against every record one by one. When you specify values in set analysis, you are essentially specifying a set of selections that should be made for the purpose of performing the calculation. Because it is simply making a selection, how fast that will be applied can be demonstrated by how quickly QlikView can apply a selection when you make one manually within a list box.

    As QlikView only stores distinct values within a field and what are effectively pointers between those values to indicate when they relate to values in other fields, whether you select a string or a number, it still has to follow the same number of pointers. And so any difference results solely in it's ability to test the desired value against the distinct list of values within that field once.

    Hopefully you can see that any difference would in most cases be so small that it's one of those things where I would personally go with the option that offered the best usability and only look at alternatives if I had an explicit performance issue. And if I did have a performance issue, there are a lot of other factors that would have a much more dramatic effect on it that whether I was specifying a string or a number in my set analysis.

    Regards
    Matt

    ReplyDelete