2 questions on Smart Analytics (Bank Data) Use Case from App Market


2 questions on the above mentioned Use Case:
1) In the 'Column Dependencies' for worksheet 'columnDependenciesPurchased' we are only filtering based on the column = 'PurchasedProduct', the value of which could be 'no' or 'yes'.
In the tutorial pdf, we say "the duration of the call had the greatest impact on the number of purchased products". How do we conclude that since we have not checked whether "PurchasedProduct" = 'yes' ?
Is my understanding correct?

2) In the Decision Tree infographics, if I add all the outcomes (all Yes and No) at the leaf nodes, should I not get the exact sample data size that we started with? If I add all the outcomes from the infographics, I get 3404 but our data size is 4521.
Could you please let me know the explanation for this?

thanks & regards,

Rahul Dhond Answered


  • Avatar
    Jason Arrigo

    Hello Rahul and thanks for your careful consideration!

    To answer:
    1) The statement "the duration of the call had the greatest impact on the number of purchased products" is true even though we do not know the number of product that are being purchased! The chart with blue bars in the PDF is showing us that the duration of the call has the highest correlation to whether the customs purchased a product or not. We don't really know whether it increased the number or decreased the number but we know that it affected the number of customers with a PurchasedProduct and therefore the number of PurchasedProducts.

    2) The numbers are off due to the size of the Validation Size and Pruning, but this is expected behavior.

    With pruning enabled the default validation size fraction of 0.25 is used. Pruning requires a performance measurement. The algorithm needs test data for that. So it is required to leave out some data and not to use them when building the tree, resulting in a reduced number of records.

    We are working on improving how the Flipside visualization informs users of this. Rest assured that when the algorithm model is built, ALL records are calculated and the data on the sheet displays the full results. You can easily validate this fact by adding an additional sheet where:

    Column A = GROUPBY(#DecisionTreeSheet1!Prediction)
    Column B = GROUPCOUNT()

    This gives you no=4,155 and yes=366. The total is your original 4,521.


  • Avatar
    Rahul Dhond

    Hi Jason,
    Thanks a lot for your detailed answer. I appreciate it.

Please sign in to leave a comment.