Order this Assignment Now:
£119 Apply concepts and justify decision when modelling and designing practical examples of applications using appropriate industry standard software.
Applied Business Intelligence Module Code: 55-600268
Assignment 1 Individual/Group: Group 1.1 Learning Outcomes
This assignment assesses your ability to:
Describe and critically evaluate the role and relevance of business intelligence and analytical investigation to the solution of business information problems
Explain the concepts that underpin the subject area of business intelligence, making reference to main established concepts and developing areas
Apply concepts and justify decision when modelling and designing practical examples of applications using appropriate industry standard software.
1.1 Assessment Criteria
This module will be assessed via a case study. This will involve the analysis of the data set described below. The first assignment, contributes to 40% of the final mark, and is a 10 minute group presentation that should address the question outlined below - the marks are indicated next to each question.
1.3 Submission Details
One group member is required to submit the power point presentation, through the “
001 Group Presentation - Submission Point” on blackboard, by the given deadline. 1.4 Presentation Details
Although you should answer the questions below, you should bear in mind that this scenario would in real-life be presented to the bank manager. You should therefore provide suitable output on your slides and be prepared to interpret this output in your presentation. You should have an introductory slide and a slide with some specific conclusions and another with applications to the bank. Bear-in-mind that the bank manager is unlikely to have any knowledge of data mining or statistics and will not understand or be interested in SAS code or Enterprise Miner settings. Therefore, you are required to interpret the outputs so that the bank manager can understand these. The presentation should last no more than 10 minutes, each group member is expected to present, the presentation will be stopped at 10 minutes - there will then be approximately 3 minutes of questions.
1.6 Problem Outline
For this assignment you are required to analyse a data set concerning financial transactions and details for customers at a Czech bank.
1.7 Data Provided
The final query is saved as a SAS dataset for use in Enterprise Miner. It is called
czechbk15.sas7bdat . It is available on the SHU server in the path:
You will need to create a library to access the data.
1.8 Details of the Query and Resulting Data
In this assignment you will investigate if there are any groups of accounts with similar properties. Also you will build a model to predict which accounts have a second account holder attached to that account. For this purpose a subset of variables are selected from the final combination of tables for each account. These variables can be seen to represent for each account, credits and different types of withdrawals that take place:
Credits (payments in) there is one pair of variables that gives the total paid in to the account (credit) and the number of times money is paid in (creditn).
Withdrawal (taking money out) there are two separate variables for each of the following methods of withdrawing money:
Other bank withdrawal
For each of these types of payments the number of payments (ending in –n) and the value of payments (ending in –t) has been recorded for a period of five years.
Finally additional information is held about each account:
Account id, Age of primary account holder, if they have a credit card or not (with this bank), number of days account open, if they have a loan or not, if there is a second user of the account and the gender of the main account holder ( ). There is one nominal variable: the sex frequency of their bank statements which is monthly, weekly or after transaction. This gives the set of variables as shown in the appendix. Make sure you fully understand what these variables represent - for a full list see the Appendix 1.
For this assignment we will be using only the following variables in the data set. Whilst you are working on the assignment set all the other variables to
rejected and then you will not have to keep changing them.
1.9 Analysis Required
Since the cluster analysis (which we will be carrying out in the next assignment) requires the use of fields that are as symmetrical as possible you should first investigate each of the
interval fields in the data.
Produce suitable summary measures and plots and fully interpret your results. Are there any unusual features to any of your plots? By examining the actual data records discuss why these features might occur (Hint: you may initially find it best to achieve this with the “Explore” feature available within Enterprise Miner).
Investigate the remaining binary and nominal variables by producing suitable plots. Fully discuss your results. (5 marks)
Use the transform node in Enterprise Miner and the "Maximum Normal" option for interval variables to find suitable transformations of the interval variables. You should ensure that in your scoring settings, you still retain a copy of the original variables (set both
Hide and Reject to "no").
a) Explain what actual transformations the software has picked, were any of the
interval variables not transformed - if so why do you think this is? (Hint: you may wish to include the SAS transformations table as a screen shot on your slides) (2 marks)
b) Produce further plots of the transformed variables and use these to present evidence of whether the transformations have been successful. (Refer to the lower branch of Figure 1.0 for guidance on the Enterprise Miner stream you need for this). Comment clearly on your results. (Please note that it may not be possible to make all variables totally symmetrical). Consequently, state
for each interval variable whether subsequent analysis should use the original (untransformed) variable or the new transformed variable. Hence list which set of interval variables you would use for clustering. (Hint: you may wish to show the plots of the original and transformed variables side by side in your slides) (8 marks)
The bank would also like to have an idea about the characteristics of their customers that have chosen to have a
In the data node set:
second as the target value. Now add a data partition node to the data node and set the training level 70%, the validation to 30% and the test to 0%. Add a decision tree node to the data partition node as in figure 1.0 and adjust the tree settings as per figure 1.1. Now run the Decision Tree node.
a) Using the tree diagram fully interpret the derived tree and discuss the Fit Statistics (Hint: include a screen shot of your Tree and Fit Statistics in your slides)
b) If you have a customer who has the following characteristics, would they be likely to have a
second account? Discuss the results in terms of practicality to the bank (Hint: you might wish to show the path followed through the decision tree in your slides).
Age = ., creditn = 0.01, creditt =200, stmentn = 0.02, stmentt = 10, card = y, cardwdn = 0, cardwdt = 0, insuren = 0, insuret = 0, overdtn = 0.42,
overdtt = 600, days = 800, frequency = monthly, householdn = 0, householdt = 0, othbwdn = 1000, othbwdt = 500, loanpayn = 6000, loanpayt = 98894, sex =M, cashwdn = 0, cashwdt = 0
c) If the bank wished to use this model, to look at the important factors that impact on a customer’s decision to have a
second account, what reservations might you have?
(3 marks) 2.0 Presentation
You should put all of your finds in a PowerPoint presentation. These should look clear, neat and professional and contain the correct information. The group will deliver the presentation in a 10-15 minute slot, where each group member is expected to present. The presentation should last no more than 10 minutes and there will be 3 minutes at the end for questions – further marks will be allocated for your group’s response to these questions.
(5 marks) Total Marks available: 40 marks