Analytics University: June 2018

Saturday, June 23, 2018

8 Data Science Projects in Banking

Financial data analysis is as much a broad area as Finance. You can use it for managing/mitigating different types of financial risk, taking decisions on investment, managing portfolio, valuing assets etc. Below are a few beginner level projects you can try working on.

1- Build a Credit Scorecard Model - Credit scorecards are basically used to assess credit worthiness of customers. Use German Loan data-set (publicly available credit data) to build credit scorecard for customers. The data set has historical data on default status of 1000 customers and the different factors that are possibly correlated with the customer’s chances of defaulting such as salary age, marital status etc. and attributes of the loan contract such as term, APR rate etc. Build a classification model (using techniques like Logistic Regression, LDA, Decision Tree, Random Forest, Boosting, Bagging) to classify good and bad customers (default and non default customers) and use the model to score new customers in future and lend to customers that have a minimum score. Credit scorecards are heavily used in the industry for taking decisions on grating credit, monitoring portfolio, calculating expected loss etc.

2- Build a Stock Price Forecasting Model - These models are used to predict price of a stock or an index for a given time period in future. You can download stock price of any of the publicly listed companies such as Apple, Microsoft, Facebook, Google from Yahoo finance. Such data is known as uni-variate time series data. You can use ARIMA (AR, MA, ARMA, ARIMA) class of models or use Exponential Smoothing models.

3- Portfolio Optimization Problem - Assume you are working as an adviser to a high net worth individual who wants to diversify his 1 million cash in 20 different stocks. How would you advise him? you can find 20 least correlated stocks (that mitigates the risk) using correlation matrix and use optimization algorithms (OR algos) to find out how you would distribute 1million among these 20 different stocks.

4- Segmentation modelling - Financial services are increasingly becoming tailored made. Doing so helps banks in targeting customers in a in a more efficient way. How do banks do so? They use segmentation modelling to cater differently to different segments of customers. You need historical data on customer attributes & data on financial product/services to build a segmentation model. Techniques such as Decision Trees, Clustering are used to build segmentation models.

5- Revenue Forecasting - Revenue forecasting can be done using statistical analysis as well (apart from the conventional accounting practices that companies follow). You can take data for factors affecting revenue of a company or a group of companies for a set of periods of equal interval (monthly, Quarterly, Half year, annual) to build a regression model. make sure you correct for problem of auto-correlation as the data has time series component and the errors are likely to be correlated (that violates assumptions of regression analysis)

6- Pricing Financial Products : You can build models to price financial products such as mortgages, auto loans, credit card transactions etc. (pricing in this case would be charging right interest rate to account for the risk involved, earn profit from the contract and yet be competitive in the market). You can also build models to price forward, future, options, swaps (relatively more complicated though)

7- Prepayment models - Prepayment is a problem in loan contracts for banks. Use loan data to predict customers could potentially prepay. You can build another model in parallel to this to know if a customer prepays, when is he likely to prepay in the life time of the loan (time to prepay). You may also build a model to know how much loss the company would incur if a section of the portfolio of customer prepay in future.

8 - Fraud Model - These models are being used to know if a particular transaction is a fraudulent transaction. Historical data having details of fraud and non-fraud transactions can be used to build a classification model that would predict chances of fraud happening in a transaction. Since we normally have high volume of data, one can try not just relatively simpler models like Logistic Regression or Decision trees but also should try more sophisticated ensemble models.

Wednesday, June 20, 2018

ANOVA | Analysis of Variance

Analysis of variance (ANOVA) is used to compare means of two or more samples. While t test can be used to compare means for two samples, it can not be used for more than two. Anova is used in such situation.

ANOVA was invented by Sir Ronald Fisher who applied this technique first to agriculture and cotton industry

It is now a popular technique used in many areas, most notably in design of experiments.

Sunday, June 10, 2018

Common Mistakes Made in Cross Validation | Machine Learning

In this video we will discuss few of the common mistakes often made while performing cross validation of Machine Learning Models. While root means square error and accuracy rate are the two most popular metrics used in evaluating model performance in cross validation, there are limitations of using these when the performance is more important for the researcher in one section of the data than the other

For example we could be interested in better performance in predicting house price of a segment of the sample (say the middle priced houses) than the other segments. Similarly, we could be interested in predicting more default customers than the non default customers in a classification set up.

Sunday, June 3, 2018

Occam's Razor (Parsimony ) in Machine Learning | Model Selection

Occam's Razor (Parsimony ) is one hypothesis that states that out of all possible models that provides similar results (or performance), the one that is most simple should be selected as the final model.

It dates back to many centuries ago when it was studied,not in relation to ML though but was studied in general. This is now widely accepted means of selecting the best model out of many models

Why everyone should learn some Data Science?

Everyone, irrespective career choices should learn some data science. Data Science skills are very useful every where. As part of data science you learn statistical analysis, forecasting, data visualization & mathematical programming that are very useful no matter which career you are interested in . Computational skills are going to be very important in future in any jobs

Friday, June 1, 2018

No Free Lunch theorem in Machine Learning

The No Free Lunch theorem in Machine Learning says that no single machine learning algorithm is universally the best algorithm. In fact, the goal of machine learning models is not find an algorithm that will be the best.

If one algorithm works good for a given problem it may not work well for some other problem. So there is no universally best algorithm that works very well in all cases

Pages