As we continue our multipart series in teaching bankers step-by-step how to use artificial intelligence, we continue with the intermediate lesson. After laying the basics in programming (Part 1 and Part 2), we now introduce the idea of artificial intelligence (AI) and machine learning (ML). Artificial intelligence is defined as the theory and development of computer systems able to perform tasks that normally require human intelligence. Therefore, machine learning is the process in which a machine can understand on its own without being explicitly directed. Machine learning, at least how we think of it, is an application of the theory of AI. In Part 3, we go hands on with designing an algorithm that trains itself in order to help make better decisions for banks.
Getting Comfortable With Computer Science
Throughout this entire series, we are confident that you have not been shocked, blown away, or intimidated by anything we have shown you. Computer science can be a complex and technical discipline, but, for the most part, we believe it to be misunderstood in banking. Senior managers often leave AI to the technology group and we think that is a mistake. With today’s tools, AI is now largely the domain of every banker. While computer science is a highly coveted, upper echelon skill, so is finance, marketing, and accounting. The only difference is people are not worried about balance sheets taking over the world.
Let’s walk you through a simple machine learning linear regression project. Regression, or the process of understanding how datasets are related, is our most common analysis effort in banking.
While this lesson is more difficult than the previous two, the rewards are greater. This lesson will teach the basics and is the hardest in the series. After this, things get easier as next, we will explore some tools to help simplify this whole process. After spending just a few more minutes completing this lesson, you will have a grasp of ML usually only reserved for programmers.
Machine Learning In A Nutshell
To put it in context, machine learning is simply the next logical iteration of how human beings accomplish work. Machine learning can be broken down into two efforts - processing data (such as running high-level statistics) and recognizing patterns. Benedict Evans, a Silicon Valley venture capitalist does a great job of outlining the nuances of machine learning in his recent blog “Ways to Think About Machine Learning.” In his piece, he discusses the practical applications for machine learning in business. For example, Evans discusses how ML has helped retailers become more efficient by tracking the store’s most popular purchases and optimizing distance between these goods, corporations become more organized by sorting emails based on sentiment and tone and then sorting them into tranches of priority, and data analysts be more dynamic by processing audio, images, and video, where before these could only be sorted by file size.
Beyond analyzing data, ML can pick up patterns that human beings do not have the time or ability to recognize. For example, AlphaZero is a computerized chess machine that was developed by Google engineers. Unlike previous chess programs that gave their system strategies to analyze and recall, AlphaZero was only given a chess board and the rules of the game. After playing millions of games against itself, AlphaZero recognized patterns that could not be uncovered in multiple lifetimes by the greatest of grandmasters.
However, chess is a relatively simple game, played in a vacuum, with a simple end goal. If given a simple enough procedure with rules and a score, machines (due to their speed and computational ability) can uncover patterns that humans could then apply to more complex business scenarios.
Computer Science Concepts
Machine learning varies in levels of complexity. But, at its core, machine learning is a set of techniques that deal with vast data in the most intelligent fashion (using statistical concepts/algorithms/rules) to derive actionable insights.
In order for a machine or any learning entity, to learn, it must have some training. More complex algorithms can divine the data themselves, such as AlphaZero, in a method called “deep learning.” However, for more rudimentary ML applications, you will need to write or give the training data for the machine to process.
After training your algorithm, your computer will then make decisions based off of the data it was given. These predictions are the actual implementation of machine learning.
A Linear Regression Project to Demonstrate Machine Learning in Action
Getting Python, Add-ons and ML Libraries
For this project, we have exceeded the functionality found in codeskulptor (the code editor we used in the previous blogs), so this will be a graduation of sorts. We now need new tools to handle the more complex AI applications. In order to use more Python functionalities, follow this YouTube tutorial below to download Python, ML libraries, and Anaconda (a Python add-on) to your computer. This set up should take around 30 minutes.
Using these added libraries, you will have the power to run a myriad of machine learning programs. For those wanting to move beyond this lesson, you can check out the other ML functionality in the associated documentation online.
Using ML For Regression – Online Reviews as a Predictor of Credit
Using these added libraries, we are going to run a linear regression. We chose linear regression because it should be a familiar concept to most of you. While regression is renowned as a statistical model, it is also a machine learning algorithm because of its predictive and intelligent abilities given an input of data. While statistics and ML have many similarities and overlap, ML differentiates itself from statistics because while statistics is a mathematical discipline, ML is an applied field concerned primarily with optimization. If we were to take this project just one leap forward, we could use machine learning to accomplish many for tasks. For example, we may analyze which variables or collection of variables is the best predictor of credit or which statistical technique is the most accurate for prediction.
For this project, we are going to assume there is a relationship between Yelp reviews and credit. We will explain this project more in the future, but it is one of the many efforts we are working on. The question is, should we be looking at reviews (Yelp/Google/Glassdoor/etc.) to incorporate into our commercial credit analysis?
To start to answer this question, we created two parallel lists that you can access and open in “Spyder” attached as source code below. Spyder is an integrated development environment (IDE) that comes along with Anaconda. Spyder allows you to edit, open, run, and save various Python programs. The first list is a series of Yelp ratings while the second list represents corresponding internal credit ratings on a scale of 1-10 (1 being the best).
Training The Model
First, we are going to “train” our algorithm. One of the libraries we implemented, SciPy, lets us run a regression using a built-in function called “stats.linregress().” “stats.linregress()” is a function that runs a simple linear regression and computes slope, intercept, R2, P-value, and standard error based on the training data of two parallel arrays.
After we run the regression, we should define a function that estimates the value of an establishment’s credit based on its Yelp rating. The implied credit rating is equal to the intercept plus slope multiplied by the variable (Yelp).
After creating that function, it is important to verify that the regression run by our machine is statistically significant. Using an if-statement, if our P-value calculated is less than 0.05, we can assume the regression is a valid estimator. If the P-value is less than our inputted alpha, we can run a test on the regression to estimate the credit rating given a certain Yelp rating. We can call and print this value using the “estimate_value()” function we created.
We can also plot the scatter chart and regression line using the Matplot graphics library we downloaded. Feel free to also create this graphic using the following code below.
Finally, below is the output after you run the code within Spyder.
Congratulations on coding your first machine learning algorithm! This is a massive accomplishment, as few bankers have ventured this far.
Putting This Into Action
We hope this series proved to be worthwhile and you learned a lot about ML and computer science in general. Below you can find our solution to this linear regression algorithm as well as the source code to get you started.
We will be publishing our findings on how online reviews can be a predictor of credit risk in addition to continuing this series on learning artificial intelligence. In the next part, we will look at some of the tools available to bankers to make all the above easier.
Until then, be sure to play around with the code and some libraries in order to enhance your basic understanding. As data becomes more central to making decisions, being able to manipulate and analyze data using ML is a critical skill that most bankers will have to have over the next ten years.
Source Code: HERE
About the Author: This is a guest post by Dan Kim, an analyst at CenterState Bank.
Submitted by Chris Nichols on August 13, 2018