In the world of finance, predicting stock prices accurately is a challenging yet essential task. One popular method used for making predictions is linear regression. In this post, we’ll delve into the fundamentals of linear regression and demonstrate how to use it in python to forecast stock prices.
What is Linear Regression?
Linear Regression is a statistical method that seeks to establish a linear relationship between a dependent variable (target) and one or more independent variables (features). It assumes that this relationship can be represented as a straight line and aims to find the best-fitting line through the data points, minimizing the sum of squared residuals (the difference between the predicted and actual values). Once the line is established, it can be used to make predictions for new data points based on their features.
Using Linear Regression to Forecast Stock Prices:
Step 1: Data Collection
The first step is to collect historical stock price data for the particular stock you want to forecast. Various financial APIs (e.g., Yahoo Finance, Alpha Vantage) or data providers offer access to historical stock price information, which can be obtained using Python libraries like Pandas and yfinance.
Step 2: Data Preprocessing
Clean and preprocess the data to ensure it is suitable for training the model. This involves handling missing values, scaling the data if necessary, and splitting it into training and testing sets.
Step 3: Feature Selection
Identify the relevant features that could impact the stock price. These features could include factors like market index values, trading volume, and past stock prices. However, it’s crucial to avoid overfitting by selecting only the most meaningful features.
Step 4: Implementing Linear Regression
In Python, you can use libraries like Scikit-learn to implement Linear Regression easily. Import the necessary libraries and instantiate the Linear Regression model. Then, fit the model to the training data using the fit() method.
Step 5: Model Evaluation
Evaluate the performance of your trained model using metrics like Mean Squared Error (MSE), R-squared (R²), or Mean Absolute Error (MAE). These metrics will give you insights into how well your model is performing on the test data.
Step 6: Forecasting Stock Prices
With your trained and validated Linear Regression model, you can now use it to forecast future stock prices. Prepare the features for the future time periods, and input them into the model’s predict() function to obtain the predicted stock prices.
Following these steps, I attempted to forecast stock prices for 6 different tech and non-tech companies based on interest rates, crude oil prices, and Google search trends. The 6 technology companies I selected were Apple, Google, Amazon, Microsoft, IBM, and META, all of which are the best-performing technology stocks in my portfolio. For the non-tech stocks, I once again selected a few of the best-performing non-technology stocks in my portfolio as well as a few others: Johnson & Johnson, Norwegian Cruise Line, Walmart, Target, Home Depot, and Starbucks.
Using yahoo finance and the yfinance library in python, I downloaded the monthly historical data for all of these stocks for the past 5 years. The monthly historical interest rates and crude oil prices data were downloaded from Fred, and Google search trends from google analytics. Using pandas, I created one large data frame that contained the percentage change of all of the stock prices as well as the change in interest rates, crude oil prices, and Google search volume.
Using the Scikit-learn packages in python, I ran linear regression for all 12 stocks and found the most significant relationships by analyzing the p-value. When the p-value is less than 0.05, the relationship is considered to be statistically significant. However, since stocks are known to be incredibly volatile, many data scientists consider using a significance level of 0.1 instead. The linear regression table output for Apple, as well as the significant relationships found from this model, are listed below:
Something interesting to note from these results is that crude oil was found to have a positive relationship with stocks at a monthly frequency. This contradicts many studies that have found that commodities negatively affect the stock market, which could be a potential error in my study and will be revisited later.
Following this, I used the .predict() function off of the linear regression model to do an in-sample prediction for stock prices. To visualize the accuracy of these predictions, I graphed the actual values on the x-axis and the predicted values on the y-axis and drew a line of best fit to compare the results. The graph for Apple is shown below:
The value of the slope is the same as the R^2 value in the Apple linear regression table output. It can be seen that the model was not quite accurate, which was to be expected since there were only 3 features. As there are many factors that affect stock prices, using many features is important to obtain accuracy. It is critical to find the right balance to ensure that the model is not overwhelmed but also accurate.
Disclaimer:
Remember that predicting stock prices involves inherent risks, and past performance is not indicative of future results. Always seek professional financial advice and conduct extensive research before making any investment choices.
The dashboard featured on Quantann is a great tool to look at the historical prices of these stocks and apply technical indicators as well. I hope to expand on the results of this study and see if there are any existing relationships between technical indicators and stock prices through linear regression using the dashboard.