Forecasting and Clustering in Google Colab
Data analysis often involves multiple steps — cleaning, exploring, visualizing, modeling. Two common and powerful techniques are forecasting (predicting future trends) and clustering (grouping similar data points).
In this post, we’ll show how to do both using Google Colab, walk through the code, and highlight the complexity involved — then reveal how Datastripes can simplify this to just a couple of visual nodes, no code required.
Time Series Forecasting with Prophet in Colab
Suppose you have daily sales data, and you want to forecast the next 30 days. Prophet, a tool developed by Facebook, is great for this.
The Data
Imagine a CSV like this:
ds | y |
---|---|
2024-01-01 | 200 |
2024-01-02 | 220 |
2024-01-03 | 215 |
... | ... |
Where ds
is the date and y
is the sales.
Step-by-step Code Walkthrough
# Install Prophet - this runs only once in the Colab environment
!pip install prophet
This command installs Prophet in the Colab environment. It might take a minute.
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
Here we import the necessary libraries:
- pandas for data handling
- Prophet for forecasting
- matplotlib for plotting
# Load your sales data CSV into a DataFrame
df = pd.read_csv('sales.csv')
You’ll need to upload your sales.csv
file to Colab or provide a link.
# Take a peek at your data to ensure it loaded correctly
print(df.head())
Always check your data early! Look for correct date formats, missing values, or typos.
# Initialize the Prophet model
model = Prophet()
This creates the Prophet model with default parameters. You can customize it later.
# Fit the model on your data
model.fit(df)
This is where the magic happens — Prophet learns the patterns from your historical data.
# Create a DataFrame with future dates to forecast
future = model.make_future_dataframe(periods=30)
print(future.tail())
make_future_dataframe
adds 30 extra days beyond your data so the model can predict future values.
# Use the model to predict future sales
forecast = model.predict(future)
forecast
now contains predicted values (yhat
) and confidence intervals (yhat_lower
and yhat_upper
).
# Visualize the forecast
model.plot(forecast)
plt.title('Sales Forecast')
plt.show()
You get a clear graph showing past data, predicted future, and uncertainty.
Tips for Better Forecasts
- Ensure your dates (
ds
) are in datetime format. - Check for missing or outlier data points before fitting.
- Tune Prophet’s parameters like seasonality or holidays for your context.
Clustering Customers Using KMeans in Colab
Now, let’s say you want to segment customers based on income and spending behavior.
The Data
A CSV with columns:
CustomerID | Annual Income (k$) | Spending Score (1-100) |
---|---|---|
1 | 15 | 39 |
2 | 16 | 81 |
3 | 17 | 6 |
... | ... | ... |
Step-by-step Code Walkthrough
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd
We import KMeans for clustering, matplotlib for plotting, and pandas to load data.
# Load the customer data CSV
df = pd.read_csv('customers.csv')
print(df.head())
Always check the data to understand its shape and content.
# Select the two features to cluster on
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
These columns will form a 2D space for clustering.
# Initialize KMeans with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)
Choosing number of clusters is a key step. Here we pick 3 for illustration.
# Fit the model and predict cluster assignments
kmeans.fit(X)
df['Cluster'] = kmeans.labels_
Each customer gets assigned a cluster label (0,1,2).
# Plot clusters with colors
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], c=df['Cluster'], cmap='viridis')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('Customer Segmentation')
plt.show()
The scatter plot shows customers grouped by clusters in different colors.
Tips for Better Clustering
- Normalize or scale features if they have different units.
- Experiment with cluster counts and validate with metrics like silhouette score.
- Visualize results to make business sense of clusters.
Why Is This Hard for Most People?
If you’re not a coder, these steps look intimidating: installing packages, writing code, understanding APIs, and debugging errors.
Even for tech-savvy folks, repeating these steps every time the data updates is tedious.
It takes time away from what really matters: interpreting results and making decisions.
How Datastripes Makes This Effortless
With Datastripes, you don’t need to write or understand code:
- Upload your data.
- Drag a "Forecast" node and configure date and value columns.
- Drag a "Cluster" node, pick features, and watch clusters appear.
- Everything updates live and visually, directly in your browser.
- No installs, no scripts, no errors.
Datastripes is built to turn these complex workflows into intuitive flows — freeing you to focus on insight, not syntax.
Try the live demo at datastripes.com and see how forecasting and clustering go from tens of lines of code to just two nodes.
When data analysis becomes simple, you can explore more, decide faster, and actually enjoy the process.