Skip to main content

Forecasting and Clustering in Google Colab

· 5 min read
Vincenzo Manto
Founder @ Datastripes
Alessia Bogoni
Chief Data Analyist @ Datastripes

Data analysis often involves multiple steps — cleaning, exploring, visualizing, modeling. Two common and powerful techniques are forecasting (predicting future trends) and clustering (grouping similar data points).

In this post, we’ll show how to do both using Google Colab, walk through the code, and highlight the complexity involved — then reveal how Datastripes can simplify this to just a couple of visual nodes, no code required.


Time Series Forecasting with Prophet in Colab

Suppose you have daily sales data, and you want to forecast the next 30 days. Prophet, a tool developed by Facebook, is great for this.

The Data

Imagine a CSV like this:

dsy
2024-01-01200
2024-01-02220
2024-01-03215
......

Where ds is the date and y is the sales.

Step-by-step Code Walkthrough

# Install Prophet - this runs only once in the Colab environment
!pip install prophet

This command installs Prophet in the Colab environment. It might take a minute.

import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt

Here we import the necessary libraries:

  • pandas for data handling
  • Prophet for forecasting
  • matplotlib for plotting
# Load your sales data CSV into a DataFrame
df = pd.read_csv('sales.csv')

You’ll need to upload your sales.csv file to Colab or provide a link.

# Take a peek at your data to ensure it loaded correctly
print(df.head())

Always check your data early! Look for correct date formats, missing values, or typos.

# Initialize the Prophet model
model = Prophet()

This creates the Prophet model with default parameters. You can customize it later.

# Fit the model on your data
model.fit(df)

This is where the magic happens — Prophet learns the patterns from your historical data.

# Create a DataFrame with future dates to forecast
future = model.make_future_dataframe(periods=30)
print(future.tail())

make_future_dataframe adds 30 extra days beyond your data so the model can predict future values.

# Use the model to predict future sales
forecast = model.predict(future)

forecast now contains predicted values (yhat) and confidence intervals (yhat_lower and yhat_upper).

# Visualize the forecast
model.plot(forecast)
plt.title('Sales Forecast')
plt.show()

You get a clear graph showing past data, predicted future, and uncertainty.

Tips for Better Forecasts

  • Ensure your dates (ds) are in datetime format.
  • Check for missing or outlier data points before fitting.
  • Tune Prophet’s parameters like seasonality or holidays for your context.

Clustering Customers Using KMeans in Colab

Now, let’s say you want to segment customers based on income and spending behavior.

The Data

A CSV with columns:

CustomerIDAnnual Income (k$)Spending Score (1-100)
11539
21681
3176
.........

Step-by-step Code Walkthrough

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import pandas as pd

We import KMeans for clustering, matplotlib for plotting, and pandas to load data.

# Load the customer data CSV
df = pd.read_csv('customers.csv')
print(df.head())

Always check the data to understand its shape and content.

# Select the two features to cluster on
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

These columns will form a 2D space for clustering.

# Initialize KMeans with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=42)

Choosing number of clusters is a key step. Here we pick 3 for illustration.

# Fit the model and predict cluster assignments
kmeans.fit(X)
df['Cluster'] = kmeans.labels_

Each customer gets assigned a cluster label (0,1,2).

# Plot clusters with colors
plt.scatter(df['Annual Income (k$)'], df['Spending Score (1-100)'], c=df['Cluster'], cmap='viridis')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('Customer Segmentation')
plt.show()

The scatter plot shows customers grouped by clusters in different colors.

Tips for Better Clustering

  • Normalize or scale features if they have different units.
  • Experiment with cluster counts and validate with metrics like silhouette score.
  • Visualize results to make business sense of clusters.

Why Is This Hard for Most People?

If you’re not a coder, these steps look intimidating: installing packages, writing code, understanding APIs, and debugging errors.

Even for tech-savvy folks, repeating these steps every time the data updates is tedious.

It takes time away from what really matters: interpreting results and making decisions.


How Datastripes Makes This Effortless

With Datastripes, you don’t need to write or understand code:

  • Upload your data.
  • Drag a "Forecast" node and configure date and value columns.
  • Drag a "Cluster" node, pick features, and watch clusters appear.
  • Everything updates live and visually, directly in your browser.
  • No installs, no scripts, no errors.

Datastripes is built to turn these complex workflows into intuitive flows — freeing you to focus on insight, not syntax.

Try the live demo at datastripes.com and see how forecasting and clustering go from tens of lines of code to just two nodes.


When data analysis becomes simple, you can explore more, decide faster, and actually enjoy the process.

How to use Power BI and Datastripes for data analysis

· 6 min read
Vincenzo Manto
Founder @ Datastripes

If you’re diving into data analytics, you’ve probably heard of Power BI — Microsoft’s powerful and widely used tool. But now there’s Datastripes, a fresh platform focused on making data work simple and visual, no coding needed. Let’s break down how these two stack up, so you can decide which one fits your style and needs best.

Why Datastripes Might Win the Data Race

· 5 min read
Alessia Bogoni
Chief Data Analyist @ Datastripes

In the world of data tools, it’s easy to get overwhelmed. So many platforms promise powerful analytics, dashboards, or integrations — but which one really gets you? Which one keeps things simple without sacrificing muscle?

Let’s cut through the noise and see why Datastripes stands out from the crowd — and why it might just be your new best data buddy.

Is Tableau still the king of data visualization up to 2025?

· 9 min read
Vincenzo Manto
Founder @ Datastripes

If you’ve worked with data, chances are you’ve heard of Tableau — a leading tool for data visualization and business intelligence. Tableau has earned a reputation for creating beautiful, interactive dashboards and handling complex datasets with ease. But what if you want something that’s easier to start with, requires no coding, and gives you full visibility into your data’s entire journey? Enter Datastripes.

Datastripes is a modern, no-code data platform designed to simplify data workflows from start to finish. Whether you’re cleaning data, creating visualizations, or generating reports, Datastripes puts everything in one intuitive, visual workspace — no scripts, no complicated formulas, just drag-and-drop simplicity combined with powerful AI assistance.

Let’s dive deeper and compare how Tableau and Datastripes stack up — so you can pick the right tool for your data adventure.

The magic of Datastripes — Easy Peasy Data Squeezy!

· 8 min read
Vincenzo Manto
Founder @ Datastripes
Alessia Bogoni
Chief Data Analyist @ Datastripes

Welcome to Datastripes, the freshest, most flexible data workspace designed for anyone and everyone who wants to master their data — without headaches, without fuss, and with a whole lot of fun! Whether you’re a data newbie, a savvy analyst, or a seasoned pro, Datastripes turns your complex workflows into a smooth, flowing adventure. Think of it like LEGO blocks for data: snap together powerful tools, build workflows, and watch insights come alive — all with zero coding stress.

At the heart of Datastripes lies a rich catalog of nodes — tiny engines of magic that fetch, transform, visualize, compute, and export data — each designed with simplicity, flexibility, and fun in mind.