In a traditional data pipeline designed to handle large volumes of data, the classic go‑to for distributed batch processing has long been Hadoop (HDFS + YARN + MapReduce). Hadoop is powerful, but it’s also famously resource‑hungry. We’re talking 64GB to 96GB of RAM just to run the cluster comfortably, before you even touch your actual data processing. It’s no damn surprise that many startups and SMEs skip the on‑prem headache entirely and move straight to cloud‑based data warehouses like Databricks, BigQuery, or Snowflake. But those come with their own cost spiral: every query, every second, every gigabyte slowly chips away at your budget.
What if i told you there’s a middle path?
You don’t need racks of expensive hardware to host your storage layer. You don’t need to be tied to a cloud provider that bills you like a metered taxi ride.
In this post, I’ll show you a lightweight data stack centered around MinIO, an S3‑compatible object store that replaces HDFS. You can host it on a small VPS or even your own laptop. MinIO itself isn’t a data warehouse; it’s the storage foundation. You pair it with a lightweight query engine (e.g., DuckDB, Polars, or Presto) that runs on the same box and queries data directly from MinIO using S3 APIs.
The result? A stack that won’t handle every extreme edge case, but will cover 95% of what a growing startup actually needs, for a fraction of the cost and complexity of either a classic Hadoop cluster or a cloud warehouse.
MinIO Introduction
MinIO is a high-performance, open-source object storage system that’s fully compatible with Amazon S3. Think of it as a self‑hostable alternative to S3, same APIs, same ecosystem, same tooling, but running on your own hardware (or a modest VPS).
Written in Go, MinIO is surprisingly lightweight. You can start with a single small server and later scale out to petabytes of distributed storage without changing a single line of application code. That’s the real beauty: if your app can talk to S3, it can talk to MinIO (no rewrites, no lock‑in).
For a lean startup data pipeline, MinIO replaces the heavyweight HDFS (Hadoop Distributed File System) and gives you a modern, cloud‑native storage layer, and generally faster than HDFS.

NOTEIn this tutorial, we’ll be setting up the commercial version of MinIO (AIStor Free Tier).
MinIO has two major distributions:
| Open Source (AGPLv3) | Commercial AIStor (Free Tier) | |
|---|---|---|
| Docker command | quay.io/minio/minio | quay.io/minio/aistor/minio |
| License key needed? | No | Yes (free) |
| Official updates? | ⚠️ No longer published | ✅ Actively maintained |
| Web console | Minimal (stripped down) | Full-featured |
| Cluster Support | ✅ Yes | ❌No |
| We’ll be setting up the free commercial version in this blog |
MinIO Setup (Commercial AIStor)
The Manual Setup
The commercial version of MinIO is actively maintained and has a much better web UI compared to the open-source one. In this post, I’ll be setting up the free commercial version, as its better for demonstrating MinIO’s features
Expect to:
- Download the MinIO binary
- Setup storage directories and permissions manually
- Set up systemd services (so it survives reboots)
- Get a free license key from MinIO
Download the MinIO binary
We’ll download the MinIO binary, since MinIO is made in a compiled language like Go, we do not need to setup Java and JVM to run the application
wget https://dl.min.io/aistor/minio/release/linux-amd64/miniochmod +x minio./minio --versionfor the output of ./minio —version, you’ll see something like this
minio version RELEASE.2026-04-14T21-32-45Z (commit-id=9d4e0e68f7c891e26b363d17f6dde2a00f6b69d1)Runtime: go1.26.2 linux/amd64License: MinIO AIStor LicenseCopyright: 2015-2026 MinIO, Inc.Setup storage directories & permissions manually
Step 1: Create the MinIO User
For security, MinIO should run as its own user, not root:
sudo useradd -r -s /sbin/nologin minio-userStep 2: Create the Configuration Files
First, create the environment file that systemd reads:
sudo mkdir -p /etc/defaultsudo tee /etc/default/minio << 'EOF'# Volume to be used for MinIO serverMINIO_VOLUMES="/data"# API and Console portsMINIO_OPTS="--address :9000 --console-address :9001"# Pointer to the main configuration fileMINIO_CONFIG_ENV_FILE=/etc/minio/config.envEOFNext, create the main config file with your credentials:
sudo mkdir -p /etc/miniosudo tee /etc/minio/config.env << 'EOF'# Root credentials - CHANGE THESE IN PRODUCTIONMINIO_ROOT_USER=adminMINIO_ROOT_PASSWORD=admin123EOF
# Lock down permissions — this file contains secrets!sudo chmod 600 /etc/minio/config.envsudo chown -R minio-user:minio-user /etc/minioCAUTIONReminder to change the password
admin123in production
Step 3: Create the Data Directory
sudo mkdir -p /datasudo chown -R minio-user:minio-user /dataSet up systemd services
Now that you have the MinIO binary ready and permissions setup, it’s time to make it run like a proper citizen on your Linux server, starting on boot, restarting if it crashes, and logging properly. That’s where systemd comes in.
MinIO uses a two-file configuration approach that’s actually pretty clever
| File | Purpose |
|---|---|
/etc/default/minio | Systemd environment variables (changes here require systemctl restart) |
/etc/minio/config.env | Main configuration (changes here only need mc admin service restart) |
This split lets you change things like root credentials without a full systemd restart. Neat, right?
Step 1: Download the systemd Service File
sudo curl -o /etc/systemd/system/minio.service \ https://raw.githubusercontent.com/minio/minio-service/master/linux-systemd/minio.servicesudo systemctl daemon-reloadThe service file is pre-configured to:
- Run as
minio-user - Expect the binary at
/usr/local/bin/minio - Read environment variables from
/etc/default/minio
Step 2: Move the Binary to the Right Location
sudo mv minio /usr/local/bin/Step 3: Start and Enable MinIO
# Start the service nowsudo systemctl start minio# Make it start automatically on bootsudo systemctl enable minio# Check if everything is healthysudo systemctl status minioIf you see Active: active (running) in green , then congratulations, you’ve finished the hard part
To monitor logs in real time:
sudo journalctl -u minio -fQuick Test
Once MinIO is running, open your browser to:
http://your-server-ip:9001If you see this screen then you’re ready to rock

Get a license key from MinIO
The next step is to get a License key from MinIO via this link, make sure to click the free version and enter your email and name

After you’ve filled this in, they should send a license key straight in your inbox

Add the license key to your environment configuration file:
sudo tee -a /etc/minio/config.env << 'EOF'# Your MinIO AIStor License KeyMINIO_LICENSE=<your-license-key>EOFThen restart MinIO:
sudo systemctl restart miniosudo systemctl status minioThen go back to your web browser and then enter the credentials you’ve set in /etc/minio/config.env (In this practice its username: admin password: admin123)

if you’ve logged in and arrived to this page then congratulations, you’ve setup MinIO manually.
NOTEDont worry the AIStor Free Tier license does not expire
Navigating the Web UI
Go to Buckets in the navigation menu on the left, and you’ll see this page

Click the Add Bucket button, Select Basic and then enter raw-data as the name

What all the types mean:
Basic (Default):
Standard “last write wins” storage. Upload report.csv → old report.csv disappears forever.
Versioned Every upload creates a new version. Delete something? That version still exists. You can time-travel back to any previous state.
Locked (WORM) Write Once, Read Many + legal holds. Once an object is written, nobody can delete or modify it until the lock expires. Not you. Not root. Not MinIO support with a gun to their head.

After creating the bucket, you should see this, proceed to go to Object Browser in the top right side of the UI.
Now you can upload any types of file like pdf, jpg, gif or mp4 etc, using the upload button.

Introduction to mc (MinIO Client)
You’ve got MinIO running. You can see that slick web console. But let’s be real, clicking around a UI gets old fast. You’re building a data pipeline, not running a museum exhibit. You need a CLI tool that talks to MinIO the same way your code will.
Enter mc (MinIO Client).
Think of it as the aws s3 CLI, but actually pleasant to use. It speaks native S3 API, so it works with AWS S3, Google Cloud Storage, and obviously your self-hosted MinIO. Same commands, same syntax, no context switching.
Installing mc
One binary. One chmod. Thats it
# Download the binarywget https://dl.min.io/client/mc/release/linux-amd64/mc# Make it executablechmod +x mc# Move it to your PATHsudo mv mc /usr/local/bin/# Verify it worksmc --versionYou should see something like this
mc version RELEASE.2025-08-13T08-35-41Z (commit-id=7394ce0dd2a80935aded936b09fa12cbb3cb8096)Runtime: go1.24.6 linux/amd64Copyright (c) 2015-2025 MinIO, Inc.License GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>Connect mc to Your MinIO Server
mc doesn’t magically know where your MinIO instance lives. You need to create an alias,a shortcut that stores the endpoint, credentials, and region.
mc alias set myminio http://localhost:9000 admin admin123Test the Connection
mc ls myminioYou should see the raw-data bucket that we created earlier in the Web UI navigation section
[2026-05-03 15:53:54 +08] 0B raw-data/You wanna list the files you uploaded in that bucket do
mc ls myminio/raw-data/Basic mc Commands You’ll Actually Use
Here’s the 80% of mc you’ll need for 95% of your work:
| Command | What it does | Example |
|---|---|---|
mc ls | List buckets or objects | mc ls myminio/ |
mc mb | Make (create) a bucket | mc mb myminio/analytics |
mc cp | Copy files in/out | mc cp data.csv myminio/analytics/ |
mc mv | Move objects | mc mv myminio/analytics/old.csv myminio/archive/ |
mc rm | Remove objects | mc rm myminio/analytics/bad.csv |
mc cat | Display file contents | mc cat myminio/logs/app.log |
mc du | Show storage usage | mc du myminio/analytics/ |
Create Your Second Bucket
mc mb myminio/raw-logsVerify it worked:
mc ls myminio/Output
[2026-05-03 15:53:54 +08] 0B raw-data/[2026-05-03 16:20:29 +08] 0B raw-logs/Now upload something:
echo "hello, minio" > test.txtmc cp test.txt myminio/raw-logs/List the contents:
mc ls myminio/raw-logs/You should see test.txt staring back at you.
TIPmc works identically with real AWS S3. Just change the alias:
mc alias set aws https://s3.amazonaws.com YOUR_AWS_KEY YOUR_AWS_SECRET
There’s no new CLI to learn. No context switching. The same muscle memory works whether you’re on your laptop’s MinIO or production AWS.
Python + Boto3: Proof That MinIO Is Just S3
Here’s where theory meets practice. You’ve got MinIO running. You’ve got buckets ready. Now let’s write code that talks to it, which is exactly the same code that would talk to AWS S3.
No MinIO SDK. No custom libraries. No vendor lock-in.
Just standard boto3, the AWS SDK for Python, pointed at your self-hosted endpoint.
Install the SDK
pip install boto3That’s it. Same library AWS customers use. Same documentation. Same Stack Overflow answers.
For a remote VPS, swap localhost with your server’s IP:
endpoint_url='http://your-server-ip:9000'The Connection String (This Is the Only Difference)
When you connect to AWS S3, you use the default endpoint. When you connect to MinIO, you override it:
import boto3from botocore.config import Config# This exact same code works for AWS, MinIO, or any S3-compatible stores3 = boto3.client( 's3', endpoint_url='http://localhost:9000', # ← ONLY difference aws_access_key_id='admin', # ← your MinIO username aws_secret_access_key='admin123', # ← your MinIO password config=Config(signature_version='s3v4'), region_name='us-east-1' # dummy value, MinIO ignores it)Full Python Script
Here’s a python script that demos the features and integratioin
#!/usr/bin/env python3"""MinIO + boto3 demo"""
import boto3from botocore.config import Configfrom botocore.exceptions import ClientError
# Configuration - EDIT THESE FOR YOUR SETUPMINIO_ENDPOINT = 'http://localhost:9000' # or http://your-server-ip:9000ACCESS_KEY = 'admin'SECRET_KEY = 'admin123'BUCKET_NAME = 'python-demo'
# Create client (same as AWS, just with custom endpoint)s3 = boto3.client( 's3', endpoint_url=MINIO_ENDPOINT, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, config=Config(signature_version='s3v4'), region_name='us-east-1')
def demo(): print("🚀 MinIO + boto3 Demo\n")
# 1. Create bucket try: s3.create_bucket(Bucket=BUCKET_NAME) print(f"✅ Created bucket: {BUCKET_NAME}") except ClientError as e: if e.response['Error']['Code'] == 'BucketAlreadyExists': print(f"ℹ️ Bucket already exists: {BUCKET_NAME}") else: print(f"❌ Error: {e}")
# 2. Create and upload a test file test_content = "date,product,sales\n2024-01-01,Widget,100\n2024-01-02,Gadget,200" with open('/tmp/test.csv', 'w') as f: f.write(test_content)
s3.upload_file('/tmp/test.csv', BUCKET_NAME, 'data/test.csv') print(f"✅ Uploaded: s3://{BUCKET_NAME}/data/test.csv")
# 3. List objects response = s3.list_objects_v2(Bucket=BUCKET_NAME) print(f"\n📁 Objects in '{BUCKET_NAME}':") for obj in response.get('Contents', []): print(f" - {obj['Key']} ({obj['Size']} bytes)")
# 4. Download and read response = s3.get_object(Bucket=BUCKET_NAME, Key='data/test.csv') content = response['Body'].read().decode('utf-8') print(f"\n📄 Content of test.csv:\n{content}")
print("\n✨ Demo complete! Your MinIO is working like a real S3 endpoint.")
if __name__ == '__main__': demo()Output:
🚀 MinIO + boto3 Demo
✅ Created bucket: python-demo✅ Uploaded: s3://python-demo/data/test.csv
📁 Objects in 'python-demo': - data/test.csv (62 bytes)
📄 Content of test.csv:date,product,sales2024-01-01,Widget,1002024-01-02,Gadget,200Confirm it using mc
The Python script claims it uploaded a file. Don’t take its word for it. Check it yourself
mc ls myminio/python-demo/data/You should see:
[2026-05-03 16:34:06 +08] 62B STANDARD test.csvWant to see the actual content?
mc cat myminio/python-demo/data/test.csvOutput:
date,product,sales2024-01-01,Widget,1002024-01-02,Gadget,200If you prefer clicking, open the web UI at http://your-server-ip:9001, navigate to Buckets → python-demo → Object Storage → data, and you’ll see test.csv sitting there with its size and information.

Conclusion
Lets step back for a second and give yourself a pat in the back.
You just built a production-ready, S3-compatible object store on your own hardware. From scratch.
No cloud bill. No vendor charging you an arm and a leg just because you accidentally uploaded a massive file or left your instance running overnight.
You have full control of your storage layer now. Go wild in testing. Break things. It’s fine. That’s the whole point of owning your stack.
If anything goes sideways? Come back to this post, skim through it again, and reinstall. Ten minutes later, you’re back in business. No support tickets. No “please upgrade to Enterprise” emails. No mystery charges on a credit card you forgot you attached.
MinIO is that.