In a traditional data pipeline designed to handle large volumes of data, the classic go‑to for distributed batch processing has long been Hadoop (HDFS + YARN + MapReduce). Hadoop is powerful, but it’s also famously resource‑hungry. We’re talking 64GB to 96GB of RAM just to run the cluster comfortably, before you even touch your actual data processing. It’s no damn surprise that many startups and SMEs skip the on‑prem headache entirely and move straight to cloud‑based data warehouses like Databricks, BigQuery, or Snowflake. But those come with their own cost spiral: every query, every second, every gigabyte slowly chips away at your budget.

What if i told you there’s a middle path?

You don’t need racks of expensive hardware to host your storage layer. You don’t need to be tied to a cloud provider that bills you like a metered taxi ride.

In this post, I’ll show you a lightweight data stack centered around MinIO, an S3‑compatible object store that replaces HDFS. You can host it on a small VPS or even your own laptop. MinIO itself isn’t a data warehouse; it’s the storage foundation. You pair it with a lightweight query engine (e.g., DuckDB, Polars, or Presto) that runs on the same box and queries data directly from MinIO using S3 APIs.

The result? A stack that won’t handle every extreme edge case, but will cover 95% of what a growing startup actually needs, for a fraction of the cost and complexity of either a classic Hadoop cluster or a cloud warehouse.

MinIO Introduction#

MinIO is a high-performance, open-source object storage system that’s fully compatible with Amazon S3. Think of it as a self‑hostable alternative to S3, same APIs, same ecosystem, same tooling, but running on your own hardware (or a modest VPS).

Written in Go, MinIO is surprisingly lightweight. You can start with a single small server and later scale out to petabytes of distributed storage without changing a single line of application code. That’s the real beauty: if your app can talk to S3, it can talk to MinIO (no rewrites, no lock‑in).

For a lean startup data pipeline, MinIO replaces the heavyweight HDFS (Hadoop Distributed File System) and gives you a modern, cloud‑native storage layer, and generally faster than HDFS.

alt text

NOTE
In this tutorial, we’ll be setting up the commercial version of MinIO (AIStor Free Tier).

MinIO has two major distributions:

	Open Source (AGPLv3)	Commercial AIStor (Free Tier)
Docker command	`quay.io/minio/minio`	`quay.io/minio/aistor/minio`
License key needed?	No	Yes (free)
Official updates?	⚠️ No longer published	✅ Actively maintained
Web console	Minimal (stripped down)	Full-featured
Cluster Support	✅ Yes	❌No
We’ll be setting up the free commercial version in this blog

MinIO Setup (Commercial AIStor)#

The Manual Setup#

The commercial version of MinIO is actively maintained and has a much better web UI compared to the open-source one. In this post, I’ll be setting up the free commercial version, as its better for demonstrating MinIO’s features

Expect to:

Download the MinIO binary
Setup storage directories and permissions manually
Set up systemd services (so it survives reboots)
Get a free license key from MinIO

Download the MinIO binary#

We’ll download the MinIO binary, since MinIO is made in a compiled language like Go, we do not need to setup Java and JVM to run the application

1
wget https://dl.min.io/aistor/minio/release/linux-amd64/minio
2
chmod +x minio
3
./minio --version

for the output of ./minio —version, you’ll see something like this

1
minio version RELEASE.2026-04-14T21-32-45Z (commit-id=9d4e0e68f7c891e26b363d17f6dde2a00f6b69d1)
2
Runtime: go1.26.2 linux/amd64
3
License: MinIO AIStor License
4
Copyright: 2015-2026 MinIO, Inc.

Setup storage directories & permissions manually#

Step 1: Create the MinIO User#

For security, MinIO should run as its own user, not root:

1
sudo useradd -r -s /sbin/nologin minio-user

Step 2: Create the Configuration Files#

First, create the environment file that systemd reads:

1
sudo mkdir -p /etc/default
2
sudo tee /etc/default/minio << 'EOF'
3
# Volume to be used for MinIO server
4
MINIO_VOLUMES="/data"
5
# API and Console ports
6
MINIO_OPTS="--address :9000 --console-address :9001"
7
# Pointer to the main configuration file
8
MINIO_CONFIG_ENV_FILE=/etc/minio/config.env
9
EOF

Next, create the main config file with your credentials:

1
sudo mkdir -p /etc/minio
2
sudo tee /etc/minio/config.env << 'EOF'
3
# Root credentials - CHANGE THESE IN PRODUCTION
4
MINIO_ROOT_USER=admin
5
MINIO_ROOT_PASSWORD=admin123
6
EOF
7

8
# Lock down permissions — this file contains secrets!
9
sudo chmod 600 /etc/minio/config.env
10
sudo chown -R minio-user:minio-user /etc/minio

CAUTION
Reminder to change the password admin123 in production

Step 3: Create the Data Directory#

1
sudo mkdir -p /data
2
sudo chown -R minio-user:minio-user /data

Set up systemd services#

Now that you have the MinIO binary ready and permissions setup, it’s time to make it run like a proper citizen on your Linux server, starting on boot, restarting if it crashes, and logging properly. That’s where systemd comes in.

MinIO uses a two-file configuration approach that’s actually pretty clever

File	Purpose
`/etc/default/minio`	Systemd environment variables (changes here require `systemctl restart`)
`/etc/minio/config.env`	Main configuration (changes here only need `mc admin service restart`)

This split lets you change things like root credentials without a full systemd restart. Neat, right?

Step 1: Download the systemd Service File#

1
sudo curl -o /etc/systemd/system/minio.service \
2
  https://raw.githubusercontent.com/minio/minio-service/master/linux-systemd/minio.service
3
sudo systemctl daemon-reload

The service file is pre-configured to:

Run as minio-user
Expect the binary at /usr/local/bin/minio
Read environment variables from /etc/default/minio

Step 2: Move the Binary to the Right Location#

1
sudo mv minio /usr/local/bin/

Step 3: Start and Enable MinIO#

1
# Start the service now
2
sudo systemctl start minio
3
# Make it start automatically on boot
4
sudo systemctl enable minio
5
# Check if everything is healthy
6
sudo systemctl status minio

If you see Active: active (running) in green , then congratulations, you’ve finished the hard part

To monitor logs in real time:

1
sudo journalctl -u minio -f

Quick Test#

Once MinIO is running, open your browser to:

1
http://your-server-ip:9001

If you see this screen then you’re ready to rock

alt text

Get a license key from MinIO#

The next step is to get a License key from MinIO via this link, make sure to click the free version and enter your email and name alt text

After you’ve filled this in, they should send a license key straight in your inbox

alt text

Add the license key to your environment configuration file:

1
sudo tee -a /etc/minio/config.env << 'EOF'
2
# Your MinIO AIStor License Key
3
MINIO_LICENSE=<your-license-key>
4
EOF

Then restart MinIO:

1
sudo systemctl restart minio
2
sudo systemctl status minio

Then go back to your web browser and then enter the credentials you’ve set in /etc/minio/config.env (In this practice its username: admin password: admin123)

alt text

if you’ve logged in and arrived to this page then congratulations, you’ve setup MinIO manually.

NOTE
Dont worry the AIStor Free Tier license does not expire

Navigating the Web UI#

Go to Buckets in the navigation menu on the left, and you’ll see this page alt text

Click the Add Bucket button, Select Basic and then enter raw-data as the name alt text

What all the types mean:

Basic (Default): Standard “last write wins” storage. Upload report.csv → old report.csv disappears forever.

Versioned Every upload creates a new version. Delete something? That version still exists. You can time-travel back to any previous state.

Locked (WORM) Write Once, Read Many + legal holds. Once an object is written, nobody can delete or modify it until the lock expires. Not you. Not root. Not MinIO support with a gun to their head.

alt text

After creating the bucket, you should see this, proceed to go to Object Browser in the top right side of the UI.

Now you can upload any types of file like pdf, jpg, gif or mp4 etc, using the upload button.

alt text

Introduction to `mc` (MinIO Client)#

You’ve got MinIO running. You can see that slick web console. But let’s be real, clicking around a UI gets old fast. You’re building a data pipeline, not running a museum exhibit. You need a CLI tool that talks to MinIO the same way your code will.

Enter mc (MinIO Client).

Think of it as the aws s3 CLI, but actually pleasant to use. It speaks native S3 API, so it works with AWS S3, Google Cloud Storage, and obviously your self-hosted MinIO. Same commands, same syntax, no context switching.

Installing `mc`#

One binary. One chmod. Thats it

1
# Download the binary
2
wget https://dl.min.io/client/mc/release/linux-amd64/mc
3
# Make it executable
4
chmod +x mc
5
# Move it to your PATH
6
sudo mv mc /usr/local/bin/
7
# Verify it works
8
mc --version

You should see something like this

1
mc version RELEASE.2025-08-13T08-35-41Z (commit-id=7394ce0dd2a80935aded936b09fa12cbb3cb8096)
2
Runtime: go1.24.6 linux/amd64
3
Copyright (c) 2015-2025 MinIO, Inc.
4
License GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>

Connect `mc` to Your MinIO Server#

mc doesn’t magically know where your MinIO instance lives. You need to create an alias,a shortcut that stores the endpoint, credentials, and region.

1
mc alias set myminio http://localhost:9000 admin admin123

Test the Connection

1
mc ls myminio

You should see the raw-data bucket that we created earlier in the Web UI navigation section

1
[2026-05-03 15:53:54 +08]     0B raw-data/

You wanna list the files you uploaded in that bucket do

1
mc ls myminio/raw-data/

Basic `mc` Commands You’ll Actually Use#

Here’s the 80% of mc you’ll need for 95% of your work:

Command	What it does	Example
`mc ls`	List buckets or objects	`mc ls myminio/`
`mc mb`	Make (create) a bucket	`mc mb myminio/analytics`
`mc cp`	Copy files in/out	`mc cp data.csv myminio/analytics/`
`mc mv`	Move objects	`mc mv myminio/analytics/old.csv myminio/archive/`
`mc rm`	Remove objects	`mc rm myminio/analytics/bad.csv`
`mc cat`	Display file contents	`mc cat myminio/logs/app.log`
`mc du`	Show storage usage	`mc du myminio/analytics/`

Create Your Second Bucket#

1
mc mb myminio/raw-logs

Verify it worked:

1
mc ls myminio/

Output

1
[2026-05-03 15:53:54 +08]     0B raw-data/
2
[2026-05-03 16:20:29 +08]     0B raw-logs/

Now upload something:

1
echo "hello, minio" > test.txt
2
mc cp test.txt myminio/raw-logs/

List the contents:

1
mc ls myminio/raw-logs/

You should see test.txt staring back at you.

TIP
mc works identically with real AWS S3. Just change the alias: mc alias set aws https://s3.amazonaws.com YOUR_AWS_KEY YOUR_AWS_SECRET

There’s no new CLI to learn. No context switching. The same muscle memory works whether you’re on your laptop’s MinIO or production AWS.

Python + Boto3: Proof That MinIO Is Just S3#

Here’s where theory meets practice. You’ve got MinIO running. You’ve got buckets ready. Now let’s write code that talks to it, which is exactly the same code that would talk to AWS S3.

No MinIO SDK. No custom libraries. No vendor lock-in.

Just standard boto3, the AWS SDK for Python, pointed at your self-hosted endpoint.

Install the SDK#

1
pip install boto3

That’s it. Same library AWS customers use. Same documentation. Same Stack Overflow answers.

For a remote VPS, swap localhost with your server’s IP:

1
endpoint_url='http://your-server-ip:9000'

The Connection String (This Is the Only Difference)#

When you connect to AWS S3, you use the default endpoint. When you connect to MinIO, you override it:

1
import boto3
2
from botocore.config import Config
3
# This exact same code works for AWS, MinIO, or any S3-compatible store
4
s3 = boto3.client(
5
    's3',
6
    endpoint_url='http://localhost:9000',  # ← ONLY difference
7
    aws_access_key_id='admin',              # ← your MinIO username
8
    aws_secret_access_key='admin123',       # ← your MinIO password
9
    config=Config(signature_version='s3v4'),
10
    region_name='us-east-1'                 # dummy value, MinIO ignores it
11
)

Full Python Script#

Here’s a python script that demos the features and integratioin

1
#!/usr/bin/env python3
2
"""
3
MinIO + boto3 demo
4
"""
5

6
import boto3
7
from botocore.config import Config
8
from botocore.exceptions import ClientError
9

10
# Configuration - EDIT THESE FOR YOUR SETUP
11
MINIO_ENDPOINT = 'http://localhost:9000'  # or http://your-server-ip:9000
12
ACCESS_KEY = 'admin'
13
SECRET_KEY = 'admin123'
14
BUCKET_NAME = 'python-demo'
15

16
# Create client (same as AWS, just with custom endpoint)
17
s3 = boto3.client(
18
    's3',
19
    endpoint_url=MINIO_ENDPOINT,
20
    aws_access_key_id=ACCESS_KEY,
21
    aws_secret_access_key=SECRET_KEY,
22
    config=Config(signature_version='s3v4'),
23
    region_name='us-east-1'
24
)
25

26
def demo():
27
    print("🚀 MinIO + boto3 Demo\n")
28

29
    # 1. Create bucket
30
    try:
31
        s3.create_bucket(Bucket=BUCKET_NAME)
32
        print(f"✅ Created bucket: {BUCKET_NAME}")
33
    except ClientError as e:
34
        if e.response['Error']['Code'] == 'BucketAlreadyExists':
35
            print(f"ℹ️ Bucket already exists: {BUCKET_NAME}")
36
        else:
37
            print(f"❌ Error: {e}")
38

39
    # 2. Create and upload a test file
40
    test_content = "date,product,sales\n2024-01-01,Widget,100\n2024-01-02,Gadget,200"
41
    with open('/tmp/test.csv', 'w') as f:
42
        f.write(test_content)
43

44
    s3.upload_file('/tmp/test.csv', BUCKET_NAME, 'data/test.csv')
45
    print(f"✅ Uploaded: s3://{BUCKET_NAME}/data/test.csv")
46

47
    # 3. List objects
48
    response = s3.list_objects_v2(Bucket=BUCKET_NAME)
49
    print(f"\n📁 Objects in '{BUCKET_NAME}':")
50
    for obj in response.get('Contents', []):
51
        print(f"   - {obj['Key']} ({obj['Size']} bytes)")
52

53
    # 4. Download and read
54
    response = s3.get_object(Bucket=BUCKET_NAME, Key='data/test.csv')
55
    content = response['Body'].read().decode('utf-8')
56
    print(f"\n📄 Content of test.csv:\n{content}")
57

58
    print("\n✨ Demo complete! Your MinIO is working like a real S3 endpoint.")
59

60
if __name__ == '__main__':
61
    demo()

Output:

1
🚀 MinIO + boto3 Demo
2

3
✅ Created bucket: python-demo
4
✅ Uploaded: s3://python-demo/data/test.csv
5

6
📁 Objects in 'python-demo':
7
   - data/test.csv (62 bytes)
8

9
📄 Content of test.csv:
10
date,product,sales
11
2024-01-01,Widget,100
12
2024-01-02,Gadget,200

Confirm it using `mc`#

The Python script claims it uploaded a file. Don’t take its word for it. Check it yourself

1
mc ls myminio/python-demo/data/

You should see:

1
[2026-05-03 16:34:06 +08]    62B STANDARD test.csv

Want to see the actual content?

1
mc cat myminio/python-demo/data/test.csv

Output:

1
date,product,sales
2
2024-01-01,Widget,100
3
2024-01-02,Gadget,200

If you prefer clicking, open the web UI at http://your-server-ip:9001, navigate to Buckets → python-demo → Object Storage → data, and you’ll see test.csv sitting there with its size and information.

alt text

Conclusion#

Lets step back for a second and give yourself a pat in the back.

You just built a production-ready, S3-compatible object store on your own hardware. From scratch.

No cloud bill. No vendor charging you an arm and a leg just because you accidentally uploaded a massive file or left your instance running overnight.

You have full control of your storage layer now. Go wild in testing. Break things. It’s fine. That’s the whole point of owning your stack.

If anything goes sideways? Come back to this post, skim through it again, and reinstall. Ten minutes later, you’re back in business. No support tickets. No “please upgrade to Enterprise” emails. No mystery charges on a credit card you forgot you attached.

MinIO is that.

MinIO Introduction#

MinIO Setup (Commercial AIStor)#

The Manual Setup#

Download the MinIO binary#

Setup storage directories & permissions manually#

Step 1: Create the MinIO User#

Step 2: Create the Configuration Files#

Step 3: Create the Data Directory#

Set up systemd services#

Step 1: Download the systemd Service File#

Step 2: Move the Binary to the Right Location#

Step 3: Start and Enable MinIO#

Quick Test#

Get a license key from MinIO#

Navigating the Web UI#

Introduction to mc (MinIO Client)#

Installing mc#

Connect mc to Your MinIO Server#

Basic mc Commands You’ll Actually Use#

Create Your Second Bucket#

Python + Boto3: Proof That MinIO Is Just S3#

Install the SDK#

The Connection String (This Is the Only Difference)#

Full Python Script#

Confirm it using mc#

Conclusion#

Introduction to `mc` (MinIO Client)#

Installing `mc`#

Connect `mc` to Your MinIO Server#

Basic `mc` Commands You’ll Actually Use#

Confirm it using `mc`#