1992 words
10 minutes
Getting Started with MinIO For Data Engineers

In a traditional data pipeline designed to handle large volumes of data, the classic go‑to for distributed batch processing has long been Hadoop (HDFS + YARN + MapReduce). Hadoop is powerful, but it’s also famously resource‑hungry. We’re talking 64GB to 96GB of RAM just to run the cluster comfortably, before you even touch your actual data processing. It’s no damn surprise that many startups and SMEs skip the on‑prem headache entirely and move straight to cloud‑based data warehouses like Databricks, BigQuery, or Snowflake. But those come with their own cost spiral: every query, every second, every gigabyte slowly chips away at your budget.

What if i told you there’s a middle path?

You don’t need racks of expensive hardware to host your storage layer. You don’t need to be tied to a cloud provider that bills you like a metered taxi ride.

In this post, I’ll show you a lightweight data stack centered around MinIO, an S3‑compatible object store that replaces HDFS. You can host it on a small VPS or even your own laptop. MinIO itself isn’t a data warehouse; it’s the storage foundation. You pair it with a lightweight query engine (e.g., DuckDB, Polars, or Presto) that runs on the same box and queries data directly from MinIO using S3 APIs.

The result? A stack that won’t handle every extreme edge case, but will cover 95% of what a growing startup actually needs, for a fraction of the cost and complexity of either a classic Hadoop cluster or a cloud warehouse.

MinIO Introduction#

MinIO is a high-performance, open-source object storage system that’s fully compatible with Amazon S3. Think of it as a self‑hostable alternative to S3, same APIs, same ecosystem, same tooling, but running on your own hardware (or a modest VPS).

Written in Go, MinIO is surprisingly lightweight. You can start with a single small server and later scale out to petabytes of distributed storage without changing a single line of application code. That’s the real beauty: if your app can talk to S3, it can talk to MinIO (no rewrites, no lock‑in).

For a lean startup data pipeline, MinIO replaces the heavyweight HDFS (Hadoop Distributed File System) and gives you a modern, cloud‑native storage layer, and generally faster than HDFS.

alt text

NOTE

In this tutorial, we’ll be setting up the commercial version of MinIO (AIStor Free Tier).

MinIO has two major distributions:

Open Source (AGPLv3)Commercial AIStor (Free Tier)
Docker commandquay.io/minio/minioquay.io/minio/aistor/minio
License key needed?NoYes (free)
Official updates?⚠️ No longer published✅ Actively maintained
Web consoleMinimal (stripped down)Full-featured
Cluster Support✅ Yes❌No
We’ll be setting up the free commercial version in this blog

MinIO Setup (Commercial AIStor)#

The Manual Setup#

The commercial version of MinIO is actively maintained and has a much better web UI compared to the open-source one. In this post, I’ll be setting up the free commercial version, as its better for demonstrating MinIO’s features

Expect to:

  • Download the MinIO binary
  • Setup storage directories and permissions manually
  • Set up systemd services (so it survives reboots)
  • Get a free license key from MinIO

Download the MinIO binary#

We’ll download the MinIO binary, since MinIO is made in a compiled language like Go, we do not need to setup Java and JVM to run the application

Terminal window
wget https://dl.min.io/aistor/minio/release/linux-amd64/minio
chmod +x minio
./minio --version

for the output of ./minio —version, you’ll see something like this

minio version RELEASE.2026-04-14T21-32-45Z (commit-id=9d4e0e68f7c891e26b363d17f6dde2a00f6b69d1)
Runtime: go1.26.2 linux/amd64
License: MinIO AIStor License
Copyright: 2015-2026 MinIO, Inc.

Setup storage directories & permissions manually#

Step 1: Create the MinIO User#

For security, MinIO should run as its own user, not root:

Terminal window
sudo useradd -r -s /sbin/nologin minio-user

Step 2: Create the Configuration Files#

First, create the environment file that systemd reads:

Terminal window
sudo mkdir -p /etc/default
sudo tee /etc/default/minio << 'EOF'
# Volume to be used for MinIO server
MINIO_VOLUMES="/data"
# API and Console ports
MINIO_OPTS="--address :9000 --console-address :9001"
# Pointer to the main configuration file
MINIO_CONFIG_ENV_FILE=/etc/minio/config.env
EOF

Next, create the main config file with your credentials:

Terminal window
sudo mkdir -p /etc/minio
sudo tee /etc/minio/config.env << 'EOF'
# Root credentials - CHANGE THESE IN PRODUCTION
MINIO_ROOT_USER=admin
MINIO_ROOT_PASSWORD=admin123
EOF
# Lock down permissions — this file contains secrets!
sudo chmod 600 /etc/minio/config.env
sudo chown -R minio-user:minio-user /etc/minio
CAUTION

Reminder to change the password admin123 in production

Step 3: Create the Data Directory#

Terminal window
sudo mkdir -p /data
sudo chown -R minio-user:minio-user /data

Set up systemd services#

Now that you have the MinIO binary ready and permissions setup, it’s time to make it run like a proper citizen on your Linux server, starting on boot, restarting if it crashes, and logging properly. That’s where systemd comes in.

MinIO uses a two-file configuration approach that’s actually pretty clever

FilePurpose
/etc/default/minioSystemd environment variables (changes here require systemctl restart)
/etc/minio/config.envMain configuration (changes here only need mc admin service restart)

This split lets you change things like root credentials without a full systemd restart. Neat, right?

Step 1: Download the systemd Service File#

Terminal window
sudo curl -o /etc/systemd/system/minio.service \
https://raw.githubusercontent.com/minio/minio-service/master/linux-systemd/minio.service
sudo systemctl daemon-reload

The service file is pre-configured to:

  • Run as minio-user
  • Expect the binary at /usr/local/bin/minio
  • Read environment variables from /etc/default/minio

Step 2: Move the Binary to the Right Location#

Terminal window
sudo mv minio /usr/local/bin/

Step 3: Start and Enable MinIO#

Terminal window
# Start the service now
sudo systemctl start minio
# Make it start automatically on boot
sudo systemctl enable minio
# Check if everything is healthy
sudo systemctl status minio

If you see Active: active (running) in green , then congratulations, you’ve finished the hard part

To monitor logs in real time:

Terminal window
sudo journalctl -u minio -f

Quick Test#

Once MinIO is running, open your browser to:

http://your-server-ip:9001

If you see this screen then you’re ready to rock

alt text

Get a license key from MinIO#

The next step is to get a License key from MinIO via this link, make sure to click the free version and enter your email and name alt text

After you’ve filled this in, they should send a license key straight in your inbox

alt text

Add the license key to your environment configuration file:

Terminal window
sudo tee -a /etc/minio/config.env << 'EOF'
# Your MinIO AIStor License Key
MINIO_LICENSE=<your-license-key>
EOF

Then restart MinIO:

Terminal window
sudo systemctl restart minio
sudo systemctl status minio

Then go back to your web browser and then enter the credentials you’ve set in /etc/minio/config.env (In this practice its username: admin password: admin123)

alt text

if you’ve logged in and arrived to this page then congratulations, you’ve setup MinIO manually.

NOTE

Dont worry the AIStor Free Tier license does not expire

Navigating the Web UI#

Go to Buckets in the navigation menu on the left, and you’ll see this page alt text

Click the Add Bucket button, Select Basic and then enter raw-data as the name alt text

What all the types mean:

Basic (Default): Standard “last write wins” storage. Upload report.csv → old report.csv disappears forever.

Versioned Every upload creates a new version. Delete something? That version still exists. You can time-travel back to any previous state.

Locked (WORM) Write Once, Read Many + legal holds. Once an object is written, nobody can delete or modify it until the lock expires. Not you. Not root. Not MinIO support with a gun to their head.

alt text

After creating the bucket, you should see this, proceed to go to Object Browser in the top right side of the UI.

Now you can upload any types of file like pdf, jpg, gif or mp4 etc, using the upload button.

alt text

Introduction to mc (MinIO Client)#

You’ve got MinIO running. You can see that slick web console. But let’s be real, clicking around a UI gets old fast. You’re building a data pipeline, not running a museum exhibit. You need a CLI tool that talks to MinIO the same way your code will.

Enter mc (MinIO Client).

Think of it as the aws s3 CLI, but actually pleasant to use. It speaks native S3 API, so it works with AWS S3, Google Cloud Storage, and obviously your self-hosted MinIO. Same commands, same syntax, no context switching.

Installing mc#

One binary. One chmod. Thats it

Terminal window
# Download the binary
wget https://dl.min.io/client/mc/release/linux-amd64/mc
# Make it executable
chmod +x mc
# Move it to your PATH
sudo mv mc /usr/local/bin/
# Verify it works
mc --version

You should see something like this

mc version RELEASE.2025-08-13T08-35-41Z (commit-id=7394ce0dd2a80935aded936b09fa12cbb3cb8096)
Runtime: go1.24.6 linux/amd64
Copyright (c) 2015-2025 MinIO, Inc.
License GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>

Connect mc to Your MinIO Server#

mc doesn’t magically know where your MinIO instance lives. You need to create an alias,a shortcut that stores the endpoint, credentials, and region.

Terminal window
mc alias set myminio http://localhost:9000 admin admin123

Test the Connection

Terminal window
mc ls myminio

You should see the raw-data bucket that we created earlier in the Web UI navigation section

[2026-05-03 15:53:54 +08] 0B raw-data/

You wanna list the files you uploaded in that bucket do

Terminal window
mc ls myminio/raw-data/

Basic mc Commands You’ll Actually Use#

Here’s the 80% of mc you’ll need for 95% of your work:

CommandWhat it doesExample
mc lsList buckets or objectsmc ls myminio/
mc mbMake (create) a bucketmc mb myminio/analytics
mc cpCopy files in/outmc cp data.csv myminio/analytics/
mc mvMove objectsmc mv myminio/analytics/old.csv myminio/archive/
mc rmRemove objectsmc rm myminio/analytics/bad.csv
mc catDisplay file contentsmc cat myminio/logs/app.log
mc duShow storage usagemc du myminio/analytics/

Create Your Second Bucket#

Terminal window
mc mb myminio/raw-logs

Verify it worked:

Terminal window
mc ls myminio/

Output

[2026-05-03 15:53:54 +08] 0B raw-data/
[2026-05-03 16:20:29 +08] 0B raw-logs/

Now upload something:

Terminal window
echo "hello, minio" > test.txt
mc cp test.txt myminio/raw-logs/

List the contents:

mc ls myminio/raw-logs/

You should see test.txt staring back at you.

TIP

mc works identically with real AWS S3. Just change the alias: mc alias set aws https://s3.amazonaws.com YOUR_AWS_KEY YOUR_AWS_SECRET

There’s no new CLI to learn. No context switching. The same muscle memory works whether you’re on your laptop’s MinIO or production AWS.

Python + Boto3: Proof That MinIO Is Just S3#

Here’s where theory meets practice. You’ve got MinIO running. You’ve got buckets ready. Now let’s write code that talks to it, which is exactly the same code that would talk to AWS S3.

No MinIO SDK. No custom libraries. No vendor lock-in.

Just standard boto3, the AWS SDK for Python, pointed at your self-hosted endpoint.

Install the SDK#

pip install boto3

That’s it. Same library AWS customers use. Same documentation. Same Stack Overflow answers.

For a remote VPS, swap localhost with your server’s IP:

endpoint_url='http://your-server-ip:9000'

The Connection String (This Is the Only Difference)#

When you connect to AWS S3, you use the default endpoint. When you connect to MinIO, you override it:

import boto3
from botocore.config import Config
# This exact same code works for AWS, MinIO, or any S3-compatible store
s3 = boto3.client(
's3',
endpoint_url='http://localhost:9000', # ← ONLY difference
aws_access_key_id='admin', # ← your MinIO username
aws_secret_access_key='admin123', # ← your MinIO password
config=Config(signature_version='s3v4'),
region_name='us-east-1' # dummy value, MinIO ignores it
)

Full Python Script#

Here’s a python script that demos the features and integratioin

#!/usr/bin/env python3
"""
MinIO + boto3 demo
"""
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
# Configuration - EDIT THESE FOR YOUR SETUP
MINIO_ENDPOINT = 'http://localhost:9000' # or http://your-server-ip:9000
ACCESS_KEY = 'admin'
SECRET_KEY = 'admin123'
BUCKET_NAME = 'python-demo'
# Create client (same as AWS, just with custom endpoint)
s3 = boto3.client(
's3',
endpoint_url=MINIO_ENDPOINT,
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
config=Config(signature_version='s3v4'),
region_name='us-east-1'
)
def demo():
print("🚀 MinIO + boto3 Demo\n")
# 1. Create bucket
try:
s3.create_bucket(Bucket=BUCKET_NAME)
print(f"✅ Created bucket: {BUCKET_NAME}")
except ClientError as e:
if e.response['Error']['Code'] == 'BucketAlreadyExists':
print(f"ℹ️ Bucket already exists: {BUCKET_NAME}")
else:
print(f"❌ Error: {e}")
# 2. Create and upload a test file
test_content = "date,product,sales\n2024-01-01,Widget,100\n2024-01-02,Gadget,200"
with open('/tmp/test.csv', 'w') as f:
f.write(test_content)
s3.upload_file('/tmp/test.csv', BUCKET_NAME, 'data/test.csv')
print(f"✅ Uploaded: s3://{BUCKET_NAME}/data/test.csv")
# 3. List objects
response = s3.list_objects_v2(Bucket=BUCKET_NAME)
print(f"\n📁 Objects in '{BUCKET_NAME}':")
for obj in response.get('Contents', []):
print(f" - {obj['Key']} ({obj['Size']} bytes)")
# 4. Download and read
response = s3.get_object(Bucket=BUCKET_NAME, Key='data/test.csv')
content = response['Body'].read().decode('utf-8')
print(f"\n📄 Content of test.csv:\n{content}")
print("\n✨ Demo complete! Your MinIO is working like a real S3 endpoint.")
if __name__ == '__main__':
demo()

Output:

🚀 MinIO + boto3 Demo
✅ Created bucket: python-demo
✅ Uploaded: s3://python-demo/data/test.csv
📁 Objects in 'python-demo':
- data/test.csv (62 bytes)
📄 Content of test.csv:
date,product,sales
2024-01-01,Widget,100
2024-01-02,Gadget,200

Confirm it using mc#

The Python script claims it uploaded a file. Don’t take its word for it. Check it yourself

Terminal window
mc ls myminio/python-demo/data/

You should see:

[2026-05-03 16:34:06 +08] 62B STANDARD test.csv

Want to see the actual content?

Terminal window
mc cat myminio/python-demo/data/test.csv

Output:

date,product,sales
2024-01-01,Widget,100
2024-01-02,Gadget,200

If you prefer clicking, open the web UI at http://your-server-ip:9001, navigate to Buckets → python-demo → Object Storagedata, and you’ll see test.csv sitting there with its size and information.

alt text

Conclusion#

Lets step back for a second and give yourself a pat in the back.

You just built a production-ready, S3-compatible object store on your own hardware. From scratch.

No cloud bill. No vendor charging you an arm and a leg just because you accidentally uploaded a massive file or left your instance running overnight.

You have full control of your storage layer now. Go wild in testing. Break things. It’s fine. That’s the whole point of owning your stack.

If anything goes sideways? Come back to this post, skim through it again, and reinstall. Ten minutes later, you’re back in business. No support tickets. No “please upgrade to Enterprise” emails. No mystery charges on a credit card you forgot you attached.

MinIO is that.