Krylox
ServicesProcessGPU HostingBlogMLOps MaturityGet Started
MLOpsNVIDIA TritonONNXModel ServingComputer Vision

Deploying a ResNet50 EuroSAT Image Classifier on NVIDIA Triton Inference Server

A step-by-step guide to taking a ResNet50 model fine-tuned on the EuroSAT dataset from Hugging Face, converting it to ONNX, and serving it at production scale with NVIDIA Triton Inference Server.

KKrylox Team·May 18, 2026·8 min read

Serving a trained model in production is a different beast from training it. You need low latency, high throughput, hardware utilization, and stability, all at once. NVIDIA Triton Inference Server is built to handle exactly that. In this post, we'll walk through taking a ResNet50 model fine-tuned on the EuroSAT dataset from Hugging Face, converting it to ONNX, and serving it via Triton.


What We're Building

EuroSAT is a satellite image classification dataset covering 10 land-use classes: forests, highways, industrial zones, residential areas, and more. A ResNet50 fine-tuned on it can classify 64x64 Sentinel-2 patches into these categories with strong accuracy.

NVIDIA Triton is a production-grade inference server that supports multiple backends (PyTorch, TensorFlow, ONNX Runtime, TensorRT, and more), dynamic batching, concurrent model execution, and both HTTP and gRPC endpoints out of the box.

HuggingFace ResNet50 (EuroSAT) → ONNX Export → Triton Model Repository → Triton Server → HTTP/gRPC Inference

Prerequisites

  • Docker with NVIDIA Container Toolkit installed
  • A GPU (the commands use --gpus=all)
  • The Hugging Face model ID for your ResNet50 EuroSAT checkpoint

Step 1: Pull the Container Images

We use two separate NVIDIA containers, one for model conversion and one for serving.

# PyTorch container for ONNX conversion
docker pull nvcr.io/nvidia/pytorch:26.03-py3

# Triton Inference Server
docker pull nvcr.io/nvidia/tritonserver:26.04-py3

Keeping these separate is intentional. The PyTorch image is heavyweight and optimized for model work; the Triton image is lean and optimized for serving. There's no reason to bloat your production inference container with training-time dependencies.


Step 2: Convert the HuggingFace Model to ONNX

ONNX (Open Neural Network Exchange) is the format Triton's ONNX Runtime backend expects. Hugging Face's optimum library makes this conversion straightforward.

2a. Start the PyTorch Container

docker run --gpus=all -it -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:26.03-py3

The -v ${PWD}:/workspace flag mounts your current directory into the container so exported files persist after the container exits.

2b. Install Dependencies

pip install timm accelerate
pip install "optimum-onnx[onnxruntime]"

2c. Export to ONNX

optimum-cli export onnx \
  --model <hf/resnet50-eurosat> \
  --task image-classification \
  resnet50-eurosat-onnx

Replace <hf/resnet50-eurosat> with your actual Hugging Face model ID (e.g., microsoft/resnet-50 fine-tuned, or a community checkpoint). The --task image-classification flag tells optimum which input/output signature to produce.

This generates a resnet50-eurosat-onnx/ directory containing model.onnx and the tokenizer/processor config.


Step 3: Inspect the ONNX Model

Before writing the Triton config, confirm the exact input and output tensor names and shapes. Triton is strict about these.

import onnx

model = onnx.load("resnet50-eurosat-onnx/model.onnx")

print("Inputs:")
print(model.graph.input)

print("Outputs:")
print(model.graph.output)

For a standard image classification ResNet50, you'll see:

  • Input: pixel_values, shape [-1, 3, 224, 224], dtype float32
  • Output: logits, shape [-1, 10], dtype float32 (10 classes for EuroSAT)

Keep these handy for the next step.


Step 4: Prepare the Model Repository

Triton expects models in a specific directory layout:

model_repository/
└── <model-name>/
    ├── config.pbtxt
    └── 1/
        └── model.onnx

The 1/ subdirectory is the model version. Triton supports multiple versions simultaneously and can route traffic between them, which is useful for A/B testing or canary deployments.

mkdir -p model_repository/image_classification/1

cp resnet50-eurosat-onnx/model.onnx model_repository/image_classification/1/model.onnx

Step 5: Write the Triton Config

Create model_repository/image_classification/config.pbtxt:

name: "image_classification"
backend: "onnxruntime"
max_batch_size: 0

input [
  {
    name: "pixel_values"
    data_type: TYPE_FP32
    dims: [ -1, 3, 224, 224 ]
  }
]

output [
  {
    name: "logits"
    data_type: TYPE_FP32
    dims: [ -1, 10 ]
  }
]

Your final directory structure should look like this:

model_repository/
└── image_classification/
    ├── config.pbtxt
    └── 1/
        └── model.onnx

Step 6: Start Triton Inference Server

docker run \
  --gpus=all \
  --rm \
  --shm-size=256m \
  -p 8000:8000 \
  -p 8001:8001 \
  -p 8002:8002 \
  -v ${PWD}/model_repository:/models \
  nvcr.io/nvidia/tritonserver:26.04-py3 \
  tritonserver --model-repository=/models

Port mapping:

PortProtocolUse
8000HTTPREST inference + management
8001gRPCgRPC inference
8002HTTPmetrics

When the server starts successfully, you'll see:

I tritonserver.cc Started GRPCInferenceService at 0.0.0.0:8001
I tritonserver.cc Started HTTPService at 0.0.0.0:8000
I tritonserver.cc Started Metrics Service at 0.0.0.0:8002

And model status:

I modelcheckerheuristic.cc Model image_classification: Status: READY

Step 7: Send an Inference Request

With the server running, you can query it over HTTP using Triton's HTTP client or plain curl. Here's a quick Python example using tritonclient:

import tritonclient.http as httpclient
import numpy as np
from PIL import Image
from torchvision import transforms

# Preprocess image the same way the model was trained
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

image = Image.open("eurosat_sample.jpg").convert("RGB")
tensor = transform(image).unsqueeze(0).numpy()  # shape: [1, 3, 224, 224]

client = httpclient.InferenceServerClient(url="localhost:8000")

inputs = [httpclient.InferInput("pixel_values", tensor.shape, "FP32")]
inputs[0].set_data_from_numpy(tensor)

outputs = [httpclient.InferRequestedOutput("logits")]

response = client.infer(model_name="image_classification", inputs=inputs, outputs=outputs)
logits = response.as_numpy("logits")

EUROSAT_CLASSES = [
    "AnnualCrop", "Forest", "HerbaceousVegetation", "Highway",
    "Industrial", "Pasture", "PermanentCrop", "Residential",
    "River", "SeaLake"
]

predicted_class = EUROSAT_CLASSES[logits.argmax()]
print(f"Predicted: {predicted_class}")

Summary

StepWhat Happens
Pull containersGet PyTorch (conversion) and Triton (serving) images
Export to ONNXUse optimum-cli to convert the HF model
Inspect ONNXConfirm tensor names and shapes
Build model repoSet up the model_repository/<name>/1/model.onnx structure
Write configDefine backend, inputs, outputs in config.pbtxt
Start servertritonserver --model-repository=/models
InferSend requests via HTTP, gRPC, or tritonclient

Triton handles the serving infrastructure, including batching, multi-GPU routing, health checks, and metrics, so your application code only needs to worry about sending requests and interpreting logits.