S3 Compatible Storage Integrations

PyTorch

Integrate PyTorch with IDrive^® e2

PyTorch is an open-source deep learning framework that offers dynamic computation graphs, flexible model building, and efficient training across CPUs and GPUs. It supports a wide range of AI applications, including computer vision, natural language processing, and generative models. Integrating PyTorch with IDrive^® e2 extends these capabilities with a scalable, secure, and cost-effective cloud object storage solution that facilitates systematic integration for research and production workloads.

Prerequisites

Before you begin, ensure the following:

An active IDrive^® e2 account. Sign up here if you do not have one.
A bucket in IDrive^® e2. See how to create a bucket.
Valid Access Key ID and Secret Access Key. Learn how to create an access key.
Python: 3.8 – 3.13
PyTorch: ≥ 2.0
AWS CLI installed and configured (aws configure)

The following steps can help you successfully configure PyTorch with IDrive^® e2 cloud object storage.

Install the S3 Connector for PyTorch

Shell
pip install s3torchconnector torchvision torch

Note: Pre-built wheels are available for Linux and macOS. On Windows, you may need to build from source (see GitHub repo).

Configure Credentials

You can authenticate in several ways:

AWS CLI profile:
```
Shell
aws configure --profile myprofile
```
Then set:
```
Shell
export AWS_PROFILE=myprofile
```

Environment variables:

Shell
export AWS_ACCESS_KEY_ID=YOUR_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_SECRET
export AWS_DEFAULT_REGION=us-la-1

Loading Data from S3 into PyTorch

The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URL (a bucket and optional prefix) and the region where the bucket is located.

Example: Map-style Dataset

Python
from s3torchconnector import S3MapDataset, S3IterableDataset

# You need to update <BUCKET> and <PREFIX>
ENDPOINT_URL="https://<e2_ENDPOINT>"
DATASET_URI="s3://pytorch-data/test"
REGION = "<e2_REGION>"
iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT_URL, region=REGION, s3client_config=cfg)

# Datasets are also iterators.
for item in iterable_dataset:
print(item.key)

# S3MapDataset eagerly lists all the objects under the given prefix
# to provide support of random access.
# S3MapDataset builds a list of all objects at the first access to its elements or
# at the first call to get the number of elements, whichever happens first.
# This process might take some time and may give the impression of being unresponsive.

map_dataset = S3MapDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT_URL, region=REGION, s3client_config=cfg)

# Randomly access an item in map_dataset.
item = map_dataset[0]

# Learn about bucket, key, and content of the object
bucket = item.bucket
key = item.key
content = item.read()
len(content)

Saving & Loading Checkpoints Directly to e2

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

Python
from s3torchconnector import S3Checkpoint
import torchvision
import torch

ENDPOINT_URL="https://<e2_ENDPOINT>"
CHECKPOINT_URI="s3://<BUCKET>/<PREFIX>/"
REGION = "<e2_REGION>"

checkpoint = S3Checkpoint(region=REGION, endpoint=ENDPOINT_URL, s3client_config=cfg)

model = torchvision.models.resnet18()

# Save checkpoint to S3
with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
torch.save(model.state_dict(), writer)

# Load checkpoint from S3
with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
state_dict = torch.load(reader)
model.load_state_dict(state_dict)

PyTorch

Integrate PyTorch with IDrive^® e2

Prerequisites

Install the S3 Connector for PyTorch

Configure Credentials

Loading Data from S3 into PyTorch

Example: Map-style Dataset

Saving & Loading Checkpoints Directly to e2

E2-logo

Solutions

Get Started

Services

PyTorch

Integrate PyTorch with IDrive® e2

Prerequisites

Install the S3 Connector for PyTorch

Configure Credentials

Loading Data from S3 into PyTorch

Example: Map-style Dataset

Saving & Loading Checkpoints Directly to e2

Integrate PyTorch with IDrive^® e2