Integrate PyTorch with IDrive® e2

    PyTorch is an open-source deep learning framework that offers dynamic computation graphs, flexible model building, and efficient training across CPUs and GPUs. It supports a wide range of AI applications, including computer vision, natural language processing, and generative models. Integrating PyTorch with IDrive® e2 extends these capabilities with a scalable, secure, and cost-effective cloud object storage solution that facilitates systematic integration for research and production workloads.

    Prerequisites

    Before you begin, ensure the following:

    1. An active IDrive® e2 account. Sign up here if you do not have one.
    2. A bucket in IDrive® e2. See how to create a bucket.
    3. Valid Access Key ID and Secret Access Key. Learn how to create an access key.
    4. Python: 3.8 – 3.13
    5. PyTorch: ≥ 2.0
    6. AWS CLI installed and configured (aws configure)

    The following steps can help you successfully configure PyTorch with IDrive® e2 cloud object storage.


    Install the S3 Connector for PyTorch

    Shell
    pip install s3torchconnector torchvision torch

    Note: Pre-built wheels are available for Linux and macOS. On Windows, you may need to build from source (see GitHub repo).

    Configure Credentials

    You can authenticate in several ways:

    1. AWS CLI profile:
      Shell
      aws configure --profile myprofile
    2. Then set:
      Shell
      export AWS_PROFILE=myprofile
    3. Environment variables:
      Shell
      export AWS_ACCESS_KEY_ID=YOUR_KEY
      export AWS_SECRET_ACCESS_KEY=YOUR_SECRET
      export AWS_DEFAULT_REGION=us-la-1

    Loading Data from S3 into PyTorch

    The simplest way to use the S3 Connector for PyTorch is to construct a dataset, either a map-style or iterable-style dataset, by specifying an S3 URL (a bucket and optional prefix) and the region where the bucket is located.

    Example: Map-style Dataset
    Python
    from s3torchconnector import S3MapDataset, S3IterableDataset
    # You need to update <BUCKET> and <PREFIX>
    ENDPOINT_URL="https://<e2_ENDPOINT>"
    DATASET_URI="s3://pytorch-data/test"
    REGION = "<e2_REGION>"
    iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT_URL, region=REGION, s3client_config=cfg)

    # Datasets are also iterators.
    for item in iterable_dataset:
    print(item.key)

    # S3MapDataset eagerly lists all the objects under the given prefix
    # to provide support of random access.
    # S3MapDataset builds a list of all objects at the first access to its elements or
    # at the first call to get the number of elements, whichever happens first.
    # This process might take some time and may give the impression of being unresponsive.

    map_dataset = S3MapDataset.from_prefix(DATASET_URI, endpoint=ENDPOINT_URL, region=REGION, s3client_config=cfg)
    # Randomly access an item in map_dataset.
    item = map_dataset[0]

    # Learn about bucket, key, and content of the object
    bucket = item.bucket
    key = item.key
    content = item.read()
    len(content)

    Saving & Loading Checkpoints Directly to e2

    In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

    Python
    from s3torchconnector import S3Checkpoint
    import torchvision
    import torch

    ENDPOINT_URL="https://<e2_ENDPOINT>"
    CHECKPOINT_URI="s3://<BUCKET>/<PREFIX>/"
    REGION = "<e2_REGION>"

    checkpoint = S3Checkpoint(region=REGION, endpoint=ENDPOINT_URL, s3client_config=cfg)

    model = torchvision.models.resnet18()

    # Save checkpoint to S3
    with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

    # Load checkpoint from S3
    with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)
    model.load_state_dict(state_dict)