Pluralis Research

    Senior Platform Engineer

    Pluralis Research
    Posted 11/6/2025Senior Level
    Full-time
    Technology
    Infrastructure-as-Code
    Python Engineering
    Container & GPU
    Networking
    ML Infrastructure

    Job Description

    Overview Pluralis Research is pioneering Protocol Learning—a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates. By pooling compute from many participants, incentivising their efforts, and preventing any single party from controlling a model’s full weights, we’re creating a genuinely open, collaborative path to frontier-scale AI. We’re looking for a Senior Platform Engineer with experience in startups, or senior devops in big tech with a passion for ML. Helping to scale and own our systems infrastructure orchestration, and services integration. Responsibilities Multi-Cloud Infrastructure: Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform). Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes. Distributed Training Systems: Architect fault-tolerant infrastructure for distributed ML. GPU clusters, NVIDIA runtime, S3 checkpointing, Large dataset management and streaming, health monitoring, and resilient retry strategies. Real-World Networking: Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss — while managing dynamic node churn and ensuring efficient data flow across workers with heterogeneous connectivity, because our training happens on consumer nodes and non co-located infrastructure, not in a datacenter. What You’ll Bring

    Ideally, you’ll have 5+ years of work experience with deep experience in:

    • Infrastructure-as-Code: Production Pulumi/Terraform/CloudFormation managing multi-cloud deployments. Lifecycle orchestration, automated provisioning, self-healing systems at scale.
    • Python Engineering: Idiomatic async Python with error handling, retry logic, concurrent execution. Asyncio, SSH libraries, cloud SDKs, CLI tools.
    • Container & GPU: Docker, Kubernetes/EKS, GPU workloads, heterogeneous clusters. multi-GPU optimization, resource scheduling.
    • Networking: Decentralized topologies and routing, NAT hole punching, P2P multi-address coordination, traffic shaping, real-world bandwidth constraints.
    • ML Infrastructure: Distributed training workflows, checkpoint management, data sharding, model versioning, long-running job operations.
    • Observability & SRE: Monitoring systems (Prometheus/Grafana), logging, SLOs, incident response, bottleneck profiling, performance optimization.
    • What we’re looking for
    • Experience in a startup environment with an emphasis on micro-services orchestration or big tech background
    • Deep understanding of multi-cloud infra & distributed training systems
    • A team player with high attention to detail

    A strong passion to work at the intersection of AI and decentralized systems Backed by Union Square Ventures and other tier-1 investors, we’re a world-class, deeply technical team of ML researchers. Pluralis is unapologetically ideological. We view the world as a better place if we are able to implement what we are attempting, and Protocol Learning as the only plausible approach to preventing a handful of massive corporations monopolising model development, access and release, and achieving massive economic capture. If this resonates, please apply.*

    💼 Want More Jobs Like This?

    Get similar opportunities delivered to your inbox. Free, no account needed!

    Similar Jobs You Might Like

    Technical Account Manager

    Nymbus, Inc.
    Not specifiedabout 5 hours ago
    Full-time
    Technical Account Management
    Client Engagement
    API Integrations
    System Configurations
    Troubleshooting

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Low Level Systems Debugging
    Performance Measurement
    Optimization

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Algorithms
    Data Structures
    Performance Measurement

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Low Level Systems Debugging
    Performance Measurement
    Optimization

    Want to see all 19,521 jobs?

    You're currently viewing 1 out of 19,521 available remote opportunities

    🔒 19,520 more jobs are waiting for you

    Unlock All Jobs

    Access every remote opportunity

    Advanced Filters

    Find your perfect match faster

    Daily Updates

    New opportunities every day

    Save & Alerts

    Never miss an opportunity

    Weekly
    $4
    Perfect for quick searches
    POPULAR
    Monthly
    $12
    Best for active job seekers
    Yearly
    $48
    Save 67% • Best value
    Unlock All 19521 Jobs

    Join thousands of remote workers who found their dream job

    Frequently Asked Questions

    What's included in premium access?

    Premium members get unlimited access to all remote job listings, advanced search filters, job alerts, and the ability to save favorite jobs.

    Can I cancel anytime?

    Yes! You can cancel your subscription at any time from your account settings. You'll continue to have access until the end of your billing period.

    Do you offer refunds?

    We offer a 7-day money-back guarantee on all plans. If you're not satisfied, contact us within 7 days for a full refund.

    Is my payment secure?

    Absolutely! We use Stripe for payment processing, which is trusted by millions of businesses worldwide. We never store your payment information.