Aalyria

    Staff Site Reliability Engineer - Spacetime

    Aalyria
    Posted 11/10/2025Lead/Manager
    Full-time
    Technology
    Site Reliability Engineering
    Observability
    Prometheus
    OpenTelemetry
    Terraform

    Job Description

    Role Overview:

    This isn't a "keep the lights on" SRE role. This is a strategic, high-impact opportunity to build the nervous system for a platform that transforms how networks of satellites, ground stations, and fleets are interconnected and orchestrated. You will be building the core observability stack that ensures the reliability of systems critical to the operation of satellite megaconstellations and missions to deep space. This is a greenfield/brownfield opportunity. You will be the foundational expert, defining the strategy and building the tools that empower our engineers. You will own the roadmap to mature our observability stack and build a robust, scalable, and insightful platform built on best-in-class technologies (e.g. Prometheus, OpenTelemetry, etc.). If you are an SRE who thrives on platform-building challenges and wants to own a production-grade observability stack from the ground up, this role is for you. Note: this role includes on-call responsibilities.

    Key Responsibilities:

    Design, build, and own the technical roadmap for Aalyria's centralized observability platform, integrating and scaling tools for metrics (Prometheus), logging (Loki), and distributed tracing (Tempo/OpenTelemetry). Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. Establish and evangelize observability best practices, providing standards, documentation, and tooling (e.g., OpenTelemetry libraries) to empower our Go and Java application teams to instrument their services effectively. Partner with core software engineers to provide the tools and insights needed to debug performance, optimize computational pipelines (including CPU/GPU workloads), and ensure the reliability of large-scale distributed systems. Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (Terraform) and GitOps principles (ArgoCD). Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.

    Required Qualifications:

    7+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems. Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). You have proven experience using these tools to support performance analysis and debugging of complex distributed systems. Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes. Proven mastery of Infrastructure as Code (IaC) with Terraform and GitOps principles (e.g., ArgoCD). Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling. Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services.

    Preferred Qualifications:

    • Experience operating a multi-cloud environment, specifically GCP and AWS.
    • Hands-on experience with GitLab CI for CI/CD pipelines.
    • Working knowledge of service mesh technologies such as Istio or Linkerd.
    • Experience with high-performance computing (HPC) environments and instrumenting numerical optimization workloads
    • Familiarity with instrumenting applications written in Golang and C++.
    • Experience with JVM observability (tuning, monitoring) for Java-based applications.
    • An active Secret clearance, or higher, is preferred for this position.

    What We Offer:

    • Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications.
    • Impactful Work: Directly contribute to critical national security programs and initiatives.
    • Growth Opportunities: Expand your career with opportunities for professional development and advancement.
    • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter.
    • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules.
    • Compensation and Equity: Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.

    ITAR/EAR Requirements:

    This position involves access to export-controlled information. To comply with U.S. government export regulations, applicants must meet one of the following criteria:

    (A) Qualify as a U.S. person, which includes:

    U.S. citizen or national U.S. lawful permanent resident (green card holder) Refugee under 8 U.S.C. 1157 Asylee under 8 U.S.C. 1158 (B) Be eligible to access export-controlled information without requiring an export authorization.

    (C) Be eligible and reasonably likely to obtain the necessary export authorization from the appropriate U.S. government agency.

    The company reserves the right to decline pursuing an export licensing process for legitimate business-related reasons.

    Equal Opportunity Employer Statement:

    Aalyria is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age, disability status, genetic information, protected veteran status, or any other characteristic protected by law. Qualified applicants from all backgrounds are encouraged to apply.

    💼 Want More Jobs Like This?

    Get similar opportunities delivered to your inbox. Free, no account needed!

    Similar Jobs You Might Like

    Technical Account Manager

    Nymbus, Inc.
    Not specifiedabout 5 hours ago
    Full-time
    Technical Account Management
    Client Engagement
    API Integrations
    System Configurations
    Troubleshooting

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Low Level Systems Debugging
    Performance Measurement
    Optimization

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Algorithms
    Data Structures
    Performance Measurement

    Senior Staff Software Engineer - Delta

    Databricks
    Not specifiedabout 5 hours ago
    Full-time
    Software Engineering
    Distributed Systems
    Low Level Systems Debugging
    Performance Measurement
    Optimization

    Want to see all 19,521 jobs?

    You're currently viewing 1 out of 19,521 available remote opportunities

    🔒 19,520 more jobs are waiting for you

    Unlock All Jobs

    Access every remote opportunity

    Advanced Filters

    Find your perfect match faster

    Daily Updates

    New opportunities every day

    Save & Alerts

    Never miss an opportunity

    Weekly
    $4
    Perfect for quick searches
    POPULAR
    Monthly
    $12
    Best for active job seekers
    Yearly
    $48
    Save 67% • Best value
    Unlock All 19521 Jobs

    Join thousands of remote workers who found their dream job

    Frequently Asked Questions

    What's included in premium access?

    Premium members get unlimited access to all remote job listings, advanced search filters, job alerts, and the ability to save favorite jobs.

    Can I cancel anytime?

    Yes! You can cancel your subscription at any time from your account settings. You'll continue to have access until the end of your billing period.

    Do you offer refunds?

    We offer a 7-day money-back guarantee on all plans. If you're not satisfied, contact us within 7 days for a full refund.

    Is my payment secure?

    Absolutely! We use Stripe for payment processing, which is trusted by millions of businesses worldwide. We never store your payment information.