
This isn't a "keep the lights on" SRE role. This is a strategic, high-impact opportunity to build the nervous system for a platform that transforms how networks of satellites, ground stations, and fleets are interconnected and orchestrated. You will be building the core observability stack that ensures the reliability of systems critical to the operation of satellite megaconstellations and missions to deep space. This is a greenfield/brownfield opportunity. You will be a trusted expert, helping to define and implement the strategy and building the tools that empower our engineers. You will support the roadmap to mature our observability stack, moving from cloud-native tools to a robust, scalable, and insightful platform built on best-in-class technologies (Prometheus, OpenTelemetry, etc.). If you are an SRE who thrives on platform-building challenges and wants to be relied upon to build a production-grade observability stack from the ground up, this role is for you. Note: this role includes on-call responsibilities.
Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry). Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries). Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD). Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.
4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems. Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.). You have proven experience using these tools to support performance analysis and debugging of complex distributed systems. Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes. Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD). Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling. Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.
U.S. citizen or national U.S. lawful permanent resident (green card holder) Refugee under 8 U.S.C. 1157 Asylee under 8 U.S.C. 1158 (B) Be eligible to access export-controlled information without requiring an export authorization.
(C) Be eligible and reasonably likely to obtain the necessary export authorization from the appropriate U.S. government agency.
The company reserves the right to decline pursuing an export licensing process for legitimate business-related reasons.
Aalyria is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age, disability status, genetic information, protected veteran status, or any other characteristic protected by law. Qualified applicants from all backgrounds are encouraged to apply.
Get similar opportunities delivered to your inbox. Free, no account needed!
You're currently viewing 1 out of 20,019 available remote opportunities
🔒 20,018 more jobs are waiting for you
Access every remote opportunity
Find your perfect match faster
New opportunities every day
Never miss an opportunity
Join thousands of remote workers who found their dream job
Premium members get unlimited access to all remote job listings, advanced search filters, job alerts, and the ability to save favorite jobs.
Yes! You can cancel your subscription at any time from your account settings. You'll continue to have access until the end of your billing period.
We offer a 7-day money-back guarantee on all plans. If you're not satisfied, contact us within 7 days for a full refund.
Absolutely! We use Stripe for payment processing, which is trusted by millions of businesses worldwide. We never store your payment information.