
We are seeking a Senior Platform/DevOps Engineer to join our growing Platform Engineering team. This role will focus on building and maintaining automation, infrastructure as code, and platform tooling that enables our development teams to ship reliable software quickly across our multi-cloud infrastructure (Azure/AKS and AWS/EKS).
Core Responsibilities
Automation & Infrastructure Design, implement, and maintain infrastructure automation using Terraform/OpenTofu Build, optimize, and improve CI/CD pipelines and processes using GitHub Actions for .NET, Python, and Go applications Improve developer experience through workflow automation and tooling enhancements Develop processes to enable better testing and debugging of production issues in lower environments Develop Infrastructure as Code patterns using Kustomize and Helm for Kubernetes deployments Implement GitOps workflows using Flux for declarative infrastructure management Create self-service platform capabilities that empower development teams Automate operational tasks to reduce manual overhead and improve reliability Platform Engineering Manage and optimize Kubernetes clusters (Azure AKS) across multiple environments (CI, QA, RC, Production) Contribute to maintaining and upgrading existing Azure infrastructure Contribute to Azure B2C authentication replacement/upgrade initiative planned for early 2026 Contribute to AWS/EKS infrastructure research, planning, and buildout initiatives for 2026 expansion Design and implement platform services and tools that improve developer productivity Build and maintain observability infrastructure (Grafana, Prometheus, Loki, Tempo) Establish platform engineering best practices and standards across both cloud providers Collaborate with application teams to understand platform requirements Optimize resource utilization and cost efficiency across Azure and AWS infrastructure Documentation & Knowledge Sharing Create comprehensive documentation for platform services, tools, and processes Develop runbooks and troubleshooting guides for operational procedures Build knowledge base for platform operations and best practices Conduct knowledge sharing sessions with team members and application developers Document architecture decisions and infrastructure patterns Maintain up-to-date system diagrams and technical documentation Operations & Reliability Participate in on-call rotation for platform infrastructure support (required) Investigate and resolve infrastructure incidents Perform root cause analysis and implement preventive measures Monitor platform health and proactively address issues Contribute to incident response and post-mortem processes
Technical Skills
5+ years of experience with Azure cloud services (Azure primary focus) 5+ years of hands-on experience with Kubernetes Experience with AWS services and willingness to lead AWS/EKS expansion initiatives Deep understanding of Kubernetes architecture, networking, storage, and security Production experience with container orchestration and microservices architectures Multi-cloud architecture understanding and cross-cloud portability considerations
Expert proficiency with Terraform or OpenTofu Strong experience with Kustomize and Helm for Kubernetes deployments Experience with GitOps methodologies and tools (Flux, ArgoCD, or similar) Understanding of declarative infrastructure management
Strong experience with GitHub and GitHub Actions Proven track record of building and optimizing CI/CD pipelines Experience automating operational tasks using scripting (Bash, Python, or Go) Understanding of automated testing strategies and deployment patterns
Experience supporting .NET applications in production environments Experience with Javascript services and deployment patterns Familiarity with Python application deployment and runtime requirements Understanding of application observability and monitoring needs
Strong understanding of DevOps principles and methodologies Experience with monitoring and observability tools (Prometheus, Grafana, or similar) Knowledge of logging aggregation systems (Loki, ELK, or similar) Understanding of distributed tracing concepts and tools Professional Skills
Exceptional technical writing skills with ability to create clear, comprehensive documentation Strong verbal communication skills for knowledge sharing and collaboration Experience creating runbooks, architecture diagrams, and technical specifications Ability to explain complex technical concepts to various audiences
Strong analytical and troubleshooting skills Proactive approach to identifying and solving problems Curiosity-driven mindset for discovering better solutions and practices Ability to balance pragmatic solutions with long-term architectural considerations
Experience mentoring junior engineers and sharing knowledge Collaborative working style with ability to work independently Strong stakeholder management skills Experience working in cross-functional teams
Experience with on-call rotations and incident response Understanding of SRE principles and practices Focus on reliability, availability, and performance Experience with capacity planning and performance optimization Preferred Qualifications
Experience with service mesh technologies (Istio, Linkerd) Knowledge of Kubernetes operators and custom resource definitions (CRDs) Experience with distributed tracing systems (Tempo, Jaeger) Familiarity with policy enforcement tools (OPA, Kyverno) Experience with secrets management (Azure Key Vault, Vault, Sealed Secrets) Experience with advanced deployment strategies (Blue/Green, Canary, automated rollbacks)
Certifications: Azure Administrator Associate, Azure DevOps Engineer Expert, CKA/CKAD Experience with Active Directory and Azure Active Directory (Entra ID) Experience with Spacelift or similar infrastructure orchestration platforms Cloud cost optimization experience and financial operations (FinOps) practices Experience with security scanning and compliance tooling Background in software development or site reliability engineering Experience with AI-powered tooling and workflow automation platforms Technical writing and standards documentation experience
Experience in [relevant industry vertical] Understanding of compliance requirements (SOC2, FedRamp, etc.) Experience with multi-region deployments and disaster recovery Knowledge of networking fundamentals and Azure networking services
Get similar opportunities delivered to your inbox. Free, no account needed!
You're currently viewing 1 out of 16,759 available remote opportunities
🔒 16,758 more jobs are waiting for you
Access every remote opportunity
Find your perfect match faster
New opportunities every day
Never miss an opportunity
Join thousands of remote workers who found their dream job
Premium members get unlimited access to all remote job listings, advanced search filters, job alerts, and the ability to save favorite jobs.
Yes! You can cancel your subscription at any time from your account settings. You'll continue to have access until the end of your billing period.
We offer a 7-day money-back guarantee on all plans. If you're not satisfied, contact us within 7 days for a full refund.
Absolutely! We use Stripe for payment processing, which is trusted by millions of businesses worldwide. We never store your payment information.