Senior Site Reliability Engineer (SRE) - (Dublin, CA) Job at Articul8, Dublin, CA

M1dXMU8vc0VPa1pwd3ExbjBxN1ErYWc1b1E9PQ==
  • Articul8
  • Dublin, CA

Job Description

About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities
  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.
  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.
  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.
  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.
  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.
  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.
  • Optimize infrastructure for performance, scalability, and cost-effectiveness-especially for high-demand AI workloads.
  • Implement and enforce security best practices across all systems and environments.
  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.
Qualifications
Required
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 5+ years of experience in DevOps, SRE, or similar roles
  • Strong experience with cloud platforms (AWS, GCP, or Azure)
  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
  • Solid background in containerization technologies (Docker, Kubernetes)
  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
  • Strong understanding of CI/CD pipelines and automation
  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
Preferred
  • Experience supporting AI/ML systems in production
  • Knowledge of GPU infrastructure management and optimization
  • Familiarity with distributed systems and high-performance computing
  • Experience with database systems (SQL and NoSQL)
  • Certifications in cloud platforms (AWS, GCP, Azure)
  • Experience with chaos engineering and resilience testing
  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow's AI at Articul8 AI!

Job Tags

Similar Jobs

National Veterinary Associates

Veterinarian Job at National Veterinary Associates

 ...Job Description Are you ready to work with a cohesive team in the Magic City?\nDo you want to enjoy all four seasons? Most\nimportantly, do you want to bring your dog to work every day? Shiloh Veterinary Hospital is currently\nlooking for an associate full-time Veterinarian... 

Skytreedgtl1

Web Developer Job at Skytreedgtl1

 ...tiga) tahun dalam bidang ini. Memahami dasar HTML, CSS, dan JavaScript. Familiar dengan berbagai tool CMS seperti WordPress dan Drupal. Familiar dengan Google Tag Manager. Mengikuti berbagai tren yang terjadi di bisnis, teknologi, dan internet. Dapat... 

Empire Telecom

Verizon Business Sales Representative Job at Empire Telecom

 ...re valued and encouraged to thrive. Your Role: Become a Verizon expert, capable of passionately communicating the benefits of our...  ...-friendly iPad purchase order system. Actively engage with business owners, sparking conversations that lead to sales opportunities... 

LanceSoft

Travel Progressive Care Unit (PCU) Registered Nurse - $2,362 per week Job at LanceSoft

 ...LanceSoft is seeking a travel nurse RN PCU - Progressive Care Unit for a travel nursing job in Corvallis, Oregon. Job Description & Requirements...  ...Drug & Alcohol Facilities, Home Health & Community Health, Urgent Care Clinics, and many other provider-based facilities.... 

Jerry's Enterprises Inc.

Part Time Pricing Clerk - Up to $18.00 an hour based on experience Immediate Opening Job at Jerry's Enterprises Inc.

Location: Cub Foods Alexandria Reports to: Pricing Coordinator Classification: Part Time Nonunion Rate of Pay: Up to $18.00 an hour based on experience Hours: Sunday Saturday, varied hours (mostly early mornings) Jerrys work perks: Store Discount ...