Senior Site Reliability Engineer (SRE) - (Dublin, CA) Job at Articul8, Dublin, CA

M1dXMU8vc0VPa1pwd3ExbjBxN1ErYWc1b1E9PQ==
  • Articul8
  • Dublin, CA

Job Description

About Us

Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.

Position Overview

We are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.

Key Responsibilities
  • Architect and maintain scalable, highly available infrastructure for our GenAI platform.
  • Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
  • Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.
  • Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.
  • Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.
  • Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.
  • Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.
  • Optimize infrastructure for performance, scalability, and cost-effectiveness-especially for high-demand AI workloads.
  • Implement and enforce security best practices across all systems and environments.
  • Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.
Qualifications
Required
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
  • 5+ years of experience in DevOps, SRE, or similar roles
  • Strong experience with cloud platforms (AWS, GCP, or Azure)
  • Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
  • Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
  • Solid background in containerization technologies (Docker, Kubernetes)
  • Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
  • Strong understanding of CI/CD pipelines and automation
  • Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
Preferred
  • Experience supporting AI/ML systems in production
  • Knowledge of GPU infrastructure management and optimization
  • Familiarity with distributed systems and high-performance computing
  • Experience with database systems (SQL and NoSQL)
  • Certifications in cloud platforms (AWS, GCP, Azure)
  • Experience with chaos engineering and resilience testing
  • Knowledge of security best practices and compliance requirements

Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow's AI at Articul8 AI!

Job Tags

Similar Jobs

Lucas Oil Products Inc

Social Media & Digital Content Specialist (On-site) Job at Lucas Oil Products Inc

 ...Job Description Job Description Job Purpose The Social Media & Digital Content Specialist plays a key role in driving Lucas Oil Products digital presence and brand storytelling across social and digital platforms. This creative, production-minded marketer will... 

Shook Construction

Commercial Truck Driver Job at Shook Construction

 ...efficiencies Responsibility of making sure truck and trailer is properly loaded for safe...  ...to deal with problems involving several concrete variables in standardized situations....  ...Registrations Must have valid CDL Class A driver's license with clean driving record.... 

ASAP TRANS CORP

CDL-A OTR Drivers - 2+ Years Experience, Competitive pay, New equipment! Job at ASAP TRANS CORP

 ...growing company based in Illinois, looking for professional CDL-A OTR drivers who want steady miles, great pay, and a supportive team. *...  ...-home pay * Weekly direct deposit * New, well-maintained trucks and trailers * 24/7 dispatch support * Consistent freight... 

Workoo Technologies

Remote Data Entry Specialist - Work from home Job at Workoo Technologies

 ...Remote Data Entry Specialist - Work from Home Our company is looking for a remote data entry specialist to handle a variety of basic clerical tasks. This adaptable part?time role can be performed from home, college, or any other location. The position offers pay of... 

Aramark

Barista - Coffee Shop at Ringling Museum Job at Aramark

 ...and handle cash and credit card transactions. Greet and assist customers while anticipating their needs Prepare and serve coffees, teas, specialty beverages per brand standards Prepares and serve food items in line with location standards Count, organize...