SRE (Site Reliability Engineer) Job at Diverse Lynx, Dallas, TX

MzJPME92b0JPazl2emE5dTNxclg4cUE9
  • Diverse Lynx
  • Dallas, TX

Job Description

Hi Team,

SRE (Site Reliability Engineer)

Required Skills: ***** Please screen as per below required skills & do not share generic SRE/Site Reliability Engineer resumes *****
  • Bachelor's degree in computer science, information technology, or a related field.t
  • 8+ years in software or operations engineering
  • 3+ years of DevOps and Site Reliability engineering: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and deep understanding of multi-cloud architectures
  • 3+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions and alike.
  • 2+ years of documented hands-on experience with Azure (Azure ML is a plus)
  • 2+ years of practical experience in containerization technologies (Kubernetes, Docker) and orchestration
  • 2+ years of practical experience in Scripting & Automation
  • Advanced proficiency in scripting languages such as Bash to support automation and system integration efforts.
Job Description:
  • Act as production Gatekeeper for all changes (Product and infrastructure changes)
  • Perform detailed deep dive (root cause analysis) on the repeated system issues and work with engineering team for permanent solution
  • Provide support as Tier2 application/platform support for client AI applications
  • Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods
  • Site Reliability Engineer (SRE) is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning
  • This role will be a member of a team that focuses on DevOps, Dev SecOps and SRE for the client AI Organization
  • The role drives continuous improvement in delivery of resilient, scalable, performant, secure, and high-quality cloud-native services
  • Collaborating with SecOps, and development teams the SRE identifies cross-team issues which create risk for operations across the organization and resolving those issues with a mixture of engineering, troubleshooting expertise, and general operational guidance
  • Proactively drive improvement of enterprise cloud capabilities while creating best practices and tools to empower developers to create, deploy, and operationally support services
  • As a key contributor in the organization this role is responsible for the working with the Principal SRE and guiding junior team members in DevOps culture, highly scalable architectures, and lean development utilizing agile practices
  • Educate yourself and others on anything that helps service teams more quickly and easily build, test, deploy & run their services to be more reliable
  • Plan, design, deploy, and operate Site Reliability Engineering capabilities for cloud products & services
  • Recognize and address sub-standard performance based on key performance indicators (KPIs)
  • Build monitoring that alerts on symptoms rather than outages
  • Continuously build, automate, and improve upon capabilities that are secure, scalable, performant, and resilient
  • Work closely with Infrastructure, Network, Security, Architecture, and Development teams to build highly performing, scalable, and secure Azure/AWS/GCP (cloud) environments
  • Define needs by documenting processes; includes research, planning and writing supporting documentation• Participate in regulatory and compliance activities as necessary
  • Periodic on call rotations and available outside of normal business hours on evenings and weekends during critical production release or issue escalation periods
  • Responsible for remediating the security vulnerabilities which are discovered in the non-production and production scans.
  • Participate in the new vendor/product/service onboarding and assess partner technical readiness (Such as Azure AI studio, Azure model catalog, AWS sagemaker).
  • Develop or maintain dashboard for operational analysis and status reports.
  • Perform Operational Readiness testing for every release package to proactively predict any performance degradations across all components of a critical asset.(For example - Portal, Workspace creation, Project creation, Model Inference and API Response times

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.

Job Tags

Permanent employment, Weekend work, Afternoon shift,

Similar Jobs

Optum

Hospice Registered Nurse - RN Job at Optum

 ...Bonus! Explore opportunities with UP Health System Home Care & Hospice, a part of LHC Group, a leading post-acute care partner for...  ...of Caring. Connecting. Growing together. As the Registered Nurse Case Manager, you will assume full nursing responsibility for the... 

Cassia CONNECT

Certified Nursing Assistant - Training Provided Job at Cassia CONNECT

 ...Description Job Description Embark on a Certified Nursing Assistant journey with Elim Wellsprings! This opportunity offers you a...  ...About Us: Elim Wellspring offers Adult Day Care, Assisted Living, Skilled Nursing, Memory Care, Hospice, Home Care, Respite Care... 

First Quality

Data Scientist Job at First Quality

 ...and our mission to Make Things Better. We are seeking a Data Scientist for our First Quality facilities located in McElhattan...  ...~ PhD or masters degree in Statistics, Mathematics, Computer Science or other relevant discipline. ~5+ years of experience using... 

Judge Direct Placement

Materials Manager Job at Judge Direct Placement

 ...Job Description The Judge Group is seeking a Materials Manager for a manufacturing company located in Horsham, PA area Title: Supply Chain Manager Location: Horsham, PA Salary: $115,000 - $125,000 Responsibilities Lead materials planning operations... 

CarMax

Inventory Associate Job at CarMax

6080 - Serramonte - 401 Serramonte Boulevard, Colma, California, 94014 CarMax, the way your career should be! General Summary: Under general supervision, responsible for interior cleaning/vacuuming, exterior cleaning, paint touch up, wet sanding buffing...