DevOps & SRE

Staff Site Reliability Engineer

Job ID: #

An established hedge fund firm is looking for a senior SRE to join their Singapore office.

Position: Staff Site Reliability Engineer (Bare Metal Experience) - Hedge Fund Domain

Location: Singapore

Company Overview:

Our client is an established hedge fund in Singapore. With a strong track record of success and a dedication to innovation, they are at the forefront of leveraging technology and data-driven insights to drive investment strategies and deliver exceptional returns to our clients.

Job Description:

We are seeking a talented Staff Site Reliability Engineer (SRE) with expertise in bare metal infrastructure to join their dynamic team within the hedge fund domain. As a Staff SRE, you will play a critical role in designing, building, and maintaining the reliability, scalability, and performance of our infrastructure stack, with a focus on bare metal environments. You will work closely with cross-functional teams to ensure the seamless operation of our trading systems and support our mission-critical operations.

Responsbilities:

- Design, implement, and maintain highly available, fault-tolerant infrastructure solutions for our trading systems, with a focus on bare metal environments.

- Collaborate with engineering teams to define reliability and performance requirements, and implement solutions to meet those requirements.

- Develop and maintain automation tools and frameworks for infrastructure provisioning, configuration management, and monitoring.

- Implement and maintain robust disaster recovery and failover mechanisms to ensure business continuity.

- Lead efforts to identify and address performance bottlenecks, scalability challenges, and other operational issues.

- Define and enforce best practices for infrastructure security, compliance, and operational excellence.

- Mentor and coach junior members of the SRE team and contribute to the overall technical growth of the organization.

Industry

Location:

Singapore, Remote

Company Size:

Job Type:

Date:

Requirements

- Bachelor's or advanced degree in Computer Science, Engineering, or a related field.

- Extensive experience as a Site Reliability Engineer or similar role within the hedge fund, financial services, or high-frequency trading domain.

- Strong expertise in designing, building, and managing bare metal infrastructure at scale.

- Proficiency in scripting and automation using languages such as Python, Bash, or similar.

- Experience with configuration management tools (e.g., Ansible, Puppet, Chef) and infrastructure as code (IaC) principles.

- Deep understanding of networking concepts, protocols, and technologies.

- Strong problem-solving skills and ability to troubleshoot complex issues in distributed systems.

- Excellent communication and collaboration skills, with the ability to work effectively in a fast-paced, dynamic environment.

Good to have:

Expsoure to HFT or similar trading background.
Previous exposure to K8s operators.
Proficiency in containerisation technologies.
Understanding of security principles from both operational and implementation perspectives.