Montash have a long-term requirement for a Site Reliability Engineer to support a financial services client on their Hadoop ecosystem.
Role: Site Reliability Engineer
Length: 6 months initially - with likely extension to 9-12 months
Client Location: Brussels, Belgium
Remote Work: Until COVID-19 restrictions are lifted, 100%. Thereafter, it will be mostly on site work in Brussels, Belgium.
Start Date: Either Mid-Late May, or Early June (flexible)
Project Language: English (French or Dutch language skills are a bonus)
Working with a small team of Engineers & Administrators, you will be responsible for the latency, efficiency, emergency response, monitoring, capacity planning, change management, availability and performance of the Hadoop big data platform, with a focus on the production environment.
* 50% focus on operational management of the system (monitoring, incident management, manual tasks)
* 50% focus on improving the automation and scalability of the system
There is a current focus within the team to have a continuously stable system, and eventually, a fully automated environment so the system can run by itself with minimal intervention. The system is built on IEP, Hadoop distribution and underlying hardware.
Aside from previous experience as a Site Reliability Engineer (SRE), you may have had job titles such as:
Hadoop system engineer / Linux system administrator / Hadoop admin
Experience & Skills
* Cloudera Hadoop
* Troubleshoot incidents and problems on the Hadoop ecosystem and infrastructure
* Daily maintenance of the Hadoop system
* Optimisation of the reliability and performance of a big data platform
* Disaster recovery setup
* Linux administration and troubleshooting
* Red Hat system administration
* Define KPIs to monitor complex systems and make improvements
Please apply to the job advert for immediate consideration.