Job Description
What We Do
What sets our group apart is end-to-end ownership of our models and services, which are distributed, high-throughput and low latency systems that are collectively called billions of times a day. In order to deliver at such scale, we are building platforms that enable our application-focused ML engineering teams to go from an idea to a model to a scalable service with minimal overhead. We also offer higher-level abstractions and UIs to enable domain experts to easily build, deploy and maintain production ML models for their applications in a self-service manner, with little engineering intervention.
What We Need From You
While working on the team as a Senior Data Engineer, you will have the opportunity to enhance our data pipelines and platforms that enable machine learning tasks. You will work with application, product, platform, and infrastructure teams to extract data from Bloomberg’s vast data ecosystem and prepare it for analyzing, annotating, training, evaluating, serving, and other tasks. Typical activities include:
- Work with application and product teams to define data needs and SLAs
- Design, build, and deploy resilient, monitorable pipelines to transport and store data in various storage solutions
- Identify the appropriate storage solutions and for a given use case, understanding the tradeoffs in performance, cost, and complexity
- Model datasets with idiomatic schema design, tailored for a dataset’s intended use
- Design and implement efficient data retrieval and processing solutions suited for our ML environments
- Collaborate with platform and infrastructure teams to influence the direction and support of various managed solutions
Colleagues who excel in this role often exhibit these qualities:
- Experience with message queueing systems like Kafka
- Proven track record building ETL pipelines leveraging technologies such as Kafka Streams, Kafka Connect, Flink, or Spark Streaming
- Prior experience building or extending data lakes, using various storage, cataloging, and retrieval technologies such as S3, HDFS, Hadoop, HBase, Hive, Trino, Presto, Cassandra, Spark
- Instruments their pipelines, and is able to quickly pinpoint problems from dashboards and logs
- Proficiency in a programming language such as Scala, Java, or Python, and a willingness to learn new languages
- Familiarity with an workflow orchestration technology, such as Argo, Airflow, or Oozie
- Excellent communication skills and a willingness to collaborate with various stakeholders
This position requires at least one of the following:
- A bachelor’s degree in computer science or a related field, and/or
- An equivalent combination of education, and/or
- Specialized training, and/or
- Related professional experience.
Job ID: 123050