Description
The engineer works mainly on the engineering of (batch or near-realtime) data processing pipelines that run on the data platform and process data in(to) the data lake and or warehouse (DWH). Data is first ingested to the platform on a landing zone (AWS S3) and from there on made available on the (cloud) data lake and data warehouse where it is used to build domain-oriented data products for AI/ML, BI and data integration use cases.
The engineer will be part of an agile engineering team with a focus on the delivery of business value, while following architectural guidelines and best practices.
Way of working keywords: Agile principles, continuous improvement, peer reviewing / pull requests, clean code, testing, … Good listening and communication skills (can talk both with business as with IT stakeholders).
Responsibilities
- Structuring and modelling data, e.g. Schema design, dimensional & Data Vault modelling, …
- Creating technical designs for data products that support AI, BI and/or data integration use cases.
- Building, maintaining and running E2E data pipelines according to defined architectural guidelines, development patterns, standards & guidelines.
- Defining development standards, guidelines & best practices.
Skills & Expertise
Hands-on, proven experience with (“the basics”):
- SQL
- Python & PySpark
- Snowflake
- Airflow
- JSON, Parquet
- Data modelling, dimensional modelling, Data Vault modelling
- Building data pipelines with Dbt
- Data processing in an AWS cloud environment: AWS services (S3, Glue, Athena, AppFlow, DMS, …) & IAM
- Agile delivery, CI/CD, Git
Fluent in English (verbal and in writing), Dutch or French optional.
Experience with, notions of or profound interest in (“nice to have”):
- Qlik
- Data ingestion (e.g. CDC, API, …)
- Apache Kafka
- Apache Iceberg
- Event-based data processing
- Data product architectures
- Data quality metrics
- Data catalogue, data lineage tooling (e.g. DataHub)