The Data Engineer is a critical role responsible for designing, building, and maintaining timely and robust pipelines and data assets that enable data-driven insights and applications.
Working in close collaboration with other data teams, the Data Engineer seamlessly integrates new data assets with existing data products.
The Data Engineer maximizes productivity by developing flexible and reusable data pipeline code through dynamic configuration and templating, enabling rapid adaptation to new data sources and avoidance of development overhead.
The Data Engineer maintains a high level of professional integrity by producing clean, modular, and well-documented code that promotes collaboration, reduces operational overhead, and accelerates future development efforts.
This role requires expertise in Google Cloud Platform (GCP) and its data services to effectively manage and process large-scale datasets.
Key Responsibilities:
- Data Ingestion and Processing: Design and implement robust data pipelines to collect, clean, transform, and store large volumes of data using GCP services like BigQuery and Cloud Storage, as well as DataBricks
- Data Warehousing and Analytics: Build and optimize data warehouses on Google BigQuery for efficient data analysis and reporting.
- Data Modeling: Design and implement scalable data models to support strategic data products, business intelligence, and machine learning applications.
- ETL/ELT Development: Develop and maintain Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) processes using GCP tools.
- Performance Optimization: Continuously monitor and optimize data pipelines and queries for performance and cost-effectiveness.
- Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver data solutions.
Required Skills:
- Google Cloud Platform (GCP): Strong proficiency in GCP data services, including BigQuery, Cloud Storage, and DataBricks. Experience with Data Flow is an asset.
- Data Engineering Tools: Hands-on experience with Cloud Composer, Apache Beam, Apache Airflow, or similar data pipeline orchestration tools. Ability to author dynamic DAGs in Airflow is a key skill for this role.
- Programming Skills: Strong proficiency in Python for data processing and pipeline development is essential for this role.
- SQL: Expert level SQL skills for data analysis, complex query development and optimization.
- Data Modeling: Expertise in data modeling techniques and schema design.
- Problem Solving: Ability to analyze and solve complex data engineering challenges.
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- Minimum 3-5+ years of experience in a hands-on data engineering role, with a focus on pipeline development and data modeling on Google Cloud Platform.
- Experience with large-scale data processing and distributed systems.
- Google Cloud Professional Data Engineer Certification or equivalent is a strong asset for this role.
- Experience with machine learning and data science workflows.
- Knowledge of data visualization and reporting tools.
- Strong communication and collaboration skills.
Salary Range: $69,000 - $127,000