6 to 8 Years Relevant Experience
We are looking for a highly skilled Senior AWS Data Engineer with extensive hands-on experience in building scalable, high-performance batch and streaming data pipelines. The ideal candidate will have deep expertise in Databricks, Apache Spark, Python/PySpark, and various AWS data services, with a strong focus on delivering production-grade data products at scale.
Key Responsibilities:
- Design, build, and optimize data pipelines and transformation workflows for both batch and real-time streaming data.
- Develop scalable, reliable, and maintainable data solutions using Databricks, PySpark, and AWS services.
- Architect and implement complex data processing workflows using orchestrators such as Apache Airflow and AWS Step Functions.
- Work with Petabyte-scale data and ensure high-quality, observable data products aligned with modern data product principles.
- Integrate diverse data sources and design robust data architectures that support analytics, machine learning, and BI use cases.
- Collaborate with data scientists, analysts, and business stakeholders to translate requirements into data engineering deliverables.
- Ensure monitoring, logging, and alerting is in place using tools like CloudWatch and AWS Powertools for reliability and observability.
Required Skills & Experience:
- Strong expertise in AWS Cloud (PaaS) with hands-on experience in:
- EMR, EC2, Lambda, Step Functions, SQS, CloudWatch, Airflow, AWS Powertools
- Extensive experience in Databricks, Apache Spark, Python, and PySpark
- Proven ability to build and deploy data pipelines for both batch and streaming use cases
- Solid understanding of data orchestration frameworks (e.g., Apache Airflow)
- Familiarity with key big data platform components, including:
- Data catalogs
- Compute engines
- SQL engines
- Observability/monitoring layers
- Experience in large-scale data engineering (handling Petabyte-scale data)
- Strong understanding of data integration, ETL/ELT, and modern data architecture principles
Nice to Have:
- Experience with Scala
- Knowledge of Apache Kafka
- Familiarity with real-time event streaming architectures