6 to 8 Years Relevant Experience
We are seeking a highly skilled and experienced Senior AWS Data Engineer to join our data engineering team. The ideal candidate will have a strong background in building modern data platforms on AWS, with hands-on experience in data lakes, data warehouses, orchestration tools, data quality frameworks, and CI/CD pipelines. This role will play a key part in designing scalable, secure, and efficient data solutions that support advanced analytics and data governance.
Key Responsibilities
- Design, develop, and maintain scalable data lake architectures on AWS using S3, Glue Data Catalog, Glue Jobs, and Apache Hudi for efficient storage and processing.
- Build and manage data ingestion, transformation, and ETL pipelines using AWS Glue and integrate with downstream analytics tools such as Amazon Athena and Redshift Spectrum.
- Architect and implement data warehouse solutions using Amazon Redshift Serverless, focusing on performance, cost-efficiency, and data accessibility.
- Implement data access control and governance using Lake Formation, IAM, Secrets Manager, and Secret Server, ensuring data security and compliance.
- Automate and orchestrate workflows using Apache Airflow (MWAA), ensuring robust scheduling, monitoring, and logging of data pipeline executions.
- Apply data quality frameworks using AWS Glue Data Quality to ensure completeness, accuracy, and integrity of data across systems.
- Set up and maintain CI/CD pipelines using GitHub and GitHub Actions, enabling continuous integration and deployment of data engineering solutions.
- Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and deliver reliable, scalable solutions.
- Monitor and optimize performance across data pipelines and queries, implementing best practices in partitioning, schema design, and lifecycle policies.
Required Skills & Experience
- 6–8 years of professional experience in data engineering, with at least 3+ years on AWS.
- Deep expertise in AWS Glue, S3, Glue Data Catalog, Glue Data Quality, and Athena.
- Hands-on experience with Amazon Redshift (Serverless & Spectrum) and data modeling for analytics.
- Strong understanding of data governance and access control using Lake Formation, IAM, and Secrets Manager.
- Proficient in building and managing data workflows with Apache Airflow (MWAA).
- Solid experience with Python and SQL for data transformation and scripting.
- Experience using version control systems (GitHub) and automating deployments via GitHub Actions.
- Familiarity with Hudi for managing large-scale transactional data in data lakes.
- Excellent problem-solving skills with a focus on performance tuning and optimization.
- Strong communication and collaboration skills, with the ability to work across teams and manage stakeholders effectively.
Preferred Qualifications
- AWS certification (e.g., AWS Certified Data Analytics, Solutions Architect) is a plus.
- Experience with data cataloging and lineage tools.
- Exposure to DevOps, monitoring tools (e.g., CloudWatch), and cost optimization techniques in AWS.
- Experience in real-time or near real-time data streaming is advantageous.