7+ Years Relevant Experience
More than 3 years in data integration, pipeline development, and data warehousing, with a strong focus on AWS Databricks.
Job Responsibilities:
- Administer, manage, and optimize the Databricks environment to ensure efficient data processing and pipeline development
- Perform advanced troubleshooting, query optimization, and performance tuning in a Databricks environment
- Collaborate with development teams to guide, optimize, and refine data solutions within the Databricks ecosystem
- Ensure high performance in data handling and processing, including the optimization of Databricks jobs and clusters
- Engage with and support business teams to deliver data and analytics projects effectively
- Manage source control systems and utilize Jenkins for continuous integration
- Actively participate in the entire software development lifecycle, focusing on data integrity and efficiency within Databricks
Technical Skills:
- Proficiency in Databricks platform, management, and optimization
- Strong experience in AWS Cloud, particularly in data engineering and administration, with expertise in Apache Spark, S3, Athena, Glue, Kafka, Lambda, Redshift, and RDS
- Proven experience in data engineering performance tuning and analytical understanding in business and program contexts
- Solid experience in Python development, specifically in PySpark within the AWS Cloud environment, including experience with Terraform
- Knowledge of databases (Oracle, SQL Server, PostgreSQL, Redshift, MySQL, or similar) and advanced database querying
- Experience with source control systems (Git, Bitbucket) and Jenkins for build and continuous integration
- Understanding of continuous deployment (CI/CD) processes
- Experience with Airflow and additional Apache Spark knowledge is advantageous
- Exposure to ETL tools, including Informatica