GCP Data Engineer-Remote
- Country: United States
- Region: Oregon
Role: GCP Data Engineer
Location: Remote
Job Description:
The Role:
Looking for an experienced Data Engineer who can help move on-prem Hadoop
workloads to Google
Cloud (GCP).
Responsibilities:
? Export data from the Hadoop ecosystem to ORC or Parquet file
? Build scripts to move data from on-prem to GCP
? Build Python/PySpark pipelines
? Transform the data as per the outlined data model
? Proactively improve pipeline performance and efficiency
?Must Have? Experience:
? 4+ years of Data Engineering work experience
? 2+ years of building Python/PySpark pipelines
? 2+ years working with Hadoop/Hive
? 4+ years of experiencing with SQL
? Experience with Data Warehousing & Data Lake
? Understanding of Data Modeling
? Google experience ? Cloud Storage, Cloud Composer, Dataproc & BigQuery
? Understanding of data files format like ORC, Parquet, Avro
?Nice to Have? Experience:
? Understanding of GCP services
? Experience using Cloud Warehouses like BigQuery (preferred), Amazon
Redshift, Snowflake
? etc.
? Working knowledge of Distributed file systems like GCS, S3, HDFS etc.
? Understanding of Airflow / Cloud Composer
? CI/CD and DevOps experience
? ETL tools e.g., Informatica (IICS) Ab Initio, Infoworks, Pentaho, SSIS
Reference : GCP Data Engineer-Remote jobs