Wednesday, 20 May 2020

Airflow SparkSubmitOperator example

Creating a connection




Note:
  • config values passed from SparkSubmitOperator takes precedence over the connection parameters. 
  • Using the spark-home in extra properties will work for all versions in case of multiple spark installations
  • default deployment mode is client. Adding 'deploy-mode':'cluster' in extra properties will change it to cluster mode

No comments:

Post a Comment

The Mindset Behind Reliable Data Systems I’ve been in data engineering long enough to see the stack change many times over. Tools come and g...