Wednesday, 20 May 2020

Airflow SparkSubmitOperator example

Creating a connection




Note:
  • config values passed from SparkSubmitOperator takes precedence over the connection parameters. 
  • Using the spark-home in extra properties will work for all versions in case of multiple spark installations
  • default deployment mode is client. Adding 'deploy-mode':'cluster' in extra properties will change it to cluster mode

The Mindset Behind Reliable Data Systems I’ve been in data engineering long enough to see the stack change many times over. Tools come and g...