When writing scripts for remote sensing, it is often advisable to take a few things into account from the beginning, to save work later on:

  1. Run/Test your code on a Linux distribution, it is not hard to write cross platform code in any modern language, but having to port your codebase after writing it is no fun.
  2. Prepare your code for distributed processing from day 1. Spark is perfectly suitable to write processing jobs that run locally, and is quite easy to learn. This again avoids costly refactoring work at a later point.
  3. Try running your code on the cluster as soon as possible. You will learn a number of very useful things by doing so, and will know what to avoid in the future.

Running a Spark job on the cluster can be done using the spark-submit command, specifying 'yarn-cluster' as master. More info on writing Spark jobs can be found here.