This guide is intended for remote sensing developers who want to work on the MEP. It is intended to cover most things that are specific to the platform, but it should be noted that the platform aims to be very open with respect to how you work with it.
The platform is not tied to a specific language, in our examples, we use a lot of Python, because it is known to be used in the remote sensing community. Next to that, we also use a lot of Java and Scala, because a lot of the more advanced API's in the Hadoop ecosystem work with these languages. We have however also done extensive testing on Hadoop with Python (pyspark) and integration of C++ or Fortran code.
The only limitations to the software that can be used in VM or on the cluster are:
- It has to run on CentOS 7, a stable, enterprise grade, Linux distribution.
- Commercial software that requires a license is not provided, but can be used as long as you provide your own license, and there are no technical or legal issues that prevent you from using it on a VM or cluster environment.
When writing scripts for remote sensing, it is often advisable to take a few things into account from the beginning, to save work later on:
- Run/Test your code on a Linux distribution, it is not hard to write cross platform code in any modern language, but having to port your codebase after writing it is no fun.
- Prepare your code for distributed processing from day 1. Spark is perfectly suitable to write processing jobs that run locally, and is quite easy to learn. This again avoids costly refactoring work at a later point.
- Try running your code on the cluster as soon as possible. You will learn a number of very useful things by doing so, and will know what to avoid in the future.