Kerberos

The Hadoop security system relies on Kerberos to enable single-sign-on across multiple machines and services. Kerberos is not specific to Hadoop, so a lot of IT users may be familiar with it. The central component of Kerberos is the KDC, which is provided by 'FreeIPA' in this platform.

The actual integration of Kerberos with Hadoop is better explained on these pages:

https://blog.cloudera.com/hadoop-delegation-tokens-explained/

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/w...

Kerberos tickets

To interact with a Kerberos-secured system, a user needs a Kerberos 'ticket'. Most regular platform users obtain a ticket by running 'kinit' on the command line, and providing the user password. You can verify if a ticket is available by running 'klist'.

For systems where it is not appropriate to store a password, a keytab can be generated. This is basically a small file that replaces the password. Using the keytab, a ticket can be created:

kinit -kt my.keytab my_user@VGT.VITO.BE
Once a keytab is generated for a user, the password is no longer valid!

Long running jobs

After a certain amount of time, the Kerberos ticket will expire if it is not renewed. Long running applications can use the keytab to automatically renew the kerberos ticket.

In Spark jobs, the spark-submit command has a --keytab and --principal argument that can be used to enable automatic renewal for long running jobs.

On a Linux machine, the 'k5start' utility can be used to automatically renew a kerberos ticket based on a keytab.

User impersonation

Some applications, like web services, want to interact with the cluster on behalf of another user. Upon request, we can create a user with this capability, so that your application can run jobs for other users. This also allows to measure cluster usage per user, and to isolate user code from the rest of your application.

The details of this feature are described here:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common...