Configuring
Hadoop with Kerberos by command
line with out Ambari or CM .
All
config Files :
Setup
:
MasterNode
(nameNode) : master.company.com
Node1
: datanode.company.com
Hadoop
Client : laptop.company.com
Main
task
1.
Setup kerbores
2.
Configure & Start Namenode
3.
Setup Kerbores client
4.
Configure & Start Datanode
5.
Verify the setup
6.
MapReduce Job Configration
7.
Running mapreduce Job
1.
Setuping Kerberos Server
1.1
. Install kerbores via yum
yum
install krb5-server krb5-libs krb5-auth-dialog krb5-pkinit-openssl
krb5-workstation
1.2
Edit /etc/krb5.conf
vi
/etc/krb5.conf
etc/krb5.conf
Here
: STERIN.COM is the RELAM
Kerberos
server is running on same server where name node is installed
[logging]
default
= FILE:/var/log/krb5libs.log
kdc =
FILE:/var/log/krb5kdc.log
admin_server
= FILE:/var/log/kadmind.log
[libdefaults]
default_realm
= STERIN.COM
dns_lookup_realm
= false
dns_lookup_kdc
= false
ticket_lifetime
= 24h
renew_lifetime
= 7d
forwardable
= true
default_tgs_enctypes
= aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5
default_tkt_enctypes
= aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5
permitted_enctypes
= aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5
[realms]
STERIN.COM
= {
kdc =
master.company.com
admin_server
= master.company.com
}
1.3
Edit /var/kerberos/krb5kdc/kadm5.acl
vi
/
var/kerberos/krb5kdc/kadm5.acl
/var/kerberos/krb5kdc/kadm5.acl
/krb5.conf
*/admin@STERIN.COM *
1.4
Edit /var/kerberos/krb5kdc/kdc.conf
vi
/
var/kerberos/krb5kdc/kdc.conf
erberos/krb5kdc/kadm5.acl
/krb5.conf
[kdcdefaults]
kdc_ports
= 88
kdc_tcp_ports
= 88
[realms]
STERIN.COM
= {
#master_key_type
= aes256-cts
acl_file
= /var/kerberos/krb5kdc/kadm5.acl
dict_file
= /usr/share/dict/words
admin_keytab
= /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes
= aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal
arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal
des-cbc-crc:normal
}
Encryption
details and RELAM should match
1.5
cd
/var/kerberos/krb5kdc/
krb5.conf
1.6
: Create Kdb database
kdb5_util
create
rb5.conf
< This
process will take time >
1.7
Starting Kerberos process
-
service krb5kdc start
.conf
service
kadmin start
1.8
start kadmin.local ( try to do as sudo user )
kadmin.local
tc/krb5.conf
1.9
in kadmin.local
addprinc
cm/admin
addprinc
-randkey HTTP/master.company.con@STERIN.CON
addprinc
-randkey
hduser/master.company.com@STERIN.COM
xst
-norandkey -k hduser.keytab hduser/master.company.com@STERIN.COM HTTP/master.company.com@STERIN.COM
cm/admin is
kerbores admin user
Creating
principal for http for master server
Creating
principal for hduser (who starts hadoop process)
Creating
keytab for hduser to start the namenode process
If
you are using ambari or Cloudera Manger , this step is sufficient .
Rest all works will be done in manager
1.10
. Change the permission of keytab and place it any place ( Here in
home dir)
sudo
chown hduser:hduser /home/hduser/hduser.keytab
1.11
Verify the principals in keytab
klist
-e -k -t /home/hduser/hduser.keytab
Keytab
name: FILE:/home/hduser/hduser.keytab
KVNO
Timestamp Principal
----
-----------------
--------------------------------------------------------
1
07/15/16 18:46:09 hduser/master.company.com@STERIN.COM
(aes256-cts-hmac-sha1-96)
1
07/15/16 18:46:09 hduser/master.company.com@STERIN.COM
(aes128-cts-hmac-sha1-96)
1
07/15/16 18:46:10 hduser/master.company.com@STERIN.COM
(des3-cbc-sha1)
1
07/15/16 18:46:10 hduser/master.company.com@STERIN.COM (arcfour-hmac)
1
07/15/16 18:46:10 hduser/master.company.com@STERIN.COM
(des-hmac-sha1)
1
07/15/16 18:46:10 hduser/master.company.com@STERIN.COM (des-cbc-md5)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM
(aes256-cts-hmac-sha1-96)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM
(aes128-cts-hmac-sha1-96)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM (des3-cbc-sha1)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM (arcfour-hmac)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM (des-hmac-sha1)
1
07/15/16 18:46:10 HTTP/master.company.com@STERIN.COM (des-cbc-md5)
2.
Configure & Start Namenode
Make
sure the keytab has proper permission (600) and
hduser
added in keytab .
2.1
edit core-site.xml
vi
core-site.xml
etc/krb5.conf
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
<!-- A value of "simple" would disable security. -->
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
2.2
Edit hdfs-site.xml
vi
hdfs-site.xml
etc/krb5.conf
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/home/hduser/hdfs.keytab</value>
<!-- path to the HDFS keytab -->
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hduser/_HOST@STERIN.COM</value>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>HTTP/_HOST@STERIN.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@STERIN.COM</value>
</property>
2.3
Update local-policy.jar and US_export_policy.jar
2.4
start the namenode only
hadoop-daemon.sh
start namenode
onf
2.5
in the logs verify that
2016-07-16
06:50:56,947 INFO org.apache.hadoop.security.UserGroupInformation:
Login successful for user hduser/master.company.com@STERIN.COM using
keytab file /home/hduser/hduser.keytab
2016-07-16
06:50:58,075 INFO
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler:
Login using keytab /home/hduser/hduser.keytab, for principal
HTTP/master.company.com@STERIN.COM
2016-07-16
06:50:58,157 INFO
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler:
Login using keytab /home/hduser/hduser.keytab, for principal
HTTP/master.company.com@STERIN.COM
if
keytab verification is not done, the namenode wont start up
Namenode
configration is easy compare to datanode .
2.6
Quick verification
Make
sure the no, valid ticket
klist
hadoop
fs -ls /
etc/krb5.conf
[hduser@master
tmp]$ klist
klist:
No credentials cache found (ticket cache FILE:/tmp/krb5cc_500)
[hduser@master
tmp]$ hadoop fs -ls /
16/07/16
12:39:34 WARN
16/07/16
12:39:35 WARN ipc.Client: Exception encountered while connecting to
the server : javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism
level: Failed to find any Kerberos tgt)]
ls:
Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]; Host Details : local host is:
"master.company.com/172.16.102.137"; destination host is:
"master.company.com":8020;
kinit
cm/admin
hadoop
fs -ls /
etc/krb5.conf
kinit
cm/admin
Password
for cm/admin@STERIN.COM:
[hduser@master
tmp]$ hadoop fs -ls /
16/07/16
12:40:45 WARN
Found
2 items
drwxr-xr-x
- hduser supergroup 0 2016-07-16 08:18 /cm
drwxr-xr-x
- hduser supergroup 0 2016-07-15 16:02 /new
2.7
Even the namenode is up , still we need to create jsvc for setup
Build
steps
wget
https://archive.apache.org/dist/commons/daemon/source/commons-daemon-1.0.10-src.tar.gz
yum
install autoconf make
tar
-xvzf commons-daemon-1.0.10-src.tar.gz
cd
commons-daemon-1.0.10-src/src/native/unix/
sh
-x support/buildconf.sh
./configure
--with -java=/opt/jdk
make
./jsvc
-help
last
step should show the output like this
-jvm
<JVM name>
use a
specific Java Virtual Machine. Available JVMs:
'server'
2.8
mkdir
-p /usr/lib/bigtop-utils/
c
p
jsvc
/
usr/lib/bigtop-utils
tc/krb5.conf
3.
Setup Kerbores client
3.1
install via yum in datanode.company.com
yum
install krb5-workstation
etc/krb5.conf
3.2
Copy the /etc/krb5.conf from master to datanode
scp
master:/etc/krb5.conf /etc/krb5.conf
e
/etc/krb5.conf
tc/krb5.conf
3.3
Verify the ticket
kinit
cm/admin
etc/krb5.conf
This
should show valid ticket in datanode
3.4
Create jsvc like step 2.7
3.5
Update local-policy.jar and US_export_policy.jar like 2.5
3.6
create keytab for datanode
run this
in master
addprinc
-randkey HTTP/datanode.company.com@STERIN.COM
addprinc
-randkey hduser/datanode.company.com@STERIN.COM
xst
-norandkey -k hduser.keytab hduser/datanode.company.com
HTTP/datanode.company.com
get
the keytab and copy in home dir
4.
Configure &
Start
Datanode
In
Master
4.1
Edit the hadoop-env.sh
vi
hadoop-env.sh
Add
export
HADOOP_SECURE_DN_USER=hduser
export
HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop/$HADOOP_SECURE_DN_USER
export
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop/$HADOOP_SECURE_DN_USER
export
JSVC_HOME=/usr/lib/bigtop-utils/
4.2
update
vi
hdfs-site
.xml
etc/krb5.conf
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<!--
DataNode security config -->
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1006</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/home/hduser/hduser.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hduser/_HOST@STERIN.COM</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:1022</value>
</property>
<property>
<name>dfs.datanode.https.address</name>
<value>0.0.0.0:50475</value>
</property>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:8010</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:1019</value>
</property>
4.3
Create new secure log folders and restart the namenode
-
sudo mkdir -p /var/run/hadoop
-
sudo mkdir -p /var/run/hadoop
-
sudo mkdir -p /var/run/hadoop
-
hadoop-daemon.sh stop namenode
-
hadoop-daemon.sh star
t
namenod
e
sudo
mkdir -p /var/log/hadoop
While
starting the server , NO need to have valid ticket .
4.4
Copy etc folder to Datanode.company.com
scp
-
r
/etc/hadoop datanode:`pwd`
etc/krb5.conf
4.5
Login to datanode as root
Secure
only starts as root unless SASL is
configured . Here it is not done . So Login as root .
4.6
create
similar folders like step 4.3
4.7
start datanode as root
hadoop-daemon.sh
star
t
data
nod
e
tc/krb5.conf
2016-07-16
07:54:13,716 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception in secureMain
java.lang.RuntimeException:
Cannot start secure DataNode without configuring either privileged
resources or SASL RPC data transfer protection and SSL for HTTP.
Using privileged resources in combination with SASL RPC data transfer
protection is not supported.
If
you are getting this error then check
1.
JSVC folder and access
2.
local_policy jar & US_Export.jar
3.
running user , it should be root
5.
Verify the setup
[hduser@master
hadoop]$ kdestroy
[hduser@master
hadoop]$ klist
klist:
No credentials cache found (ticket cache FILE:/tmp/krb5cc_500)
[hduser@master
hadoop]$ hadoop fs -ls /
16/07/16
13:15:54 WARN
16/07/16
13:15:55 WARN ipc.Client: Exception encountered while connecting to
the server : javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism
level: Failed to find any Kerberos tgt)]
ls:
Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)]; Host Details : local host is:
"master.company.com/172.16.102.137"; destination host is:
"master.company.com":8020;
[hduser@master
hadoop]$ kinit hduser/admin
Password
for hduser/admin@STERIN.COM:
[hduser@master
hadoop]$ hadoop fs -ls /new
16/07/16
13:16:50 WARN
Found
1 items
-rw-r--r--
1 hduser supergroup 716 2016-07-15 16:02
/new/hadoop-hduser-datanode-datanode.company.com.out
[hduser@master
hadoop]$ hadoop fs -cat
/new/hadoop-hduser-datanode-datanode.company.com.out
16/07/16
13:17:00 WARN
ulimit
-a for user hduser
core
file size (blocks, -c) 0
5.2
. Similary in the laptop ,laptop.company.com ,do the same step 3.1
to 3.3
5.2.1
Then access the namenode webport
it
will show this error to access
5.2.2
In
the address bar of Firefox, type
about:config
to
display the list of current configuration options.-
In the Filter field, type
negotiate
to restrict the list of options. -
Double-click the network.negotiate-auth.trusted-uris entry to display the Enter string value dialog box.
-
Enter the name of the domain against which you want to authenticate, for .company.com
5.2.3
Get
valid ticket , the access it
6.
MapReduce Job Configration
6.1
, we same keytab is used . But in real production server , different
keytab should be used for hdfs,yarn process
Edit
yarn.xml
vi
yarn-s
ite.xml
tc/krb5.conf
<!--
yarn process -->
<property>
<name>yarn.nodemanager.container-executor.class</name>
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>
<property>
<name>yarn.nodemanager.linux-container-executor.group</name>
<value>hduser</value>
</property>
<!--
resource manager secure configuration info -->
<property>
<name>yarn.resourcemanager.principal</name>
<value>hduser/_HOST@STERIN.COM</value>
</property>
<property>
<name>yarn.resourcemanager.keytab</name>
<value>/home/hduser/hduser.keytab</value>
</property>
<!--
NodeManager -->
<property>
<name>yarn.nodemanager.principal</name>
<value>hduser/_HOST@STERIN.COM</value>
</property>
<property>
<name>yarn.nodemanager.keytab</name>
<value>/home/hduser/hduser.keytab</value>
</property>
6.2
Edit mapred-site.xml
vi
mapred-site.xml
b5.conf
<property>
<name>mapreduce.jobhistory.keytab</name>
<value>/home/hduser/hduser.keytab</value>
</property>
<property>
<name>mapreduce.jobhistory.principal</name>
<value>hduser/_HOST@STERIN.COM</value>
</property>
6.3
Edit container-executor.cfg
vi
container-executor.cf
g
tc/krb5.conf
yarn.nodemanager.linux-container-executor.group=hduser
banned.users=bin
min.user.id=500
allowed.system.users=hduser
At
this point you can start both nodemanager and resourcemanager . But
mapreduce process might fail due to missing native lib files .
It
can be test by running hadoop fs -ls / command . If there is no
warning which means we can go head with starting the yarn .if not
need to build it
Wrong:
[hduser@master
sbin]$ hadoop fs -ls /
16/07/17
15:38:17 WARN util.NativeCodeLoader: Unable
to load native-hadoop library for your platform... using builtin-java
classes where applicable
Found
5 items
drwxr-xr-x
- hduser supergroup 0 2016-07-17 10:57 /cm
Correct
[hduser@master
sbin]$ hadoop fs -ls /
Found
5 items
drwxr-xr-x
- hduser supergroup 0 2016-07-17 10:57 /cm
No
warning here , then skip to step 6.6
From
apache site , you can get haoop native build on GLIBC_2.14 . But in
Red Hat / Oracle Linux latest via yum is GLIBC_2.12. So it requires
to build hadoop native files in GLIBC_2.12 .
6.4
To build
<This
will take minimum 30 mins>
6.5
Update the newly build lib files , and verify lib , by running hadoop
fs -ls
-
$HADOOP_HOME/bin
f
$HADOOP_HOME/lib/native
6.6
Provide proper permission to container-executor
-
chmod 050 container-executor
-
chmod u+s container-executor
-
chmod g+s container-executor
-
./container-executor
chown
root:hduser container-executor
Premission
should look like
Last
command should works fine with out any warning or error . Change the
permission of the folders and files if required
6.7
Copy the all updated files to datanode server
6.8
Start
namenode , resoure manager in master
-
yarn-daemon.sh start resourcemanager
-
/krb5.conf
hadoop-daemon.sh
star
t
data
nod
e
6.9
Start datanode , nodemanger as root on datanode
./hadoop-daemon.sh
start datanode
./yarn-daemon.sh
start nodemanager
b5.conf
6.10
Make sure that all servers are running
7.
Running
mapreduce Job
7.1
Make sure that valid ticket is there before running job
klist
Ticket
cache: FILE:/tmp/krb5cc_500
Default
principal: hduser/admin@STERIN.COM
Valid
starting Expires Service principal
07/17/16
10:31:59 07/18/16 10:31:59 krbtgt/STERIN.COM@STERIN.COM
renew
until 07/17/16 10:31:59
7.2
Run the job
hadoop
jar ./Batting.jar BattingExample /new/2.csv /cm/output
Correct
output :
16/07/17
15:45:58 INFO client.RMProxy: Connecting to ResourceManager at
master.company.com/172.16.102.137:8050
16/07/17
15:45:59 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 27
for hduser on 172.16.102.137:8020
16/07/17
15:45:59 INFO security.TokenCache: Got dt for
hdfs://172.16.102.137:8020; Kind: HDFS_DELEGATION_TOKEN, Service:
172.16.102.137:8020, Ident: (HDFS_DELEGATION_TOKEN token 27 for
hduser)
16/07/17
15:46:00 INFO input.FileInputFormat: Total input paths to process : 1
16/07/17
15:46:00 INFO mapreduce.JobSubmitter: number of splits:1
16/07/17
15:46:01 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1468784561982_0001
16/07/17
15:46:01 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN,
Service: 172.16.102.137:8020, Ident: (HDFS_DELEGATION_TOKEN token 27
for hduser)
16/07/17
15:46:03 INFO impl.YarnClientImpl: Submitted application
application_1468784561982_0001
16/07/17
15:46:03 INFO mapreduce.Job: The url to track the job:
http://master.company.com:8088/proxy/application_1468784561982_0001/
16/07/17
15:46:03 INFO mapreduce.Job: Running job: job_1468784561982_0001
16/07/17
15:46:19 INFO mapreduce.Job: Job job_1468784561982_0001 running in
uber mode : false
16/07/17
15:46:19 INFO mapreduce.Job: map 0% reduce 0%
16/07/17
15:46:26 INFO mapreduce.Job: map 100% reduce 0%
16/07/17
15:46:34 INFO mapreduce.Job: map
100%
reduce 100%
16/07/17
15:46:34 INFO mapreduce.Job: Job job_1468784561982_0001 completed
successfully
No comments:
Post a Comment