使用mac pro 使用ansible自动监控服务器进程状态

本文目的,因为笔者有需要观察云开发环境相关服务进程需求,因为该开发环境并没有部署相关运行监控程序,因此笔者想通过一个脚本,在使用环境前做一个基本检查

如果服务不满足使用条件,对其进行一定调整,使用该脚本只需要花费不到1s时间,既可以达到使用目标。当然也可以通过ssh命令到具体服务器一个个观察服务进程,但需要耗费大量重复劳动时间。

场景描述

mac pro osx 系统想远程登录控制controller服务器(ubuntu16.04),并且执行相关命令

准备工作

本地controller域名解析配置

cat /private/etc/hosts

1
2
3
4
192.168.4.101 controller
192.168.4.102 compute
192.168.4.104 compute02
192.168.4.103 network

controller服务器支持免密登录

步骤1:在controller服务器中如果不存在authorized_keys将其创建出,创建命令如下

touch /root/.ssh/authorized_keys

步骤2:找出mac pro osx系统公钥

1
2
➜  ~ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDH7uxw50HQG0SwvLCcHAgcCarLw5DP4gDqDDki/+E85STu5Di++u4F8SeHLphkMiuvqsDWZzzOwx7+H32JXDu+aev/A2a8vQ9TRnH257+n4SOjWQD07QsyDQ+U0A4I3oofXY2kul3KBeQ9f8z/2lW7yAN1AEeJ/SW+TFeNqvLlkHfGNOUsw6NmfH5uujxbhxIREB0T7kH9q+gjLVcyMgRYdCKk8fvdzWZ99w/+xuUhCkhs1kLdqgRWuqQ6iI9ZPmcZU7pJD3DDQPqIUUxzgGFRkb3SJ7ewczdKm0XV3BupmwRlEXvuS2o26zoVui7X1ndqahLdjQSH6ZGggn/w6KJX qinlong@QinlongdeMacBook-Pro.local

步骤3:将mac pro公钥放入controller服务器的root/.ssh/authorized_keys

1
2
root@controller:~# cat /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDH7uxw50HQG0SwvLCcHAgcCarLw5DP4gDqDDki/+E85STu5Di++u4F8SeHLphkMiuvqsDWZzzOwx7+H32JXDu+aev/A2a8vQ9TRnH257+n4SOjWQD07QsyDQ+U0A4I3oofXY2kul3KBeQ9f8z/2lW7yAN1AEeJ/SW+TFeNqvLlkHfGNOUsw6NmfH5uujxbhxIREB0T7kH9q+gjLVcyMgRYdCKk8fvdzWZ99w/+xuUhCkhs1kLdqgRWuqQ6iI9ZPmcZU7pJD3DDQPqIUUxzgGFRkb3SJ7ewczdKm0XV3BupmwRlEXvuS2o26zoVui7X1ndqahLdjQSH6ZGggn/w6KJX qinlong@QinlongdeMacBook-Pro.local

osx安装ansible

安装:

1
brew install ansible

安装验证:

1
2
3
4
5
6
7
➜  ~ ansible --version
ansible 2.3.0.0
config file =
configured module search path = Default w/o overrides
python version = 2.7.13 (default, Dec 18 2016, 07:03:39) [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]
➜ ~
➜ ~

配置想要访问的hosts ansible:

1
2
3
➜  ~ cat /usr/local/etc/ansible/hosts
[controller]
controller

ensible以root方式登录controller执行命令ls

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
➜  ~ ansible controller  -u root  -m command -a "ls"
[WARNING]: Found both group and host with same name: controller

[WARNING]: Found both group and host with same name: network

controller | SUCCESS | rc=0 >>
1.sh
a
admin-openrc
arxan_0916
arxan_0916.tar.gz
arxan-manager_1.0.0-7_all.deb
a.tar.gz
br-sw-set.sh
b.tar.gz
centec_driver.py
cirros-0.3.4-x86_64-disk.img
demo
demo-openrc
etcd_2.2.2_amd64.deb
id_rsa.pub
neutron-l2-arxan-agent_2.1.1-14_all.deb
neutron-local-controller_2.1.1-14_all.deb
p.tar.gz
python-arxan_1.0.0-7_all.deb
python-dragonflow_2.1.1-14_all.deb
python-etcd_0.4.5-1_all.deb
python-ovsdbapp_0.4.0-0ubuntu2_all.deb
q.tar.gz
xcmdb.py

在云主机执行shell脚本程序

云主机中的脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
root@controller:~# cat 1.sh
#!/bin/bash
echo $HOSTNAME
echo "-----------------------------------------"
arr_string=("rabbitmq-server" "etcd" )

for var in ${arr_string[@]}
do
echo $var
wc_rab=`systemctl status $var |grep active|grep running|wc -l`

if [ $wc_rab -ne 1 ]
then
echo " $var is down ,Now restart it"
systemctl restart $var
else
echo "ok"
fi
done

该脚本主要观察进程状态,如果进程非开启状态并将其重启,它观察的进程是rabbitmq-server, etcd

mac pro osx执行云主机的脚本

1
ansible check  -u root  -m command -a "bash ./1.sh"