Dragonflow安全组实现三(详细)

流量从其它云主机发起,然后该云主机被动回复的流量场景下,此情况的安全组工作情况

入方向安全组

入方向安全组是指控制数据流量进入云主机的安全组,下文是针对于这种安全组的详细分析。分析方式有多种,下文是按照流量时刻进行分析的,流量时刻包含未放通入方向安全组、放通安全组第一个数据包、放通安全组建立会话后;根据这些流量时刻可以观察安全组起作用的整个过程,方便理解。

未放通入方向安全组

1
2
cookie=0x0, duration=2362.677s, table=105, n_packets=1563, n_bytes=153174, idle_age=0, priority=100,ip,reg7=0xa3 actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=4987.907s, table=110, n_packets=662, n_bytes=71525, idle_age=334, priority=1 actions=drop

数据根据流量流入主机的标记0xa3判定数据包需要进入安全组流程

将流量通过ct注入到table110处理,安全组建立的会话被网段信息分割(不同网段建立CT不会冲突,因为不同域)

因为没有方通安全组智能命中默认drop策略,该数据包被丢弃

放通安全组第一个数据包

1
2
3
4
cookie=0x0, duration=2362.677s, table=105, n_packets=1563, n_bytes=153174, idle_age=0, priority=100,ip,reg7=0xa3 actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=2093.711s, table=110, n_packets=4, n_bytes=392, idle_age=51, priority=11,conj_id=9,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=2204.825s, table=110, n_packets=0, n_bytes=0, idle_age=2204, priority=11,ct_state=+new-est-rel-inv+trk,reg7=0xa3 actions=conjunction(9,1/2)
cookie=0xb00000000, duration=1291.328s, table=110, n_packets=0, n_bytes=0, idle_age=1291, priority=11,icmp actions=conjunction(9,2/2)

此时安全组被放通了

数据根据流量流入主机的标记0xa3判定数据包需要进入安全组流程

在安全组判断流程table110中,有一种特殊的判定条件就是conj_id=9,这种条件的意思是匹配条件仍在该table中,查找流量会继续查找匹配conjunction(9,1/2) conjunction(9,2/2),如果能够同时命中这两个条件,该会话会commit记录到conntrack中,将该流量交给table115发出处理

conjunction(9,1/2) +new-est-rel-inv+trk 数据包会话未建立完成(一正一反数据包建立称为est)、未关联会话、无异常状态(ovs有问题或者数据包来的状态检测不通过)、数据包是进入ct状态后复制的数据包不是从其它地方误闯进来的

conjunction(9,2/2) 安全组协议条件判断

conjunction(9,1/2) 以会话为角度的状态条件判断、conjunction(9,2/2)是协议白名单条件判断

conj_id=9 的意思是必须要同时满足conjunction(9,1/2)、conjunction(9,2/2) 才算满足判断条件才能执行后续的actions

经过过这个流程后,会话状态建立了,但是处于初始状态+new ; 如果条件不满足不会执行commit动作,也就不会建立conntrack表项

stack@p-controller:~/dragonflow$ sudo conntrack -E -e ALL|grep 2.2.2.6

[NEW] icmp 1 30 src=2.2.2.10 dst=2.2.2.6 type=8 code=0 id=43521 [UNREPLIED] src=2.2.2.6 dst=2.2.2.10 type=0 code=0 id=43521 zone=11

那么该安全组产生的会话什么时候更新状态 ? 在此就先简单讲解吧,作为小标题,因为它并不符合入方向的流量时刻,会话的更新需要该云主机收到流量后,根据协议回复该流量,然后再次进入CT流程会触发如下更新:

[UPDATE] icmp 1 30 src=2.2.2.10 dst=2.2.2.6 type=8 code=0 id=43521 src=2.2.2.6 dst=2.2.2.10 type=0 code=0 id=43521 zone=11

放通安全组建立会话

1
2
cookie=0x0, duration=2362.677s, table=105, n_packets=1563, n_bytes=153174, idle_age=0, priority=100,ip,reg7=0xa3 actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=4947.447s, table=110, n_packets=1470, n_bytes=144060, idle_age=0, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,115)

方通安全组后续流量

数据根据流量流入主机的标记0xa3判定数据包需要进入安全组流程

在安全组条件table110中,-new+est-rel-inv+trk 会话建立完成了,不用再查找条件了,可以根据已有的会话直接放入到云主机了

出方向安全组

流量到达云主机后,云主机回复此协议流量,会再次经过出方向安全组流程,下文详细分析。分析方式有多种,下文是按照流量时刻进行分析的;但是在被动回复的场景下,流量时刻分为第一个回复的数据包、和后续回复的流量;其实这两种流量场景从宏观流表上不存在差异性,但其底层具体的动作是有差异的,第一个回复的数据包时刻,会触发conntrack会话从new变更为est状态; 和后续回复的流量会一直是est状态,不存在状态变化过程。

第一个回复的数据包时刻

1
2
cookie=0x0, duration=9890.264s, table=10, n_packets=8389, n_bytes=822122, idle_age=0, priority=100,ip,reg6=0xa3 actions=ct(table=15,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=12447.715s, table=15, n_packets=8658, n_bytes=848484, idle_age=1, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,20)

第一个数据包会根据标记0xa3,查找ct更新ct状态为est,并复制产生一个数据包,将该数据包交给table出安全组规则进行判别

table15 出方向安全组规则查找地,这类规则可分为两种: 一种是状态判断 一种是安全组条件的判断 ,但是在被动回复的场景下我们只关注状态判断就好了;安全组条件判断一般发生在主动出发场景。

状态判断是 -new+est-rel-inv+trk : 只要是est状态数据包、数据包是进入ct状态后复制的数据包不是从其它地方误闯进来的 就算状态检测通过,直接退出出安全组判断,将该流量交给后续流程(table20)处理

后续回复的流量

cookie=0x0, duration=9890.264s, table=10, n_packets=8389, n_bytes=822122, idle_age=0, priority=100,ip,reg6=0xa3 actions=ct(table=15,zone=OXM_OF_METADATA[0..15])

cookie=0x0, duration=12447.715s, table=15, n_packets=8658, n_bytes=848484, idle_age=1, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,20)

因为这个时候会话的状态已经维持在est状态,所以直接可以过table15的状态检查,直接退出安全组,将该流量交给table20处理

流量当前云主机发起,然后该云主机会收到回复的流量,此场景下情况的安全组工作情况

这种流量场景会先经过云主机出方向安全组,后经过云主机入方向安全组,和被动回复场景正好相反

出方向安全组

默认情况下出方向的安全组都是放通的,所以在此将流量时刻分析直接简化为: 放通第一个数据包,放通安全组建立会话

放通第一个数据包

1
2
3
4
cookie=0x0, duration=11539.315s, table=10, n_packets=8580, n_bytes=840840, idle_age=5, priority=100,ip,reg6=0xa3 actions=ct(table=15,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=11264.259s, table=15, n_packets=258, n_bytes=25284, idle_age=5, priority=11,conj_id=9,ip actions=ct(commit,table=20,zone=NXM_NX_CT_ZONE[])
cookie=0xa00000000, duration=11366.388s, table=15, n_packets=0, n_bytes=0, idle_age=11366, priority=11,ip actions=conjunction(9,2/2)
cookie=0x0, duration=11406.839s, table=15, n_packets=0, n_bytes=0, idle_age=11406, priority=11,ct_state=+new-est-rel-inv+trk,reg6=0xa3 actions=conjunction(9,1/2)

conjunction(9,2/2) 安全组协议条件判断 放通所有的ip数据包 ,加入的安全组策略全部体现在此处

conjunction(9,1/2) 以会话为角度的判断,主要是建立状态后续就不需要多次查询安全组协议条件了

放通安全组建立会话

1
2
cookie=0x0, duration=11539.315s, table=10, n_packets=8580, n_bytes=840840, idle_age=5, priority=100,ip,reg6=0xa3 actions=ct(table=15,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=22154.590s, table=15, n_packets=8869, n_bytes=869162, idle_age=1, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,20)

云主机发出的协议流量被回复后,会话状态转为est,后续流量直接查看会话状态

入方向安全组

1
2
cookie=0x0, duration=31.769s, table=105, n_packets=13, n_bytes=1274, idle_age=1, priority=100,ip,reg7=0xa3 actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=55.210s, table=110, n_packets=34, n_bytes=3332, idle_age=0, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,115)

cookie值: 在table15和table110中,有一种协议放通的策略,该策略和conj_id相同,应该是额外的功能吧,你可以使用该功能很方便的过滤出你想看的安全组流表

默认行为: 出方向全部放通,入方向不放通,如果此云主机作为服务端使用,需要ssh、http等服务,需要额外开启

安全组功能是针对于ip的层的判别策略

安全组的流表优先级不同?

优先级肯定不同,安全组是按照建立的顺序从低到高优先级逐渐延展,这样查找时候会按照优先级高的安全组开始查找,运气差的话要循环查找一遍

不同安全组是如何区分的?

由上文我们知道,conjunction(9,2/1) 是以会话为角度的判断,其实还有一个条件reg6=0x9c,根据这个条件呢;不同云主机接口可以查找不同的安全组策略;这个条件也是区分安全组的关键

安全组的etcd长什么样?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[('{"name": "default", "unique_key": 4, "rules": [
{"direction": "ingress", "protocol": 6, "ethertype": "IPv4", "port_range_max": 65535, "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "port_range_min": 1, "remote_ip_prefix": "0.0.0.0/0", "id": "19bf513a-89a0-4f2e-91a8-53e7e11c0a7c"},

{"direction": "egress", "ethertype": "IPv4", "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "id": "6384d6b0-684d-4102-a2f0-44ae2abff45a"},
{"direction": "egress", "ethertype": "IPv6", "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "id": "63902a8d-375b-485e-ba44-d5679d16128a"},

{"remote_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "direction": "ingress", "ethertype": "IPv4", "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "id": "79c9bca5-ec8f-4e8c-bb6b-059cac18490c"},

{"direction": "ingress", "protocol": 1, "ethertype": "IPv4", "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "remote_ip_prefix": "0.0.0.0/0", "id": "d3116bc4-614f-4e3e-889b-562261714e24"},

{"remote_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "direction": "ingress", "ethertype": "IPv6", "security_group_id": "253df9b8-3e41-4153-ae43-6c4990dcd74f", "topic": "5e347d7b2f7541198031e12916acaa0b", "version": 0, "id": "f95d6c8e-b4f9-4de0-b16f-87a9bdb4513b"}],

"topic": "5e347d7b2f7541198031e12916acaa0b", "version": 6, "id": "253df9b8-3e41-4153-ae43-6c4990dcd74f"}’, {u'mod_revision': u'44933', u'create_revision': u'621', u'version': u'3', u'key': '/secgroup/253df9b8-3e41-4153-ae43-6c4990dcd74f’}

)]

远程安全组怎么实现的?

研究远程安全组的场景是,位于同一个vpc的两台云主机,拥有不同的安全组策略(安全组default和安全组xxxx),这两个安全组策略均放通了对外主动访问的流量;只有安全组default 的云主机放通了default安全组可以访问的远程安全组访问策略

那么其是如何实现的?

我们研究入方向安全组的变化吧

在远程安全组下发之前:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cookie=0x0, duration=422.928s, table=105, n_packets=375, n_bytes=36750, idle_age=13, priority=100,ip,reg7=0xaa actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=422.928s, table=105, n_packets=0, n_bytes=0, idle_age=422, priority=100,ipv6,reg7=0xaa actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=422.828s, table=105, n_packets=262, n_bytes=25676, idle_age=0, priority=100,ip,reg7=0xad actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=422.828s, table=105, n_packets=0, n_bytes=0, idle_age=422, priority=100,ipv6,reg7=0xad actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=425.459s, table=110, n_packets=495, n_bytes=48510, idle_age=13, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,115)
cookie=0x0, duration=425.459s, table=110, n_packets=0, n_bytes=0, idle_age=425, priority=65534,ct_state=-new+rel-inv+trk actions=resubmit(,115)
cookie=0x0, duration=425.459s, table=110, n_packets=0, n_bytes=0, idle_age=425, priority=65534,ct_state=+new+rel-inv+trk,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=425.459s, table=110, n_packets=0, n_bytes=0, idle_age=425, priority=65534,ct_state=+new+rel-inv+trk,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=425.459s, table=110, n_packets=0, n_bytes=0, idle_age=425, priority=65534,ct_state=+inv+trk actions=drop
cookie=0x0, duration=422.928s, table=110, n_packets=0, n_bytes=0, idle_age=422, priority=6,conj_id=4,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=422.928s, table=110, n_packets=0, n_bytes=0, idle_age=422, priority=6,conj_id=4,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=422.828s, table=110, n_packets=3, n_bytes=294, idle_age=78, priority=13,conj_id=11,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=422.828s, table=110, n_packets=0, n_bytes=0, idle_age=422, priority=13,conj_id=11,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=422.928s, table=110, n_packets=0, n_bytes=0, idle_age=422, priority=6,ct_state=+new-est-rel-inv+trk,reg7=0xaa actions=conjunction(4,1/2)
cookie=0x0, duration=422.828s, table=110, n_packets=0, n_bytes=0, idle_age=422, priority=13,ct_state=+new-est-rel-inv+trk,reg7=0xad actions=conjunction(11,1/2)
cookie=0x0, duration=425.459s, table=110, n_packets=139, n_bytes=13622, idle_age=0, priority=1 actions=drop

远程安全组下发之后

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cookie=0x0, duration=361.237s, table=105, n_packets=326, n_bytes=31948, idle_age=1, priority=100,ip,reg7=0xaa actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=361.237s, table=105, n_packets=0, n_bytes=0, idle_age=361, priority=100,ipv6,reg7=0xaa actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=361.137s, table=105, n_packets=200, n_bytes=19600, idle_age=1, priority=100,ip,reg7=0xad actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=361.137s, table=105, n_packets=0, n_bytes=0, idle_age=361, priority=100,ipv6,reg7=0xad actions=ct(table=110,zone=OXM_OF_METADATA[0..15])
cookie=0x0, duration=363.768s, table=110, n_packets=397, n_bytes=38906, idle_age=1, priority=65534,ct_state=-new+est-rel-inv+trk actions=resubmit(,115)
cookie=0x0, duration=363.768s, table=110, n_packets=0, n_bytes=0, idle_age=363, priority=65534,ct_state=-new+rel-inv+trk actions=resubmit(,115)
cookie=0x0, duration=363.768s, table=110, n_packets=0, n_bytes=0, idle_age=363, priority=65534,ct_state=+new+rel-inv+trk,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=363.768s, table=110, n_packets=0, n_bytes=0, idle_age=363, priority=65534,ct_state=+new+rel-inv+trk,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=363.768s, table=110, n_packets=0, n_bytes=0, idle_age=363, priority=65534,ct_state=+inv+trk actions=drop
cookie=0x0, duration=361.237s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=6,conj_id=4,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=361.237s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=6,conj_id=4,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=361.137s, table=110, n_packets=3, n_bytes=294, idle_age=17, priority=13,conj_id=11,ip actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x0, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,conj_id=11,ipv6 actions=ct(commit,table=115,zone=NXM_NX_CT_ZONE[])
cookie=0x800000000, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,icmp,nw_src=192.168.58.107 actions=conjunction(11,2/2)
cookie=0x800000000, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,icmp,nw_src=192.168.58.109 actions=conjunction(11,2/2)
cookie=0x0, duration=361.237s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=6,ct_state=+new-est-rel-inv+trk,reg7=0xaa actions=conjunction(4,1/2)
cookie=0x0, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,ct_state=+new-est-rel-inv+trk,reg7=0xad actions=conjunction(11,1/2)
cookie=0x0, duration=363.768s, table=110, n_packets=126, n_bytes=12348, idle_age=22, priority=1 actions=drop

通过对比发现:

cookie=0x800000000, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,icmp,nw_src=192.168.58.107 actions=conjunction(11,2/2)

cookie=0x800000000, duration=361.137s, table=110, n_packets=0, n_bytes=0, idle_age=361, priority=13,icmp,nw_src=192.168.58.109 actions=conjunction(11,2/2)

如果访问安全组xxxx的云主机,源地址是192.168.58.107 192.168.58.109可以访问;其实这两个云主机的安全组都属于安全组default,也就相当于放通了安全组default的访问策略;总结一句话就是远程安全组通关源地址条件实现。通过研究发现dragonflow

还解决了源ip地址很多的问题,它会通过在一个网段的地址通过mask 方式减少策略条目。

减少条目方法

eg: 192.168.58.100 192.168.58.111 换成条件为192.168.58.110/31

cookie=0xa00000000, duration=149.523s, table=110, n_packets=0, n_bytes=0, idle_age=149, priority=13,icmp,nw_src=192.168.58.110/31 actions=conjunction(11,2/2)

远程安全组是同一个网段才下发?

远程安全组和是不是同一个网段没有关联,其不关注使用该安全组的云主机是否处于同一个网段,其只根据源ip进行策略添加判断;因此有时候会出现虽然放通了该安全组,但是从来不会被命中情况;其实想想也基本能理解了,作为安全组策略,它们无法智能或者武断的分辨出不同云主机之间连通关系,所以只能一到切实现该功能。

安全组的代码都在什么位置?

sg.py

遇到的问题

1
2
3
4
5
6
7
8
9
10
11
2018-04-19 06:52:09.803 DEBUG dragonflow.db.api_nb [-] Could not get object 487e697b-d4e7-4471-b958-22b1367c1bd8 from table lport from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:296
2018-04-19 06:52:09.803 DEBUG dragonflow.db.api_nb [-] ('Traceback (most recent call last):n File "/opt/stack/dragonflow/dragonflow/db/api_nb.py", line 290, in getn _get_topic(lean_obj),n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 117, in get_keyn return self._get_key(self._make_key(table, key), key)n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 123, in _get_keyn raise df_exceptions.DBKeyNotFound(key=key)nDBKeyNotFound: DB Key not found, key=487e697b-d4e7-4471-b958-22b1367c1bd8n',) from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:297
2018-04-19 06:52:09.803 WARNING dragonflow.controller.topology [-] No logical port found for ovs port: OvsPort object

2018-04-19 06:52:09.807 DEBUG dragonflow.db.api_nb [-] Could not get object bf8c2c33-b61f-4bf1-937d-ce48ec63439c from table lport from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:296
2018-04-19 06:52:09.807 DEBUG dragonflow.db.api_nb [-] ('Traceback (most recent call last):n File "/opt/stack/dragonflow/dragonflow/db/api_nb.py", line 290, in getn _get_topic(lean_obj),n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 117, in get_keyn return self._get_key(self._make_key(table, key), key)n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 123, in _get_keyn raise df_exceptions.DBKeyNotFound(key=key)nDBKeyNotFound: DB Key not found, key=bf8c2c33-b61f-4bf1-937d-ce48ec63439cn',) from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:297
2018-04-19 06:52:09.807 WARNING dragonflow.controller.topology [-] No logical port found for ovs port: OvsPort object

2018-04-19 06:52:09.817 DEBUG dragonflow.db.api_nb [-] Could not get object 71c111ac-3eac-4dba-91c6-6195b50091a4 from table lport from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:296
2018-04-19 06:52:09.817 DEBUG dragonflow.db.api_nb [-] ('Traceback (most recent call last):n File "/opt/stack/dragonflow/dragonflow/db/api_nb.py", line 290, in getn _get_topic(lean_obj),n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 117, in get_keyn return self._get_key(self._make_key(table, key), key)n File "/opt/stack/dragonflow/dragonflow/db/drivers/etcd_db_driver.py", line 123, in _get_keyn raise df_exceptions.DBKeyNotFound(key=key)nDBKeyNotFound: DB Key not found, key=71c111ac-3eac-4dba-91c6-6195b50091a4n',) from (pid=28510) get /opt/stack/dragonflow/dragonflow/db/api_nb.py:297
2018-04-19 06:52:09.817 WARNING dragonflow.controller.topology [-] No logical port found for ovs port: OvsPort object
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Pdb) bt
/usr/local/bin/df-local-controller(10)<module>()
-> sys.exit(main())
/opt/stack/dragonflow/dragonflow/cmd/eventlet/df_local_controller.py(17)main()
-> df_local_controller.main()
/opt/stack/dragonflow/dragonflow/controller/df_local_controller.py(352)main()
-> controller.run()
/opt/stack/dragonflow/dragonflow/controller/df_local_controller.py(121)run()
-> self.nb_api.process_changes()
/opt/stack/dragonflow/dragonflow/db/api_nb.py(207)process_changes()
-> self._notification_cb(next_update)
/opt/stack/dragonflow/dragonflow/controller/df_local_controller.py(292)_handle_update()
-> self._handle_db_change(update)
/opt/stack/dragonflow/dragonflow/controller/df_local_controller.py(306)_handle_db_change()
-> self.sync()
/opt/stack/dragonflow/dragonflow/controller/df_local_controller.py(137)sync()
-> self.topology.check_topology_info()
/opt/stack/dragonflow/dragonflow/controller/topology.py(294)check_topology_info()
-> lport = self._get_lport(ovs_port)
/opt/stack/dragonflow/dragonflow/controller/topology.py(278)_get_lport()
-> lport = self.nb_api.get(ovs_port.lport)
> /opt/stack/dragonflow/dragonflow/db/api_nb.py(296)get()
-> LOG.debug(

(Pdb) ovs_port

OvsPort(attached_mac=EUI(‘fa:16:3e:f3:c7:0a’), id=u’db4b9d2d-1bb2-4648-9308-0e5aa32719bf’, lport=LogicalPortProxy(id=487e697b-d4e7-4471-b958-22b1367c1bd8), name=u’tap487e697b-d4’, ofport=-1, type=u’compute’)

这个问题已经定位了,是因为在check_topology_info的时候,会根据ovs口,来查找etcd,如果存在下面情况(在ovsdb存在,但实际口已经没有了),会导致etcd中没有数据,因此会出现上述的告警;这个告警属于正常的

;根本原因是ovs口的No such device问题。