NSX-T Logical switches get stuck in “In Progress” state

Recently I faced an issue with Enterprise PKS cluster creation failing and it was due to NSX-T issue where the logical switches created go into “In Progress” state. On further investigation, we identified that it was due to a security hardening done on the ESXi hosts that caused the issue. I tried to reproduce the issue and here are the details.

Symptoms:

  • Logical switch created get stuck in “In Progress”
  • Issue is observed with NSX-T 2.4.0 and 2.4.1

Customer had recently enabled Lockdown mode on the ESXi hosts as a part of security hardening. Hence to reproduce the issue, I enabled lock down mode on the ESXi.

Before enabling the Lockdown mode, I checked the logical switch state on the NSX-T and confirmed it to be good.

Enabled Lockdown mode on the ESXi hosts that are part of the NSX-T transport nodes from the DCUI.

Now I will create some test logical switches from the NSX-T manager.

In the below screenshot, you can see the logical switches created are going into pending state and then into In Progress state. This will get stuck here and does not progress.

The /var/log/nsxaVim.log on the ESXi host will report permission denied fault messages as below.

[root@esxi-1:/var/log] less nsxaVim.log
 2019-08-21T02:06:50Z nsxaVim: [2100253]: ERROR status changed from [ready] to [error: waitForUpdate failed: {'msg': 'The session is not authenticated.', 'faultMessage': [], 'fault': 'NotAuthenticated'}, reconnecting…]^@
 2019-08-21T02:06:54Z nsxaVim: [2100253]: ERROR status changed from [error: waitForUpdate failed: {'msg': 'The session is not authenticated.', 'faultMessage': [], 'fault': 'NotAuthenticated'}, reconnecting…] to [error: failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'faultMessage': [], 'fault': 'NoPermission'}]^@
 2019-08-21T02:10:34Z nsxaVim: [2100253]: INFO data size= [161]. actual data length = [161]^@
 2019-08-21T02:10:34Z nsxaVim: [2100253]: INFO Result msg:[b"error: not ready. reason: error: failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'faultMessage': [], 'fault': 'NoPermission'}"]^@
 2019-08-21T02:10:34Z nsxaVim: [2100253]: INFO Reading data from socket.^@
 2019-08-21T02:19:43Z nsxaVim: [2100253]: INFO data size= [161]. actual data length = [161]^@
 2019-08-21T02:19:43Z nsxaVim: [2100253]: INFO Result msg:[b"error: not ready. reason: error: failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'faultMessage': [], 'fault': 'NoPermission'}"]^@
 2019-08-21T02:19:43Z nsxaVim: [2100253]: INFO Reading data from socket.^@
 2019-08-21T02:21:58Z nsxaVim: [3618751]: INFO resync.py started^@
 2019-08-21T02:21:58Z nsxaVim: [3618751]: INFO [resync] resync.py started^@
 2019-08-21T02:21:58Z nsxaVim: [3618751]: INFO setProcessName to [resync.py]^@
 2019-08-21T02:21:58Z nsxaVim: [3618751]: INFO [resync] setProcessName to [resync.py]^@
 2019-08-21T02:22:02Z nsxaVim: [3618751]: ERROR Failed to connect to hostd: [{'msg': 'Permission to perform this operation was denied.', 'fault': 'NoPermission', 'faultMessage': []}]^@
 2019-08-21T02:22:02Z nsxaVim: [3618751]: ERROR [resync] Failed to connect to hostd: [{'msg': 'Permission to perform this operation was denied.', 'fault': 'NoPermission', 'faultMessage': []}]^@
 2019-08-21T02:22:02Z nsxaVim: [3618751]: INFO resync.py replied error: [failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'fault': 'NoPermission', 'faultMessage': []}]^@
 2019-08-21T02:22:02Z nsxaVim: [3618751]: INFO [resync] resync.py replied error: [failed to connect to hostd: {'msg': 'Permission to perform this operation was denied.', 'fault': 'NoPermission', 'faultMessage': []}]^@

Now let’s disable the Lockdown mode on the ESXi hosts

As you can see below, the Test 4 and Test 5 logical switches have created successfully after disabling the Lockdown mode on ESXi hosts.

Summary :

To conclude, this is a known issue with NSX-T 2.4.0 and 2.4.1 and this should be resolved in future NSX-T release. To workaround the issue, we can disable the Lockdown mode on the ESXi hosts.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: