VMware Enterprise PKS cluster creation fails

Recently while working on my PKS lab, I faced this issue where the PKS cluster creation was failing with error. The error message did not help us to identify the cause. I am writing here about how I fixed the issue.

PKS cluster creation fails with error “1 of 5 post-start scripts failed. Failed Jobs: kubelet”

Symptoms

  • Enterprise PKS cluster creation fails with the error “1 of 5 post-start scripts failed. Failed Jobs: kubelet”.
  • Master and Worker nodes creation completes and they are in a running state.

Bosh task has the below error message

Task 2060 | 07:01:49 | Preparing deployment: Preparing deployment
Task 2060 | 07:01:53 | Warning: DNS address not available for the link provider instance: pivotal-container-service/ad445ac4-6a37-47e2-a4dd-61e085804f27
Task 2060 | 07:01:53 | Warning: DNS address not available for the link provider instance: pivotal-container-service/ad445ac4-6a37-47e2-a4dd-61e085804f27
Task 2060 | 07:01:54 | Warning: DNS address not available for the link provider instance: pivotal-container-service/ad445ac4-6a37-47e2-a4dd-61e085804f27
Task 2060 | 07:02:07 | Preparing deployment: Preparing deployment (00:00:18)
Task 2060 | 07:02:07 | Preparing deployment: Rendering templates (00:00:09)
Task 2060 | 07:02:16 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 2060 | 07:02:16 | Creating missing vms: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0)
Task 2060 | 07:02:16 | Creating missing vms: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0)
Task 2060 | 07:04:45 | Creating missing vms: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0) (00:02:29)
Task 2060 | 07:04:58 | Creating missing vms: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0) (00:02:42)
Task 2060 | 07:04:58 | Updating instance master: master/25edd43e-c566-41c8-930e-b8f0e010bf3e (0) (canary) (00:06:14)
Task 2060 | 07:11:12 | Updating instance worker: worker/4e598359-a5bc-4544-bfd9-ff6071658522 (0) (canary) (00:35:32)
                     L Error: Action Failed get_task: Task eb75b17c-0027-42bc-44ab-358733b8d976 result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns, telemetry-agent-image, wavefront-proxy-images, sink-resources-images.
Task 2060 | 07:46:44 | Error: Action Failed get_task: Task eb75b17c-0027-42bc-44ab-358733b8d976 result: 1 of 5 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns, telemetry-agent-image, wavefront-proxy-images, sink-resources-images.

Task 2060 Started  Tue Jul  2 07:01:49 UTC 2019
Task 2060 Finished Tue Jul  2 07:46:44 UTC 2019
Task 2060 Duration 00:44:55
Task 2060 error

Capturing task '2060' output:
Expected task '2060' to succeed but state is 'error'

Exit code 1

kubelet.stderr.log @ /var/vcap/sys/log/kubelet/

Failed to patch IP as MAC address 02:42:29:34:95:46 does not belong to a VMware platform

Cause

The issue can occur due to the ephemeral disk being full before the addons errand is run.

Resolution

Use the following steps to workaround the issue:

  1. Login to Ops Manager GUI
  2. Navigate to the appropriate plan on the PKS tile and increase the ephemeral disk size by changing the Errand vm_type for the workers.
  3. Apply the changes.
  4. Recreate the cluster.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: