VCF Automation ran out of disk space? Fleet Manager's got you.

My life is full of equally true statements:

  • I have VCF Operations monitoring everything.
  • I have VCF Operations sending me Slack notifications.
  • Things go wrong in my lab all the time.
  • I ignore the alarms and notifications.
  • I wonder why things suddenly break just before a demo.

This post isn't here to help you learn from my lessons to pay attention to your alarms, but it's here on the assumption that you are like me, and you need to know how to get out of the pickle your life choices have gotten you in! It all started the day before I needed to do an automation demo:

Oh no

I resorted to rebooting, and then rebooting again, to no avail. I ended up using a different approach for my demo and a week or two later, I decided to try and fix it.

What was actually broken

The alert in question

It turns out that there was a space alarm that I ignored. I SSH'd into the VCF host and sure enough there was a full volume.

Oh no, a volume is full.

Not knowing what volume this was for, I did what any enterprising admin would do — I poked my nose where it doesn't belong.

A database mount point

Upon discovering that this was the mount point for the PostgreSQL instance that serves VCFA, it was clear why things were unhealthy.

Doing some poking around, I didn't see any obvious temporary files or other easy space reclamation targets. Rather than poking blindly, I managed to consult the oracle (ChatGPT), who recommended that I try to expand the PV. Its first recommendation was to check and see if the PV would autogrow if we expanded the disk:

Auto Expansion is on!

This was good news! It meant that as soon as I expanded the VMDK for the volume, it would expand and the database could recover.

Fleet Manager to the rescue

Fleet Manager has a resize option
I knew from my experience with Aria Suite Lifecycle Manager (the predecessor to Fleet Manager) that there were often day 2 capabilities to resize appliances. As it turns out, Fleet Manager carries on the tradition — inside the Lifecycle section of VCF Operations there is a resize option: Fleet Management -> Lifecycle -> Components -> (select VCF Automation) -> Storage Resize.
Storage resize UI

It shows you the details of all the volume groups and allows you to update their capacity, and once you tell it to go — under the hood it drives the PVC expansion through the supported channel, including whatever reconciliation it needs to do in the operator stack on the way.

Nature heals

Resize in progress
Once I kicked the resize off in Fleet Management, giving it 60 GB, I got to watch it work: the VMDK grew, the appliance detected the additional space, and finally updated the PV to the total size!

Back in business

At this point, I thought everything was great. Until I discovered a second error that had been both hidden and triggered by the database failure. I'll save that for my next post, which is equal parts interesting and terrifying. Stay tuned.