Sunday, June 25, 2017

Start order of software services in VMware vCenter Server Appliance 6.0 U2

vCenter Server Appliance 6.0 U2 services are started in the following order ...

  1. vmafdd (VMware Authentication Framework)
  2. vmware-rhttpproxy (VMware HTTP Reverse Proxy)
  3. vmdird (VMware Directory Service)
  4. vmcad (VMware Certificate Service)
  5. vmware-sts-idmd (VMware Identity Management Service)
  6. vmware-stsd (VMware Security Token Service)
  7. vmware-cm (VMware Component Manager)
  8. vmware-cis-license (VMware License Service)
  9. vmware-psc-client (VMware Platform Services Controller Client)
  10. vmware-sca (VMware Service Control Agent)
  11. applmgmt (VMware Appliance Management Service)
  12. vmware-netdumper (VMware vSphere ESXi Dump Collector)
  13. vmware-syslog (VMware Common Logging Service)
  14. vmware-syslog-health (VMware Syslog Health Service)
  15. vmware-vapi-endpoint (VMware vAPI Endpoint)
  16. vmware-vpostgres (VMware Postgres)
  17. vmware-invsvc (VMware Inventory Service)
  18. vmware-mbcs (VMware Message Bus Configuration Service)
  19. vmware-vpxd (VMware vCenter Server)
  20. vmware-eam (VMware ESX Agent Manager)
  21. vmware-rbd-watchdog (VMware vSphere Auto Deploy Waiter)
  22. vmware-sps (VMware vSphere Profile-Driven Storage Service)
  23. vmware-vdcs (VMware Content Library Service)
  24. vmware-vpx-workflow (VMware vCenter Workflow Manager)
  25. vmware-vsan-health (VMware VSAN Health Service)
  26. vmware-vsm (VMware vService Manager)
  27. vsphere-client ()
  28. vmware-perfcharts (VMware Performance Charts)
  29. vmware-vws (VMware System and Hardware Health Manager) 

Thursday, June 22, 2017

CLI for VMware Virtual Distributed Switch

A few weeks ago I have been asked by one of my customers if VMware Virtual Distributed Switch (aka VDS) supports Cisco like command line interface. The key idea behind was to integrate vSphere switch with open-source tool Network Tracking Database (NetDB) which they use for tracking MAC addresses within their network. I have been told by customer that NetDB can telnet/ssh to Cisco switches and do screen scraping so would not it be cool to have the most popular switch CLI commands for VDS? These commands are

  • show mac-address-table
  • show interface status
The official answer is NO, but wait a minute. Almost anything is possible with VMware API. So my solution is leveraging VMware's vSphere Perl SDK to pull information out of Distributed Virtual Switches. I have prepared PERL script which currently supports two commands mentioned above. It goes through all VMware Distributed Switches on single vCenter. Script along with shell wrappers are available on GITHUB here

See screenshots below to get an idea what script does.

The output of the command --username readonly --password readonly --cmd show-port-status
looks as depicted in screenshot below.

and output of the command --username readonly --password readonly --cmd show-mac-address-table
Now, I'm working on telnet daemon which will simulate remotely accessible switch CLI. It will just call PERL scripts above but that is what network administrators want for their day to day operations.

Tuesday, June 13, 2017

Storage DRS integration with storage profiles

This is a very quick blog post. In vSphere 6.0, VMware has introduced Storage DRS integration with storage profiles (aka SPBM - Storage Policy Based Management).

Here is the link to official documentation.

Generally, it is about SDRS advanced option EnforceStorageProfiles. Advanced option EnforceStorageProfiles takes one of these integer values, 0,1 or 2 where the default value is 0.

  • When option is set to 0, it indicates that there is NO storage profile or policy enforcement on the SDRS cluster.
  • When option is set to 1, it indicates that there is storage profile or policy SOFT enforcement on the SDRS cluster. It is analogous with DRS soft rules. SDRS will comply with storage profile/policy in the optimum level. However if required, SDRS will violate the storage profile compliant.
  • When option is set to 2, it indicates that there is storage profile or policy HARD enforcement on the SDRS cluster. It is analogous with DRS hard rules. In any case, SDRS will not violate the storage profile or policy compliant.

Please note that at the time of writing this post, SDRS Storage Profiles Enforcement works only during initial placement and NOT for already provisioned VMs during load balancing. Therefore, when iVM Storage Policy is changed for particular VM, SDRS will not make it automatically compliant nor throw any recommendation.

Another limitation is that vCloud Director (vCD) backed by SDRS cluster does NOT support  Soft (1) or Hard (2) storage profile enforcements. vCloud Director (vCD) will work well with Default (0) option

Relevant references to other resources:

Wednesday, June 07, 2017

VMware Photon OS with PowerCLI

Photon OS is linux distribution maintained by VMware with multiple benefits for virtualized form factor, therefore any virtual appliance should be based on Photon OS.

I have recently tried to play with Photon OS and here are some my notes.

IP Settings

Network configuration files are in directory
IP settings are leased from DHCP by default. It is configured in file  /etc/systemd/network/

File contains following config
To use static IP settings it is good to move DHCP config file down in alphabetical order and create config file with static IP settings.
mv 99-dhcp-en.networkcp
file  /etc/systemd/network/ should looks similar to

Network can be restarted by command
systemctl restart systemd-networkd
and network settings can be checked by command


Package management

Photon OS uses TDNF  (Tiny DNF) package manager. It is based on Fedora's DNF.  This is a development by VMware that comes with compatible repository and package management capabilities. Note that not every dnf command is available but the basic ones are there.

  • tdnf install libxml2
  • tdnf install openssl-devel
  • tdnf install binutils
  • tdnf install pkg-config
  • tdnf perl-Crypt-SSLeay
  • tdnf install cpan
  • tdnf libuuid-devel
  • tdnf install make
Update of the whole operating system can be done by command
tdnf update

Log Management

You will not find typical linux /var/log/messages
Instead, journald is used and you have to use command journalctl

Equivalent to tail -f /var/log/messages is
journalctl -f 

System services

System services are control by command systemctl

To check service status use
systemctl status docker
To start service use
systemctl start docker
To enable service after system start use
systemctl enable docker

Docker and containerized PowerCLI

One of key use cases for Photon OS is to be a docker host, therefore, docker is preinstalled in Photon OS. You can see further Docker information by command
docker info
If Docker is running on your system, you can very quickly spin up docker container. Let's use example of containerized PowerCLI. To download container image from DockerHup use command
docker pull vmware/powerclicore
to check all downloaded images use the command
docker images -a   
 root@photon-machine [ ~ ]# docker images -a    
 REPOSITORY      TAG         IMAGE ID      CREATED       SIZE  
 vmware/powerclicore  latest       a8e3349371c5    6 weeks ago     610 MB  
 root@photon-machine [ ~ ]#   

Now you can run powercli container interactively (-i) and in allocated pseudo-TTY (-t). Option -rm stands for "Automatically remove the container when it exits".
docker run --rm -it vmware/powerclicore 
 root@photon-machine [ ~ ]# docker run --rm -it --name powercli vmware/powerclicore         
 Copyright (C) Microsoft Corporation. All rights reserved.erclicore --name powercl  
      Welcome to VMware vSphere PowerCLI!  
 Log in to a vCenter Server or ESX host:       Connect-VIServer  
 To find out what commands are available, type:    Get-VICommand  
 Once you've connected, display all virtual machines: Get-VM  
     Copyright (C) VMware, Inc. All rights reserved.  
 Loading personal and system profiles took 3083ms.  
 PS /powershell#   

Now you can use PowerCLI running on linux container. The very first PowerCLI command is usually Connect-VIServer, but you can get following warning and error messages

 PS /powershell> Connect-VIServer                                                                         
 cmdlet Connect-VIServer at command pipeline position 1  
 Supply values for the following parameters:  
 Specify Credential  
 Please specify server credential  
 User: cdave  
 Password for user cdave: *********  
 WARNING: Invalid server certificate. Use Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you'd like to connect once or to add  
  a permanent exception for this server.  
 Connect-VIServer : 06/07/2017 19:25:44     Connect-VIServer          An error occurred while sending the request.       
 At line:1 char:1  
 + Connect-VIServer  
 + ~~~~~~~~~~~~~~~~  
   + CategoryInfo     : NotSpecified: (:) [Connect-VIServer], ViError  
   + FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_Exception,VMware.VimAutomation.ViCore.Cmdlets.Commands.ConnectVIServer  
 PS /powershell>   

To solve the problem you have to adjust PowerCLI configuration by
Set-PowerCLIConfiguration -InvalidCertificateAction ignore -confirm:$false -scope All
The command above changes PowerCLI configuration for all users.

To use other docker commands you can open another ssh session, and for example list running containers

 root@photon-machine [ ~ ]# docker ps -a     
 CONTAINER ID    IMAGE         COMMAND       CREATED       STATUS       PORTS        NAMES  
 6ecccf77891e    vmware/powerclicore  "powershell"    7 minutes ago    Up 7 minutes              powercli  
 root@photon-machine [ ~ ]#   

... or issue any other docker command.

That's cool, isn't it?

Tuesday, June 06, 2017

VMware VVOLs scalability

I'm personally a big fan of VMware Virtual Volumes concept. If you are not familiar with VVOLs check this blog post with the recording of VMworld session and read VMware KB Understanding Virtual Volumes (VVols) in VMware vSphere 6.0

We all know that the devil is always in details. The same is true with VVOLs. VMware prepared the conceptual framework but implementation always depends on storage vendors thus it vary around storage products.

Recently, I have had VVOLs discussion with one of my customers and he was claiming that their particular storage vendor supports a very small number of VVOLs. That discussion inspired me to do some research.

Please, note that numbers bellow are valid at the moment of writing this article. You should always check current status with your particular storage vendor.

Vendor / Storage ArrayMaximum VVOLs / Snapshots or Clones
DELL / Compellent SC 80002,000 / TBD
EMC / Unity 3009,000 / TBD
EMC / Unity 4009,000 / TBD
EMC / Unity 50013,500 / TBD
EMC / Unity 60030,000 / TBD
EMC / VMAX 364,000 / TBD
Hitachi / VSP G2002,000 / 100,000
Hitachi / VSP G4004,000 / 100,000
Hitachi / VSP G6004,000 / 100,000
Hitachi / VSP G80016,000 / 100,000
Hitachi / VSP G100064,000 / 1,000,000

Numbers above are very important because single VM have minimally 3 VVOLs (home, data, swap) and usually even more (snapshot) or more data disks. If you will assume 10 VVOls for single VM you will end up with just 200 VMs on Dell Compellent or Hitachi VSP G200. On the other hand, EMC Unity 600 would give you up to 3,000 VMs which is not bad and enterprise storage systems (EMC VMAX and Hitachi G1000) would give you up to 6,400 VMs which is IMHO very good scalability.

So as always, it really depends on what storage system do you have or planning to buy.

If you know numbers for other storage systems, please share it in comments below this blog post.

Wednesday, May 31, 2017

vROps & vSphere Tags, Custom Attributes

As many of my customers started to recently customize their vROps and together we are working on various use-cases I find it useful to summarize my notes here and possibly help others during their investigation and customization.

This time I will focus on custom descriptions for the objects in vROps. When you are providing an access to vRealize Operations to your company management, many times they are not familiar with IT naming convention and it is very hard for them to analyze why some object is marked as red and if it is important at all.

We've been thinking this through with David for a bit and there are two very easy alternatives to tackle this use case. vSphere Tags and Custom Attributes in vSphere. In the following lines I will explain step-by-step procedure to use these and tackle possible problems you might hit on the way.

1) Create preferred description in vSphere. For Custom Attributes can be used local (object based) or global definitions - both works fine. At the end of this article you can see how the vSphere Tags and Custom Attributes looks like and what is better to cover your specific use-case.

2) Afterwards switch to vROps and check, if the metric is being propagated to the object. Bear in mind that it might take couple of minutes for metric to be collected.

3) After the metric being available you can start working with it for example in your Views. For this post I've created couple of Tags on my vCenter appliance called APPL_vCenter; therefore selecting Virtual machine as a subject of view creation is logical choice.

4) Now the tricky part I had personally a problem (I would like to thank our great vROps consultant Oleg Ulyanov for helping me out) was that the metric was simply not available in a view. The thing here is that if you have big environment with hundreds of VMs, vROps will randomly chose few (I think the number was 5) and based on those 5 show a merge of available metrics. If you would be lucky as me and APPL_vCenter would not be among them, Tags will not be available. To force vROps to use specific machine, you can use the square next to the Metrics/Properties button.

In newly opened Window you can filter out a VM you want.

5) Afterwards just chose the VM you've created Tag on (in my case again APPL_vCenter) and metric should be now visible.

6) In the final screenshot I would like to compare both solutions - vSphere Tags and Custom Attributes (for some reason in vROps marked as Custom Tag).

vSphere Tags are consoliadted into one Field. I've created Tag "Purpose" and Tag "OS" for the vCenter Appliance. On the other hand Custom Attributes are always separated so doing the same would create two Custom Tags with just a value in it. In case you would need for example filtering or any other logic behind the Tags, Custom Attributes seems to be a better choice.

Sunday, May 14, 2017

VM Snapshots Deep-Dive

A while ago I received interesting question regarding snapshot consolidation from one of my customers and as I was not 100% sure about the particular details (file naming, consolidation, pointers, etc.) I went to do some testing in a lab. The scenario was pretty simple; create a virtual machine with non-linear snapshot tree and start removing the snapshots.

Lessons learned: When doing such tests, it is always good to add some files or something a bit more sizable into the each snapshot. My initial work started with just creating the folders named snap[1-7] which during consolidation was really not helpful identifying where the data from snapshot actually went.

The non-linear snapshot tree I mentioned earlier looks like this:

First confusion which was sort of most important and took me a while to turn my brain around was the file naming convention. More or less file SnapTest-flat.vmdk is a main data file of the Server, in this case C: drive of the Microsoft Windows server with size around 26GB. This file is not visible in Web Client as only the descriptor <VM name>.vmdk (in our case SnapTest.vmdk) is directly visible. When you will create a first snapshot this is a file which is being used by it as you can see in the following image:

Command grep -E 'displayName|fileName' SnapTest.vmsd is listing all lines containing displayName and/or fileName from the file SnapTest.vmsd. Going through the vSphere documentation you will find:
A .vmsd file that contains the virtual machine's snapshot information and is the primary source of information for the Snapshot Manager. This file contains line entries, which define the relationships between snapshots and between child disks for each snapshot.

With that being said above output of the command is listing our predefined snapshot names (I used the number of the snapshot and the size of the file I've added) and its respected file. So first created snapshot is named Snap1+342MB and using file SnapTest.vmdk.

Using the 2nd useful command during this test grep parentFileNameHint SnapTest-00000[0-9].vmdk is going through all the snapshot files and listing parentFileNameHint. As you probably guessed it, it is a snapshot it is depending on (parent file).

List of tests I performed:
1) Remove Snapshot 5 (Snap5+366MB)
2) Remove Snapshot 4 (Snap4+356MB)
3) Remove Snapshot 3 (Snap3+337MB)
4) Remove Snapshot 2 (Snap2+348MB)
5) Move Here You Are
6) Remove Snapshot 6 (Snap6+168MB)
7) Remove Snapshot 7 (Snap7+348MB)

Now In more details per every case.

1) Remove Snapshot 5 (Snap5+366MB)
Result can be seen in this visualisation. After removing the Snapshot 5 within the Web Client, Snapshot 6 and Snapshot 5 vmdk files were consolidated, size updated accordingly same as the snapshot's vmdk file.

As for the fist example I will add also the command exports here for illustration. Following scenarios should be understandable even without such.

2) Remove Snapshot 4 (Snap4+356MB)
I did this test just to proof myself the proper functionality, so it is very similar to the previous part.

3) Remove Snapshot 3 (Snap3+337MB)
Now with removing Snapshot 3, things are becoming a bit more challenging. On snapshot 3 are currently depending 3 more snapshots (Snap6, Snap7 and You Are Here). As the consolidation in this case would need to be performed with each of them it would be very "costly" operation. The result was that the Snapshot was removed from GUI but the files remained on the disk and all the dependencies were preserved.

4) Remove Snapshot 2 (Snap2+348MB)
Although it might seem complicated on the "paper" the remove process for Snapshot 2 was very similar with every other snapshot removal only in this case Snapshot 2 was consolidated with temporary file preserved from the previous step.

5) Move "Here You Are"
Moving active state over virtual machine named as "Here You Are" is also quite simple operation. I was performing this test more or less to validate, how many snapshots can be dependent on the parent snapshot until the snapshots are consolidated. To spoil the surprise it has to be just one file as in this case on the temporary file are depending only Snapshot 6 and Snapshot 7.

6) Remove Snapshot 6 (Snap6+168MB)
As mentioned in the previous step if there is only one child snapshot to the parent snapshot and the parent snapshot is being removed, data are being consolidated. Otherwise there would be preserved temporary file for child snapshots to work with.

7) Remove Snapshot 7 (Snap7+348MB)The final step was to remove the last Snapshot 7 and be left with just one snapshot Snap1+342MB and the main file. If this file would be removed all the data would be consolidated into the main VMDK and there would be no delta file for "You Are Here" state and therefore no point to get back to.

Overall the work with the snapshots is not a rocket science but my test today showed me a in a bit more detail what is happening in the background with the file names, snapshots IDs in the vmdk files, data consolidation. It also showed that there are temporary parent files left behind if there is more than one direct child snapshot depending on it. It also forced me to refresh the knowledge about the Space Efficient Sparse Virtual Disks (or SE Sparse Disks for short) which was well explained by my colleague Cormac Hogan in late 2012.

Thursday, April 20, 2017

Back to the basics - VMware vSphere networking

As a software-defined networking (VMware NSX) is getting more and more traction I have been recently often asked to explain the basics of VMware vSphere networking to networking experts who do not have experience with VMware vSphere platform. First of all, networking team should familiarize them self with vSphere platform at least from a high level. Following two videos can help them to understand what vSphere platform is.

vSphere Overview Video

What is vCenter (Watch the first two minutes)

When they understand basic vSphere terms like vCenter and  ESXi we can start talking about virtual networking.

First thing first, VMware vSwitch is not a switch. Let me repeat it again ...
VMware vSwitch is not a typical ethernet switch.
It is not a typical network (ethernet) switch because not all switch ports are equal. In VMware vSwitch you have to configure switch uplinks (physical NICs) and internal switch ports (software constructs). If the ethernet frame is coming from the physical network via uplink, vSwitch will never forward such frame to any other uplink but only to internal switch ports, where virtual machines are connected. This behavior guarantees that vSwitch will never cause the L2 loop problem.  It also means that vSwitch does not need to implement and participate in spanning tree protocol (STP) usually running in your physical network. Another different vSwitch behavior compared to traditional ethernet switch is that vSwitch does not learn external MAC addresses. It only knows about MAC addresses of virtual machines running on particular ESXi host (hypervisor). Such devices are often called port extenders. For example, CISCO FEX (fabric extender) is a physical device having the same behavior.

Now let's talk about network redundancy. In production environments, we usually have a redundant network where multiple NICs are connected to different physical switches.

Each NIC connected to different physical switch
vSwitch network redundancy is achieved by NIC teaming. NIC teaming is also known as link aggregation, link bundling, port channeling, ethernet bonding or NIC teaming. VMware is using the term Network teaming or NIC teaming. So what teaming options do we have in VMware vSphere platform? It depends on what edition (license) you have and what vSwitch you want to use. VMware offers two types of vSwitches.
  • VMware vSphere standard switch (aka vSwitch or vSS)
  • VMware vSphere distributed virtual switch (aka dvSwitch or vDS)
Let's start with VMware's standard switch available on all editions.

VMware vSphere standard switch (vSS)

VMware vSphere standard switch supports multiple switch independent active/active and active/standby teaming methods and also one switch dependent active/active teaming method.

The standard switch can use following switch independent load balancing algorithms:
  • Route based on originating virtual port - (default) switch independent active/active teaming where the traffic is load balanced in round-robin fashion across all active network adapters (NICs) based on internal vSwitch port id where virtual machine vNIC's or ESXi vmKernel ports are connected.
  • Route based on source MAC hash - switch independent active/active teaming where the traffic is load balanced in round-robin fashion across all active network adapters (NICs) based on source MAC address identified in standard vSwitch.
  • Use explicit failover order - is another switch independent teaming but active/passive. Only one adapter from all active adapters is used and if it fails the next one is used. In other words, it always uses the highest order uplink from the list of Active adapters which passes failover detection criteria.
and only one switch dependent load balancing algorithm
  • Route based on IP hash - switch dependent active/active teaming where the traffic is load balanced based on a hash of the source and destination IP addresses of each packet. For non-IP packets, whatever is at those offsets is used to compute the hash. This is switch dependent teaming, therefore, the static port-channel (aka ether-channel) has to be configured on the physical switch side otherwise, it will not work.
It is worth to mention that for all active/active teaming methods you can add additional standby adapters which are used just in case the active adapter fails and you can also define unused adapters which you do not want to use at all. For further information, you can read VMware vSphere documentation.

VMware vSphere distributed switch (vDS)

If you have vSphere Enterprise Plus license or VSAN license you are eligible to use VMware vSphere distributed switch. VMware distributed switch key advantages are
  • centralized management
  • advanced enterprise functionality
When you use virtual distrubuted switch, you do not need to configure each vSwitch individually but instead, you have single distributed vSwitch across multiple ESXi hosts and you can manage it centrally. On top of centralized management you will get following advanced enterprise functionalities:
  • NIOC (Network I/O Control) which allows QoS and marking (802.1p tagging, DSCP)
  • LACP - dynamic switch dependent teaming
  • Route based on physical NIC load - another switch independent teaming with optimized load balancing
  • ACLs - Access Control Lists
  • LLDP
  • Port mirroring
  • NetFlow
  • Configuration backup and restore
  • and more
It is worth to mention, that when LACP is used you can leverage significantly enhanced load balancing algorithms to more optimal bandwidth usage of physical NICs.

vSphere 6.0 LACP supports following twenty (20) hash algorithms:
  1. Destination IP address
  2. Destination IP address and TCP/UDP port
  3. Destination IP address and VLAN
  4. Destination IP address, TCP/UDP port and VLAN
  5. Destination MAC address
  6. Destination TCP/UDP port
  7. Source IP address
  8. Source IP address and TCP/UDP port
  9. Source IP address and VLAN
  10. Source IP address, TCP/UDP port and VLAN
  11. Source MAC address
  12. Source TCP/UDP port
  13. Source and destination IP address
  14. Source and destination IP address and TCP/UDP port
  15. Source and destination IP address and VLAN
  16. Source and destination IP address, TCP/UDP port and VLAN
  17. Source and destination MAC address
  18. Source and destination TCP/UDP port
  19. Source port ID
  20. VLAN
Note: Advanced LACP settings are available via esxcli commands.  
esxcli network vswitch dvs vmware lacp
esxcli network vswitch dvs vmware lacp config get
esxcli network vswitch dvs vmware lacp status get
esxcli network vswitch dvs vmware lacp timeout set
Unfortunately, I do not have LACP ready hardware in my lab so for further details see this blog post.

Hope this was informative and useful.

References to other useful resources

Sunday, April 02, 2017

ESXi Host Power Management

I have just listened to Qasim Ali's  VMworld session "INF8465 - Extreme Performance Series: Power Management's Impact on Performance" about ESXi Host Power Management (P-States, C-States, TurboMode and more) and here are his general recommendations
  • Configure BIOS to allow ESXi host the most flexibility in using power management features offered by the hardware
  • Select "OS Control mode", "Performace per Watt", or equivalent 
  • Enable everything P-States, C-States and Turbo mode
  • To achieve the best performance per watt for most workloads, leave the power policy at default which is "Balanced"
  • For applications that require maximum performance, switch to "High Performance" from within ESXi host
Ali's VMworld session linked above is really worth to watch. I encourage you to watch it by yourself. 

Saturday, March 18, 2017

VMware vSphere 6.5 products enhancements and basic concepts behind

VMware Tech Marketing have produced a bunch of cool vSphere 6.5 related whiteboard videos. Great stuff to review to understand VMware products enhancements and basic concepts behind.

It is definitely worth to watch it but please, keep in mind that the devil is in details so be prepared for further planning, designing and testing before you implement it in to the production.

Friday, March 10, 2017

High level introduction to VMware products

My blog posts usually go to low level technical details and are targeted to VMware subject matter experts. However, sometime is good to step back and watch things from high level perspective. It can be especially helpful when you need to explain VMware products to somebody who is not an expert in VMware technologies.

vSphere Overview Video

What is vCenter (Watch the first two minutes)

HTML5 Web Client (This is how vSphere is managed now - no more client. Minute 3 shows you how to create a virtual machine)

vR Ops Overview

Troubleshooting VM Performance in vR Ops

How to Build Blueprints in vRA - Single Machine, Application, and with AWS

NSX - Network Concepts Overview (Watch up until minute 4)

NSX - Microsegmentation (Watch 2:50 to 4:40)

vSAN Overview

Hope you find it useful! Either way, sharing is welcome!

Sunday, March 05, 2017

ESXi localcli

I have just read very informative blog post "Adding new vNICs in UCS changes vmnic order in ESXi". The author (Michael Rudloff) is using localcli with undocumented functions to achieve correct NIC order. So what is this localcli? All vSphere admins probably know esxcli command for ESXi configuration. esxcli manages many aspects of an ESXi host. You can run ESXCLI commands remotely or in the ESXi Shell.

You can use esxcli in following three ways
  • vCLI package.Install the vCLI package on the server of your choice, or deploy a vMA virtual machine and target the ESXi system that you want manipulate. You can run ESXCLI commands against a vCenter Server system and target the host indirectly. Running against vCenter Server systems by using the -vihost parameter is required if the host is in lockdown mode.
  • ESXi shell. Run ESXCLI commands in the local ESXi shell to manage that host.
  • You can also run ESXCLI commands from the vSphere PowerCLI prompt by using the Get-EsxCli cmdlet.
So esxcli is well known but what about localcli. Based on VMware documentation, it is a set of commands for use with VMware Technical Support. localcli commands are equivalent to ESXCLI commands, but bypass hostd. The localcli commands are only for situations when hostd is unavailable and cannot be restarted. After you run a localcli command, you must restart hostd. Run ESXCLI commands after the restart.

Warning: If you use a localcli command in other situations, an inconsistent system state and potential failure can result.
So it is obvious that usage of LOCALCLI is unsupported and should be used only when instructed by VMware Support.
However, the command is very interesting because when you use special internal plugin directory some undocumented namespaces will appear. You can browse these namespaces and discover some cool functionality. Just login to your ESXi and use command localcli --plugin-dir /usr/lib/vmware/esxcli/int/

 [root@esx11:~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/   
 Usage: localcli [disp options]    
 For esxcli help please run localcli --help  
 Available Namespaces:   
 boot       operations for system bootstrapping                                          
 debug       Options related to VMkernel debugging. These commands should be used at the direction of VMware Support Engineers.   
 device      Device manager commands                                                
 deviceInternal  Device layer internal commands                                             
 elxnet      elxnet esxcli functionality                                              
 esxcli      Commands that operate on the esxcli system itself allowing users to get additional information.            
 fcoe       VMware FCOE commands.                                                 
 graphics     VMware graphics commands.                                               
 hardware     VMKernel hardware properties and commands for configuring hardware.                          
 hardwareinternal VMKernel hardware properties and commands for configuring hardware, which are not exposed to end users.        
 iscsi       VMware iSCSI commands.                                                 
 network      Operations that pertain to the maintenance of networking on an ESX host. This includes a wide variety of commands   
          to manipulate virtual networking components (vswitch, portgroup, etc) as well as local host IP, DNS and general   
          host networking settings.  
 networkinternal  Operations used by partner software, but are not exposed to the end user. These operations must be kept compatible   
          across releases.  
 rdma       Operations that pertain to remote direct memory access (RDMA) protocol stack on an ESX host.              
 rdmainternal   Operations that pertain to the remote direct memory access (RDMA) protocol stack on an ESX host, but are not   
          exposed to the end user. These operations must be kept compatible across releases.  
 sched       VMKernel system properties and commands for configuring scheduling related functionality.               
 software     Manage the ESXi software image and packages                                      
 storage      VMware storage commands.                                                
 system      VMKernel system properties and commands for configuring properties of the kernel core system and related system   
 systemInternal  Internal VMKernel system properties andcommands for configuring properties of the kernel core system.         
 user       VMKernel properties and commands for configuring user level functionality.                       
 vm        A small number of operations that allow a user to Control Virtual Machine operations.                 
 vsan       VMware Virtual SAN commands                                              
 Available Commands:   

Let me tell you again that this command is unsupported, therefore do not use it in production. On the other hand, it is very cool to test it in our labs ...

Tuesday, February 28, 2017

Maximum client sessions vCenter server can accept

I work as VMware TAM (Technical Account Manager) and one my customer had recently significant incident when clients (vSphere admins) was not able connect to vCenter server. It did not work nighter from old C# client nor new Web Client. It was interesting that sometimes some admins were able to connect and stay connected but others where not able to connect.

The error message was very general saying ...
Call "ServiceInstance.RetrieveContent" for object "ServiceInstance" on Server "" failed.
C# Client returned another further explanation ...
The server '' could not interpret the client's request. (The remote server returned an error: (503) Server Unavailable.) 
See error messages in screenshot below ...

C# Client error messages
As you can see, both error messages are very general and further holistic troubleshooting was necessary. After multiple theories, one customer's vSphere/Windows administrator did a Windows OS analysis with Windows perfmon tool and realized that during the incident there were more then 1400 open threads with client connections to vCenter server. This turned in to the hypothesis that we have reached the maximum of client sessions vCenter can accept.

The hypothesis is always very important but even more important is the proof that hypothesis is valid and it is the root cause of particular issue.

Unfortunately, the maximum of total client sessions to vCenter server is not documented. The only numbers documented in "Configuration Maximums - vSphere 5.5" are ..
Concurrent vSphere Client connections to vCenter Server = 100Concurrent vSphere Web Clients connections to vCenter Server = 180
However, my customer is using automation extensively, therefore PowerCLI can have additional connections. The only way how to know the maximum is to test it.

My customer is still on vCenter 5.5 but I have prepared and executed the test in my home lab where I have vCenter 6.0 U2. I prepared PowerCLI script to create 2000 new client sessions and keep sessions open. The purpose of script is to find the maximum of established sessions vCenter can accept and see what will be the error message when maximum will be achieved.

The PowerCLI script is available on GitHub here
and it is based on excellent blog post and scripts "List and Disconnect vCenter Sessions" prepared by Alan Renouf.

I run the script in my lab and waited when it fails to find the maximum. You can see the expected failure on screenshot below ...

Expected connection failure to find what is the maximum
And the result is ...
vCenter Server 6.0 U2 accepts maximally 1995 established client sessions
When the above maximum is exceeded you are not able to connect to vCenter server any more and you will see the error messages mentioned at the beginning of this article.

Business impact and visibility

It is good to mention that this technical issue was observed during Disaster Recovery fail-over test and it silently disappeared after fail back of all services. That's the reason why this incident had very high internal business visibility and the issue was escalated to top IT management which required very quick Root Cause Analysis and proper problem management.

That's just another proof how vCenter and vSphere platform is critical in modern IT environments.

It seems, that my customer is using some automation script which establish connection to vCenter server, but because of some circumstances which happening only when services are running on disaster recovery backup site, the script does not disconnect sessions and the vCenter server maximum is exceeded and it does not accept any new connections. In such situation, vSphere platform is unmanageable.

This is good to know, especially in the age of automation, where single badly written automation script, can crash vSphere manageability.

As VMware TAM, I can communicate and justify my customer's product feature requests internally inside VMware organization.  That's another benefit of VMware TAM Program.

So here is publicly written vCenter Product Feature Request which I will open with our Product Management.

Feature Request: Maximum of supported client sessions should be documented in "vSphere Configuration Maximums" document. When the maximum is exceeded, vCenter server should accept at least one more connection for vSphere Administrator (for example administrator@vsphere.local) which should be used as last resort or back door if you wish. Such special "back door" connection should be terminated and re-established by the most recent connection of vSphere Admin to allow manageability in such situation.

Sunday, February 26, 2017

How to install VMware tools on FreeBSD server

FreeBSD is my favorite operating system. All my FreeBSD servers (except embedded systems on physical micro computers) are running as virtual machines. FreeBSD is officially supported GuestOS by VMware so nothing stops to virtualize FreeBSD even for productional use.

VMware Tools is a suite of utilities that enhances the performance of the virtual machine's guest operating system and improves management of the virtual machine. Although the guest operating system can run without VMware Tools, you would lose important functionality and convenience. In other words, VMware tools are not necessary but highly recommended to use on virtual machines running on top of VMware ESXi hosts.

There are multiple options how to install VMware tools on FreeBSD but I personally use Open VM Tools native FreeBSD package as using Open VM Tools is actually the latest VMware's recommendation for unix like systems which is the case of FreeBSD. The reason why I use Open VM Tools instead of VMtools delivered by VMware on ESXi hosts or VMware download site is that I can use default FreeBSD package management system (pkg) for simple deployment. It is fast, convenient and fully integrated with standard operating system update and upgrade procedures.

As you can see below, the installation on FreeBSD 10.x and above is very straight forward. Essentially, the single command and 5 lines in FreeBSD system config file.

# You have to switch to administrator account (root)
su -l root

# and install Open VM Tools by FreeBSD package manager
pkg install open-vm-tools-nox11

To run the Open Virtual Machine tools at startup, you must add the following settings to your /etc/rc.conf


Easy, right?

And just for your information, Open VM tools is set of four kernel modules (vmemctl, vmxnet, vmblock, vmhgfs) and one daemon (guestd).

vmemctl is driver for memory ballooning
vmxnet is paravirtualized network driver
vmhgfs is the driver that allows the shared files feature of VMware Workstation and other products that use it. This is not optimal to use on server therefore we do not enable it.
vmblock is block filesystem driver to provide drag-and-drop functionality from the remote console.
VMware Guest Daemon (guestd) is the daemon for controlling communication between the guest and the host including time synchronization.

On Windows and Supported Linux Distributions exists other VMtools modules/drives but those are not supported on FreeBSD. For further information about all VMtools components look at

Tuesday, January 24, 2017

VMware vSphere 6.0 PSC and SSO Domain useful resources

I do not have real numbers but it seems obvious and logical that SMB and midrange customers are adopting the latest VMware software much quicker then large enterprise customers. To be more precise, they are probably already running vSphere 6.0 and planing to upgrade to 6.5 now or soon. Some of them just waiting for 6.5 U1 which is expected soon.

On the other hand, the largest VMware customers are logically more conservative and starting migrations from vSphere 5.5 to 6.0 just now, in time of writing this article (beginning of 2017). These large customers have significantly larger scale therefore their PSC/SSO topology is much more complex.

During last few weeks I have discussed some vSphere 6 PSC/vCenter topology design decision points with these customers and I have decided to write down blog post about few useful, publicly available, resources / documents for such discussions.

First and foremost,  FAQ below is the most comprehensive VMware KB article about this topic.

FAQ: VMware Platform Services Controller in vSphere 6.0 (2113115)

The most surprised information, even for long time VMware customers, are following two Q&A's from FAQ above.

Q: Can I merge two vSphere Domains together?
A: No, there is no way to merge two vSphere domains together.

Q: Can I get Enhanced Linked Mode (ELM) between two, separate vSphere domains?
A: No, Enhanced Linked Mode requires that all PSCs be in the same domain and replicating. Since two separate vSphere Domains do not have a means of replicating, the new APIs that provide ELM cannot display the contents of both domains.

What does it mean?
Well, if you have multiple independent vSphere 5.5 SSO domains and you want to merge them, you have to do it in vSphere 5.5 before upgrade to 6.0 because you will not be able to do so in vSphere 6 and later.
Note: I do not know how it will change in longer term but it is the true even for vSphere 6.5 which is the latest version in time of writing this blog post.

Q: One of my customers asked me if the same vSphere SSO name (vsphere.local) in their two separate datacenters means that it is the same vSphere domain.
A: No. If you do not have replication between domains, there are not the same domain even they have the same name.

Another good question, you have to ask yourselves is, if you should or should not merge your vSphere domains. The typical reason for single vSphere domain is requirement for Enhanced Linked Mode (ELM). What Enhanced Linked Mode will give you? Below are several benefits of ELM:
  • You can log in to all linked vCenter Server systems simultaneously with a single user name and password.
  • With Enhanced Linked Mode, you can view and search across all linked vCenter Server systems. This mode replicates roles, permissions, licenses, and other key data across systems.
  • You can view and search the inventories of all linked vCenter Server systems within the vSphere Web Client.
  • Roles, permission, licenses, tags, and policies are replicated across linked vCenter Server systems.
  • You can use WebClient GUI to do cross vCenter vMotion
However, any technology has some limits. In case of vSphere, we should always look at vSphere Configuration Maximums. The relevant information from configuration maximums are

  • Maximum PSCs per vSphere Domain - 8
  • Maximum PSCs per site, behind a load balancer - 4
  • Maximum number of VMware Solutions connected to a single PSC - 4
  • Maximum number of VMware Solutions in a vSphere Domain - 10
What are VMware Solutions?
A VMware Solution is defined as a product that creates a Machine Account and one or more Solution User (a collection of vSphere services) within the VMware Directory Service when the product is joined to the PSC, thus the vSphere Domain. The Machine Account and Solution User(s) are used to broker and secure communication between other Solutions available within the vSphere environment. In order to count against these maximums, the Machine Account and Solution Users must be fully integrated with all of the PSC's available feature sets (Identity Management and Authentication Brokering, Certificate Management, Licensing, etc.) such that the product makes full use of the PSC. At this time, only vCenter Server is defined as a fully integrated solution and counts against these maximums. Partially integrated solutions, such as vCenter Site Recovery Manager, vCloud Director vRealize Orchestrator, vRealize Automation Center, and vRealize Operations, do not count against these defined maximums.
So in other words, vCenters are currently the only solutions which counts into maximum of 10 VMware solutions. 

Now, when you know if you really need and want to merge vSphere domains it must be done in vSphere 5.5 because in vSphere 6 it is not possible.

I was asked by one of my customers, where is written that vSphere domain merging is supported and how it can be done.

Bellow are two blog post written by blogger Thom Greene ...

Merging SSO Domains in vCenter 5.5 part 1: Why?

Merging SSO Domains in vCenter Server 5.5 pt 2: How?

and very detailed blog post of Andreas Peetz referred by Thom in his posts.

Re-pointing vCenter Server 5.5: A Survival Guide to KB2033620

... but resources above are not VMware official documents so where are VMware official documents? Andreas' blog posts are referring to following VMware KB's

Migrating two VMware vCenter Single Sign-On embedded VMware vCenter Servers in the same VMware vCenter Single Sign-On domain (2130433)

How to repoint and re-register vCenter Server 5.1 / 5.5 and components (2033620)

VMware vCenter Server 5.1/5.5 fails to start after re-registering with vCenter Single Sign-On (2048753)

Old but still informative blog post ... vSphere Datacenter Design – vCenter Architecture Changes in vSphere 6.0 – Part 1

Additional VMware resources:

Platform Services Controller Topology Decision Tree

vCenter Server Topology Considerations

Reconfigure a Standalone vCenter Server with an Embedded Platform Services Controller to a vCenter Server with an External Platform Services Controllerlink

How to repoint vCenter Server 6.x between External PSC within a site (2113917)

Using the cmsso command to unregister vCenter Server from Single Sign-On (2106736)

and just another related blog post from William Lam
How to split vCenter Servers configured in an Enhanced Linked Mode (ELM)?

Useful VMware KB article before upgrade to vSphere 6.5

I have just found following very useful VMware KB articles and blog posts which should be read before any vSphere 6.5 upgrade and design refresh.

Update sequence for vSphere 6.5 and its compatible VMware products (2147289) 

Important information before upgrading to vSphere 6.5 (2147548)

Best practices for upgrading to vCenter Server 6.5 (2147686)

Platform Services Controller Topology Decision Tree

Reconfigure a Standalone vCenter Server with an Embedded Platform Services Controller to a vCenter Server with an External Platform Services Controller

How to repoint vCenter Server 6.x between External PSC within a site (2113917)

Wednesday, January 11, 2017

Using esxtop to identify storage performance issues for ESX / ESXi

ESXi performance are exposing to administrators through vSphere Clients. You can see real-time performance statistics which are collected in 5 minute intervals where each interval consists of fifteen 20 seconds samples. It is obvious that 20 second sample is pretty large for storage performance where we are working in mili or even micro second scale.
20 seconds contains 20,000 milliseconds
Let's be clear here, we will never have full visibility but smaller monitoring sample will give as better clue what is really happening inside the system. It is similar to microscope device.

The smallest monitoring samples can be achieved by ESXi utility ESXTOP. The default esxtop delay between monitoring points (sample) is 5 seconds. However, it can be lowered up to 2 seconds by parameter -d 2

For real analytics the esxtop data must be exprted to external file. In esxtop terminology it is batch mode and it is achieved by parameter -b 

Another important factor is what statistics (metrics) we are going to collect. The best is to collect all statistics because during performance analytics you have to correlate multiple values against each other. It is achieved by parameter -a

And last parameter is -n which defines how many iterations you want to perform in batch mode. So in example below we will have 30 iterations with delay between each other 2 seconds. So we will do total monitoring for 60 seconds.

esxtop -b -a -d 2 -n 30 > esxtop-data.csv

For all esxtop parameters see screenshot below.

 [root@esx11:~] esxtop -h  
 usage: esxtop [-h] [-v] [-b] [-l] [-s] [-a] [-c config file] [-R vm-support-dir-path]   
         [-d delay] [-n iterations]  
        [-export-entity entity-file] [-import-entity entity-file]   
        -h prints this help menu.  
        -v prints version.  
        -b enables batch mode.  
        -l locks the esxtop objects to those available in the first snapshot.  
        -s enables secure mode.  
        -a show all statistics.  
        -c sets the esxtop configuration file, which by default is .esxtop60rc  
        -R enables replay mode.  
        -d sets the delay between updates in seconds.  
        -n runs esxtop for only n iterations. Use "-n infinity" to run esxtop forever.  
        -----Experimental Features-------------  
        -export-entity writes the entity ids into a file, which can be modified  
         to select interesting entities.  
        -import-entity reads the file of selected entities. If this opion   
         is used, esxtop only shows the data for the selected entities.  

It is important to know, that esxtop will give you significantly more statistics you can see in vSphere Client level. That's another important benefit of esxtop. But each benefit has also some drawbacks or impact. The impact is, that single esxtop output line can have several thousands statistic counters. For example ESXi 6.0 host with just 2 running VMs in my home lab has 27,314 counters. My customer's product ESXi host has over 330,000 counters! So the output file can be pretty large in case you run it for 24 hours. Count on it.

In the file are very interesting counters. Following counters for physical disk devices are the most interesting
### Reponse times
Average Guest MilliSec/Command
Average Kernel MilliSec/Command
Average Queue MilliSec/Command
Average Queue MilliSec/Read
Average Driver MilliSec/Command
Average Driver MilliSec/Write
### Queue
Adapter Q Depth
### IOPS
### MB/s
MBytes Read/sec
MBytes Written/sec"
### Split commands
Split Commands/sec
### SCSI Reservations
Failed Reserves/sec
### Failures
Failed Commands/sec
Failed Reads/sec
Failed Writes/sec
Failed Bytes Read/sec
Failed Bytes Written/sec
Some of above counters are not available in vSphere Client but the big benefit is that esxtop will give you data in 2 second interval which is much better granularity.

I hear your questions - So what now? How to analyze esxtop output file?
Well, you can replay it back in esxtop or you can use any of following tools

  • VisualEsxtop
  • perfmon
  • excel
  • esxplot
To be honest, none of tools above fulfilled my requirements therefore I'm writing my own python script for esxtop output analysis.

I will blog about it in next post when script will be good enough for public usage and published on github.

Stay tuned.