Tuesday, August 15, 2017

NSX Basic Concepts, Tips and Tricks

NSX and Network Teaming

There are multiple options how to achieve network teaming from ESXi to the physical network. For more information see my another blog post "Back to the basics - VMware vSphere networking".

In a nutshell, there are generally three supported methods how to connect NSX VTEP(s) to the physical network
  1. Explicit failover - only single physical NIC is active at any given time, therefore no load balancing at all
  2. LACP - single aggregated virtual interface where load balancing is done based on hashing algorithm
  3. Switch independent teaming achieved by multiple VTEPs where each VTEP is bind to different ESXi pNIC.
Let's assume we have switch independent teaming with multiple independent uplinks to the physical network. Now the question is how to check VM vNIC to ESXi host pNIC mapping? I'm aware of at least four methods how to check this mapping
  1. ESXTOP
  2. ESXCLI
  3. NSX Controller
  4. NSX Manager
1/ ESXTOP method
  • ssh to ESXi
  • run esxtop
  • Press key [n] to switch to network view
  • Check column TEAM-PNIC – it should be different vmnic (ESXi pNIC) for each VM
2/ ESXCLI method
  • ssh to ESXi
  • Use command “esxcli network vm list” and locate World IDs of VM
  • Use “esxcli network vm port list -w ” and check “Team Uplink” value. It should be different vmnic (ESXi PNIC) for each VM
3/ NSX Controller method
  • Identify MAC address of VM
  • Login to NSX Controller nodes (ssh or console) one by one
  • Use command “show control-cluster logical-switches mac-table ” to show mac-address to VTEP mappings. I assume multi VTEP configuration where each VTEP is statically bound to particular ESXi pNIC (vmnic)
4/ NSX Manager method
  • Identify MAC address of VM
  • Login to NSX Manager (ssh or console)
  • Go through all controllers and show mac address table where is also information behind which VTEP particular mac address is
  • i) show controller list all
  • ii) show logical-switch controller controller-1 vni 10001 mac
  • iii) show logical-switch controller controller-2 vni 10001 mac
  • iv) show logical-switch controller controller-3 vni 10001 mac
The appropriate method is typically chosen based on the role and Role Based Access Control. vSphere Administrator will probably use esxtop or esxcli and Network Administrator will use NSX Manager or Controller.

Distributed Logical Router (DLR)

DLR is a virtual router distributed across multiple ESXi hosts. You can imagine it as a chassis with multiple line cards.  Chassis is virtual (software based) and line cards are software modules spread across multiple ESXi hosts (physical x86 servers).

The basic concept of DLR is that every routing decision is done locally which means that NSX DLR always performs local routing on the DLR instance running in the kernel of the ESXi hosting the workload that initiates the communication. When VM traffic needs to be routed to another logical switch, it first comes to DLR on the same ESXi host where VM is running. Each DLR line card module (ESXi host) has all logical switches (VXLANs) connected locally so DLR forwards the packet to the appropriate destination logical switch and if the target VM runs on another ESXi host the packet is encapsulated on local ESXi host and decapsulated on target ESXi host.

It is good to know, that DLR uses always the same MAC address for default gateway addresses for all logical switches. This MAC address is called VMAC. This is a MAC address used for DLR logical L3 interfaces (LIFs) connected into logical switches (VXLANs).

However, there must be some coordination between multiple DLR "line card" modules (ESXi hosts) therefore each DLR module must also have physical MAC address. This MAC address is called PMAC.

To show DLR PMAC and VMAC run following command on ESXi host
net-vdr -l -C

Distributed Logical Firewall (DFW) - firewall rules

NSX Distributed Firewall applies firewall rules directly to VM vNICs. In the vNIC is the concept of slots where different services are bind and chain together. NSX DFW sits in slot 2 and for example, the third party firewall sits in slot 4.

So the DFW firewall rules are automatically applied on each vNIC so the question is how to double check what rules are at vNIC level.

There are two methods how to check it
  1. ESXi commands
  2. NSX Manager commands
1/ ESXi method
  • ssh to ESXi
  • Use command “summarize-dvfilter” and locate the VM of your interest and its vNIC name is slot 2 used by agent vmware-sfw
  • grep commands can help us here ... "summarize-dvfilter | grep -A 10 "
  •  vNIC name should looks similar to nic-24565940-eth0-vmware-sfw.2
  • Now you can list firewall rules by command "vsipioctl getfwrules -f nic-24565940-eth0-vmware-sfw.2"

2/ NSX Manager method (https://kb.vmware.com/kb/2125482)
  • Log in to the NSX Manager with the admin credentials
  • To display a summary of DVFilter information, run the command "show dfw host-id summarize-dvfilter"
  • To display detailed information about a vnic, run the command "show dfw host host-id vnic"
  • To display the rules configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name rules"
  • To display the addrsets configured on the filter, run the command "show dfw host host-id vnic vnic-id filter filter-name addrsets"
And again, the appropriate method is typically chosen based on the administrator role and Role Based Access Control. 

Distributed Logical Firewall (DFW) - third party integration and availability considerations

NSX Distributed Firewall supports integration with third party solutions. This integration is also called service chaining. Third party solution is hooked to a particular vNIC slot and usually, some selected or potentially all (not recommended) traffic can be redirected to third-party solution agent running on each ESXi host as a special Virtual Machine. The third-party solution can inspect the traffic and allow or deny the traffic. However,  what happens when agent VM is not available? It is easy to test it, you can Power Off Agent VM and see what happens. Actually, the behavior depends on Service failOpen/failClosed policy.  You can check policy setting as depicted on the screenshot below ...

Service failOpen/failClosed policy
If failOpen is set to false then the virtual machine traffic will be dropped in case the agent is unavailable. It has a negative impact on availability but positive impact on security. If failOpen is set to true then the VM traffic will be allowed and everything works even the agent is not available. In such situation, the security policy cannot be enforced and there is a potential security risk. So this is typical design decision point where a decision is dependent on customer specific requirements.

Now the question is how failOpen setting can be changed. Well, my understanding is that it depends on third party solution. Here is the link to TrendMicro how to - "Set vNetwork behavior when appliances shut down"  

Monday, August 14, 2017

Remote text based console to ESXi over IPMI SOL

I have just bought another server into my home lab. I already have 6 Intel NUCs but a lot of RAM is needed for full VMware SDDC with all products like LogInsight, vROps, vRNI, vRA, vRO, ...  but that's another story.

Anyway, I have decided to buy used Dell rack server (PowerEdge R810) with 256 GB RAM mainly because of the amount of RAM but also because of all Dell servers older than 9 Generation support IPMI which is very useful. The server can be remotely managed (power on, power off, etc.) over IPMI and it also supports SOL which stands for Serial-over-LAN for server consoles. IPMI SOL is an inexpensive alternative to the iDRAC enterprise virtual console.

You can read more about IPMI on links below

So, if you will follow instructions on links above, you will be able to use IMPI SOL to see and manage server during the boot process and change for example BIOS settings. I have tested it and it works like a charm. You see the booting progress, you can go to the BIOS and change anything how you want. Console redirection works and the keyboard can be used to control the server during POST. However, after the server POST phase and boot loading of ESXi, the ESXi console was not, unfortunately, redirected to SOL. I think it is because ESXi DCUI is not pure text based console. Instead, it is a graphics mode simulating text mode. A graphics mode consoles cannot be, for obvious reasons, transferred over IPMI SOL.

So there is another possibility. ESXi Direct Console (aka DCUI) can be redirected to a Serial Port. The setup procedure is nicely described in the documentation here. It is done via ESXi host advanced setting "VMkernel.Boot.tty2Port" to the value "com2". It is worth to mention that server console redirection and ESXi DCUI redirection cannot be done on the same time for obvious reasons. So I have unconfigured server console redirection and configured ESXi DCUI redirection. It worked great, but the keyboard was not working. It is pretty useless to see ESXi DCUI without the possibility to use it, right? To be honest, I do not know why my keyboard did not work over IPMI SOL.

So what is the conclusion? Unfortunately,  I have hit another AHA effect ...
"Aha, IPMI SOL will not help me too much with remote access to ESXi DCUI console."
And as always, any feedback or tips and tricks are more than welcome as comments to this blog post.

Update: I have just found and bought very cheap iDRAC Enterprise Remote Access Card on Ebay, which supports remote virtual console and media. So, it is hardware workaround to my software problem :-)

iDrac6 on Ebay





Sunday, June 25, 2017

Start order of software services in VMware vCenter Server Appliance 6.0 U2

vCenter Server Appliance 6.0 U2 services are started in the following order ...

  1. vmafdd (VMware Authentication Framework)
  2. vmware-rhttpproxy (VMware HTTP Reverse Proxy)
  3. vmdird (VMware Directory Service)
  4. vmcad (VMware Certificate Service)
  5. vmware-sts-idmd (VMware Identity Management Service)
  6. vmware-stsd (VMware Security Token Service)
  7. vmware-cm (VMware Component Manager)
  8. vmware-cis-license (VMware License Service)
  9. vmware-psc-client (VMware Platform Services Controller Client)
  10. vmware-sca (VMware Service Control Agent)
  11. applmgmt (VMware Appliance Management Service)
  12. vmware-netdumper (VMware vSphere ESXi Dump Collector)
  13. vmware-syslog (VMware Common Logging Service)
  14. vmware-syslog-health (VMware Syslog Health Service)
  15. vmware-vapi-endpoint (VMware vAPI Endpoint)
  16. vmware-vpostgres (VMware Postgres)
  17. vmware-invsvc (VMware Inventory Service)
  18. vmware-mbcs (VMware Message Bus Configuration Service)
  19. vmware-vpxd (VMware vCenter Server)
  20. vmware-eam (VMware ESX Agent Manager)
  21. vmware-rbd-watchdog (VMware vSphere Auto Deploy Waiter)
  22. vmware-sps (VMware vSphere Profile-Driven Storage Service)
  23. vmware-vdcs (VMware Content Library Service)
  24. vmware-vpx-workflow (VMware vCenter Workflow Manager)
  25. vmware-vsan-health (VMware VSAN Health Service)
  26. vmware-vsm (VMware vService Manager)
  27. vsphere-client ()
  28. vmware-perfcharts (VMware Performance Charts)
  29. vmware-vws (VMware System and Hardware Health Manager) 


Thursday, June 22, 2017

CLI for VMware Virtual Distributed Switch

A few weeks ago I have been asked by one of my customers if VMware Virtual Distributed Switch (aka VDS) supports Cisco like command line interface. The key idea behind was to integrate vSphere switch with open-source tool Network Tracking Database (NetDB) which they use for tracking MAC addresses within their network. I have been told by customer that NetDB can telnet/ssh to Cisco switches and do screen scraping so would not it be cool to have the most popular switch CLI commands for VDS? These commands are

  • show mac-address-table
  • show interface status
The official answer is NO, but wait a minute. Almost anything is possible with VMware API. So my solution is leveraging VMware's vSphere Perl SDK to pull information out of Distributed Virtual Switches. I have prepared PERL script vdscli.pl which currently supports two commands mentioned above. It goes through all VMware Distributed Switches on single vCenter. Script along with shell wrappers are available on GITHUB here https://github.com/davidpasek/vdscli

See screenshots below to get an idea what script does.

The output of the command
vdscli.pl --server=vc01.home.uw.cz --username readonly --password readonly --cmd show-port-status
looks as depicted in screenshot below.


and output of the command
vdscli.pl --server=vc01.home.uw.cz --username readonly --password readonly --cmd show-mac-address-table
Now, I'm working on telnet daemon which will simulate remotely accessible switch CLI. It will just call PERL scripts above but that is what network administrators want for their day to day operations.

Tuesday, June 13, 2017

Storage DRS integration with storage profiles

This is a very quick blog post. In vSphere 6.0, VMware has introduced Storage DRS integration with storage profiles (aka SPBM - Storage Policy Based Management).

Here is the link to official documentation.

Generally, it is about SDRS advanced option EnforceStorageProfiles. Advanced option EnforceStorageProfiles takes one of these integer values, 0,1 or 2 where the default value is 0.

  • When option is set to 0, it indicates that there is NO storage profile or policy enforcement on the SDRS cluster.
  • When option is set to 1, it indicates that there is storage profile or policy SOFT enforcement on the SDRS cluster. It is analogous with DRS soft rules. SDRS will comply with storage profile/policy in the optimum level. However if required, SDRS will violate the storage profile compliant.
  • When option is set to 2, it indicates that there is storage profile or policy HARD enforcement on the SDRS cluster. It is analogous with DRS hard rules. In any case, SDRS will not violate the storage profile or policy compliant.

Please note that at the time of writing this post, SDRS Storage Profiles Enforcement works only during initial placement and NOT for already provisioned VMs during load balancing. Therefore, when iVM Storage Policy is changed for particular VM, SDRS will not make it automatically compliant nor throw any recommendation.

Another limitation is that vCloud Director (vCD) backed by SDRS cluster does NOT support  Soft (1) or Hard (2) storage profile enforcements. vCloud Director (vCD) will work well with Default (0) option

Relevant references to other resources:

Wednesday, June 07, 2017

VMware Photon OS with PowerCLI

Photon OS is linux distribution maintained by VMware with multiple benefits for virtualized form factor, therefore any virtual appliance should be based on Photon OS.

I have recently tried to play with Photon OS and here are some my notes.

IP Settings

Network configuration files are in directory
/etc/systemd/network/
IP settings are leased from DHCP by default. It is configured in file  /etc/systemd/network/10-dhcp-en.network

File contains following config
[Match]
Name=e*
[Network]
DHCP=yes
To use static IP settings it is good to move DHCP config file down in alphabetical order and create config file with static IP settings.
mv 10-dhcp-en.network 99-dhcp-en.networkcp 99-dhcp-en.network 10-static-en.network
file  /etc/systemd/network/10-static-en.network should looks similar to
[Match]Name=eth0
[Network]Address=192.168.4.56/24Gateway=192.168.4.254DNS=192.168.4.4

Network can be restarted by command
systemctl restart systemd-networkd
and network settings can be checked by command

networkctl

Package management

Photon OS uses TDNF  (Tiny DNF) package manager. It is based on Fedora's DNF.  This is a development by VMware that comes with compatible repository and package management capabilities. Note that not every dnf command is available but the basic ones are there.

Examples:
  • tdnf install libxml2
  • tdnf install openssl-devel
  • tdnf install binutils
  • tdnf install pkg-config
  • tdnf perl-Crypt-SSLeay
  • tdnf install cpan
  • tdnf libuuid-devel
  • tdnf install make
Update of the whole operating system can be done by command
tdnf update

Log Management

You will not find typical linux /var/log/messages
Instead, journald is used and you have to use command journalctl

Equivalent to tail -f /var/log/messages is
journalctl -f 

System services

System services are control by command systemctl

To check service status use
systemctl status docker
To start service use
systemctl start docker
To enable service after system start use
systemctl enable docker

Docker and containerized PowerCLI

One of key use cases for Photon OS is to be a docker host, therefore, docker is preinstalled in Photon OS. You can see further Docker information by command
docker info
If Docker is running on your system, you can very quickly spin up docker container. Let's use example of containerized PowerCLI. To download container image from DockerHup use command
docker pull vmware/powerclicore
to check all downloaded images use the command
docker images -a   
 root@photon-machine [ ~ ]# docker images -a    
 REPOSITORY      TAG         IMAGE ID      CREATED       SIZE  
 vmware/powerclicore  latest       a8e3349371c5    6 weeks ago     610 MB  
 root@photon-machine [ ~ ]#   

Now you can run powercli container interactively (-i) and in allocated pseudo-TTY (-t). Option -rm stands for "Automatically remove the container when it exits".
docker run --rm -it vmware/powerclicore 
 root@photon-machine [ ~ ]# docker run --rm -it --name powercli vmware/powerclicore         
 PowerShell   
 Copyright (C) Microsoft Corporation. All rights reserved.erclicore --name powercl  
      Welcome to VMware vSphere PowerCLI!  
 Log in to a vCenter Server or ESX host:       Connect-VIServer  
 To find out what commands are available, type:    Get-VICommand  
 Once you've connected, display all virtual machines: Get-VM  
     Copyright (C) VMware, Inc. All rights reserved.  
 Loading personal and system profiles took 3083ms.  
 PS /powershell#   

Now you can use PowerCLI running on linux container. The very first PowerCLI command is usually Connect-VIServer, but you can get following warning and error messages

 PS /powershell> Connect-VIServer                                                                         
 cmdlet Connect-VIServer at command pipeline position 1  
 Supply values for the following parameters:  
 Server: vc01.home.uw.cz  
 Specify Credential  
 Please specify server credential  
 User: cdave  
 Password for user cdave: *********  
 WARNING: Invalid server certificate. Use Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you'd like to connect once or to add  
  a permanent exception for this server.  
 Connect-VIServer : 06/07/2017 19:25:44     Connect-VIServer          An error occurred while sending the request.       
 At line:1 char:1  
 + Connect-VIServer  
 + ~~~~~~~~~~~~~~~~  
   + CategoryInfo     : NotSpecified: (:) [Connect-VIServer], ViError  
   + FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_Exception,VMware.VimAutomation.ViCore.Cmdlets.Commands.ConnectVIServer  
 PS /powershell>   

To solve the problem you have to adjust PowerCLI configuration by
Set-PowerCLIConfiguration -InvalidCertificateAction ignore -confirm:$false -scope All
The command above changes PowerCLI configuration for all users.

To use other docker commands you can open another ssh session, and for example list running containers

 root@photon-machine [ ~ ]# docker ps -a     
 CONTAINER ID    IMAGE         COMMAND       CREATED       STATUS       PORTS        NAMES  
 6ecccf77891e    vmware/powerclicore  "powershell"    7 minutes ago    Up 7 minutes              powercli  
 root@photon-machine [ ~ ]#   

... or issue any other docker command.

That's cool, isn't it?

Tuesday, June 06, 2017

VMware VVOLs scalability

I'm personally a big fan of VMware Virtual Volumes concept. If you are not familiar with VVOLs check this blog post with the recording of VMworld session and read VMware KB Understanding Virtual Volumes (VVols) in VMware vSphere 6.0

We all know that the devil is always in details. The same is true with VVOLs. VMware prepared the conceptual framework but implementation always depends on storage vendors thus it vary around storage products.

Recently, I have had VVOLs discussion with one of my customers and he was claiming that their particular storage vendor supports a very small number of VVOLs. That discussion inspired me to do some research.

Please, note that numbers bellow are valid at the moment of writing this article. You should always check current status with your particular storage vendor.

Vendor / Storage ArrayMaximum VVOLs / Snapshots or Clones
DELL / Compellent SC 80002,000 / TBD
EMC / Unity 3009,000 / TBD
EMC / Unity 4009,000 / TBD
EMC / Unity 50013,500 / TBD
EMC / Unity 60030,000 / TBD
EMC / VMAX 364,000 / TBD
Hitachi / VSP G2002,000 / 100,000
Hitachi / VSP G4004,000 / 100,000
Hitachi / VSP G6004,000 / 100,000
Hitachi / VSP G80016,000 / 100,000
Hitachi / VSP G100064,000 / 1,000,000

Numbers above are very important because single VM have minimally 3 VVOLs (home, data, swap) and usually even more (snapshot) or more data disks. If you will assume 10 VVOls for single VM you will end up with just 200 VMs on Dell Compellent or Hitachi VSP G200. On the other hand, EMC Unity 600 would give you up to 3,000 VMs which is not bad and enterprise storage systems (EMC VMAX and Hitachi G1000) would give you up to 6,400 VMs which is IMHO very good scalability.

So as always, it really depends on what storage system do you have or planning to buy.

If you know numbers for other storage systems, please share it in comments below this blog post.