Wednesday, January 28, 2015

The fastest vMotion over Force10 MXL?

Here is the question I have got yesterday ...
My customer has two M1000e chassis in a single rack with MXL blade switches in fabrics A and B.  MXL fabric B is connected to 10G EQL SAN.  The goal is to allow vmotion to occur very fast between the two chassis using fabric A without going to the top of rack 10G switch.  The question is what interconnect between the A fabric is both chassis is best?
 Is VLT or stacking preferred?
Is it best to vertically stack chassis 1 MXL A1 to chassis 2 MXL A1 and then LACP to TOR S4810?
Or is it better to horizontally stack chassis 1 MXL A1 to chassis 1 MXL A2 and then LACP to TOR S4810?
Let's do a consultative design exercise.

Requirements
  • R1 - vMotion to occur very fast between two chassis
  • R2- vMotion over fabric A without going to the TOR switch
  • R3 - use fabric B only for iSCSI 
Constraints
  • C1 - 2x Blade chassis DELL M1000e
  • C2 - Each blade chassis has Force10 MXL blade switch IO modules on fabrics A1 and A2 for ethernet/IP trafffic
  • C3 - Each blade chassis has Force10 MXL blade switch IO modules on fabrics B1 and B2 for iSCSI trafffic
  • C4 - VMware vSphere ESXi hypervisor on each blade server
  • C5 - maximum 8 vMotions per ESXi host
Assumptions
  • A1- Blade servers have 2x 10GB NIC connected to fabric A (A1, A2)
  • A2- Blade servers have 2x 10GB NIC connected to fabric B (B1, B2)
  • A3- 16x half height blade servers are used on each blade chassis
  • A4 - Each ESXi server has NIC Teaming with dual homing to A1 and A2 IO modules
Design decision and justification
  • MXL switches in fabrics A1/A2 on each blade chassis are stacked vertically via 160Gb interconnect for east/west traffic with fan in/out ratio 1:1
  • Vertical stacking allow single management of two switches in fabric A1 but still allow non disruptive firmware upgrade because fabric A2 is independent fault zone and NIC teaming will handle automated fail over. That's the reason horizontal stacking is not used.
  • Northbound connectivity for north/south traffic is done via VLT port-channel giving 80Gb total upstream bandwidth for each MXL which is fan in/out ratio 2:1. 
  • Top of rack switches (2x Force10 S4810) are formed into single VLT domain (aka virtual chassis) to have loop free topology and utilize full upstream bandwidth. 
Design impact
  • vMotion vmKernel interfaces has to be configured on the same physical NIC (vmnic) on each ESXi host. This will ensure vMotion traffic inside Fabric A without unwanted TOR switch traffic.
  • MultiNIC vMotion cannot be used otherwise vMotion traffic between A1 and A2 would potentially go through TOR switch which is against requirement REQ-002.
  • LACP/Etherchannel Teaming cannot be used because upstream MXL switches are not in stack or VLT. Therefore IP Hash based load balancing cannot be used and single VM traffic will be always routed over single physical NIC and single VM will not be able to handle more than 10Gb.
  • VM Traffic for particular portgroups (L2 segments) should be configured as active/standby consistently across servers for optimal east-west traffic in particular VLAN in non-degraded state and eliminating VM traffic flow across TOR switch.
  • L3 traffic will be routed over TOR.
Alternative
  • Leveraging vSphere MultiNIC vMotion can improve vMotion performance (REQ-001) but would be against REQ-002 because vMotion communication would fly over TOR switch.
Design decision qualities
  • Availability: Great
  • Performance: Very good for vMotion, Good for VM Traffic 
  • Manageability: Good - just two logical switches to manage
  • Scalability: max 6 members in stack
Logical design drawing


Monday, January 26, 2015

DELL 13G servers with PERC H730 finally certified for VSAN

I'm reading and learning about VMware's VSAN a lot. I really believe there will be lot of use cases in the future for software defined distributed storage. However I don't see VSAN momentum right now because of several factors. Three most obvious factors are mentioned below:

  • Maturity
  • TCO
  • Single point of support - if you compare it to traditional SAN based storage vendors support

That's the reason I didn't have a chance and time to play with VMware VSAN so far but I'm getting lot of questions from colleagues, DELL partners, customers and folks from VMware community about the right DELL storage controller for VSAN which can be used on the latest DELL server generation.

DELL 13th server generation was unveiled September 8, 2014. Since then, there was not any DELL storage controller for DELL 13G servers officially supported by VMware for VSAN.
Today I have got information that DELL PERC H730 is officially supported by DELL and VMware for VSAN. For more information look here.
This is really great info for VSAN early adapters planning to use DELL servers. One little advice to all VSAN enthusiasts ... If you are not going to use officially supported VSAN nodes or EVO:Rail appliance and you are designing your own VSAN cluster do it very carefully and don't forget to do PoC before or during design phase and perform design and operational validation tests (aka test plan) before putting VSAN into real production. Be sure you know something about queue depth of adapters (AQLEN) and disks (DQLEN).

If you build your own software defined storage then you are the storage architect with little bit higher risk and responsibility in comparison to classic storage system (this is my opinion). That's the risk of any modern (aka emerging) technology before it's become the commodity. On the other hand, this can be your added value to your customers and there are no doubts there are some benefits.

But never forget why "data centers" are so important and business critical? Because usually we have there very valuable data which must be always available with reasonable performance. Think about 99.999% storage up time with some reasonable response time (3-20ms) for expected IOPS workload.

I wish everybody lot of success with hyper converge systems like VSAN and leave a comment of your hopefully success stories and use cases. And I'm still looking forward for my first VSAN project  :-) 

vCenter SSO: Active Directory as a LDAP Server

Recently I had a need to use secondary Active Directory (VPOD02.example.com) to my vCenter SSO in the lab which is already integrated with Active Directory (VPOD01.example.com).

Here are several facts just to give you brief overview of my lab.

I have two independent vPODs in my lab. Each vPOD has everything what's needed for VMware vSphere infrastructure. I have there dedicated hardware (Compute, Storage, Network), vSphere components like vCenter, SSO, ESXi hosts, Site Recovery Manager, vSphere Replication Appliance, and also Domain Controllers and DNS servers.

vCenter SSO placed in VPOD01 is using Integrated Windows Authentication with Microsoft Active Directory "VPOD01.example.com". Therefore another integration with Microsoft Active Directory "VPOD02.example.com" can be done only via LDAP. Configuration of additional identity source is depicted on the screenshot below.

SSO: Add identity source
Identity source type: Active Directory as a LDAP Server
Identity source settings:
  Name: vpod02.example.com
  Base DN for users: dc=vpod02,dc=example,dc=com
  Domain name: vpod02.example.com
  Domain alias: vpod02
  Base DN for groups: dc=vpod02,dc=example,dc=com
  Primary server URL: ldap://10.2.22.51:389
  Secondary server URL: empty
  Username: administrator@vpod02.example.com
I know that two Microsoft domains can be integrated in to the single "Domain Trust" but because I'm not to much familiar and experienced with Microsoft Active Directory I think that vCenter Single Sign-On capability of multiple identity sources is another nice design option.

Simpler manageability for non-Microsoft oriented vSphere Admin was the primary reason and justification to use this option in my vSphere lab :-)




Monday, January 19, 2015

DELL Force10 : DCB configuration - design decision justification and configuration

Introduction to DCB

Datacenter bridging (DCB) is group of protocols for modern QoS mechanism on Ethernet networks. There are four key DCB protocols described with more details here. In this blog post I'll show you how to configure DCB ETS, PFC and DCBX on Force10 S4810.

ETS (Enhanced Transmission Selection) is bandwidth management allowing reservations of link bandwidth resources when link is congested. DCB QoS is based on 802.1p CoS (Class of Service) which can handle up to 8 class of services (aka priority levels). Any QoS is always done via dedicated queues for different class of services and I/O scheduler which understand configured priorities.

S4810 has 4 queues and 802.1p CoS are by default mapped as outputted bellow …
DCSWCORE-A#show qos dot1p-queue-mapping
Dot1p Priority : 0  1  2  3  4  5  6  7
         Queue : 0  0  0  1  2  3  3  3
Command service-class dot1p-mapping can reconfigure mapping but let's use default mapping for our example. Queue CoS mapping:

  • To Queue 0 are mapped CoS'es 0,1,2
  • To Queue 1 is mapped CoS 3
  • To Queue 2 is mapped CoS 4
  • To Queue 3 are mapped CoS'es 5,6,7

PFC (Priority Flow Control) is nothing else then classic Ethernet flow control protocol but just in one specific 802.1p CoS. Force10 S4810 support PFC on two queues.

Now, let's define our design requirements and constraints for our specific design decision.

Design decision justification

R1: 4Gb guarantee for iSCSI traffic on each 10Gb converged link is required.
R2: Lost-less ethernet is required for iSCSI traffic
R3: 1Gb guarantee for Hypervisor Management network on each 10Gb converged link is required.
R4: 2Gb guarantee for Hypervisor Live Migration network on each 10Gb converged link is required.
R5: 3Gb guarantee for production networks on each 10Gb converged link is required.

C1: We have 10Gb links to edge devices (servers and storage)
C2: We have only four switch queues for DCB on DELL Force10 S4810
C3: We have DCB capable iSCSI storage DELL EqualLogic

A1: No other storage protocol then iSCSI is required
A2: No other network traffic type requires QoS
A3: We have iSCSI traffic in 802.1p CoS 4

Let's design best DCB Mapping based on requirements, constraints and assumptions above. Following priority groups reflects all requirements and constraints.

  • PG0 - Hypervisor management; 10% reservation; lossy ethernet;  CoS 0,1,2 -> Switch Queue 0
  • PG1 - Hypervisor live migrations; 20% reservation; lossy ethernet;  CoS 3 -> Switch Queue 1
  • PG2 - iSCSI; 40% reservation; loss-less ethernet;  CoS 4 -> Switch Queue 2
  • PG3 - Production; 30% reservation; lossy ethernet;  CoS 5,6,7 -> Switch Queue 3

Below is Force10 configuration snippet of DCB mapping to 802.1p CoS'es.
dcb-map converged
  priority-group 0 bandwidth 10 pfc off
  priority-group 1 bandwidth 20 pfc off
  priority-group 2 bandwidth 40 pfc on
  priority-group 3 bandwidth 30 pfc off
  priority-pgid 0 0 0 1 2 3 3 3
DCB map has to be configured on particular Force10 switch port. One particular switch port configuration snippet is below.
interface TenGigabitEthernet 0/6
 no ip address
 mtu 12000
 switchport
 spanning-tree rstp edge-port
 dcb-map converged
!
 protocol lldp
  dcbx port-role auto-downstream
 no shutdown
Following technologies are configured on switch port Te 0/6 by configuration snippet above.

  • DCB ETS and PFC defined in dcb-map converged
  • LLDP  protocol streaming down DCB information configured in the network
  • MTU 12000 (Force10 maximum) because Jumbo Frames are beneficial for iSCSI. iSCSI Jumbo Frames require payload 9000 bytes plus some Ethernet and TCP/IP protocol overhead. MTU 9216 woudl be enough but why not set maximal MTU in the core network? Performance overhead is negligible and we are ready for everything.
  • Edge port configuration for faster port transition to forwarding state

Friday, January 16, 2015

BPDU filter and Forged Transmit on VMware vSwitch to prevent loops

Do you know there is a potential risk of Spanning Tree loop when someone will do virtual bridging between two vNICs inside VMware vSphere VM? Or there can be rogue tool in VM guest OS to send BPDUs from VM to your physical network?

Let's assume we have Rapid STP enabled on our network. Below is typical Force10 configuration snippet for server access ports.
interface TenGigabitEthernet 0/2
 no ip address
 switchport
 spanning-tree rstp rootguard 
[updated]
 spanning-tree rstp edge-port bpduguard shutdown-on-violation
 no shutdown
Same or similar configs are usually used also for ESXi servers. ESXi NICs are used as vSwitch uplinks. It is important to note that VMware vSwitch is not a switch but some kind of port extender so it cannot make a loop in your network and not generating BPDUs at all. However when some VM on top of ESXi is generating BPDUs these BPDUs will arrive to switch ports and your ESXi access switch ports will be blocked by bpduguard feature. That's good from network stability point of view because this is what we want and configure it on switch, right? 

But what will happen on ESXi? VMware ESXi vSwitch will detect that link is down and will do fail over to another uplink in the vSwitch connected to another physical switch port which will be eventually disabled by bpduguard as well. 

Ok, but the problem is that at the end all ESXi physical NICs (vSwitch uplinks) will be down and all VMs running on top of ESXi will be disconnected from the network. That can be a serious problem. The best solution would be to have BPDU Guard functionality on VMware vSwitch but such feature does not surprisingly exist. There is relatively new possibility (since ESX 5.1) to use BPDU Filter which can help as to keep our shared switch port still up and running because no BPDUs arrive to the physical switch port but that's not all we need to protect loops.VMware's BPDU Filter functionality has to be configured for each ESXi host by altering advanced setting Net.BlockGuestBPDU.
Default setting is: Net.BlockGuestBPDU = 0
To allow BPDU Filter: Net.BlockGuestBPDU = 1
Look at ESXi 5.1 and BPDU Guard for full article with all details about this topic.

By the way there is yet another possibility to protect your network against unwanted attacks or misconfigurations. It is generally recommended to use vSphere vSwitch security policy "Forged Transmits" to reject unauthorized MAC addresses. In that case only burn in MAC address (actually virtual BIA VMware assigned) will be allowed to communicate to the network and therefore in-guest virtual networking will be disabled and your network will be protected against potential STP issues like simulating root switch from vSphere environment, For further information about STP attacks from the linux guest look for example here and here

Thursday, January 08, 2015

Can you please tell me more about VN-Link?

Back in 2010 when I have worked for CISCO Advanced Services as UCS Architect, Consultant, Engineer I compiled presentation about CISCO's virtual networking point of view in enterprise environments. Later I published this presentation on Slideshare as "VMware Networking, CISCO Nexus 1000V, and CISCO UCS VM-FEX". I used this presentation to educate CISCO partners and customers because it was really abstract topic for regular network specialists without server virtualization awareness. Please note, that SDN (Software Defined Networking) was not known and abused at that time.

Yesterday I received following Slideshare comment / question about this preso from John A.
Hi David, Thanks for this great material. Can you please tell me more about VN-Link?
I have decided to write blog post instead simple answer on Slideshare comments.

Disclaimer: I don't work for CISCO more then 3 years and I work for competitor (DELL) so this blog is my private opinion and my own understanding of CISCO technology. I might oversimplify some definitions or might be inaccurate on some statements but I believe I'm right conceptually which is the most important for John and other readers interested in CISCO virtual networking technologies for enterprise environments.

VN-link was CISCO marketing and conceptual term which is currently replaced with new term VM-FEX. VM-FEX (Virtual Machine Fabric Extender) is in my opinion better understandable term for CISCO networking professionals familiar with CISCO FEX technology. However VN-link/VM-FEX term is purely conceptual and abstract construct achievable by several different technologies or protocols. I have always imagined VN-LINK as the permanent virtual link between virtual machine virtual NIC (for example VMware vNIC) and CISCO Switch switchport with virtualization capabilities (vEth). When I'm saying switchport virtualization capabilities there are several technologies which can be used to fulfill conceptual idea of VN-link. VN-link conceptual and logical idea is always the same but implementation differs. Generally it is some kind of network overlay and each VN-link (virtual link) is the tunnel implemented by some standard protocol or proprietary technology. CISCO VN-link has one tunnel end point always the same - it is vEth on some model of CISCO Nexus switch. It can be physical Nexus switch (UCS FI, N5K, N7K, ...) or virtual switch Nexus 1000v (N1K). The second tunnel (vNIC) end point can be terminated on several places of your virtualized infrastructure. Below is conceptual view of VN-link or virtual wire if you wish.



So let's deep dive in two different technologies for CISCO VN-LINK tunnels implementations.

Nexus 1000v  (VN-link in software)

VN-link can be implemented in software by CISCO Nexus 1000v. The first VN-link tunel end point (vEth) in this particular case is in Nexus 1000v VSM (Virtual Supervisor Module) and second tunel end point (vNIC) is instantiated in CISCO virtual switch Nexus 1000v VEM (Virtual Ethernet Module) on particular hypervisor. Nexus 1000v architecture is not in scope of this blog post but someone familiar with CISCO FEX technology can imagine VSM as parent Nexus switch and VEM as remote line card (aka FEX - Fabric Extender).

VN-link  in hardware is hardware independent and everything is done in software. Is it Software Defined Networking? I can imagine Nexus 1000v VSS as a form of SDN controller. However when I speak personally with Martin Casado about this analogy on VMworld 2012 he was against it. I agree that Nexus 1000v has smaller scalability then NSX controller but conceptually this analogy works for me quite well. It always depends what scalability is required for particular environment and what kind of scalability, performance, availability and manageability you are looking for. There are always some pros and cons on each technology.

CISCO UCS and hypervisor module (VN-link in hardware)

For VN-link in hardware you must have appropriate CISCO UCS (Unified Computing System) hardware supporting protocol 802.1Qbh. Protocol 802.1Qbh (aka VN-TAG) allows physical switch port and server NIC port virtualization effectively establish virtual link over physical link. This technology dynamically creates vEth interfaces on top of physical switch interface (UCS FI). This vEth is one end point of VN-link (virtual link, virtual wire) established between CISCO UCS FI vEth and virtual machine vNIC. Virtual machine can be virtual server instance on top of any server virtualization platform (VMware vSphere, Microsoft Hyper-V, KVM, etc.) for which CISCO has plugin/module in hypervisor. CISCO VN-TAG (802.1Qbh) protocol is conceptually similar to HP Multichannel VEPA (802.1Qbg) but VN-TAG advantage is that virtual wire can be composed from several segments. This multisegment advantage is leveraged in UCS because one virtual link is combined from two following virtual links. First virtual link is in hardware and second is in software. Below are listed two segments of single virtual link over UCS infrastructure.
  1. From UCS FI vEth to UCS VIC (Virtual Interface Card) logical NIC (it goes through UCS IOM which is effectively normal CISCO physical FEX) 
  2. From UCS VIC logical NIC to VM vNIC (it goes through hypervisor module - software FEX)  
Below are specified hardware components required for VN-link in hardware.
  • CISCO UCS FI (Fabric Interconnects). UCS FI act as the first VN-link tunel end point where vEths exists.
  • CISCO UCS IOM (I/O Module) on each UCS Blade chassis is working as regular FEX   
  • CISCO VIC (Virtual Interface Card) on each server hosting specific hypervisor and allowing NIC partitioning of single physical adapter into logical NICs or HBAs.   

Conclusion

I hope this explenation of VN-link answered John A. question and help others who want to know what VN-link really is. I forget to mention that VN-link primary use case is mostly about operational collaboration between virtualization and network admins. CISCO believes that VN-link allows to keep virtual networking administration to legacy network specialists. To be honest I'm not very optimistic about this idea because it makes infrastructure significantly more complex. In my opinion IT silos (network, compute, storage) has to be merged into one team and modern datacenter administrators must be able to administer servers, storage and networking. However I agree that this is pretty big mental shift and it will definitely take some time. Especially in big enterprise environments.

Dell Virtual Racks

Virtual racks with Dell equipment are available at http://esgvr.dell.com/

Dell Server Virtual Rack

Direct link to DELL Server Virtual Rack where you can see how particular compute systems physically looks.

Dell Storage Virtual Rack

Direct link to DELL Storage Virtual Rack where you can see how particular storage systems physically looks.

Dell Networking Virtual Rack

Direct link to DELL NetworkingVirtual Rack where you can see how particular network systems physically looks.