Sunday, April 20, 2014

Potential Network Black Hole Issue

When I do vSphere and hardware infrastructure health checks very often I meet misconfigured networks usually but not only in blade server environments. That's the reason I've decided to write blog post about this issue. The issue is general and should be considered and checked for any vendor solution but because I'm very familiar with DELL products I'll use DELL blade system and I/O modules to show you deeper specification and configurations.  

Blade server chassis typically have switch modules as depicted on figure below.


When blade chassis switch modules are connected to another network layer (aggregation or core) than there is possibility of network black hole which I would like to discuss deeply on this post.

Let's assume you will lost single uplink from I/O module A. This situation is depicted below.


In this situation there is not availability problem because network traffic can flow via second I/O module uplink port. Indeed, there is only half of uplink bandwidth so there is potential throughput degradation and therefore congestion can occur but everything works and it is not availability issue.

But what happen when second I/O switch module uplink port fails? Look at figure below.

   
If I/O switch module is in normal switch mode then uplink ports are in link-down state but downlink server ports are in link-up state and therefore ESX host NIC ports are also up and ESX teaming don't know that something is wrong down the path and traffic is sending to both NIC uplinks. We call this situation "black hole" because traffic routed via NIC1 will never reach the destination and your infrastructure is in trouble.

To overcome this issue some I/O modules in blade systems can be configured as I/O Aggregator. Some other modules are designed as I/O Aggregators by default and it cannot be changed.

Here are examples of DELL blade switch modules which are switches by default but can be configure to work as I/O Aggregators (aka Simple Switch Mode):
  • DELL PowerConnect M6220
  • DELL PowerConnect M6348
  • DELL PowerConnect M8024-k
Example of implicit I/O Aggregator is DELL Force10 IOA.

Another I/O Aggregator like option is to use Fabric Extender architecture implemented in DELL Blade System as CISCO Nexus B22. CISCO FEX is little bit different topic but is also help you to effectively avoid our black hole issue.

When you use "simple switch mode"you have limited configuration possibilities. For example you can use the module just for L2 and you cannot use advanced features like access control lists (ACLs). That can be reason you would like to leave I/O module in normal switch mode. But even you have I/O modules in normal switch mode you can configure your switch  to overcome potential "black hole" issue. Here are examples of DELL blade switches and technologies to overcome this issue:
  • DELL PowerConnect M6220 (Link Dependency)
  • DELL PowerConnect M6348 (Link Dependency)
  • DELL PowerConnect M8024-k (Link Dependency)
  • DELL Force10 MXL (Uplink Failure Detection)
  • CISCO 3130X (Link State Tracking)
  • CISCO 3130G (Link State Tracking)
  • CISCO 3032 (Link State Tracking)
  • CISCO Nexus B22 (Fabric Extender)
If you leverage any of technology listed above then link states of I/O module switch uplink ports are synchronized to the configured downlink ports and ESX teaming driver can effectively do ESX uplink high availability. Such situation is depicted in figure below.


Below are examples of detail CLI configurations of some port tracking technologies described above.


DELL PowerConnect Link Dependency

Link dependency configuration on both blade access switch modules can solve "Network Black Hole" issue.

 ! Server port configuration  
 interface Gi1/0/1  
 switchport mode general  
 switchport general pvid 201  
 switchport general allowed vlan add 201  
 switchport general allowed vlan add 500-999 tagged  
 ! Physical Uplink port configuration   
 interface Gi1/0/47  
 channel-group 1 mode auto  
 exit  
 ! Physical Uplink port configuration   
 interface Gi1/0/48  
 channel-group 1 mode auto  
 exit  
 ! Logical Uplink port configuration (LACP Port Channel)  
 interface port-channel 1  
 switchport mode trunk  
 exit   
 ! Link dependency configuration  
 link-dependency group 1  
 add Gi1/0/1-16  
 depends-on port-channel 1  

Force10 Uplink Failure Detection (UFD)

Force 10 call link dependency feature UFD and here is configuration example

 FTOS#show running-config uplink-state-group  
 !  
 uplink-state-group 1  
 downstream TenGigabitEthernet 0/0  
 upstream TenGigabitEthernet 0/1  
 FTOS#  

The status of UFD can be displayed by "show configuration" command

 FTOS(conf-uplink-state-group-16)# show configuration  
 !  
 uplink-state-group 16  
 description test  
 downstream disable links all  
 downstream TengigabitEthernet 0/40  
 upstream TengigabitEthernet 0/41  
 upstream Port-channel 8  

CISCO Link State Tracking

Link state tracking is a feature available on Cisco switches to manage the link state of downstream ports (ports connected to Servers) based on the status of upstream ports (ports connected to Aggregation/Core switches).

11 comments:

GregSchulz said...

Great post David, thanks for sharing...

Anonymous said...

Thanks for your Blog and so interesting.

I just want to check if i have 2x10Gb NIC in M620, and npar to total 8 link, how should i can configure the Link-state for that particular partition?

Thanks,
Johnny

David Pasek said...

Hi Johnny,
NPAR doesn't have any impact on link dependency (aka uplink-failure-detection, link-state). Dependency is between uplink and downlink switch ports. All NPAR logical NICs are connected to the same physical switch port which act as downlink port in link dependency configuration.

David.

ritchie james said...

Hi David,

Thanks for your post.Can you please advise what feature need to be enabled in M8428-K switch

Rgds
Ritchie James

David Pasek said...

Hi Ritchie.

M8428-k uses track interfaces, see an example below ...

interface InTengigabitEthernet 0/16
track enable
track interface port-channel 1
mtu 9208
fcoeport
switchport
switchport mode converged
switchport converged allowed vlan all
no shutdown

In this particular case port-channel 1 is uplink and interface 0/16 is downlink which is in down state in case port-channel 1 is down.

Anonymous said...

Hi David
Thanks for your immediate reply.actually i used this tracking feature you mentioned but the problem is that when i removed the 10g network cable actually it should make the internal 10 ports down only.but in this case its making the fcoe port also down.



thanks
Ritchie James

David Pasek said...

Ritchie,

I don't agree with your statement that FCoE port should stay up when downstream interface is shutdown by interface tracking. It is the same ethernet port also understanding FCoE.

However your secondary SAN should stay up so SAN is degraded but still working because of MPIO, right?

DELL call this approach as switch agnostic.

To be honest this scenario is better handled by proprietary solutions invented by HP (VEPA) and CISCO (VN-TAG/VN-link). These solutions are switch dependent. There are virtual ethernet ports on physical switch ports and more virtual interfaces on CNAs therefore there are multiple independent virtual cables between CNA port and switch port. You can read more about CISCO VN-link at
http://blog.igics.com/2015/01/can-you-please-tell-me-more-about-vn.html

If there are virtual independent cables to the server they can be handled separately. Does it make sense?

Anonymous said...

Hi David,

sorry for the late reply.I got your point.Thanks for your quick responses

Regards
Ritchie James

Terminal said...

This is an excellent writeup! I had been trying to explain this issue to people and this article perfectly details it with pictures!

I have one question, do you know if these changes can be made on Force10MXL's live without impact? And what the commands would be to link all the internal interfaces to an external PortChannel?

David Pasek said...

Hi Terminal. Actually you have two questions ;-)

Answer 1/ Yes, on Force10 MXL it can be configured live.

Answer 2/ Command should looks like
uplink-state-group 1
downstream TenGigabitEthernet 0/0-32
upstream port-channel 1

Don't forget to test it at least some time. Plan the test in maintenance window.

Hope this helps.

Terminal said...

Hah, yes..You got me :P

Excellent information! Thank you very much. I actually love the detail that you've put into this and it will save me time going forward when explaining this to others.

Thanks!
Ben