I am in need of a little education on multiqueue and was wondering if someone here might be able to help me. Given intel igb network driver, it appears I can do something like: tc qdisc del dev eth0 root handle 1: multiq which works and reports 4 bands: dev eth0 root refcnt 4 bands 4/4 But our network is a little more complicated. Above the ethernet we have the bonding driver which is using mode 2 bonding with two ethernet slaves. Then we have vlans on the bond interface. Our production traffic is on a vlan and resource contention is an issue as these are busy machines. It is my understanding that the vlan driver became multiqueue aware in 2.6.32 (we are currently using 2.6.31). It would seem that the first thing the kernel would encounter with traffic headed out would be the vlan interface, and then the bond interface, and then the physical ethernet interface. Is that correct? So with my kernel, I would seem to get no utility from multiq on the ethernet interface if the vlan interface is going to be a single-threaded bottleneck. What about the bond driver? Is it currently multiqueue aware? I am try to get some sort of logical picture of how all these things interact with each other to get things a little more efficient and reduce resource contention in the application while still trying to be efficient in use of network ports/interfaces. If someone feels up to the task of sending a little education my way, I would be most appreciative. There doesn't seem to be a whole lot of documentation floating around about multiqueue other than a blurb of text in the kernel and David's presentation of last year. Thanks! George --
Hi George Vlan is multiqueue aware, but bonding is not unfortunatly at this moment. We could let it being 'multiqueue' (a patch was submitted by Oleg A. Arkhangelsky a while ago), but bonding xmit routine needs to lock a central lock, shared by all queues, so it wont be very efficient... Since this bothers me a bit, I will probably work on this in a near future. (adding real multiqueue capability and RCU to bonding fast paths) Ref: http://permalink.gmane.org/gmane.linux.network/152987 --
The lock is a read lock, so theoretically it should be possible to enter the bonding transmit function on multiple CPUs at the same The question I have about it (and the above patch), is: what does multi-queue "awareness" really mean for a bonding device? How does allocating a bunch of TX queues help, given that the determination of the transmitting device hasn't necessarily been made? I haven't had the chance to acquire some multi-queue network cards and check things out with bonding, so I'm not really sure how it should work. Should the bond look, from a multi-queue perspective, like the largest slave, or should it look like the sum of the slaves? Some of this is may be mode-specific, as well. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com --
Yes, and with 10Gb cards, this is a limiting factor, if you want to send
14 million packets per second ;)
read_lock() is one atomic op, dirtying cacheline
read_unlock() is one atomic op, dirtying cache line again (if contended)
in active-passive mode, RCU use should be really easy, given netdevices
are already RCU compatable. This way, each cpu only reads bonding state,
Well, it is a problem that was also taken into account with vlan, you
might take a look at this commit :
commit 669d3e0babb40018dd6e78f4093c13a2eac73866
Author: Vasu Dev <vasu.dev@intel.com>
Date: Tue Mar 23 14:41:45 2010 +0000
vlan: adds vlan_dev_select_queue
This is required to correctly select vlan tx queue for a driver
supporting multi tx queue with ndo_select_queue implemented since
currently selected vlan tx queue is unaligned to selected queue by
real net_devce ndo_select_queue.
Unaligned vlan tx queue selection causes thrash with higher vlan
tx lock contention for least fcoe traffic and wrong socket tx
queue_mapping for ixgbe having ndo_select_queue implemented.
-v2
As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
vlan net_device_ops to have them with and without
vlan_dev_select_queue
and then select according to real dev ndo_select_queue present or
not
for a vlan net_device. This is to completely skip
vlan_dev_select_queue
calling for real net_device not supporting ndo_select_queue.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
--
I would say that having the number of bands be either the number of cores or 4, whichever is the smaller would be a good start. That is probably fine for GigE. Of the network cards we have that support multiqueue, they are either 4 or 8 bands. In an optimal world, you would have the number of bands that you have available at the physical ethernet level but changing those on the fly in case of a change in available interfaces might be more trouble than it is worth. Four or eight would seem to be a good number to start with as I don't think I have seen an ethernet card with less than 4. If you have fewer than 4 CPUs there probably isn't much utility in having more bands than processors, or maybe that utility rapidly diminishes as the number of bands increases beyond the number of CPUs. At that point you have probably just spent a lot of work building a bigger buffer. I would be happy with 4 bands. I guess it just depends on where you want the bottleneck. If you have 8 bands on the bond driver (another reasonable alternative) and only 4 bands available for output, you have just moved the contention down a layer to between the bond and the ethernet driver. But I am a fan of moving the point of contention as far away from the application interface as possible. If I have one big lock around the bond driver and have 6 things waiting to talk to the network, those are six things that can't be doing anything else. I would rather have the application handle its network task and get back to other things. Now if you have 8 bands of bond and only 4 bands of ethernet, or even one band of ethernet, oh well. Maybe have 1 to 8 bands configurable by an option to the driver that could be set explicitly and defaults to, say, 4? Thanks for taking the time to answer. George --
That would be great and you would have my sincere thanks.. And if anyone is interested, what we do is take a pair of "top of rack" switches and cluster them together so they appear as one switch. Configure a LAG consisting of a port on each physical switch to a pair of bonded interfaces on the server and use mode 2 bonding. In normal operation, both interfaces are active. Should one switch experience a power or interface failure, the server sees one of the interfaces fail but just keeps working on the remaining interface. There is no "failover" event going on. Thanks, George --
What kind of traffic do your machines manage exactly ? On server, you use two ports of the same kind (same number of queues) ? --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "C |
