Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753173AbbEHPuq (ORCPT ); Fri, 8 May 2015 11:50:46 -0400 Received: from mail-ig0-f173.google.com ([209.85.213.173]:33810 "EHLO mail-ig0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751966AbbEHPun (ORCPT ); Fri, 8 May 2015 11:50:43 -0400 Message-ID: <554CDB4D.5050108@gmail.com> Date: Fri, 08 May 2015 08:50:37 -0700 From: Alexander Duyck User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Denys Vlasenko , "David S. Miller" CC: Jiri Pirko , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, netfilter-devel@vger.kernel.org Subject: Re: [PATCH] net: deinline netif_tx_stop_queue() and netif_tx_stop_all_queues() References: <1430998870-1453-1-git-send-email-dvlasenk@redhat.com> <554B9D82.80101@gmail.com> <554C85B2.1010605@redhat.com> In-Reply-To: <554C85B2.1010605@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5584 Lines: 117 On 05/08/2015 02:45 AM, Denys Vlasenko wrote: > On 05/07/2015 07:14 PM, Alexander Duyck wrote: >> On 05/07/2015 04:41 AM, Denys Vlasenko wrote: >>> These functions compile to ~60 bytes of machine code each. >>> >>> With this .config: http://busybox.net/~vda/kernel_config >>> there are 617 calls to netif_tx_stop_queue() >>> and 49 calls to netif_tx_stop_all_queues() in vmlinux. >>> >>> Code size is reduced by 27 kbytes: >>> >>> text data bss dec hex filename >>> 82426986 22255416 20627456 125309858 77813a2 vmlinux.before >>> 82399481 22255416 20627456 125282353 777a831 vmlinux >>> >>> It may seem strange that a seemingly simple code like one in >>> netif_tx_stop_queue() compiles to ~60 bytes of code. >>> Well, it's true. Here's its disassembly: >>> >>> netif_tx_stop_queue: > ... >>> 55 push %rbp >>> be 7a 18 00 00 mov $0x187a,%esi >>> 48 c7 c7 50 59 d8 85 mov $.rodata+0x1d85950,%rdi >>> 48 89 e5 mov %rsp,%rbp >>> e8 54 5a 7d fd callq >>> 48 c7 c7 5f 59 d8 85 mov $.rodata+0x1d8595f,%rdi >>> 31 c0 xor %eax,%eax >>> e8 b0 47 48 00 callq >>> eb 09 jmp >> This is the WARN_ON action. One thing you might try doing is moving >> this to a function of its own instead of moving the entire thing >> out of being an inline. > If WARN_ON check would be moved into a function, the call overhead > would still be there, while each callsite will be larder than with > this patch. > >> You may find you still get most >> of the space savings as I wonder if the string for the printk >> isn't being duplicated for each caller. > Yes, strings are duplicated: > > $ strings vmlinux0 | grep 'cannot be called before register_netdev' > 6netif_stop_queue() cannot be called before register_netdev() > 6tun: netif_stop_queue() cannot be called before register_netdev() > 6cc770: netif_stop_queue() cannot be called before register_netdev() > 63c589_cs: netif_stop_queue() cannot be called before register_netdev() > 63c574_cs: netif_stop_queue() cannot be called before register_netdev() > 6typhoon netif_stop_queue() cannot be called before register_netdev() > 6axnet_cs: netif_stop_queue() cannot be called before register_netdev() > 6pcnet_cs: netif_stop_queue() cannot be called before register_netdev() > ... > > However, they amount only to ~5.7k out of 27k: > > $ strings vmlinux0 | grep 'cannot be called before register_netdev' | wc -c > 5731 > Yeah, they are probably coalesced per .c file since the compiler cannot span files. The average driver probably calls it 2 or more times which is why it is only about 1/5 instead of 1/2 of the total bytecount. Also the count above excludes carriage returns and NULL characters. >>> f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi) >> This is your set bit operation. If you were to drop the whole WARN_ON >> then this is the only thing you would be inlining. > It's up to networking people to decide. I would happily send a patch which drops > WARN_ON if they say that's ok with them. Davem? This was added under commit 18543a643fae6 ("net: Detect and ignore netif_stop_queue() calls before register_netdev()"). I think the time for allowing drivers to ignore the WARN_ON has passed and at this point they should be strongly encouraged to fix the issue via a NULL pointer dereference if they still haven't gotten the issue resolved so we can track them down and fix them. I'd say add a comment here in case someone triggers this and does some debugging, but the WARN_ON at this point has proven it is too expensive. >> That is only 8 bytes in size which would probably be comparable to the callq >> and register sorting needed for a function call. > "lock or" in my tests takes 21 cycles even on exclusively cached > L1 data cache line. Added "call+ret" is 4-5 cycles. It is an expensive instruction, but pushing it out into a separate function just adds that much more overhead. >> Have you done any performance testing on this change? > No. The most likely thing to exercise this would probably be something like a standard pktgen test. It should be able to put enough stress on a single queue for the function to be called frequently. >> I suspect there will likely be a noticeable impact some some tests. > (1) It's *transmit off* operation. Usually it means that we have to turn > transmit off because hw TX queue is full. So the bottleneck is likely > the network, not the CPU. That is true. However there are still scenarios such as pktgen where we would be triggering this often and I would prefer to keep it as fast as possible since it is still kind of hard to maintain line rate 10Gb/s for some of my traffic generator setups. > > (2) It was auto-deinlined by gcc anyway. We already were unknownigly > using the uninlined version for some time. Apparently, it wasn't noticed. Depends on what cases where uninlined. I believe the compiler is making the decision per .c file so each driver is handling it differently. For example it looks like the ixgbe driver was still inlining this in my -O2 build, so that is one case that was not. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/