Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751852Ab0HYFzI (ORCPT ); Wed, 25 Aug 2010 01:55:08 -0400 Received: from mail-ww0-f44.google.com ([74.125.82.44]:55105 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751742Ab0HYFzF (ORCPT ); Wed, 25 Aug 2010 01:55:05 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=CV/ii/Zt13h4uu/bjKZPmlPBmvg4XeDIxRlJvPq5uhD4KO3HWSttX7+iqfeTIXHgNV h0c+V41ppDGX90A168JE3L/laupB3RuVpxb9I2ttNEJHif9EtU3Wamc0kTvMYm+L9dHB Xeb++1J4O+ngREJ5XsYbWBIs09DEO9y7vDS0k= Subject: Re: RFC: MTU for serving NFS on Infiniband From: Eric Dumazet To: Stephen Hemminger Cc: Ben Hutchings , Marc Aurele La France , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, "David S. Miller" , Alexey Kuznetsov , "Pekka Savola (ipv6)" , James Morris , Hideaki YOSHIFUJI , Patrick McHardy In-Reply-To: <20100824153920.63360072@s6510> References: <20100823080543.319143e3@nehalam> <1282672647.2302.15.camel@achroite.uk.solarflarecom.com> <1282688441.22839.34.camel@localhost> <20100824153920.63360072@s6510> Content-Type: text/plain; charset="UTF-8" Date: Wed, 25 Aug 2010 07:54:58 +0200 Message-ID: <1282715698.2467.681.camel@edumazet-laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1348 Lines: 36 Le mardi 24 août 2010 à 15:39 -0700, Stephen Hemminger a écrit : > IF NFS server is smart enough to generate: > Header (skb) + one or more pages in fragment list > then IP fragmentation could do fragmentation by allocating > new headers skb (small) and assigning the same pages to > multiple skb's using page ref count. > > It obviously isn't working that way. > It is, but ip_append_data() is allocating a huge head if MTU is huge. NFS is trying to build paged skb, to avoid order-X allocations (X > 0) > The whole problem is moot because NFS over UDP has known data corruption > issues in the face of packet loss. The sequence number of the IP fragment > can easily wrap around causing old data to be grouped with new data and > the UDP checksum is so weak that the resulting UDP packet will be consumed by the NFS > client ans passed to the user application as corrupted disk block. > > DON'T USE NFS OVER UDP! But Marc point is using a big MTU, so that no IP fragmentation is needed. All UDP applications using MSG_MORE will hit the order-2 allocations if MTU=9000 for example... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/