Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755381AbZAGMuP (ORCPT ); Wed, 7 Jan 2009 07:50:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751020AbZAGMt6 (ORCPT ); Wed, 7 Jan 2009 07:49:58 -0500 Received: from mail-ew0-f17.google.com ([209.85.219.17]:62613 "EHLO mail-ew0-f17.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751003AbZAGMt5 (ORCPT ); Wed, 7 Jan 2009 07:49:57 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=B+wbCl8bgC5U3ssaCEIiC0Hsz+YtS9pW6DKQ7rQGHc5TITOV6oWRWwmutEldPLptK5 A83N8MbnkfXx5Fm6hmF52s4DYpyn9FuKzTvmz5FimcAsWkmrW/TpkXxZsGPEVLaPxvTz Kp85WQpgIvz2AEnVZG8QusLqeNJq7td5PDnh0= Date: Wed, 7 Jan 2009 12:49:46 +0000 From: Jarek Poplawski To: Jens Axboe Cc: Willy Tarreau , Changli Gao , Evgeniy Polyakov , Herbert Xu , linux-kernel@vger.kernel.org, netdev@vger.kernel.org Subject: Re: Data corruption issue with splice() on 2.6.27.10 Message-ID: <20090107124946.GA9677@ff.dom.local> References: <20081224152841.GB13113@1wt.eu> <20090106085442.GA9513@ff.dom.local> <20090106094138.GE25644@1wt.eu> <20090106100112.GB9513@ff.dom.local> <20090106155715.GA28783@1wt.eu> <20090107093915.GA6899@ff.dom.local> <20090107122205.GA6051@1wt.eu> <20090107123153.GA9597@ff.dom.local> <20090107123504.GN32491@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090107123504.GN32491@kernel.dk> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4782 Lines: 157 On Wed, Jan 07, 2009 at 01:35:04PM +0100, Jens Axboe wrote: > On Wed, Jan 07 2009, Jarek Poplawski wrote: > > On Wed, Jan 07, 2009 at 01:22:05PM +0100, Willy Tarreau wrote: > > > [ CCing Evgeniy and Herbert who also participate to the thread ] > > ... > > > Well, I've just tested it. It did not fix the problem but made it worse. > > ... > > > > Terrible mistake! Here is take 2. > > Not sure what this: > > > +static inline struct page *linear_to_page(struct page *page, unsigned int len, > > + unsigned int offset) > > +{ > > + struct page *p = alloc_pages(GFP_KERNEL, 0); > > + > > + if (!p) > > + return NULL; > > + memcpy((void *)p + offset, (void *)page + offset, len); > > is trying to do. I'm assuming you want to copy the page contents? If so, > you'd want something like > > memcpy(page_address(p) + offset, page_address(page) + offset, len); > > with possible kmaps for 'page'. My BAD!!! Thanks! > > Irregardless of that particular oddity, I don't think this is the right > path to take at all. We need to delay the pipe buffer consumption until > the appropriate time. Hmm... in any case: take 3 Jarek P. --- net/core/skbuff.c | 41 +++++++++++++++++++++++++++-------------- 1 files changed, 27 insertions(+), 14 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5110b35..6e43d52 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -73,17 +73,13 @@ static struct kmem_cache *skbuff_fclone_cache __read_mostly; static void sock_pipe_buf_release(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - struct sk_buff *skb = (struct sk_buff *) buf->private; - - kfree_skb(skb); + put_page(buf->page); } static void sock_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf) { - struct sk_buff *skb = (struct sk_buff *) buf->private; - - skb_get(skb); + get_page(buf->page); } static int sock_pipe_buf_steal(struct pipe_inode_info *pipe, @@ -1334,9 +1330,19 @@ fault: */ static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i) { - struct sk_buff *skb = (struct sk_buff *) spd->partial[i].private; + put_page(spd->pages[i]); +} - kfree_skb(skb); +static inline struct page *linear_to_page(struct page *page, unsigned int len, + unsigned int offset) +{ + struct page *p = alloc_pages(GFP_KERNEL, 0); + + if (!p) + return NULL; + memcpy(page_address(p) + offset, page_address(page) + offset, len); + + return p; } /* @@ -1344,16 +1350,23 @@ static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i) */ static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page, unsigned int len, unsigned int offset, - struct sk_buff *skb) + struct sk_buff *skb, int linear) { if (unlikely(spd->nr_pages == PIPE_BUFFERS)) return 1; + if (linear) { + page = linear_to_page(page, len, offset); + if (!page) + return 1; + } + spd->pages[spd->nr_pages] = page; spd->partial[spd->nr_pages].len = len; spd->partial[spd->nr_pages].offset = offset; - spd->partial[spd->nr_pages].private = (unsigned long) skb_get(skb); spd->nr_pages++; + get_page(page); + return 0; } @@ -1369,7 +1382,7 @@ static inline void __segment_seek(struct page **page, unsigned int *poff, static inline int __splice_segment(struct page *page, unsigned int poff, unsigned int plen, unsigned int *off, unsigned int *len, struct sk_buff *skb, - struct splice_pipe_desc *spd) + struct splice_pipe_desc *spd, int linear) { if (!*len) return 1; @@ -1392,7 +1405,7 @@ static inline int __splice_segment(struct page *page, unsigned int poff, /* the linear region may spread across several pages */ flen = min_t(unsigned int, flen, PAGE_SIZE - poff); - if (spd_fill_page(spd, page, flen, poff, skb)) + if (spd_fill_page(spd, page, flen, poff, skb, linear)) return 1; __segment_seek(&page, &poff, &plen, flen); @@ -1419,7 +1432,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, if (__splice_segment(virt_to_page(skb->data), (unsigned long) skb->data & (PAGE_SIZE - 1), skb_headlen(skb), - offset, len, skb, spd)) + offset, len, skb, spd, 1)) return 1; /* @@ -1429,7 +1442,7 @@ static int __skb_splice_bits(struct sk_buff *skb, unsigned int *offset, const skb_frag_t *f = &skb_shinfo(skb)->frags[seg]; if (__splice_segment(f->page, f->page_offset, f->size, - offset, len, skb, spd)) + offset, len, skb, spd, 0)) return 1; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/