Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932891Ab1ERQgu (ORCPT ); Wed, 18 May 2011 12:36:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49967 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757012Ab1ERQgs (ORCPT ); Wed, 18 May 2011 12:36:48 -0400 Date: Wed, 18 May 2011 19:36:33 +0300 From: "Michael S. Tsirkin" To: Shirley Ma Cc: =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Ben Hutchings , David Miller , Eric Dumazet , Avi Kivity , Arnd Bergmann , netdev@vger.kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V5 2/6 net-next] netdevice.h: Add zero-copy flag in netdevice Message-ID: <20110518163633.GB22001@redhat.com> References: <1305588738.3456.65.camel@localhost.localdomain> <1305671318.10756.49.camel@localhost.localdomain> <20110518103819.GL7589@redhat.com> <20110518111734.GO7589@redhat.com> <1305729507.32080.6.camel@localhost.localdomain> <20110518154746.GA21378@redhat.com> <1305734857.32080.53.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1305734857.32080.53.camel@localhost.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3043 Lines: 78 On Wed, May 18, 2011 at 09:07:37AM -0700, Shirley Ma wrote: > On Wed, 2011-05-18 at 18:47 +0300, Michael S. Tsirkin wrote: > > On Wed, May 18, 2011 at 07:38:27AM -0700, Shirley Ma wrote: > > > On Wed, 2011-05-18 at 13:40 +0200, Michał Mirosław wrote: > > > > >> >> Not more other restrictions, skb clone is OK. > > pskb_expand_head() > > > > looks > > > > >> >> OK to me from code review. > > > > >> > Hmm. pskb_expand_head calls skb_release_data while keeping > > > > >> > references to pages. How is that ok? What do I miss? > > > > >> It's making copy of the skb_shinfo earlier, so the pages > > refcount > > > > >> stays the same. > > > > > Exactly. But the callback is invoked so the guest thinks it's ok > > to > > > > > change this memory. If it does a corrupted packet will be sent > > out. > > > > > > > > Hmm. I tool a quick look at skb_clone(), and it looks like this > > > > sequence will break this scheme: > > > > > > > > skb2 = skb_clone(skb...); > > > > kfree_skb(skb) or pskb_expand_head(skb); /* callback called */ > > > > [use skb2, pages still referenced] > > > > kfree_skb(skb); /* callback called again */ > > > > > > > > This sequence is common in bridge, might be in other places. > > > > > > > > Maybe this ubuf thing should just track clones? This will make it > > work > > > > on all devices then. > > > > > > The callback was only invoked when last reference of skb was gone. > > > skb_clone does increase skb refcnt. I tested tcpdump on lower > > device, it > > > worked. > > > > Right, it will normally work, but two issues I think you miss: > > 1. malicious guest can change the memory between when it is sent out > > by > > device and consumed by tcpdump, so you will see different things > > (not sure how important this is). > > 2. if tcpdump stops consuming stuff from the packet socket (it's > > userspace, can't be trusted) then we won't get a callback for > > page potentially forever, guest networking will get blocked etc. > > > For the sequence of: > > > > > > skb_clone -> last refcnt + 1 > > > kfree_skb() or pskb_expand_head -> callback not called > > > kfree_skb() -> callback called > > > > > > I will check page refcount to see whether it's balanced. > > > > > > Thanks > > > shirley > > > > > > pskb_expand_head is a problem anyway I think as it > > can hang on to pages after it calls release_data. > > Then guest will modify these pages and you get trash there. > > This can be avoid by allowing pskb_expand_head in fastpath only, I > think. But not sure whether tcpdump can still work with this. > > Thanks > Shirley Yes, I agree. I think for tcpdump, we really need to copy the data anyway, to avoid guest changing it in between. So we do that and then use the copy everywhere, release the old one. Hmm? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/