Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753160Ab0F2HBR (ORCPT ); Tue, 29 Jun 2010 03:01:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:24697 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752810Ab0F2HBQ (ORCPT ); Tue, 29 Jun 2010 03:01:16 -0400 Date: Tue, 29 Jun 2010 09:55:59 +0300 From: "Michael S. Tsirkin" To: Sridhar Samudrala Cc: Aristeu Rozanski , Herbert Xu , Juan Quintela , "David S. Miller" , kvm@vger.kernel.org, virtualization@lists.osdl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ykaul@redhat.com, markmc@redhat.com Subject: Re: [PATCHv2] vhost-net: add dhclient work-around from userspace Message-ID: <20100629065559.GB3603@redhat.com> References: <20100628100807.GA30685@redhat.com> <1277763581.23755.16.camel@w-sridhar.beaverton.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1277763581.23755.16.camel@w-sridhar.beaverton.ibm.com> User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4398 Lines: 118 On Mon, Jun 28, 2010 at 03:19:41PM -0700, Sridhar Samudrala wrote: > On Mon, 2010-06-28 at 13:08 +0300, Michael S. Tsirkin wrote: > > Userspace virtio server has the following hack > > so guests rely on it, and we have to replicate it, too: > > > > Use port number to detect incoming IPv4 DHCP response packets, > > and fill in the checksum for these. > > > > The issue we are solving is that on linux guests, some apps > > that use recvmsg with AF_PACKET sockets, don't know how to > > handle CHECKSUM_PARTIAL; > > The interface to return the relevant information was added > > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > > and older userspace does not use it. > > One important user of recvmsg with AF_PACKET is dhclient, > > so we add a work-around just for DHCP. > > > > Don't bother applying the hack to IPv6 as userspace virtio does not > > have a work-around for that - let's hope guests will do the right > > thing wrt IPv6. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > > > Dave, I'm going to put this patch on the vhost tree, > > no need for you to bother merging it - you'll get > > it with a pull request. > > > > > > drivers/vhost/net.c | 44 +++++++++++++++++++++++++++++++++++++++++++- > > 1 files changed, 43 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > index cc19595..03bba6a 100644 > > --- a/drivers/vhost/net.c > > +++ b/drivers/vhost/net.c > > @@ -24,6 +24,10 @@ > > #include > > #include > > > > +#include > > +#include > > +#include > > + > > #include > > > > #include "vhost.h" > > @@ -186,6 +190,44 @@ static void handle_tx(struct vhost_net *net) > > unuse_mm(net->dev.mm); > > } > > > > +static int peek_head(struct sock *sk) > > This routine is doing more than just peeking the head of sk's receive > queue. May be this should be named similar to what qemu calls > 'work_around_broken_dhclient()' > > +{ > > + struct sk_buff *skb; > > + > > + lock_sock(sk); > > + skb = skb_peek(&sk->sk_receive_queue); > > + if (unlikely(!skb)) { > > + release_sock(sk); > > + return 0; > > + } > > + /* Userspace virtio server has the following hack so > > + * guests rely on it, and we have to replicate it, too: */ > > + /* Use port number to detect incoming IPv4 DHCP response packets, > > + * and fill in the checksum. */ > > + > > + /* The issue we are solving is that on linux guests, some apps > > + * that use recvmsg with AF_PACKET sockets, don't know how to > > + * handle CHECKSUM_PARTIAL; > > + * The interface to return the relevant information was added in > > + * 8dc4194474159660d7f37c495e3fc3f10d0db8cc, > > + * and older userspace does not use it. > > + * One important user of recvmsg with AF_PACKET is dhclient, > > + * so we add a work-around just for DHCP. */ > > + if (skb->ip_summed == CHECKSUM_PARTIAL && > > + skb_headlen(skb) >= skb_transport_offset(skb) + > > + sizeof(struct udphdr) && > > + udp_hdr(skb)->dest == htons(68) && > > + skb_network_header_len(skb) >= sizeof(struct iphdr) && > > + ip_hdr(skb)->protocol == IPPROTO_UDP && > > + skb->protocol == htons(ETH_P_IP)) { > > Isn't it more logical to check for skb->protocol, followed by ip_hdr and > then udp_hdr? Yes, but then we'll only exit after checking them all. My way we'll almost always exit after port check. > > + skb_checksum_help(skb); > > + /* Restore ip_summed value: tun passes it to user. */ > > + skb->ip_summed = CHECKSUM_PARTIAL; > > + } > > + release_sock(sk); > > + return 1; > > +} > > + > > /* Expects to be always run from workqueue - which acts as > > * read-size critical section for our kind of RCU. */ > > static void handle_rx(struct vhost_net *net) > > @@ -222,7 +264,7 @@ static void handle_rx(struct vhost_net *net) > > vq_log = unlikely(vhost_has_feature(&net->dev, VHOST_F_LOG_ALL)) ? > > vq->log : NULL; > > > > - for (;;) { > > + while (peek_head(sock->sk)) { > > head = vhost_get_vq_desc(&net->dev, vq, vq->iov, > > ARRAY_SIZE(vq->iov), > > &out, &in, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/