Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752676AbcD3UP7 (ORCPT ); Sat, 30 Apr 2016 16:15:59 -0400 Received: from mail-vk0-f51.google.com ([209.85.213.51]:33677 "EHLO mail-vk0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752558AbcD3UP4 convert rfc822-to-8bit (ORCPT ); Sat, 30 Apr 2016 16:15:56 -0400 MIME-Version: 1.0 In-Reply-To: <20160428102953.GA7656@bistromath.localdomain> References: <5720E1F0.9010203@candelatech.com> <1461780469.5102.0.camel@decadent.org.uk> <1461801603.3971874.591751457.2DB91B98@webmail.messagingengine.com> <572155F4.10405@candelatech.com> <20160428102953.GA7656@bistromath.localdomain> From: Vijay Pandurangan Date: Sat, 30 Apr 2016 16:15:35 -0400 X-Google-Sender-Auth: jOe9jyH27fH1pF6OZdg673aOoF8 Message-ID: Subject: =?UTF-8?Q?Re=3A_=5BPATCH_3=2E2_085=2F115=5D_veth=3A_don=E2=80=99t_modify_ip=5Fsumm?= =?UTF-8?Q?ed=3B_doing_so_treats_packets_with_bad_checksums_as_good=2E?= To: Sabrina Dubroca Cc: Ben Greear , Hannes Frederic Sowa , Ben Hutchings , LKML , stable@vger.kernel.org, akpm@linux-foundation.org, "David S. Miller" , Cong Wang , Linux Kernel Network Developers , Evan Jones , Nicolas Dichtel , Phil Sutter , Toshiaki Makita , Cong Wang Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3984 Lines: 96 [oops – resending this because I was using gmail in HTML mode before by accident] There was a discussion on a separate thread about this. I agree with Sabrina fully. I believe veth should provide an abstraction layer that correctly emulates a physical network in all ways. Consider an environment where we have multiple physical computers. Each computer runs one or more containers, each of which has a publicly routable ip address. When adding a new app to the cluster, a scheduler might decide to run this container on any physical machine of its choice, assuming that apps have a way of routing traffic to their backends (we did something similar Google >10 years ago). This is something we might imagine happening with docker and ipv6 for instance. If you have an app, A, which sends raw ethernet frames (the simplest case I could imagine) with TCP data that has invalid checksums to app B, which is receiving it, the behaviour of the system _will be different_ depending upon whether app B is scheduled to run on the same machine as app A or not. This seems like a clear bug and a broken abstraction (especially as the default case), and something we should endeavour to avoid. I do see Ben's point about enabling sw checksum verification as potentially incurring a huge performance penalty (I haven't had a chance to measure it) that is completely wasteful in the vast majority of cases. Unfortunately I just don't see how we can solve this problem in a way that preserves a correct abstraction layer while also avoiding excess work. I guess the first piece of data that would be helpful is to determine just how big of a performance penalty this is. If it's small, then maybe it doesn't matter. On Thu, Apr 28, 2016 at 6:29 AM, Sabrina Dubroca wrote: > Hello, > > 2016-04-27, 17:14:44 -0700, Ben Greear wrote: >> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: >> > Hi Ben, >> > >> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: >> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: >> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote: >> > > > > >> > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let me know. >> > > > I would be careful about this. It causes regressions when sending >> > > > PACKET_SOCKET buffers from user-space to veth devices. >> > > > >> > > > There was a proposed upstream fix for the regression, but it has not gone >> > > > into the tree as far as I know. >> > > > >> > > > http://www.spinics.net/lists/netdev/msg370436.html >> > > [...] >> > > >> > > OK, I'll drop this for now. >> > >> > The fall out from not having this patch is in my opinion a bigger >> > fallout than not having this patch. This patch fixes silent data >> > corruption vs. the problem Ben Greear is talking about, which might not >> > be that a common usage. >> > >> > What do others think? >> > >> > Bye, >> > Hannes >> > >> >> This patch from Cong Wang seems to fix the regression for me, I think it should be added and >> tested in the main tree, and then apply them to stable as a pair. >> >> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a > > Actually, no, this is not really a regression. > > If you capture packets on a device with checksum offloading enabled, > the TCP/UDP checksum isn't filled. veth also behaves that way. What > the "veth: don't modify ip_summed" patch does is enable proper > checksum validation on veth. This really was a bug in veth. > > Cong's patch would also break cases where we choose to inject packets > with invalid checksums, and they would now be accepted as correct. > > Your use case is invalid, it just happened to work because of a > bug. If you want the stack to fill checksums so that you want capture > and reinject packets, you have to disable checksum offloading (or > compute the checksum yourself in userspace). > > Thanks. > > -- > Sabrina