Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965096AbbLRTm7 (ORCPT ); Fri, 18 Dec 2015 14:42:59 -0500 Received: from mail-vk0-f52.google.com ([209.85.213.52]:34000 "EHLO mail-vk0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964776AbbLRTm6 convert rfc822-to-8bit (ORCPT ); Fri, 18 Dec 2015 14:42:58 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Vijay Pandurangan Date: Fri, 18 Dec 2015 12:42:37 -0700 X-Google-Sender-Auth: ImK0ggbvTQPlqb-gfUPQWxQeloM Message-ID: Subject: Re: [PATCH] veth: don't modify ip-summed; doing so treats packets with bad checksums as good. To: Cong Wang Cc: Nicolas Dichtel , Phil Sutter , Toshiaki Makita , Linux Kernel Network Developers , LKML , Evan Jones , Eric Biederman , Tom Herbert Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3777 Lines: 84 Evan and I have demonstrated this bug on Kubernetes as well, so it's not just a problem in Mesos. (See https://github.com/kubernetes/kubernetes/issues/18898) Sorry about my email client, I've re-sent the patch in another thread from git-email as I should have initially. I'll read through the TX path again to see if we missed something, but I'd love input from anyone else! -- Vijay Pandurangan https://www.twitter.com/vijayp http://www.vijayp.ca On Fri, Dec 18, 2015 at 2:00 PM, Cong Wang wrote: > (Cc'ing Eric B and Tom) > > On Fri, Dec 18, 2015 at 9:54 AM, Vijay Pandurangan wrote: >> Packets that arrive from real hardware devices have ip_summed == >> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or >> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The >> current version of veth will replace CHECKSUM_NONE with >> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to >> a veth device to be delivered to the application. This caused applications >> at Twitter to receive corrupt data when network hardware was corrupting >> packets. > > Yeah, https://reviews.apache.org/r/41158/. > > This is because normally packets to a veth device are _only_ from its pair > device, Mesos network isolator redirects packets from a hardware interface > to veth, which violates this expectation. This is also why no one else sees > this bug. ;) > >> >> We believe this was added as an optimization to skip computing and >> verifying checksums for communication between containers. However, locally >> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as >> written does nothing for them. As far as we can tell, after removing this >> code, these packets are transmitted from one stack to another unmodified >> (tcpdump shows invalid checksums on both sides, as expected), and they are >> delivered correctly to applications. We didn’t test every possible network >> configuration, but we tried a few common ones such as bridging containers, >> using NAT between the host and a container, and routing from hardware >> devices to containers. We have effectively deployed this in production at >> Twitter (by disabling RX checksum offloading on veth devices). > > > I am wondering if there is any other CHECKSUM_NONE case in the tx > path we could miss here. Mesos case is too special not only because > it redirects packets from hardware to veth, but also because it moves > packets from RX path to TX path. > > Eric? Tom? > >> >> This code dates back to the first version of the driver, commit >> ("[NET]: Virtual ethernet device driver"), so I >> suspect this bug occurred mostly because the driver API has evolved >> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix >> packet checksumming") (in December 2010) fixed this for packets that get >> created locally and sent to hardware devices, by not changing >> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming >> in from hardware devices. >> >> Co-authored-by: Evan Jones >> Signed-off-by: Evan Jones >> Cc: Nicolas Dichtel >> Cc: Phil Sutter >> Cc: Toshiaki Makita >> Cc: netdev@vger.kernel.org >> Cc: linux-kernel@vger.kernel.org >> Signed-off-by: Vijay Pandurangan > > Your patch looks good to me but your email client corrupts your patch, > so please resend. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/