Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1DE0C433EF for ; Mon, 13 Dec 2021 15:56:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240232AbhLMP4e (ORCPT ); Mon, 13 Dec 2021 10:56:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237582AbhLMP4d (ORCPT ); Mon, 13 Dec 2021 10:56:33 -0500 Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D9C5C061574 for ; Mon, 13 Dec 2021 07:56:33 -0800 (PST) Received: by mail-yb1-xb34.google.com with SMTP id d10so39520936ybn.0 for ; Mon, 13 Dec 2021 07:56:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=TJvT79EfcA7O0EY/8/ibhfo/O9cdo3pl3yfd1aNngOA=; b=TmeDngSyR8aTmXx91HH/CW8JsF4ov1PSJRE8TgYYvD+FzXxvUQCtJkq3lPBmDRMwt2 SzdNYNa+tvX6Z+KN77/kqG9/aXuFXfTqKXmQcx0EbmgWkSPGMaJDy8XNa5lE/dp95n4M pvXG6tVJqE69hSvjV45bmynX/IbKKgWuEJar6MApx+nvX1KQQU4iEnk18X87gNvPSso+ qojOfHej71CFMlOk7RheAe23e//oYCMJTb0GH+AO25NwEbILS3hRe2cqB2seOntW+X2i 1mh8z9Br5IjxUvVPiy96EOX7D6YyRnjVNgjVBX38V63b00E/tZn8Ejl9vX+VYlPfnM2u JpAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TJvT79EfcA7O0EY/8/ibhfo/O9cdo3pl3yfd1aNngOA=; b=0VqC8MiHQ78w1R2gZuAuPKfdOwPo/GGTki1FUXgX9w9rz6l0uj3MEYxkeIEJppPhyn 5dl2cv11lOVM5/n3PHDxni8Xr1135q+CMsJhLLtkOa0NSuF8qipaECLPNv6RMBLeUIkv WKT1z7Pdcxv39pbw/a+q6WwJxG1A2TzdaooGU5wHJFy3//PbCPG4uCBvqws1yFulQEEi MKRMNvGPlejiXVHAiOThjX0+KBhu8DTtXcyr5sPftaGOqNFeAYakQLpljygMMZIOLcMf 9kHMX32grKq0Gml2WQqzKmXV4Db3KB8JeUSG5sQ6RX025G08vuxG2mVn3Z5gRBxwgIJe HLDg== X-Gm-Message-State: AOAM53369b4cCRGCntyOaAKDPe0aKJvx0n5RwT8TTcgmJAIR+e8hdurJ +rKTtLC2aEfs+xQfmP3BNyQVCAsIXOUKY5DFdTgN9A== X-Google-Smtp-Source: ABdhPJxlK8jZHDGMWgi6JUell0daIec/wGzGFUvgQQWqotEzT3/6vJvJEXBx5S7k7Nm+v4Yjje3FKcYQK583wCVjouo= X-Received: by 2002:a25:9d82:: with SMTP id v2mr35214440ybp.383.1639410992399; Mon, 13 Dec 2021 07:56:32 -0800 (PST) MIME-Version: 1.0 References: <45d12aa0c95049a392d52ff239d42d83@AcuMS.aculab.com> <52edd5fd-daa0-729b-4646-43450552d2ab@intel.com> <96b6a476c4154da3bd04996139cd8a6d@AcuMS.aculab.com> In-Reply-To: <96b6a476c4154da3bd04996139cd8a6d@AcuMS.aculab.com> From: Eric Dumazet Date: Mon, 13 Dec 2021 07:56:20 -0800 Message-ID: Subject: Re: [PATCH] x86/lib: Remove the special case for odd-aligned buffers in csum_partial.c To: David Laight Cc: Dave Hansen , Noah Goldstein , "tglx@linutronix.de" , "mingo@redhat.com" , Borislav Petkov , "dave.hansen@linux.intel.com" , X86 ML , "hpa@zytor.com" , "peterz@infradead.org" , "alexanderduyck@fb.com" , open list , netdev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 13, 2021 at 7:37 AM David Laight wrote: > > From: Dave Hansen > > Sent: 13 December 2021 15:02 > .c > > > > On 12/13/21 6:43 AM, David Laight wrote: > > > There is no need to special case the very unusual odd-aligned buffers. > > > They are no worse than 4n+2 aligned buffers. > > > > > > Signed-off-by: David Laight > > > --- > > > > > > On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte > > > checksum. > > > That is just measuring the main loop with an lfence prior to rdpmc to > > > read PERF_COUNT_HW_CPU_CYCLES. > > > > I'm a bit confused by this changelog. > > > > Are you saying that the patch causes a (small) performance regression? > > > > Are you also saying that the optimization here is not worth it because > > it saves 15 lines of code? Or that the misalignment checks themselves > > add 2 or 3 cycles, and this is an *optimization*? > > I'm saying that it can't be worth optimising for a misaligned > buffer because the cost of the buffer being misaligned is so small. > So the test for a misaligned buffer are going to cost more than > and plausible gain. > > Not only that the buffer will never be odd aligned at all. > > The code is left in from a previous version that did do aligned > word reads - so had to do extra for odd alignment. > > Note that code is doing misaligned reads for the more likely 4n+2 > aligned ethernet receive buffers. > I doubt that even a test for that would be worthwhile even if you > were checksumming full sized ethernet packets. > > So the change is deleting code that is never actually executed > from the hot path. > I think I left this code because I got confused with odd/even case, but this is handled by upper functions like csum_block_add() What matters is not if the start of a frag is odd/even, but what offset it is in the overall ' frame', if a frame is split into multiple areas (scatter/gather) Reviewed-by: Eric Dumazet Thanks !