MIME-Version: 1.0
References: <45d12aa0c95049a392d52ff239d42d83@AcuMS.aculab.com>
 <52edd5fd-daa0-729b-4646-43450552d2ab@intel.com> <96b6a476c4154da3bd04996139cd8a6d@AcuMS.aculab.com>
In-Reply-To: <96b6a476c4154da3bd04996139cd8a6d@AcuMS.aculab.com>
From:   Eric Dumazet <edumazet@google.com>
Date:   Mon, 13 Dec 2021 07:56:20 -0800
Message-ID: <CANn89i+4acJp8ohBMWU4sketLfitKCzmS8FQTvduxumYYketvw@mail.gmail.com>
Subject: Re: [PATCH] x86/lib: Remove the special case for odd-aligned buffers
 in csum_partial.c
To:     David Laight <David.Laight@aculab.com>
Cc:     Dave Hansen <dave.hansen@intel.com>,
        Noah Goldstein <goldstein.w.n@gmail.com>,
        "tglx@linutronix.de" <tglx@linutronix.de>,
        "mingo@redhat.com" <mingo@redhat.com>,
        Borislav Petkov <bp@alien8.de>,
        "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
        X86 ML <x86@kernel.org>, "hpa@zytor.com" <hpa@zytor.com>,
        "peterz@infradead.org" <peterz@infradead.org>,
        "alexanderduyck@fb.com" <alexanderduyck@fb.com>,
        open list <linux-kernel@vger.kernel.org>,
        netdev <netdev@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Mon, Dec 13, 2021 at 7:37 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Dave Hansen
> > Sent: 13 December 2021 15:02
> .c
> >
> > On 12/13/21 6:43 AM, David Laight wrote:
> > > There is no need to special case the very unusual odd-aligned buffers.
> > > They are no worse than 4n+2 aligned buffers.
> > >
> > > Signed-off-by: David Laight <david.laight@aculab.com>
> > > ---
> > >
> > > On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte
> > >   checksum.
> > > That is just measuring the main loop with an lfence prior to rdpmc to
> > > read PERF_COUNT_HW_CPU_CYCLES.
> >
> > I'm a bit confused by this changelog.
> >
> > Are you saying that the patch causes a (small) performance regression?
> >
> > Are you also saying that the optimization here is not worth it because
> > it saves 15 lines of code?  Or that the misalignment checks themselves
> > add 2 or 3 cycles, and this is an *optimization*?
>
> I'm saying that it can't be worth optimising for a misaligned
> buffer because the cost of the buffer being misaligned is so small.
> So the test for a misaligned buffer are going to cost more than
> and plausible gain.
>
> Not only that the buffer will never be odd aligned at all.
>
> The code is left in from a previous version that did do aligned
> word reads - so had to do extra for odd alignment.
>
> Note that code is doing misaligned reads for the more likely 4n+2
> aligned ethernet receive buffers.
> I doubt that even a test for that would be worthwhile even if you
> were checksumming full sized ethernet packets.
>
> So the change is deleting code that is never actually executed
> from the hot path.
>

I think I left this code because I got confused with odd/even case,
but this is handled by upper functions like csum_block_add()

What matters is not if the start of a frag is odd/even, but what
offset it is in the overall ' frame', if a frame is split into multiple
areas (scatter/gather)

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks !