2020-12-15 10:30:11

by Tony W Wang-oc

[permalink] [raw]
Subject: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs.

This case expect to use crc32c-generic driver for these Zhaoxin CPUs
to get performance gain, so remove these Zhaoxin CPUs support from
crc32c-intel.

Signed-off-by: Tony W Wang-oc <[email protected]>
---
arch/x86/crypto/crc32c-intel_glue.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..5171091 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,14 +215,31 @@ static struct shash_alg alg = {
};

static const struct x86_cpu_id crc32c_cpu_id[] = {
- X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
+ /*
+ * Negative entries; exclude these chips from using this driver.
+ * They match the positive rule below, but their CRC32 instruction
+ * implementation is so slow, it doesn't merit use.
+ */
+ X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
+ X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+ X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+ X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
+ X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false),
+ X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false),
+ /*
+ * Positive entry; SSE-4.2 instructions include special purpose CRC32
+ * instructions.
+ */
+ X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
{}
};
MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);

static int __init crc32c_intel_mod_init(void)
{
- if (!x86_match_cpu(crc32c_cpu_id))
+ const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+ if (!m || !m->driver_data)
return -ENODEV;
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
--
2.7.4


2020-12-22 03:04:05

by Tony W Wang-oc

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

On December 22, 2020 3:27:33 AM GMT+08:00, [email protected] wrote:
>On December 20, 2020 6:46:25 PM PST, [email protected] wrote:
>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>><[email protected]> wrote:
>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>
>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>X86_FEATURE_XMM4_2.
>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>When
>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>will
>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>for
>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>Zhaoxin
>>>> >>>> CPUs support from crc32c-intel.
>>>> >>>>
>>>> >>>> Signed-off-by: Tony W Wang-oc <[email protected]>
>>>> >>>
>>>> >>> Does this mean that the performance of the crc32c instruction
>on
>>>those CPUs is
>>>> >>> actually slower than a regular C implementation? That's very
>>>weird.
>>>> >>>
>>>> >>
>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>think yes.
>>>> >>
>>>> >
>>>> > Did you try measuring the performance of the hashing itself, and
>>>not some
>>>> > higher-level filesystem operations?
>>>> >
>>>>
>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>>same
>>>> input value the generic C implementation takes fewer time than the
>>>> crc32c instruction implementation.
>>>>
>>>
>>>And that is really "working as intended"?
>>
>>These CPU's crc32c instruction is not working as intended.
>>
>> Why do these CPUs even
>>>declare that
>>>they support the crc32c instruction, when it is so slow?
>>>
>>
>>The presence of crc32c and some other instructions supports are
>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok
>>except the crc32c instruction.
>>
>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>>etc.) that
>>>these CPUs similarly declare support for but they are uselessly slow?
>>
>>No.
>>
>>Sincerely
>>Tonyw
>>
>>>
>>>- Eric
>
>Then the right thing to do is to disable the CPUID bit in the
>vendor-specific startup code.

This way makes these CPUs do not support all instruction sets enumerated
by CPUID.01:ECX[SSE4.2].
While only crc32c instruction is slow, just expect the crc32c-intel driver do not
match these CPUs.

Sincerely
Tonyw

2020-12-22 04:56:18

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

On December 21, 2020 7:01:39 PM PST, [email protected] wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, [email protected] wrote:
>>On December 20, 2020 6:46:25 PM PST, [email protected] wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>><[email protected]> wrote:
>>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>>
>>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>>>>> >>>> The driver crc32c-intel match CPUs supporting
>>>>X86_FEATURE_XMM4_2.
>>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>>>>> >>>> crc32c-intel and crc32c-generic are both registered, system
>>>will
>>>>> >>>> use crc32c-intel because its .cra_priority is greater than
>>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>>for
>>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>>Zhaoxin
>>>>> >>>> CPUs support from crc32c-intel.
>>>>> >>>>
>>>>> >>>> Signed-off-by: Tony W Wang-oc <[email protected]>
>>>>> >>>
>>>>> >>> Does this mean that the performance of the crc32c instruction
>>on
>>>>those CPUs is
>>>>> >>> actually slower than a regular C implementation? That's very
>>>>weird.
>>>>> >>>
>>>>> >>
>>>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>>>think yes.
>>>>> >>
>>>>> >
>>>>> > Did you try measuring the performance of the hashing itself, and
>>>>not some
>>>>> > higher-level filesystem operations?
>>>>> >
>>>>>
>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
>>>>same
>>>>> input value the generic C implementation takes fewer time than the
>>>>> crc32c instruction implementation.
>>>>>
>>>>
>>>>And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>> Why do these CPUs even
>>>>declare that
>>>>they support the crc32c instruction, when it is so slow?
>>>>
>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok
>>>except the crc32c instruction.
>>>
>>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
>>>>etc.) that
>>>>these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>
>>>>
>>>>- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw

Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2021-01-02 21:13:29

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, when
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic.
>
> When doing lmbench3 Create and Delete file test on partitions with
> ext4 enabling metadata checksum, found using crc32c-generic driver
> could get about 20% performance gain than using the driver crc32c-intel
> on some Zhaoxin CPUs.
>
> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
> to get performance gain, so remove these Zhaoxin CPUs support from
> crc32c-intel.
>
> Signed-off-by: Tony W Wang-oc <[email protected]>
> ---
> arch/x86/crypto/crc32c-intel_glue.c | 21 +++++++++++++++++++--
> 1 file changed, 19 insertions(+), 2 deletions(-)

This does not seem to address the latest comment from hpa.

Thanks,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2021-01-07 06:24:44

by Tony W Wang-oc

[permalink] [raw]
Subject: Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs


On 22/12/2020 12:54, [email protected] wrote:
> On December 21, 2020 7:01:39 PM PST, [email protected] wrote:
>> On December 22, 2020 3:27:33 AM GMT+08:00, [email protected] wrote:
>>> On December 20, 2020 6:46:25 PM PST, [email protected] wrote:
>>>> On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>>> <[email protected]> wrote:
>>>>> On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>>>>>
>>>>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>>>>>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>>>>>>> On 12/12/2020 01:43, Eric Biggers wrote:
>>>>>>>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>> wrote:
>>>>>>>>>> The driver crc32c-intel match CPUs supporting
>>>>> X86_FEATURE_XMM4_2.
>>>>>>>>>> On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>> When
>>>>>>>>>> crc32c-intel and crc32c-generic are both registered, system
>>>> will
>>>>>>>>>> use crc32c-intel because its .cra_priority is greater than
>>>>>>>>>> crc32c-generic. This case expect to use crc32c-generic driver
>>>>> for
>>>>>>>>>> some Zhaoxin CPUs to get performance gain, So remove these
>>>>> Zhaoxin
>>>>>>>>>> CPUs support from crc32c-intel.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Tony W Wang-oc <[email protected]>
>>>>>>>>>
>>>>>>>>> Does this mean that the performance of the crc32c instruction
>>> on
>>>>> those CPUs is
>>>>>>>>> actually slower than a regular C implementation? That's very
>>>>> weird.
>>>>>>>>>
>>>>>>>>
>>>>>>>> From the lmbench3 Create and Delete file test on those chips, I
>>>>> think yes.
>>>>>>>>
>>>>>>>
>>>>>>> Did you try measuring the performance of the hashing itself, and
>>>>> not some
>>>>>>> higher-level filesystem operations?
>>>>>>>
>>>>>>
>>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>> the
>>>>> same
>>>>>> input value the generic C implementation takes fewer time than the
>>>>>> crc32c instruction implementation.
>>>>>>
>>>>>
>>>>> And that is really "working as intended"?
>>>>
>>>> These CPU's crc32c instruction is not working as intended.
>>>>
>>>> Why do these CPUs even
>>>>> declare that
>>>>> they support the crc32c instruction, when it is so slow?
>>>>>
>>>>
>>>> The presence of crc32c and some other instructions supports are
>>>> enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok
>>>> except the crc32c instruction.
>>>>
>>>>> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>> AVX,
>>>>> etc.) that
>>>>> these CPUs similarly declare support for but they are uselessly
>> slow?
>>>>
>>>> No.
>>>>
>>>> Sincerely
>>>> Tonyw
>>>>
>>>>>
>>>>> - Eric
>>>
>>> Then the right thing to do is to disable the CPUID bit in the
>>> vendor-specific startup code.
>>
>> This way makes these CPUs do not support all instruction sets
>> enumerated
>> by CPUID.01:ECX[SSE4.2].
>> While only crc32c instruction is slow, just expect the crc32c-intel
>> driver do not
>> match these CPUs.
>>
>> Sincerely
>> Tonyw
>
> Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
>

Thanks for your suggestion. Have send new patch set.

Sincerely
Tonyw